Harnessing folksonomies for resource classification

Zubiaga Mendialdua, Arkaitz

Fecha

2011-07-12

Director/a

Fresno Fernández, Víctor Diego
Martínez Unanue, Raquel

Derechos de acceso

info:eu-repo/semantics/openAccess

Editorial

Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos

Citas

0 citas en

Resumen

En esta tesis abordamos el problema de la clasificación automática de recursos, una tarea cada vez más frecuente e importante en nuestra vida diaria. El catalogado de libros o la organización de vídeos, entre otros, representan algunos ejemplos de actividades para las que un proceso automático de clasificación resulta cada vez más frecuente, necesario e importante. Aprovechamos la información contenida en las anotaciones que realizan los usuarios de sistemas de etiquetado social, en los cuales se recogen numerosos metadatos que detallan el contenido de diferentes tipos de recursos. Hasta el momento, son pocos los trabajos que han explotado estos metadatos con este fin, y los pocos que lo han hecho se han limitado a realizar análisis estadísticos. En esta tesis exploramos las características de estos sistemas, de los usuarios involucrados en ellos, así como de las anotaciones que aportan, con el fin de sacar el máximo partido a estas grandes colecciones, obteniendo un rendimiento lo más preciso posible de los clasificadores automáticos de recursos.
In our daily lives, organizing resources into a set of categories is a common task. Organizing resources into categories makes searching through those resources easier by limiting the focus to a specific category. Limiting the focus significantly reduces the amount of information one must search. Categorization becomes more useful as the collection of resources increases, when managing resources becomes more and more difficult if they are not organized appropriately. Large collections like those made up by books, movies, and web pages, for instance, are usually cataloged in libraries, organized in databases and classified in directories, respectively. However, the usual largeness of these collections requires a vast endeavor and an outrageous expense to organize manually. Recent research is moving towards developing automated classifiers that reduce the increasing costs and effort of the task. Most of the research in this field has focused on self-content, where the publisher is the only author, as a data source to discover the aboutness of the resource. Self-content presents the problem that it is not always representative enough, and sometimes it is difficult to access depending on the type of resource. Little work has been done analyzing the appropriateness of and exploring how to harness the annotations provided by users on social tagging systems as a data source. Users on these systems save resources as bookmarks in a social environment by attaching annotations in the form of tags. It has been shown that these tags facilitate retrieval of resources not only for the annotators themselves but also for the whole community. Likewise, these tags provide meaningful metadata that refers to the content of the resources. In this thesis, we deal with the utilization of these user-provided tags in search of the most accurate classification of resources as compared to expert-driven categorizations. After performing a set of experiments to choose a suitable classifier 12 for this kind of task, we explore social annotations looking for a way to best use them. For this purpose, we have created three large-scale datasets including tagging data for resources from well-known social tagging systems: Delicious, LibraryThing, and GoodReads. Those resources are accompanied by categorization data from sound and consolidated expert-driven taxonomies. From these resources the appropriateness of social tags for predicting categories can be evaluated. Specifically, we first study several ways of representing the massive number of social tags by amalgamating the contributions of large communities of users. We analyze their suitability for the classification task, upon both broader top level categories and narrower deep level categories. Then, we explore the nature, characteristics, and distributions of tags in folksonomies, in order to determine how the settings of each system affect the tagging behavior and the usefulness of tags for the classification task. We go deeper into tag distributions by analyzing the usefulness of weighting schemes based on inverse frequency values. Finally, using state-of-the-art user behavior detection processes, we identify users on social tagging systems who better fit the classification task. To the best of our knowledge, this is the first research work performing actual classification experiments utilizing social tags. By exploring the characteristics and nature of these systems and the underlying folksonomies, this thesis sheds new light on the way of getting the most out of social tags for the sake of automated resource classification tasks. Therefore, we believe that the contributions in this work are of utmost interest for future researchers in the field, as well as for the scientific community in order to better understand these systems and further utilize the knowledge garnered from social tags.

Palabras clave

recursos en internet, clasificación

Centro

Escuela Internacional de Doctorado

Departamento

No procede

Handle

https://hdl.handle.net/20.500.14468/20974

Colecciones

Tesis doctorales

Página completa del ítem

Fecha

Editor/a

Director/a

Tutor/a

Coordinador/a

Prologuista

Revisor/a

Ilustrador/a

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editorial

Citas

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Datos de investigación relacionados

Handle

DOI

Colecciones