Persona: Pérez García-Plaza, Alberto
Cargando...
Dirección de correo electrónico
ORCID
0000-0002-2710-9319
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Pérez García-Plaza
Nombre de pila
Alberto
Nombre
1 resultados
Resultados de la búsqueda
Mostrando 1 - 1 de 1
Publicación An improved fuzzy system for representing web pages in clustering tasks(Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos, 2012-10-23) Pérez García-Plaza, Alberto; Fresno Fernández, Víctor; Martínez Unanue, RaquelKeeping information organized is an important issue to make information access easier. Although the information we need is sometimes available on the Web, this information is only useful if we have the ability to find it. With this aim, it is increasingly frequent to use automatic techniques for grouping documents. In this thesis we are interested in document clustering, that is, grouping doc- uments based on the similarity of their contents. In this regard, document repre- sentation plays a very important role in web page clustering and constitutes the central point of research of this dissertation. Web pages are commonly written in HTML language, that offers explicit information (tags, in this case) about their visual representation, the typography of the text or its structure, among others. It is also a widely used format on the Internet. The main goal of this thesis is to perform a deep study with the aim of making the most of a fuzzy model to represent HTML documents for clustering tasks. Our study deals with the idea of discovering whether any part of the system could be exploited in a different way to improve clustering results. We begin our work analyzing the parts of the system where there is room for improvement and then we study different alternatives to do so. Thereby, we do not propose a document representation from the beginning, but we build it trying to understand its different parts during each step. To evaluate our results and compare the different representation proposals, we use different web page collections previously gathered to be used as gold stan- dards. Clustering is performed by using state-of-the-art algorithms and our pro- posals are validated in environments of plain and hierarchical clustering. Lastly, we also test the usefulness of our approaches in two languages: English and Spanish