Publicación:
An improved fuzzy system for representing web pages in clustering tasks

dc.contributor.authorPérez García-Plaza, Alberto
dc.contributor.directorFresno Fernández, Víctor Diego
dc.contributor.directorMartínez Unanue, Raquel
dc.date.accessioned2024-05-21T13:59:19Z
dc.date.available2024-05-21T13:59:19Z
dc.date.issued2012-10-23
dc.description.abstractKeeping information organized is an important issue to make information access easier. Although the information we need is sometimes available on the Web, this information is only useful if we have the ability to find it. With this aim, it is increasingly frequent to use automatic techniques for grouping documents. In this thesis we are interested in document clustering, that is, grouping doc- uments based on the similarity of their contents. In this regard, document repre- sentation plays a very important role in web page clustering and constitutes the central point of research of this dissertation. Web pages are commonly written in HTML language, that offers explicit information (tags, in this case) about their visual representation, the typography of the text or its structure, among others. It is also a widely used format on the Internet. The main goal of this thesis is to perform a deep study with the aim of making the most of a fuzzy model to represent HTML documents for clustering tasks. Our study deals with the idea of discovering whether any part of the system could be exploited in a different way to improve clustering results. We begin our work analyzing the parts of the system where there is room for improvement and then we study different alternatives to do so. Thereby, we do not propose a document representation from the beginning, but we build it trying to understand its different parts during each step. To evaluate our results and compare the different representation proposals, we use different web page collections previously gathered to be used as gold stan- dards. Clustering is performed by using state-of-the-art algorithms and our pro- posals are validated in environments of plain and hierarchical clustering. Lastly, we also test the usefulness of our approaches in two languages: English and Spanishen
dc.description.versionversión final
dc.identifier.urihttps://hdl.handle.net/20.500.14468/21060
dc.language.isoen
dc.publisherUniversidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos
dc.relation.centerEscuela Internacional de Doctorado
dc.relation.departmentNo procede
dc.rightsAtribución-NoComercial-SinDerivadas 4.0 Internacional
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0
dc.subject.keywordspáginas web
dc.titleAn improved fuzzy system for representing web pages in clustering taskses
dc.typetesis doctorales
dc.typedoctoral thesisen
dspace.entity.typePublication
relation.isAuthorOfPublicationb0e47ca7-aeb3-4081-b492-5fee378bf392
relation.isAuthorOfPublication.latestForDiscoveryb0e47ca7-aeb3-4081-b492-5fee378bf392
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
Documento.pdf
Tamaño:
3.11 MB
Formato:
Adobe Portable Document Format
Colecciones