Publicación: Evaluation of unsupervised clustering algorithms for variable stars data
Cargando...
Fecha
2008-09
Autores
Editor/a
Director/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial.
Resumen
The aim of this master thesis is to assess the validity of unsupervised clustering algorithms to variable stars data classification for the Gaia mission. The use of these techniques allows to identify natural clustering without using any previous information about the classes and its distribution and, therefore, allows to discover new classes of objects. With this objective, we evaluate two probabilistic algorithms, one in which each cluster is characterized by a parametric distribution, and other, by a no-parametric distribution in a hierarchical clustering: Autoclass and HMAC (Hierarchical Mode Association Clustering). Both methods are evaluated against the same criteria, reproducibility, computation time, sensitivity to new classes and interpretability, in datasets that can grow up to 108 instances. These criteria are the first step to assess the feasibility of application of the algorithm but they are not enough to evaluate the goodness of clustering results. Despite the popular use of the unsupervised clustering techniques, the performance evaluation of clustering is an open question. It includes knowing how many clusters are actually present and how real is the clustering itself. Our clustering evaluation starts applying the expert knowledge and using a labeled dataset what allows to match some clusters with some variable stars types, but this is not enough to reach the objective of identifying each cluster. A review of the existing indices to evaluate clustering with objective criteria is included. Clusters and data are then analyzed to understand the results obtained with both methods biased by the method itself. A clustering combination method of these two algorithms is also tested as a technique that optimizes according multiple objective functions and trying to avoid some limitations of both algorithms.
Descripción
Categorías UNESCO
Palabras clave
Unsupervised clustering, Autoclass, HMAC, model-based clustering, hierarchical clustering, validation indices
Citación
Centro
Facultades y escuelas::E.T.S. de Ingeniería Informática
Departamento
Inteligencia Artificial