Examinando por Autor "Verdejo, Felisa"
Mostrando 1 - 4 de 4
Resultados por página
Opciones de ordenación
Publicación A comparison of extrinsic clustering evaluation metrics based on formal constraints(Springer, 2009-05-11) Artiles, Javier; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio AntonioThere is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis to the problem of overlapping clustering, where items can simultaneously belong to more than one cluster. As Bcubed cannot be directly applied to this task, we propose a modified version of Bcubed that avoids the problems found with other metrics.Publicación Combining evaluation metrics via the unanimous improvement ratio and its application in weps clustering task(Association for the Advancement of Artificial Intelligence, 2011-12-01) Artiles, Javier; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio AntonioMany Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the relative weights may produce substantial changes in the system rankings. This paper introduces the Unanimous Improvement Ratio (UIR), a measure that complements standard metric combination criteria (such as van Rijsbergen's F-measure) and indicates how robust the measured differences are to changes in the relative weights of the individual metrics. UIR is meant to elucidate whether a perceived difference between two systems is an artifact of how individual metrics are weighted. Besides discussing the theoretical foundations of UIR, this paper presents empirical results that confirm the validity and usefulness of the metric for the Text Clustering problem, where there is a tradeoff between precision and recall based metrics and results are particularly sensitive to the weighting scheme used to combine them. Remarkably, our experiments show that UIR can be used as a predictor of how well differences between systems measured on a given test bed will also hold in a different test bed.Publicación EvALL: Open Access Evaluation for Information Access Systems(Association for Computing Machinery (ACM), 2017) Almagro Cádiz, Mario; Rodríguez Vidal, Javier; Verdejo, Felisa; Amigo Cabrera, Enrique::virtual::2664::600; Carrillo de Albornoz Cuadrado, Jorge Amando::virtual::2665::600; Gonzalo Arroyo, Julio Antonio::virtual::2666::600; Amigo Cabrera, Enrique; Carrillo de Albornoz Cuadrado, Jorge Amando; Gonzalo Arroyo, Julio Antonio; Amigo Cabrera, Enrique; Carrillo de Albornoz Cuadrado, Jorge Amando; Gonzalo Arroyo, Julio Antonio; Amigo Cabrera, Enrique; Carrillo de Albornoz Cuadrado, Jorge Amando; Gonzalo Arroyo, Julio AntonioThe EvALL online evaluation service aims to provide a unified evaluation framework for Information Access systems that makes results completely comparable and publicly available for the whole research community. For researchers working on a given test collection, the framework allows to: (i) evaluate results in a way compliant with measurement theory and with state-of-the-art evaluation practices in the field; (ii) quantitatively and qualitatively compare their results with the state of the art; (iii) provide their results as reusable data to the scientific community; (iv) automatically generate evaluation figures and (low-level) interpretation of the results, both as a pdf report and as a latex source. For researchers running a challenge (a comparative evaluation campaign on shared data), the framework helps them to manage, store and evaluate submissions, and to preserve ground truth and system output data for future use by the research community. EvALL can be tested at http://evall.uned.es.Publicación The contribution of linguistic features to automatic machine translation evaluation(2009-08-02) Giménez, Jesús; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio AntonioA number of approaches to Automatic MT Evaluation based on deep linguistic knowledge have been suggested. However, n-gram based metrics are still today the dominant approach. The main reason is that the advantages of employing deeper linguistic information have not been clarified yet. In this work, we propose a novel approach for meta-evaluation of MT evaluation metrics, since correlation cofficient against human judges do not reveal details about the advantages and disadvantages of particular metrics. We then use this approach to investigate the benefits of introducing linguistic features into evaluation metrics. Overall, our experiments show that (i) both lexical and linguistic metrics present complementary advantages and (ii) combining both kinds of metrics yields the most robust metaevaluation performance.