Persona:
Amigo Cabrera, Enrique

Cargando...
Foto de perfil
Dirección de correo electrónico
ORCID
0000-0003-1482-824X
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Amigo Cabrera
Nombre de pila
Enrique
Nombre

Resultados de la búsqueda

Mostrando 1 - 4 de 4
  • Publicación
    MT Evaluation : human-like vs. human acceptable
    (2006-07-17) Giménez, Jesús; Màrquez, Lluís; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
    We present a comparative study on Machine Translation Evaluation according to two different criteria: Human Likeness and Human Acceptability. We provide empirical evidence that there is a relationship between these two kinds of evaluation: Human Likeness implies Human Acceptability but the reverse is not true. From the point of view of automatic evaluation this implies that metrics based on Human Likeness are more reliable for system tuning. Our results also show that current evaluation metrics are not always able to distinguish between automatic and human translations. In order to improve the descriptive power of current metrics we propose the use of additional syntax-based metrics, and metric combinations inside the QARLA Framework.
  • Publicación
    A comparison of extrinsic clustering evaluation metrics based on formal constraints
    (Springer, 2009-05-11) Artiles, Javier; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
    There is a wide set of evaluation metrics available to compare the quality of text clustering algorithms. In this article, we define a few intuitive formal constraints on such metrics which shed light on which aspects of the quality of a clustering are captured by different metric families. These formal constraints are validated in an experiment involving human assessments, and compared with other constraints proposed in the literature. Our analysis of a wide range of metrics shows that only BCubed satisfies all formal constraints. We also extend the analysis to the problem of overlapping clustering, where items can simultaneously belong to more than one cluster. As Bcubed cannot be directly applied to this task, we propose a modified version of Bcubed that avoids the problems found with other metrics.
  • Publicación
    The contribution of linguistic features to automatic machine translation evaluation
    (2009-08-02) Giménez, Jesús; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
    A number of approaches to Automatic MT Evaluation based on deep linguistic knowledge have been suggested. However, n-gram based metrics are still today the dominant approach. The main reason is that the advantages of employing deeper linguistic information have not been clarified yet. In this work, we propose a novel approach for meta-evaluation of MT evaluation metrics, since correlation cofficient against human judges do not reveal details about the advantages and disadvantages of particular metrics. We then use this approach to investigate the benefits of introducing linguistic features into evaluation metrics. Overall, our experiments show that (i) both lexical and linguistic metrics present complementary advantages and (ii) combining both kinds of metrics yields the most robust metaevaluation performance.
  • Publicación
    Combining evaluation metrics via the unanimous improvement ratio and its application in weps clustering task
    (Association for the Advancement of Artificial Intelligence, 2011-12-01) Artiles, Javier; Verdejo, Felisa; Amigo Cabrera, Enrique; Gonzalo Arroyo, Julio Antonio
    Many Artificial Intelligence tasks cannot be evaluated with a single quality criterion and some sort of weighted combination is needed to provide system rankings. A problem of weighted combination measures is that slight changes in the relative weights may produce substantial changes in the system rankings. This paper introduces the Unanimous Improvement Ratio (UIR), a measure that complements standard metric combination criteria (such as van Rijsbergen's F-measure) and indicates how robust the measured differences are to changes in the relative weights of the individual metrics. UIR is meant to elucidate whether a perceived difference between two systems is an artifact of how individual metrics are weighted. Besides discussing the theoretical foundations of UIR, this paper presents empirical results that confirm the validity and usefulness of the metric for the Text Clustering problem, where there is a tradeoff between precision and recall based metrics and results are particularly sensitive to the weighting scheme used to combine them. Remarkably, our experiments show that UIR can be used as a predictor of how well differences between systems measured on a given test bed will also hold in a different test bed.