Giménez, JesúsMàrquez, LluísAmigo Cabrera, EnriqueGonzalo Arroyo, Julio Antonio2024-05-212024-05-212006-07-17https://hdl.handle.net/20.500.14468/19991We present a comparative study on Machine Translation Evaluation according to two different criteria: Human Likeness and Human Acceptability. We provide empirical evidence that there is a relationship between these two kinds of evaluation: Human Likeness implies Human Acceptability but the reverse is not true. From the point of view of automatic evaluation this implies that metrics based on Human Likeness are more reliable for system tuning. Our results also show that current evaluation metrics are not always able to distinguish between automatic and human translations. In order to improve the descriptive power of current metrics we propose the use of additional syntax-based metrics, and metric combinations inside the QARLA Framework.eninfo:eu-repo/semantics/openAccessMT Evaluation : human-like vs. human acceptableconference proceedings