A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet.

Lastra-Díaz, Juan J.; García Serrano, Ana Mª

Derechos de acceso

info:eu-repo/semantics/openAccess

Editorial

Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos

Citas

0 citas en

Resumen

In a recent paper, we introduce a new family of Information Content (IC) models based on the estimation of the conditional probability between child and parent concepts. This work is encouraged by the nding of two drawbacks in the computational method of our aforementioned family of IC models, as well as other two gaps in the literature. First gap is that two of our cognitive IC models do not satisfy the axiom that constrains the sum of probabilities on the leaf nodes to be 1, whilst some ontologies with multiple inheritance could prevent the IC model satisfying the growing monotonicity axiom in concepts with multiple parents. Second gap is the lack of a complete and updated experimental survey including a pairwise statistical signi cance analysis between most IC models and ontology-based similarity measures. Finally a third gap is the lack of replication and con rmation of previous methods and results in most works. The latest two gaps are especially signi cant in the current state of the problem, in which there is no convincing winner within the family of intrinsic IC-based similarity measures and the performance margin is very narrow. In order to bridge the aforementioned gaps, this paper introduces the following contributions: (1) a re nement of our recent family of well-founded Information Content (IC) models; (2) eight new intrinsic IC models and one new corpus-based IC model; and (3) a very detailed experimental survey of ontology-based similarity measures and Information Content (IC) models on WordNet, including the evaluation and statistical signi cance analysis on the ve most signi cant datasets of most ontology-based similarity measures and all WordNet-based IC models reported in the literature, with the only exception of the IC models recently introduced by Harispe et al. (2015a) and Ben Aouicha et al. (2016b). The evaluation is entirely based on a Java software library called HESML which has been developed by the authors in order to replicate all methods evaluated herein. The new IC models obtain rivaling results as regard the state-of-the-art methods and improve our previous mod- els, whilst the experimental survey allows a detailed and conclusive image of the state of the problem to be drawn by setting the new state of the art and quantifying the main achievements of the last three decades.

Palabras clave

Intrinsic Information Content models, ontology-based semantic similarity measures, IC- based similarity measures, word similarity benchmark, semantic similarity, concept similarity model, experimental survey

Centro

E.T.S. de Ingeniería Informática

Departamento

Lenguajes y Sistemas Informáticos

Handle

https://hdl.handle.net/20.500.14468/9886

Colecciones

Informes

Página completa del ítem

Fecha

Editor/a

Director/a

Tutor/a

Coordinador/a

Prologuista

Revisor/a

Ilustrador/a

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editorial

Citas

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Datos de investigación relacionados

Handle

DOI

Colecciones