A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet.

Lastra-Díaz, Juan J. y García-Serrano, Ana, A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet. Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos

Ficheros (Some files may be inaccessible until you login with your e-spacio credentials)
Nombre Descripción Tipo MIME Size
Refinement_Espace_LastraGarcia.pdf Full text (open access) application/pdf 745.12KB

Autor(es) Lastra-Díaz, Juan J.
García-Serrano, Ana
Title of report A refinement of the well-founded Information Content models with a very detailed experimental survey on WordNet.
Parent publication NLP and IR Research Group
Notas adicionales Technical Report TR-2016-01
Publication date 2016-07-06
Materia(s) Informática
Abstract In a recent paper, we introduce a new family of Information Content (IC) models based on the estimation of the conditional probability between child and parent concepts. This work is encouraged by the …nding of two drawbacks in the computational method of our aforementioned family of IC models, as well as other two gaps in the literature. First gap is that two of our cognitive IC models do not satisfy the axiom that constrains the sum of probabilities on the leaf nodes to be 1, whilst some ontologies with multiple inheritance could prevent the IC model satisfying the growing monotonicity axiom in concepts with multiple parents. Second gap is the lack of a complete and updated experimental survey including a pairwise statistical signi…cance analysis between most IC models and ontology-based similarity measures. Finally a third gap is the lack of replication and con…rmation of previous methods and results in most works. The latest two gaps are especially signi…cant in the current state of the problem, in which there is no convincing winner within the family of intrinsic IC-based similarity measures and the performance margin is very narrow. In order to bridge the aforementioned gaps, this paper introduces the following contributions: (1) a re…nement of our recent family of well-founded Information Content (IC) models; (2) eight new intrinsic IC models and one new corpus-based IC model; and (3) a very detailed experimental survey of ontology-based similarity measures and Information Content (IC) models on WordNet, including the evaluation and statistical signi…cance analysis on the …ve most signi…cant datasets of most ontology-based similarity measures and all WordNet-based IC models reported in the literature, with the only exception of the IC models recently introduced by Harispe et al. (2015a) and Ben Aouicha et al. (2016b). The evaluation is entirely based on a Java software library called HESML which has been developed by the authors in order to replicate all methods evaluated herein. The new IC models obtain rivaling results as regard the state-of-the-art methods and improve our previous mod- els, whilst the experimental survey allows a detailed and conclusive image of the state of the problem to be drawn by setting the new state of the art and quantifying the main achievements of the last three decades.
Palabra clave Intrinsic Information Content models
ontology-based semantic similarity measures
IC- based similarity measures
word similarity benchmark
semantic similarity
concept similarity model
experimental survey
Editor(es) Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos
Formato application/pdf
Total pages 44
Identificador bibliuned:DptoLSI-ETSI-Informes-Jlastra-refinement
http://e-spacio.uned.es/fez/view/bibliuned:DptoLSI-ETSI-Informes-Jlastra-refinement
Language eng
Derechos de acceso y licencia info:eu-repo/semantics/openAccess
accessCondition Acceso abierto
Versión de la publicación publishedVersion

 
Versiones
Versión Tipo de filtro
Contador de citas: Google Scholar Search Google Scholar
Estadísticas de acceso: 12013 Visitas, 369 Descargas  -  Estadísticas en detalle
Creado: Mon, 11 Jul 2016, 20:31:47 CET