Publicación:
Information Theory–based Compositional Distributional Semantics

dc.contributor.authorAmigo Cabrera, Enrique
dc.contributor.authorAriza Casabona, Alejandro
dc.contributor.authorFresno Fernández, Víctor Diego
dc.contributor.authorMartí, M. Antònia
dc.date.accessioned2025-12-02T13:03:47Z
dc.date.available2025-12-02T13:03:47Z
dc.date.issued2022-12-01
dc.descriptionThe registered version of this article, first published in “Computational Linguistics 2022; 48 (4): 907–948", is available online at the publisher's website: Massachusetts Institute of Technology Press, https://doi.org/10.1162/coli_a_00454
dc.descriptionLa versión registrada de este artículo, publicado por primera vez en “Computational Linguistics 2022; 48 (4): 907–948", está disponible en línea en el sitio web del editor: Massachusetts Institute of Technology Press, https://doi.org/10.1162/coli_a_00454
dc.description.abstractIn the context of text representation, Compositional Distributional Semantics models aim to fuse the Distributional Hypothesis and the Principle of Compositionality. Text embedding is based on co-ocurrence distributions and the representations are in turn combined by compositional functions taking into account the text structure. However, the theoretical basis of compositional functions is still an open issue. In this article we define and study the notion of Information Theory–based Compositional Distributional Semantics (ICDS): (i) We first establish formal properties for embedding, composition, and similarity functions based on Shannon’s Information Theory; (ii) we analyze the existing approaches under this prism, checking whether or not they comply with the established desirable properties; (iii) we propose two parameterizable composition and similarity functions that generalize traditional approaches while fulfilling the formal properties; and finally (iv) we perform an empirical study on several textual similarity datasets that include sentences with a high and low lexical overlap, and on the similarity between words and their description. Our theoretical analysis and empirical results show that fulfilling formal properties affects positively the accuracy of text representation models in terms of correspondence (isometry) between the embedding and meaning spaces.en
dc.description.versionversión publicada
dc.identifier.citationEnrique Amigó, Alejandro Ariza-Casabona, Victor Fresno, M. Antònia Martí (2022); Information Theory–based Compositional Distributional Semantics. Computational Linguistics; 48 (4): 907–948. doi: https://doi.org/10.1162/coli_a_00454
dc.identifier.doihttps://doi.org/10.1162/coli_a_00454
dc.identifier.issn0891-2017
dc.identifier.urihttps://hdl.handle.net/20.500.14468/30978
dc.journal.issue4
dc.journal.titleComputational Linguistics
dc.journal.volume48
dc.language.isoen
dc.page.final948
dc.page.initial907
dc.publisherMassachusetts Institute of Technology Press
dc.relation.centerE.T.S. de Ingeniería Informática
dc.relation.departmentLenguajes y Sistemas Informáticos
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subject1203.04 Inteligencia artificial
dc.subject5705.08 Semántica
dc.subject1203.11 Logicales de ordenadores
dc.titleInformation Theory–based Compositional Distributional Semanticsen
dc.typeartículoes
dc.typejournal articleen
dspace.entity.typePublication
relation.isAuthorOfPublicationf96c6e59-3a7a-4b0c-9b10-deec22f8c06b
relation.isAuthorOfPublication80cd3492-0ff8-4c8e-a904-2858623c7fc1
relation.isAuthorOfPublication.latestForDiscoveryf96c6e59-3a7a-4b0c-9b10-deec22f8c06b
Archivos
Bloque original
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
coli_a_00454 (Amigo et al, 2022)_VICTOR DIEGO FRESNO.pdf
Tamaño:
1.25 MB
Formato:
Adobe Portable Document Format
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.62 KB
Formato:
Item-specific license agreed to upon submission
Descripción: