Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?

Fuster Baggetto, Alejandro

Fecha

2022-09-01

Director/a

Fresno Fernández, Víctor Diego

Derechos de acceso

info:eu-repo/semantics/openAccess

Editorial

Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial

Citas

0 citas en

Resumen

We conduct a set of experiments aimed to improve our understanding of the lack of semantic isometry (correspondence between the embedding and meaning spaces) of contextual word embeddings of BERT. Our empirical results show that, contrary to popular belief, the anisotropy is not the root cause of the poor performance of these contextual models’ embeddings in semantic tasks. What does affect both anisotropy and semantic isometry are a set of biased tokens, that distort the space with non semantic information. For each bias category (frequency, subword, punctuation, and case), we measure its magnitude and the effect of its removal. We show that these biases contribute but not completely explain the anisotropy and lack of semantic isometry of these models. Therefore, we hypothesise that the finding of new biases will contribute to the objective of correcting the representation degradation problem. Finally, we propose a new similarity method aimed to smooth the negative effect of biased tokens in semantic isometry and to increase the explainability of semantic similarity scores. We conduct an in depth experimentation of this method, analysing its strengths and weaknesses and propose future applications for it.

Palabras clave

semantic textual similarity, sentence embeddings, transformers, natural language processing, deep learning

Centro

E.T.S. de Ingeniería Informática

Departamento

Inteligencia Artificial

Handle

https://hdl.handle.net/20.500.14468/14260

Colecciones

Trabajos de fin de máster (TFM)

Página completa del ítem

Fecha

Editor/a

Director/a

Tutor/a

Coordinador/a

Prologuista

Revisor/a

Ilustrador/a

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editorial

Citas

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Datos de investigación relacionados

Handle

DOI

Colecciones