Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?

Fuster Baggetto, Alejandro

Publicación:
Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?

dc.contributor.author	Fuster Baggetto, Alejandro
dc.contributor.director	Fresno Fernández, Víctor Diego
dc.date.accessioned	2024-05-20T12:26:37Z
dc.date.available	2024-05-20T12:26:37Z
dc.date.issued	2022-09-01
dc.description.abstract	We conduct a set of experiments aimed to improve our understanding of the lack of semantic isometry (correspondence between the embedding and meaning spaces) of contextual word embeddings of BERT. Our empirical results show that, contrary to popular belief, the anisotropy is not the root cause of the poor performance of these contextual models’ embeddings in semantic tasks. What does affect both anisotropy and semantic isometry are a set of biased tokens, that distort the space with non semantic information. For each bias category (frequency, subword, punctuation, and case), we measure its magnitude and the effect of its removal. We show that these biases contribute but not completely explain the anisotropy and lack of semantic isometry of these models. Therefore, we hypothesise that the finding of new biases will contribute to the objective of correcting the representation degradation problem. Finally, we propose a new similarity method aimed to smooth the negative effect of biased tokens in semantic isometry and to increase the explainability of semantic similarity scores. We conduct an in depth experimentation of this method, analysing its strengths and weaknesses and propose future applications for it.	en
dc.description.version	versión final
dc.identifier.uri	https://hdl.handle.net/20.500.14468/14260
dc.language.iso	en
dc.publisher	Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial
dc.relation.center	E.T.S. de Ingeniería Informática
dc.relation.degree	Máster universitario en Investigación en Inteligencia Artificial
dc.relation.department	Inteligencia Artificial
dc.rights	info:eu-repo/semantics/openAccess
dc.rights.uri	https://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subject.keywords	semantic textual similarity
dc.subject.keywords	sentence embeddings
dc.subject.keywords	transformers
dc.subject.keywords	natural language processing
dc.subject.keywords	deep learning
dc.title	Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?	es
dc.type	tesis de maestría	es
dc.type	master thesis	en
dspace.entity.type	Publication

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: Fuster_Baggetto_Alejandro_TFM.pdf
Tamaño:: 1.67 MB
Formato:: Adobe Portable Document Format

Descargar

Colecciones

Trabajos de fin de máster (TFM)

Publicación: Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?

Archivos

Bloque original

Colecciones

Publicación:
Is Anisotropy Really the Cause of BERT Embeddings not being Semantic?