Persona: Ros Muñoz, Salvador
Cargando...
Dirección de correo electrónico
ORCID
0000-0001-6330-4958
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Ros Muñoz
Nombre de pila
Salvador
Nombre
8 resultados
Resultados de la búsqueda
Mostrando 1 - 8 de 8
Publicación Test-driving information theory-based compositional distributional semantics: A case study on Spanish song lyrics(ELSEVIER, 2025-06-15) Ghajari Espinosa, Adrián; Benito Santos, Alejandro; Ros Muñoz, Salvador; Fresno Fernández, Víctor Diego; González Blanco, ElenaSong lyrics pose unique challenges for semantic similarity assessment due to their metaphorical language, structural patterns, and cultural nuances - characteristics that often challenge standard natural language processing (NLP) approaches. These challenges stem from a tension between compositional and distributional semantics: while lyrics follow compositional structures, their meaning depends heavily on context and interpretation. The Information Theory-based Compositional Distributional Semantics framework offers a principled approach by integrating information theory with compositional rules and distributional representations. We evaluate eight embedding models on Spanish song lyrics, including multilingual, monolingual contextual, and static embeddings. Results show that multilingual models consistently outperform monolingual alternatives, with the domain-adapted ALBERTI achieving the highest F1 macro scores (78.92 ± 10.86). Our analysis reveals that monolingual models generate highly anisotropic embedding spaces, significantly impacting performance with traditional metrics. The Information Contrast Model metric proves particularly effective, providing improvements up to 18.04 percentage points over cosine similarity. Additionally, composition functions maintaining longer accumulated vector norms consistently outperform standard averaging approaches. Our findings have important implications for NLP applications and challenge standard practices in similarity calculation, showing that effectiveness varies with both task nature and model characteristics.Publicación TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus(Springer Nature, 2021-02-27) Álvarez Mellado, Elena; Díez-Platas, María Luisa; Ruiz-Fabo, Pablo; Bermúdez, Helena; Ros Muñoz, Salvador; González-Blanco, Elena; Springer NatureMedieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.Publicación Digital humanities in Spain: Historical perspective and current scenario(Ediciones Profesionales de la Información, 2020-12-19) Toscano, Murizio; Rabadán, Aroa; Ros Muñoz, Salvador; González-Blanco, Elena; Ediciones Profesionales de la InformaciónThe objective of this study was to provide the global community of interested scholars with an updated understanding of Digital Humanities in Spain, in terms of researchers and research centres, disciplines in- volved and research topics of interest, trends in digital resources development, main funding bodies and the evolution of their investment since the early nineties. One of the characteristics that differentiates this study from previous approaches is the information used to carry out the research. It combines large datasets of publicly available data from trusted sources with a handpicked selection of records grouping information scattered over the Web. Most of the evidence detected by other studies has been numerically confirmed. At the same time, the new metrics and values established constitute a reference base for monitoring the future evolution of the discipline and thus favour comparisons. Half of the researchers were found to be affiliated to only nine institutions, whereas the other half of them were scattered across 84 locations. Department affiliation showed a varied pattern of the different degrees of specialization in each institution. Although the major historic role played by Philology was confirmed, the rising interest of other areas of the Humanities and Social Science produces a wider picture, which helped to identify five large clusters of research topics, centred on major disciplines. The quantitative analysis of funding, a dimension almost unexplored in the Humanities, proved to be a valuable way to assess the discipline and its historical evolution. In fact, it revealed interesting trends that led to our proposal of a three-phase periodization in the consolidation of Digital Humanities in Spain. The paper concludes with a set of recommendations regarding how to successfully deal with issues that can harm the future development of this research area and the role that Spanish researchers can play in the international context.Publicación When linguistics meets web technologies. Recent advances in modelling linguistic linked data(Sage Journals, 2022-06-15) Fahad Khan, Anas; Chiarcos, Christian; Declerck, Thierry; Gifu, Daniela; Blanco García, Elena; Gracia, Jorge; Ionov, Maxim; Labropoulou, Penny; Mambrini, Francesco; McCrae, John P.; Pagé-Perron, Émilie; Passarotti, Marco; Ros Muñoz, Salvador; Truică, Ciprian-Octavian; IOS PressThis article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.Publicación Querying the Depths: Unveiling the Strengths and Struggles of Large Language Models in SPARQL Generation(Sociedad Española para el procesamiento del lenguaje natural, 2024-05-15) Ghajari Espinosa, Adrián; Ros Muñoz, Salvador; Pérez Pozo, Álvaro; Fresno Fernández, Víctor Diego; SEPLN, Sociedad Española para el Procesamiento del lenguaje naturalIn the quest to democratize access to databases and knowledge graphs, the ability to express queries in natural language and obtain the requested information becomes paramount, particularly for individuals lacking formal training in query languages. This situation affects SPARQL, the standard for querying ontology-based knowledge graphs, posing a significant barrier to many, hindering their ability to leverage these rich resources for research and analysis. To address this gap, our research delves into harnessing the power of Large Language Models (LLMs) to facilitate the generation of SPARQL queries directly from natural language descriptions. For this purpose, we have explored the most popular prompt engineering techniques, a powerful tool in crafting queries that help generative AI models understand and produce specific or generalized outputs based on the quality of provided prompts, without the need of aditional training. By integrating few-shot learning (FSL), Chain-of-Thought (CoT) reasoning, and Retrieval-Augmented Generation (RAG), we devise prompts that streamline the creation of effective SPARQL queries, facilitating more straightforward access to ontology knowledge graphs. Our analysis involved prompts evaluated across three distinct LLMs: DeepSeek-Code 6.7b, CodeLlama-13b and GPT 3.5 TURBO. The comparative results revealed marginal variations in accuracy among these models, with FSL emerging as the most effective technique. Our results highlight the potential of LLMs to make knowledge graphs more accessible to a broader audience, but also that much more research is needed to get results comparable to human performance.Publicación Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information(asis&t, 2020-08-19) Díez Platas, Mª Luisa; Ros Muñoz, Salvador; González-Blanco, Elena; Ruiz Fabo, Pablo; Álvarez Mellado, Elena; Asis&tThe recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper-noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity-type-specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.Publicación Automated Metric Analysis of Spanish Poetry: Two Complementary Approaches(IEEE Access, 2021-03-30) Marco Remón, Guillermo; De la Rosa, Javier; Gonzalo Arroyo, Julio Antonio; Ros Muñoz, Salvador; González-Blanco, Elena; IEEE AccessThe automatic metric analysis (commonly referred to as scansion) of Spanish poetry is not a trivial problem since it combines the nuances of the language, the different poetic traditions related to melodic patterns, and the personal stylistic preferences and intentions of the author. In this paper, we explore two alternative algorithmic approaches tailored to different applications scenarios. The first approach, Rantanplan, is a rule-based method that consists of four Natural Language Processing modules that work together to perform scansion and other related analysis: Part of Speech tagging, syllabification, stress assignment, and metrical adjustment. The second approach, Jumper, explores the possibility of performing scansion without syllabification, with a twofold purpose: to minimize the errors propagated in different parts of the linguistic processing pipeline (including the syllabification step), and to improve the efficiency of the process. Both systems outperform the state of the art and provide either a more informative solution (suitable, for instance, for teaching purposes) or a more efficient processing (when a correct scansion is all the linguistic knowledge required, as in scholar philological studies). The combined use of both systems turns out to provide a practical tool to clean-up manual annotation errors in corpora.Publicación DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels(Springer, 2021-10-13) Barbado, Alberto; Fresno Fernández, Víctor Diego; Manjarrés Riesco, Ángeles; Ros Muñoz, SalvadorNowadays, there are many applications of text mining over corpora from different languages. However, most of them are based on texts in prose, lacking applications that work with poetry texts. An example of an application of text mining in poetry is the usage of features derived from their individual words in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, even though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, dentifying with tags when a sonnet is related to a specific term. Then, we study how GAM changes according to each of those psychological terms. The corpus used contains 274 Spanish sonnets from authors of different centuries, from fifteenth to nineteenth. This corpus was annotated by different domain experts. The experts annotated the poems with affective and lexico-semantic features, as well as with domain concepts that belong to psychology. Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, per- sonality text mining studies of the authors, or the usage of poetry for therapeutic purposes.