Persona:
Ros Muñoz, Salvador

Cargando...
Foto de perfil
Dirección de correo electrónico
ORCID
0000-0001-6330-4958
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Ros Muñoz
Nombre de pila
Salvador
Nombre

Resultados de la búsqueda

Mostrando 1 - 10 de 15
  • Publicación
    TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus
    (Springer , 2021-02-27) Álvarez Mellado, Elena; Díez Platas, María Luisa; Ruiz Fabo, Pablo; Bermúdez, Helena; Ros Muñoz, Salvador; González Blanco, Elena
    Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.
  • Publicación
    Exploring Spanish contemporary song lyrics through digital humanities methods: some thematic and structural properties
    (Oxford University Press, 2021-11-08) Hernández Lorenzo, Laura; Díaz Paredes, Aitor; Pérez Pozo, Álvaro; Ros Muñoz, Salvador; González Blanco, Elena
    In this article, we present a quantitative study with Digital Humanities methods on an extensive corpus of Spanish contemporary song lyrics, a type of text related to poetry. On the one hand, poetry and songs not only have been connected since their origins, but they share some characteristics, such as the division in lines or the use of rhymes. On the other hand, Digital Humanities quantitative approaches have already been applied to poetry, but we still lack a study in the same fashion for lyrics. Taking advantage of the advances in automatic scansion and syllabification, rhyme detection, or Topic Modeling technologies, the present study analyzes Spanish contemporary song lyrics’ main thematic and structural properties, comparing them with those used in poetic texts. Our results offered new insights into the characteristics of the analyzed texts and their connections to poetic ones.
  • Publicación
    EVI-LINHD, a virtual research environment for the Spanish speaking community
    (Oxford University Press, 2017-12) González-Blanco García, Elena; Rio Riande, Gimena del; Díez Platas, María Luisa; Olmo, Álvaro del; Urízar, Miguel; Martínez Cantón, Clara Isabel; Ros Muñoz, Salvador; Pastor Vargas, Rafael; Robles Gómez, Antonio; Caminero Herráez, Agustín Carlos
    Laboratorio de Innovación en Humanidades Digitales (UNED) has developed Entorno Virtual de Investigación del Laboratorio de Innovación en Humanidades Digitales (EVI-LINHD), the first virtual research environment devoted mainly to Spanish speakers interested in digital scholarly edition. EVI-LINHD combines different open-source software for developing a complete digital project: (1) a Webbased application markup tool—TEIscribe—combined with an eXistdb solution and a TEIPublisher platform, (2) Omeka for digital libraries, and (3) WordPress for simple Web pages. All these instances are linked to a local installation of the LINDAT/Common Language Resources and Technology Infrastructure (CLARIN) digital repository. LINDAT/CLARIN allows EVI-LINHD users to have their projects deposited and stored safely. Thanks to this solution, EVI-LINHD projects also improve their visibility. The specific metadata profile used in the repository is based on Dublin Core, and it is enriched with the Spanish translation of DARIAH’s Taxonomy of Digital Research Activities in the Humanities.
  • Publicación
    Digital humanities in Spain: historical perspective and current scenario
    (Ediciones Profesionales de la Información (EPI), 2020-12-19) Toscano, Murizio; Rabadán, Aroa; Ros Muñoz, Salvador; González-Blanco, Elena
    The objective of this study was to provide the global community of interested scholars with an updated understanding of Digital Humanities in Spain, in terms of researchers and research centres, disciplines in- volved and research topics of interest, trends in digital resources development, main funding bodies and the evolution of their investment since the early nineties. One of the characteristics that differentiates this study from previous approaches is the information used to carry out the research. It combines large datasets of publicly available data from trusted sources with a handpicked selection of records grouping information scattered over the Web. Most of the evidence detected by other studies has been numerically confirmed. At the same time, the new metrics and values established constitute a reference base for monitoring the future evolution of the discipline and thus favour comparisons. Half of the researchers were found to be affiliated to only nine institutions, whereas the other half of them were scattered across 84 locations. Department affiliation showed a varied pattern of the different degrees of specialization in each institution. Although the major historic role played by Philology was confirmed, the rising interest of other areas of the Humanities and Social Science produces a wider picture, which helped to identify five large clusters of research topics, centred on major disciplines. The quantitative analysis of funding, a dimension almost unexplored in the Humanities, proved to be a valuable way to assess the discipline and its historical evolution. In fact, it revealed interesting trends that led to our proposal of a three-phase periodization in the consolidation of Digital Humanities in Spain. The paper concludes with a set of recommendations regarding how to successfully deal with issues that can harm the future development of this research area and the role that Spanish researchers can play in the international context.
  • Publicación
    Test-driving information theory-based compositional distributional semantics: A case study on Spanish song lyrics
    (ELSEVIER, 2025-06-15) Ghajari Espinosa, Adrián; Benito Santos, Alejandro; Ros Muñoz, Salvador; Fresno Fernández, Víctor Diego; González Blanco, Elena
    Song lyrics pose unique challenges for semantic similarity assessment due to their metaphorical language, structural patterns, and cultural nuances - characteristics that often challenge standard natural language processing (NLP) approaches. These challenges stem from a tension between compositional and distributional semantics: while lyrics follow compositional structures, their meaning depends heavily on context and interpretation. The Information Theory-based Compositional Distributional Semantics framework offers a principled approach by integrating information theory with compositional rules and distributional representations. We evaluate eight embedding models on Spanish song lyrics, including multilingual, monolingual contextual, and static embeddings. Results show that multilingual models consistently outperform monolingual alternatives, with the domain-adapted ALBERTI achieving the highest F1 macro scores (78.92 ± 10.86). Our analysis reveals that monolingual models generate highly anisotropic embedding spaces, significantly impacting performance with traditional metrics. The Information Contrast Model metric proves particularly effective, providing improvements up to 18.04 percentage points over cosine similarity. Additionally, composition functions maintaining longer accumulated vector norms consistently outperform standard averaging approaches. Our findings have important implications for NLP applications and challenge standard practices in similarity calculation, showing that effectiveness varies with both task nature and model characteristics.
  • Publicación
    DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels
    (Springer, 2021-10-13) Barbado, Alberto; Fresno Fernández, Víctor Diego; Manjarrés Riesco, Ángeles; Ros Muñoz, Salvador
    Nowadays, there are many applications of text mining over corpora from different languages. However, most of them are based on texts in prose, lacking applications that work with poetry texts. An example of an application of text mining in poetry is the usage of features derived from their individual words in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, even though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, dentifying with tags when a sonnet is related to a specific term. Then, we study how GAM changes according to each of those psychological terms. The corpus used contains 274 Spanish sonnets from authors of different centuries, from fifteenth to nineteenth. This corpus was annotated by different domain experts. The experts annotated the poems with affective and lexico-semantic features, as well as with domain concepts that belong to psychology. Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, per- sonality text mining studies of the authors, or the usage of poetry for therapeutic purposes.
  • Publicación
    When linguistics meets web technologies. Recent advances in modelling linguistic linked data
    (IOS Press, 2022-06-15) Fahad Khan, Anas; Chiarcos, Christian; Declerck, Thierry; Gifu, Daniela; Blanco García, Elena; Gracia, Jorge; Ionov, Maxim; Labropoulou, Penny; Mambrini, Francesco; McCrae, John P.; Pagé-Perron, Émilie; Passarotti, Marco; Ros Muñoz, Salvador; Truică, Ciprian-Octavian
    This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
  • Publicación
    Querying the depths: Unveiling the strengths and struggles of large language models in SPARQL generation
    (Sociedad Española para el procesamiento del lenguaje natural, 2024) Ghajari Espinosa, Adrián; Ros Muñoz, Salvador; Pérez Pozo, Álvaro
    The emergence of the Semantic Web has precipitated a proliferation of structured data manifested in the form of knowledge graphs, underscoring the imperative of natural language interfaces to enhance accessibility to these repositories of information. The capacity to articulate queries in natural language and subsequently retrieve data through SPARQL queries assumes paramount importance. In the present investigation, we have scrutinized the efficacy of in-context learning based on an agent-based architecture in facilitating the construction of SPARQL queries. Contrary to initial expectations, the augmentation of in-context learning prompts through agent-based mechanisms has been found to diminish the efficacy of Language Model-based Systems (LLMS), as it is perceived as extraneous "noise," thereby delineating the constraints inherent in this approach. The results highlight the need to delve deeper into the intricacies of model training and fine-tuning, focusing on the relational aspects of ontology schemas.
  • Publicación
    Hispanic Medieval Tagger (HisMeTag): una aplicación web para el etiquetado de entidades en textos medievales
    Díez Platas, María Luisa; González-Blanco García, Elena; Rio Riande, Gimena del; Tobarra Abad, María de los Llanos; Ros Muñoz, Salvador; Robles Gómez, Antonio; Caminero Herráez, Agustín Carlos
    HisMeTag permite localizar entidades nombradas en textos escritos en español medieval, mediante un proceso automático de reconocimiento de entidades nombradas (NER) y técnicas de PLN para el procesamiento lingüístico y la generación de las distintas variantes que existieron en la época medieval. Localiza, etiqueta términos conocidos y propone nuevos términos para su validación.
  • Publicación
    The Automatic Quantitative Metrical Analysis of Spanish Poetry with Rantanplan: A Preliminary Approach
    (ICL CAS, 2021) Hernández Lorenzo, Laura; Sisto, Mirella De; Pérez Pozo, Álvaro; Rosa, Javier de la; Ros Muñoz, Salvador; González Blanco, Elena; Plecháč, P.; Kolár, R.; Bories,A.; Říha, J.
    In this paper, we present a quantitative approach to Spanish poetry and versification based on the application of our own automatic metrical tool, Rantanplan, to the complete poetic works of four early modern Spanish poets. All of the poetry of these four representative authors—Garcilaso de la Vega (1503–1536), Fernando de Herrera (1534–1597), Luis de Góngora (1561–1627), and Lope de Vega (1562–1635)—was automatically processed and stress positions were extracted. Thanks to the development of a new stanza identification feature of Rantanplan, we were able to detect metrical structures as well. By completing a quantitative analysis of the stress positions, line lengths, and stanzas used by each author, we aim to model their complete metrical profiles.