Persona:
Ros Muñoz, Salvador

Cargando...
Foto de perfil
Dirección de correo electrónico
ORCID
0000-0001-6330-4958
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Ros Muñoz
Nombre de pila
Salvador
Nombre

Resultados de la búsqueda

Mostrando 1 - 8 de 8
  • Publicación
    TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus
    (Springer Nature, 2021-02-27) Álvarez Mellado, Elena; Díez-Platas, María Luisa; Ruiz-Fabo, Pablo; Bermúdez, Helena; Ros Muñoz, Salvador; González-Blanco, Elena
    Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.
  • Publicación
    Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
    (WILEY, 2021) Díez Platas, Mª Luisa; Ros Muñoz, Salvador; González-Blanco, Elena; Ruiz Fabo, Pablo; Álvarez Mellado, Elena
    The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper-noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity-type-specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.
  • Publicación
    DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels
    (Springer, 2021-10-13) Barbado, Alberto; Fresno Fernández, Víctor Diego; Manjarrés Riesco, Ángeles; Ros Muñoz, Salvador
    Nowadays, there are many applications of text mining over corpora from different languages. However, most of them are based on texts in prose, lacking applications that work with poetry texts. An example of an application of text mining in poetry is the usage of features derived from their individual words in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, even though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, dentifying with tags when a sonnet is related to a specific term. Then, we study how GAM changes according to each of those psychological terms. The corpus used contains 274 Spanish sonnets from authors of different centuries, from fifteenth to nineteenth. This corpus was annotated by different domain experts. The experts annotated the poems with affective and lexico-semantic features, as well as with domain concepts that belong to psychology. Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, per- sonality text mining studies of the authors, or the usage of poetry for therapeutic purposes.
  • Publicación
    When linguistics meets web technologies. Recent advances in modelling linguistic linked data
    (IOS Press, 2022-06-15) Fahad Khan, Anas; Chiarcos, Christian; Declerck, Thierry; Gifu, Daniela; Blanco García, Elena; Gracia, Jorge; Ionov, Maxim; Labropoulou, Penny; Mambrini, Francesco; McCrae, John P.; Pagé-Perron, Émilie; Passarotti, Marco; Ros Muñoz, Salvador; Truică, Ciprian-Octavian
    This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
  • Publicación
    Digital humanities in Spain: historical perspective and current scenario
    (Ediciones Profesionales de la Información, 2020-12-19) Toscano, Murizio; Rabadán, Aroa; Ros Muñoz, Salvador; González-Blanco, Elena
    The objective of this study was to provide the global community of interested scholars with an updated understanding of Digital Humanities in Spain, in terms of researchers and research centres, disciplines in- volved and research topics of interest, trends in digital resources development, main funding bodies and the evolution of their investment since the early nineties. One of the characteristics that differentiates this study from previous approaches is the information used to carry out the research. It combines large datasets of publicly available data from trusted sources with a handpicked selection of records grouping information scattered over the Web. Most of the evidence detected by other studies has been numerically confirmed. At the same time, the new metrics and values established constitute a reference base for monitoring the future evolution of the discipline and thus favour comparisons. Half of the researchers were found to be affiliated to only nine institutions, whereas the other half of them were scattered across 84 locations. Department affiliation showed a varied pattern of the different degrees of specialization in each institution. Although the major historic role played by Philology was confirmed, the rising interest of other areas of the Humanities and Social Science produces a wider picture, which helped to identify five large clusters of research topics, centred on major disciplines. The quantitative analysis of funding, a dimension almost unexplored in the Humanities, proved to be a valuable way to assess the discipline and its historical evolution. In fact, it revealed interesting trends that led to our proposal of a three-phase periodization in the consolidation of Digital Humanities in Spain. The paper concludes with a set of recommendations regarding how to successfully deal with issues that can harm the future development of this research area and the role that Spanish researchers can play in the international context.
  • Publicación
    Automated metric analysis of Spanish Poetry: two complementary approaches
    (IEEE Xplore, 2021-03-30) Marco Remón, Guillermo; De la Rosa, Javier; Gonzalo Arroyo, Julio Antonio; Ros Muñoz, Salvador; González-Blanco, Elena
    The automatic metric analysis (commonly referred to as scansion) of Spanish poetry is not a trivial problem since it combines the nuances of the language, the different poetic traditions related to melodic patterns, and the personal stylistic preferences and intentions of the author. In this paper, we explore two alternative algorithmic approaches tailored to different applications scenarios. The first approach, Rantanplan, is a rule-based method that consists of four Natural Language Processing modules that work together to perform scansion and other related analysis: Part of Speech tagging, syllabification, stress assignment, and metrical adjustment. The second approach, Jumper, explores the possibility of performing scansion without syllabification, with a twofold purpose: to minimize the errors propagated in different parts of the linguistic processing pipeline (including the syllabification step), and to improve the efficiency of the process. Both systems outperform the state of the art and provide either a more informative solution (suitable, for instance, for teaching purposes) or a more efficient processing (when a correct scansion is all the linguistic knowledge required, as in scholar philological studies). The combined use of both systems turns out to provide a practical tool to clean-up manual annotation errors in corpora.
  • Publicación
    Querying the Depths: Unveiling the Strengths and Struggles of Large Language Models in SPARQL Generation
    (Sociedad Española para el procesamiento del lenguaje natural, 2024) Ghajari Espinosa, Adrián; Ros Muñoz, Salvador; Pérez Pozo, Álvaro; Fresno Fernández, Víctor Diego
    In the quest to democratize access to databases and knowledge graphs, the ability to express queries in natural language and obtain the requested information becomes paramount, particularly for individuals lacking formal training in query languages. This situation affects SPARQL, the standard for querying ontology-based knowledge graphs, posing a significant barrier to many, hindering their ability to leverage these rich resources for research and analysis. To address this gap, our research delves into harnessing the power of Large Language Models (LLMs) to facilitate the generation of SPARQL queries directly from natural language descriptions. For this purpose, we have explored the most popular prompt engineering techniques, a powerful tool in crafting queries that help generative AI models understand and produce specific or generalized outputs based on the quality of provided prompts, without the need of aditional training. By integrating few-shot learning (FSL), Chain-of-Thought (CoT) reasoning, and Retrieval-Augmented Generation (RAG), we devise prompts that streamline the creation of effective SPARQL queries, facilitating more straightforward access to ontology knowledge graphs. Our analysis involved prompts evaluated across three distinct LLMs: DeepSeek-Code 6.7b, CodeLlama-13b and GPT 3.5 TURBO. The comparative results revealed marginal variations in accuracy among these models, with FSL emerging as the most effective technique. Our results highlight the potential of LLMs to make knowledge graphs more accessible to a broader audience, but also that much more research is needed to get results comparable to human performance.
  • Publicación
    Exploring Spanish contemporary song lyrics through digital humanities methods: some thematic and structural properties
    (Oxford Academic, 2021-11-08) Hernández Lorenzo, Laura; Díaz Paredes, Aitor; Pérez Pozo, Álvaro; Ros Muñoz, Salvador; González-Blanco, Elena
    In this article, we present a quantitative study with Digital Humanities methods on an extensive corpus of Spanish contemporary song lyrics, a type of text related to poetry. On the one hand, poetry and songs not only have been connected since their origins, but they share some characteristics, such as the division in lines or the use of rhymes. On the other hand, Digital Humanities quantitative approaches have already been applied to poetry, but we still lack a study in the same fashion for lyrics. Taking advantage of the advances in automatic scansion and syllabification, rhyme detection, or Topic Modeling technologies, the present study analyzes Spanish contemporary song lyrics’ main thematic and structural properties, comparing them with those used in poetic texts. Our results offered new insights into the characteristics of the analyzed texts and their connections to poetic ones.