Persona:
Ros Muñoz, Salvador

ORCID

0000-0001-6330-4958

Apellidos

Ros Muñoz

Nombre de pila

Salvador

Página completa del ítem

Resultados de la búsqueda

Mostrando 1 - 8 de 8

TEI-friendly annotation scheme for medieval named entities: a case on a Spanish medieval corpus
(Springer , 2021-02-27) Álvarez Mellado, Elena; Díez Platas, María Luisa; Ruiz Fabo, Pablo; Bermúdez, Helena; Ros Muñoz, Salvador; González Blanco, Elena
Medieval documents are a rich source of historical data. Performing named-entity recognition (NER) on this genre of texts can provide us with valuable historical evidence. However, traditional NER categories and schemes are usually designed with modern documents in mind (i.e. journalistic text) and the general-domain NER annotation schemes fail to capture the nature of medieval entities. In this paper we explore the challenges of performing named-entity annotation on a corpus of Spanish medieval documents: we discuss the mismatches that arise when applying traditional NER categories to a corpus of Spanish medieval documents and we propose a novel humanist-friendly TEI-compliant annotation scheme and guidelines intended to capture the particular nature of medieval entities.
Exploring Spanish contemporary song lyrics through digital humanities methods: some thematic and structural properties
(Oxford University Press, 2021-11-08) Hernández Lorenzo, Laura; Díaz Paredes, Aitor; Pérez Pozo, Álvaro; Ros Muñoz, Salvador; González Blanco, Elena
In this article, we present a quantitative study with Digital Humanities methods on an extensive corpus of Spanish contemporary song lyrics, a type of text related to poetry. On the one hand, poetry and songs not only have been connected since their origins, but they share some characteristics, such as the division in lines or the use of rhymes. On the other hand, Digital Humanities quantitative approaches have already been applied to poetry, but we still lack a study in the same fashion for lyrics. Taking advantage of the advances in automatic scansion and syllabification, rhyme detection, or Topic Modeling technologies, the present study analyzes Spanish contemporary song lyrics’ main thematic and structural properties, comparing them with those used in poetic texts. Our results offered new insights into the characteristics of the analyzed texts and their connections to poetic ones.
Digital humanities in Spain: historical perspective and current scenario
(Ediciones Profesionales de la Información (EPI), 2020-12-19) Toscano, Murizio; Rabadán, Aroa; Ros Muñoz, Salvador; González-Blanco, Elena
The objective of this study was to provide the global community of interested scholars with an updated understanding of Digital Humanities in Spain, in terms of researchers and research centres, disciplines in- volved and research topics of interest, trends in digital resources development, main funding bodies and the evolution of their investment since the early nineties. One of the characteristics that differentiates this study from previous approaches is the information used to carry out the research. It combines large datasets of publicly available data from trusted sources with a handpicked selection of records grouping information scattered over the Web. Most of the evidence detected by other studies has been numerically confirmed. At the same time, the new metrics and values established constitute a reference base for monitoring the future evolution of the discipline and thus favour comparisons. Half of the researchers were found to be affiliated to only nine institutions, whereas the other half of them were scattered across 84 locations. Department affiliation showed a varied pattern of the different degrees of specialization in each institution. Although the major historic role played by Philology was confirmed, the rising interest of other areas of the Humanities and Social Science produces a wider picture, which helped to identify five large clusters of research topics, centred on major disciplines. The quantitative analysis of funding, a dimension almost unexplored in the Humanities, proved to be a valuable way to assess the discipline and its historical evolution. In fact, it revealed interesting trends that led to our proposal of a three-phase periodization in the consolidation of Digital Humanities in Spain. The paper concludes with a set of recommendations regarding how to successfully deal with issues that can harm the future development of this research area and the role that Spanish researchers can play in the international context.
DISCO PAL: Diachronic Spanish sonnet corpus with psychological and affective labels
(Springer, 2021-10-13) Barbado, Alberto; Fresno Fernández, Víctor Diego; Manjarrés Riesco, Ángeles; Ros Muñoz, Salvador
Nowadays, there are many applications of text mining over corpora from different languages. However, most of them are based on texts in prose, lacking applications that work with poetry texts. An example of an application of text mining in poetry is the usage of features derived from their individual words in order to capture the lexical, sublexical and interlexical meaning, and infer the General Affective Meaning (GAM) of the text. However, even though this proposal has been proved as useful for poetry in some languages, there is a lack of studies for both Spanish poetry and for highly-structured poetic compositions such as sonnets. This article presents a study over an annotated corpus of Spanish sonnets, in order to analyse if it is possible to build features from their individual words for predicting their GAM. The purpose of this is to model sonnets at an affective level. The article also analyses the relationship between the GAM of the sonnets and the content itself. For this, we consider the content from a psychological perspective, dentifying with tags when a sonnet is related to a specific term. Then, we study how GAM changes according to each of those psychological terms. The corpus used contains 274 Spanish sonnets from authors of different centuries, from fifteenth to nineteenth. This corpus was annotated by different domain experts. The experts annotated the poems with affective and lexico-semantic features, as well as with domain concepts that belong to psychology. Thanks to this, the corpus of sonnets can be used in different applications, such as poetry recommender systems, per- sonality text mining studies of the authors, or the usage of poetry for therapeutic purposes.
When linguistics meets web technologies. Recent advances in modelling linguistic linked data
(IOS Press, 2022-06-15) Fahad Khan, Anas; Chiarcos, Christian; Declerck, Thierry; Gifu, Daniela; Blanco García, Elena; Gracia, Jorge; Ionov, Maxim; Labropoulou, Penny; Mambrini, Francesco; McCrae, John P.; Pagé-Perron, Émilie; Passarotti, Marco; Ros Muñoz, Salvador; Truică, Ciprian-Octavian
This article provides a comprehensive and up-to-date survey of models and vocabularies for creating linguistic linked data (LLD) focusing on the latest developments in the area and both building upon and complementing previous works covering similar territory. The article begins with an overview of some recent trends which have had a significant impact on linked data models and vocabularies. Next, we give a general overview of existing vocabularies and models for different categories of LLD resource. After which we look at some of the latest developments in community standards and initiatives including descriptions of recent work on the OntoLex-Lemon model, a survey of recent initiatives in linguistic annotation and LLD, and a discussion of the LLD metadata vocabularies META-SHARE and lime. In the next part of the paper, we focus on the influence of projects on LLD models and vocabularies, starting with a general survey of relevant projects, before dedicating individual sections to a number of recent projects and their impact on LLD vocabularies and models. Finally, in the conclusion, we look ahead at some future challenges for LLD models and vocabularies. The appendix to the paper consists of a brief introduction to the OntoLex-Lemon model.
Querying the depths: Unveiling the strengths and struggles of large language models in SPARQL generation
(Sociedad Española para el procesamiento del lenguaje natural, 2024) Ghajari Espinosa, Adrián; Ros Muñoz, Salvador; Pérez Pozo, Álvaro
The emergence of the Semantic Web has precipitated a proliferation of structured data manifested in the form of knowledge graphs, underscoring the imperative of natural language interfaces to enhance accessibility to these repositories of information. The capacity to articulate queries in natural language and subsequently retrieve data through SPARQL queries assumes paramount importance. In the present investigation, we have scrutinized the efficacy of in-context learning based on an agent-based architecture in facilitating the construction of SPARQL queries. Contrary to initial expectations, the augmentation of in-context learning prompts through agent-based mechanisms has been found to diminish the efficacy of Language Model-based Systems (LLMS), as it is perceived as extraneous "noise," thereby delineating the constraints inherent in this approach. The results highlight the need to delve deeper into the intricacies of model training and fine-tuning, focusing on the relational aspects of ontology schemas.
Medieval Spanish (12th–15th centuries) named entity recognition and attribute annotation system based on contextual information
(WILEY, 2021) Díez Platas, Mª Luisa; Ros Muñoz, Salvador; González-Blanco, Elena; Ruiz Fabo, Pablo; Álvarez Mellado, Elena
The recognition of named entities in Spanish medieval texts presents great complexity, involving specific challenges: First, the complex morphosyntactic characteristics in proper-noun use in medieval texts. Second, the lack of strict orthographic standards. Finally, diachronic and geographical variations in Spanish from the 12th to 15th century. In this period, named entities usually appear as complex text structure. For example, it was frequent to add nicknames and information about the persons role in society and geographic origin. To tackle this complexity, named entity recognition and classification system has been implemented. The system uses contextual cues based on semantics to detect entities and assign a type. Given the occurrence of entities with attached attributes, entity contexts are also parsed to determine entity-type-specific dependencies for these attributes. Moreover, it uses a variant generator to handle the diachronic evolution of Spanish medieval terms from a phonetic and morphosyntactic viewpoint. The tool iteratively enriches its proper lexica, dictionaries, and gazetteers. The system was evaluated on a corpus of over 3,000 manually annotated entities of different types and periods, obtaining F1 scores between 0.74 and 0.87. Attribute annotation was evaluated for a person and role name attributes with an overall F1 of 0.75.
Automated metric analysis of Spanish Poetry: two complementary approaches
(IEEE, 2021-03-30) Marco Remón, Guillermo; De la Rosa, Javier; Gonzalo Arroyo, Julio Antonio; Ros Muñoz, Salvador; González Blanco, Elena
The automatic metric analysis (commonly referred to as scansion) of Spanish poetry is not a trivial problem since it combines the nuances of the language, the different poetic traditions related to melodic patterns, and the personal stylistic preferences and intentions of the author. In this paper, we explore two alternative algorithmic approaches tailored to different applications scenarios. The first approach, Rantanplan, is a rule-based method that consists of four Natural Language Processing modules that work together to perform scansion and other related analysis: Part of Speech tagging, syllabification, stress assignment, and metrical adjustment. The second approach, Jumper, explores the possibility of performing scansion without syllabification, with a twofold purpose: to minimize the errors propagated in different parts of the linguistic processing pipeline (including the syllabification step), and to improve the efficiency of the process. Both systems outperform the state of the art and provide either a more informative solution (suitable, for instance, for teaching purposes) or a more efficient processing (when a correct scansion is all the linguistic knowledge required, as in scholar philological studies). The combined use of both systems turns out to provide a practical tool to clean-up manual annotation errors in corpora.

Persona:
Ros Muñoz, Salvador

Dirección de correo electrónico

ORCID

Fecha de nacimiento

Proyectos de investigación

Unidades organizativas

Puesto de trabajo

Apellidos

Nombre de pila

Nombre

Filtros

Autor

Tipo

Departamento

Centro

Fecha

Tiene archivos

Tipo de ítem

Nivel de acceso

Ajustes

Ordenar por

resultados por página

Resultados de la búsqueda

Persona: Ros Muñoz, Salvador

Dirección de correo electrónico

ORCID

Fecha de nacimiento

Proyectos de investigación

Unidades organizativas

Puesto de trabajo

Apellidos

Nombre de pila

Nombre

Filtros

Autor

Tipo

Departamento

Centro

Fecha

Tiene archivos

Tipo de ítem

Nivel de acceso

Ajustes

Ordenar por

resultados por página

Resultados de la búsqueda

Persona:
Ros Muñoz, Salvador