Persona:
Araujo Serna, M. Lourdes

Cargando...
Foto de perfil
Dirección de correo electrónico
ORCID
0000-0002-7657-4794
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Araujo Serna
Nombre de pila
M. Lourdes
Nombre

Resultados de la búsqueda

Mostrando 1 - 4 de 4
  • Publicación
    Disentangling categorical relationships through a graph of co-occurrences
    (American Physical Society, 2011-10-19) Borge Holthoefer, Javier; Arenas, Alex; Capitán, José A.; Cuesta, José A.; Martínez Romo, Juan; Araujo Serna, M. Lourdes
    The mesoscopic structure of complex networks has proven a powerful level of description to understand the linchpins of the system represented by the network. Nevertheless, themapping of a series of relationships between elements, in terms of a graph, is sometimes not straightforward. Given that all the information we would extract using complex network tools depend on this initial graph, it is mandatory to preprocess the data to build it on in the most accurate manner. Here we propose a procedure to build a network, attending only to statistically significant relations between constituents. We use a paradigmatic example of word associations to show the development of our approach. Analyzing the modular structure of the obtained network we are able to disentangle categorical relations, disambiguating words with success that is comparable to the best algorithms designed to the same end.
  • Publicación
    Analyzing information retrieval methods to recover broken web links
    (2011-06-19) Martínez Romo, Juan; Araujo Serna, M. Lourdes
    In this work we compare different techniques to automatically find candidate web pages to substitute broken links. We extract information from the anchor text, the content of the page containing the link, and the cache page in some digital library.The selected information is processed and submitted to a search engine. We have compared different information retrievalmethods for both, the selection of terms used to construct the queries submitted to the search engine, and the ranking of the candidate pages that it provides, in order to help the user to find the best replacement. In particular, we have used term frequencies, and a language model approach for the selection of terms; and cooccurrence measures and a language model approach for ranking the final results. To test the different methods, we have also defined a methodology which does not require the user judgments, what increases the objectivity of the results.
  • Publicación
    Web spam detection : new classification features based on qualified link analysis and language models
    (Institute of Electrical and Electronics Engineers (IEEE), 2010-09-01) Araujo Serna, M. Lourdes::virtual::5632::600; Martínez Romo, Juan::virtual::5633::600; Araujo Serna, M. Lourdes; Martínez Romo, Juan; Araujo Serna, M. Lourdes; Martínez Romo, Juan; Araujo Serna, M. Lourdes; Martínez Romo, Juan
    Web spam is a serious problem for search engines because the quality of their results can be severely degraded by the presence of this kind of page. In this paper, we present an efficient spam detection system based on a classifier that combines new link-based features with language-model (LM)-based ones. These features are not only related to quantitative data extracted from the Web pages, but also to qualitative properties, mainly of the page links.We consider, for instance, the ability of a search engine to find, using information provided by the page for a given link, the page that the link actually points at. This can be regarded as indicative of the link reliability. We also check the coherence between a page and another one pointed at by any of its links. Two pages linked by a hyperlink should be semantically related, by at least a weak contextual relation. Thus, we apply an LM approach to different sources of information from aWeb page that belongs to the context of a link, in order to provide high-quality indicators of Web spam. We have specifically applied the Kullback–Leibler divergence on different combinations of these sources of information in order to characterize the relationship between two linked pages. The result is a system that significantly improves the detection of Web spam using fewer features, on two large and public datasets such as WEBSPAM-UK2006 and WEBSPAM-UK2007.
  • Publicación
    Detecting malicious tweets in trending topics using a statistical analysis of language
    (Elsevier, 2013-06-01) Martínez Romo, Juan; Araujo Serna, M. Lourdes
    Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody’s lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34 K trending topics and 20 million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-ofthe- art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.