Persona: Martín Arevalillo, Jorge
Cargando...
Dirección de correo electrónico
ORCID
0000-0003-1944-3699
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Martín Arevalillo
Nombre de pila
Jorge
Nombre
14 resultados
Resultados de la búsqueda
Mostrando 1 - 10 de 14
Publicación Bayesian networks established functional differences between breast cancer subtypes(PLOS, 2020-06-11) Trilla Fuertes, Lucía; Gámez Pozo, Ángelo; López Vacas, Rocío; López Camacho, Elena; Prado Vázquez, Guillermo; Zapater Moros, Andrea; Díaz Almirón, Mariana; Ferrer Gómez, María; Nanni, Paolo; Zamora Auñón, Pilar; Espinosa, Enrique; Maín, Paloma; Fresno Vara, Juan Ángel; Martín Arevalillo, Jorge; Navarro Veguillas, HilarioBreast cancer is a heterogeneous disease. In clinical practice, tumors are classified as hormonal receptor positive, Her2 positive and triple negative tumors. In previous works, our group defined a new hormonal receptor positive subgroup, the TN-like subtype, which had a prognosis and a molecular profile more similar to triple negative tumors. In this study, proteomics and Bayesian networks were used to characterize protein relationships in 96 breast tumor samples. Components obtained by these methods had a clear functional structure. The analysis of these components suggested differences in processes such as mitochondrial function or extracellular matrix between breast cancer subtypes, including our new defined subtype TN-like. In addition, one of the components, mainly related with extracellular matrix processes, had prognostic value in this cohort. Functional approaches allow to build hypotheses about regulatory mechanisms and to establish new relationships among proteins in the breast cancer context.Publicación Skewness-Based Projection Pursuit as an Eigenvector Problem in Scale Mixtures of Skew-Normal Distributions(MDPI, 2021-06-03) Martín Arevalillo, Jorge; Navarro Veguillas, HilarioThis paper addresses the projection pursuit problem assuming that the distribution of the input vector belongs to the flexible and wide family of multivariate scale mixtures of skew normal distributions. Under this assumption, skewness-based projection pursuit is set out as an eigenvector problem, described in terms of the third order cumulant matrix, as well as an eigenvector problem that involves the simultaneous diagonalization of the scatter matrices of the model. Both approaches lead to dominant eigenvectors proportional to the shape parametric vector, which accounts for the multivariate asymmetry of the model; they also shed light on the parametric interpretability of the invariant coordinate selection method and point out some alternatives for estimating the projection pursuit direction. The theoretical findings are further investigated through a simulation study whose results provide insights about the usefulness of skewness model-based projection pursuit in the statistical practice.Publicación Skewness-Kurtosis Model-Based Projection Pursuit with Application to Summarizing Gene Expression Data(MDPI, 2022-04-24) Martín Arevalillo, Jorge; Navarro Veguillas, HilarioNon-normality is a usual fact when dealing with gene expression data. Thus, flexible models are needed in order to account for the underlying asymmetry and heavy tails of multivariate gene expression measures. This paper addresses the issue by exploring the projection pursuit problem under a flexible framework where the underlying model is assumed to follow a multivariate skew-t distribution. Under this assumption, projection pursuit with skewness and kurtosis indices is addressed as a natural approach for data reduction. The work examines its properties giving some theoretical insights and delving into the computational side in regards to the application to real gene expression data. The results of the theory are illustrated by means of a simulation study; the outputs of the simulation are used in combination with the theoretical insights to shed light on the usefulness of skewness-kurtosis projection pursuit for summarizing multivariate gene expression data. The application to gene expression measures of patients diagnosed with triple-negative breast cancer gives promising findings that may contribute to explain the heterogeneity of this type of tumors.Publicación A novel approach to triplenegative breast cancer molecular classification reveals a luminal immune-positive subgroup with good prognoses(Springer nature, 2019-02-07) Prado Vázquez, Guillermo; Gámez Pozo, Ángelo; Trilla Fuertes, Lucía; Zapater Moros, Andrea; Ferrer Gómez, María; Díaz Almirón, Mariana; López Vacas, Rocío; Maín, Paloma; Feliu Batlle, Jaime; Zamora Auñón, Pilar; Espinosa, Enrique; Fresno Vara, Juan Ángel; Martín Arevalillo, Jorge; Navarro Veguillas, HilarioTriple-negative breast cancer is a heterogeneous disease characterized by a lack of hormonal receptors and HER2 overexpression. It is the only breast cancer subgroup that does not benefit from targeted therapies, and its prognosis is poor. Several studies have developed specific molecular classifications for triple-negative breast cancer. However, these molecular subtypes have had little impact in the clinical setting. Gene expression data and clinical information from 494 triple-negative breast tumors were obtained from public databases. First, a probabilistic graphical model approach to associate gene expression profiles was performed. Then, sparse k-means was used to establish a new molecular classification. Results were then verified in a second database including 153 triple-negative breast tumors treated with neoadjuvant chemotherapy. Clinical and gene expression data from 494 triple-negative breast tumors were analyzed. Tumors in the dataset were divided into four subgroups (luminal-androgen receptor expressing, basal, claudin-low and claudin-high), using the cancer stem cell hypothesis as reference. These four subgroups were defined and characterized through hierarchical clustering and probabilistic graphical models and compared with previously defined classifications. In addition, two subgroups related to immune activity were defined. This immune activity showed prognostic value in the whole cohort and in the luminal subgroup. The claudin-high subgroup showed poor response to neoadjuvant chemotherapy. Through a novel analytical approach we proved that there are at least two independent sources of biological information: cellular and immune. Thus, we developed two different and overlapping triple-negative breast cancer classifications and showed that the luminal immune-positive subgroup had better prognoses than the luminal immune-negative. Finally, this work paves the way for using the defined classifications as predictive features in the neoadjuvant scenario.Publicación A stochastic ordering based on the canonical transformation of skew-normal vectors(Springer, 2018-04-25) Martín Arevalillo, Jorge; Navarro Veguillas, HilarioIn this paper, we define a new skewness ordering that enables stochastic comparisons for vectors that follow a multivariate skew-normal distribution. The new ordering is based on the canonical transformation associated with the multivariate skew-normal distribution and on the well-known convex transform order applied to the only skewed component of such canonical transformation. We examine the connection between the proposed ordering and the multivariate convex transform order studied by Belzunce et al. (TEST 24(4):813–834, 2015). Several standard skewness measures like Mardia’s and Malkovich–Afifi’s indices are revisited and interpreted in connection with the new ordering; we also study its relationship with the J-divergence between skew-normal and normal random vectors and with the Negentropy. Some artificial data are used in simulation experiments to illustrate the theoretical discussion; a real data application is provided as well.Publicación A New Separation Index and Classification Techniques Based on Shannon Entropy(Springer, 2023-09-22) Navarro, Jorge; Buono, Francesco; Martín Arevalillo, Jorge; https://orcid.org/0000-0003-2822-915X; https://orcid.org/0000-0002-3569-4052The purpose is to use Shannon entropy measures to develop classification techniques and an index which estimates the separation of the groups in a finite mixture model. These measures can be applied to machine learning techniques such as discriminant analysis, cluster analysis, exploratory data analysis, etc. If we know the number of groups and we have training samples from each group (supervised learning) the index is used to measure the separation of the groups. Here some entropy measures are used to classify new individuals in one of these groups. If we are not sure about the number of groups (unsupervised learning), the index can be used to determine the optimal number of groups from an entropy (information/uncertainty) criterion. It can also be used to determine the best variables in order to separate the groups. In all the cases we assume that we have absolutely continuous random variables and we use the Shannon entropy based on the probability density function. Theoretical, parametric and non-parametric techniques are proposed to get approximations of these entropy measures in practice. An application to gene selection in a colon cancer discrimination study with a lot of variables is provided as well.Publicación Ensemble learning from model based trees with application to differential price sensitivity assessment(ELSEVIER, 2021-05) Martín Arevalillo, JorgeThe assessment of price sensitivity is a relevant issue with important implications in decision making for revenue management. The issue has attracted interest among companies evolving towards the data-driven culture through the exploitation of their data sources. Thus, the design of pricing strategies that rely on analytics to identify groups of customers that exhibit differential price sensitivity has a great potential for revenue managers. This work proposes a data-driven approach, using ensemble learning from model based trees, to assess differential price sensitivity in a similar way as random forests algorithm does to assess variable importance. A differential price sensitivity score is defined and a ranking is obtained as a result so that the top ranked variables can be selected as candidate inputs for segmentation and differential price sensitivity group finding. Then optimal price allocation is carried out on the derived groups in order to compute the expected revenues which are compared with the revenues given by un-optimized prices and by optimal price allocation derived from the logit estimation of the bid response function. The proposed approach is validated in synthetic experiments and by application to the real business case of an auto lending company; the resulting revenues show its benefit.Publicación Patterns of differential expression by association in omic data using a new measure based on ensemble learning(De Gruyter, 2023-11-23) Martín Arevalillo, Jorge; Martín Arevalillo, Raquel; https://orcid.org/0000-0003-0674-0053The ongoing development of high-throughput technologies is allowing the simultaneous monitoring of the expression levels for hundreds or thousands of biological inputs with the proliferation of what has been coined as omic data sources. One relevant issue when analyzing such data sources is concerned with the detection of differential expression across two experimental conditions, clinical status or two classes of a biological outcome. While a great deal of univariate data analysis approaches have been developed to address the issue, strategies for assessing interaction patterns of differential expression are scarce in the literature and have been limited to ad hoc solutions. This paper contributes to the problem by exploiting the facilities of an ensemble learning algorithm like random forests to propose a measure that assesses the differential expression explained by the interaction of the omic variables so subtle biological patterns may be uncovered as a result. The out of bag error rate, which is an estimate of the predictive accuracy of a random forests classifier, is used as a by-product to propose a new measure that assesses interaction patterns of differential expression. Its performance is studied in synthetic scenarios and it is also applied to real studies on SARS-CoV-2 and colon cancer data where it uncovers associations that remain undetected by other methods. Our proposal is aimed at providing a novel approach that may help the experts in biomedical and life sciences to unravel insightful interaction patterns that may decipher the molecular mechanisms underlying biological and clinical outcomes.Publicación On connections between skewed, weighted and distorted distributions: applications to model extreme value distributions(Springer, 2023-08-05) Navarro, Jorge; Martín Arevalillo, JorgeThe purpose of the paper is to explore the connections between skew symmetric, weighted and distorted univariate distributions as well as how they appear related to the distributions of the extreme values in a sample of identically distributed random variables under both the independence and dependence scenarios. Some extensions of the concept of skewed distributions are proposed in order to cover the most general cases of extremes. Their natural connections to the likelihood ratio ordering and the role played by the P–P plots for handling these models are also highlighted. The results can also be applied to order statistics and coherent systems although these cases do not always lead to skewed distributions. The theoretical findings are illustrated by applied examples to model extremes as well as by several applications concerned with the analysis of artificial and real data.Publicación On the empirical approximation to quantiles from Lugannani–Rice saddlepoint formula(ELSEVIER, 2024) Martín Arevalillo, JorgeLugannani–Rice saddlepoint formula approximates the tail probability and the cumulative distribution function of the sample mean of independent and equally distributed variables. This note revisits Lugannani–Rice formula with a proposal for inverting it to approximate the quantile of the distribution empirically. The asymptotic behavior of the empirical approximation is assessed theoretically and its numerical accuracy for finite samples is studied and compared with the normal approximation and the second order Cornish-Fisher expansion by means of a simulation study. The outcomes of the simulation experiment shed light on the limitations of the empirical inversions of saddlepoint formulae to approximate quantiles