Persona: Schames Kreitchmann, Rodrigo
Cargando...
Dirección de correo electrónico
ORCID
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Schames Kreitchmann
Nombre de pila
Rodrigo
Nombre
8 resultados
Resultados de la búsqueda
Mostrando 1 - 8 de 8
Publicación Exploring Approaches for Estimating Parameters in Cognitive Diagnosis Models with Small Sample Sizes(MDPI, 2023-04-27) Sorrel, Miguel A.; Escudero, Scarlett; Nájera, Pablo; Vázquez Lira, Ramsés; Schames Kreitchmann, RodrigoCognitive diagnostic models (CDMs) are increasingly being used in various assessment contexts to identify cognitive processes and provide tailored feedback. However, the most commonly used estimation method for CDMs, marginal maximum likelihood estimation with Expectation–Maximization (MMLE-EM), can present difficulties when sample sizes are small. This study compares the results of different estimation methods for CDMs under varying sample sizes using simulated and empirical data. The methods compared include MMLE-EM, Bayes modal, Markov chain Monte Carlo, a non-parametric method, and a parsimonious parametric model such as Restricted DINA. We varied the sample size, and assessed the bias in the estimation of item parameters, the precision in attribute classification, the bias in the reliability estimate, and computational cost. The findings suggest that alternative estimation methods are preferred over MMLE-EM under low sample-size conditions, whereas comparable results are obtained under large sample-size conditions. Practitioners should consider using alternative estimation methods when working with small samples to obtain more accurate estimates of CDM parameters. This study aims to maximize the potential of CDMs by providing guidance on the estimation of the parameters.Publicación Enhancing Content Validity Assessment With Item Response Theory Modeling(Colegio Oficial de Psicólogos del Principado de Asturias, 2024) Nájera, Pablo; Sanz, Susana; Sorrel, Miguel Ángel; Schames Kreitchmann, RodrigoAntecedentes: Garantizar la validez de evaluaciones requiere un examen exhaustivo del contenido de una prueba. Es común emplear expertos en la materia (EM) para evaluar la relevancia, representatividad y adecuación de los ítems. Este artículo propone integrar la teoría de respuesta al ítem (TRI) en las evaluaciones hechas por EM. La TRI ofrece parámetros de discriminación y umbral de los EM, evidenciando su desempeño al diferenciar ítems relevantes/ irrelevantes, detectando desempeños subóptimos, mejorando también la estimación de la relevancia de los ítems. Método: Se comparó el uso de la TRI frente a índices tradicionales (índice de validez de contenido y V de Aiken) en ítems de responsabilidad. Se evaluó la precisión de los EM al discriminar si los ítems medían responsabilidad o no, y si sus evaluaciones permitían predecir los pesos factoriales de los ítems. Resultados: Las puntuaciones de TRI identificaron bien los ítems de responsabilidad (R2 = 0,57) y predijeron sus cargas factoriales (R2 = 0,45). Además, mostraron validez incremental, explicando entre 11% y 17% más de varianza que los índices tradicionales. Conclusiones: La TRI en las evaluaciones de los EM mejora la alineación de ítems y predice mejor los pesos factoriales, mejorando validez del contenido de los instrumentos.Publicación Improving reliability estimation in cognitive diagnosis modeling(Springer, 2023-10-01) Torre, Jimmy de la; Sorrel, Miguel A.; Nájera, Pablo; Abad, Francisco; Schames Kreitchmann, RodrigoCognitive diagnosis models (CDMs) are used in educational, clinical, or personnel selection settings to classify respondents with respect to discrete attributes, identifying strengths and needs, and thus allowing to provide tailored training/treatment. As in any assessment, an accurate reliability estimation is crucial for valid score interpretations. In this sense, most CDM reliability indices are based on the posterior probabilities of the estimated attribute profiles. These posteriors are traditionally computed using point estimates for the model parameters as approximations to their populational values. If the uncertainty around these parameters is unaccounted for, the posteriors may be overly peaked, deriving into overestimated reliabilities. This article presents a multiple imputation (MI) procedure to integrate out the model parameters in the estimation of the posterior distributions, thus correcting the reliability estimation. A simulation study was conducted to compare the MI procedure with the traditional reliability estimation. Five factors were manipulated: the attribute structure, the CDM model (DINA and G-DINA), test length, sample size, and item quality. Additionally, an illustration using the Examination for the Certificate of Proficiency in English data was analyzed. The effect of sample size was studied by sampling subsets of subjects from the complete data. In both studies, the traditional reliability estimation systematically provided overestimated reliabilities, whereas the MI procedure offered more accurate results. Accordingly, practitioners in small educational or clinical settings should be aware that the reliability estimation using model parameter point estimates may be positively biased. R codes for the MI procedure are made availablePublicación FoCo: una aplicación Shiny para la evaluación formativa usando modelos de diagnóstico cognitivo(Colegio Oficial de la Psicología de Madrid, 2023-05-03) Sanz, Susana; Nájera, Pablo; Moreno, José David; Sorrel, Miguel A.; Schames Kreitchmann, Rodrigo; Martínez Huertas, José ÁngelLa combinación de evaluaciones formativas y sumativas podría mejorar la evaluación. El modelado de diagnóstico cognitivo (MDC) se ha propuesto para diagnosticar fortalezas y debilidades de estudiantes en la evaluación formativa. Sin embargo, ningún software permite implementarlo fácilmente. Así, se ha desarrollado FoCo (https://foco.shinyapps.io/FoCo/), permitiendo realizar análisis MDC y teoría clásica de tests. Se analizaron respuestas de 86 estudiantes de grado a un examen de métodos de investigación, diagnosticándose sus fortalezas y necesidades en cuanto a su dominio de los contenidos de la asignatura y las tres primeras competencias de la taxonomía de Bloom y se analizó la validez de los resultados. El análisis ha sido informativo, ya que para estudiantes con puntuaciones similares ha sido posible detectar diferentes fortalezas y debilidades. Además, se encontró que estos atributos predicen criterios relevantes. Se espera que FoCo facilite el uso de MDC en contextos educativos.Publicación A two-dimensional multiple-choice model accounting for omissions(Frontiers Media, 2018-12-11) Abad, Francisco ; Ponsoda, Vicente; Schames Kreitchmann, RodrigoThis paper presents a new two-dimensional Multiple-Choice Model accounting for Omissions (MCMO). Based on Thissen and Steinberg multiple-choice models, the MCMO defines omitted responses as the result of the respondent not knowing the correct answer and deciding to omit rather than to guess given a latent propensity to omit. Firstly, using a Monte Carlo simulation, the accuracy of the parameters estimated from data with different sample sizes (500, 1,000, and 2,000 subjects), test lengths (20, 40, and 80 items) and percentages of omissions (5, 10, and 15%) were investigated. Later, the appropriateness of the MCMO to the Trends in International Mathematics and Science Study (TIMSS) Advanced 2015 mathematics and physics multiple-choice items was analyzed and compared with the Holman and Glas' Between-item Multi-dimensional IRT model (B-MIRT) and with the three-parameter logistic (3PL) model with omissions treated as incorrect responses. The results of the simulation study showed a good recovery of scale and position parameters. Pseudo-guessing parameters (d) were less accurate, but this inaccuracy did not seem to have an important effect on the estimation of abilities. The precision of the propensity to omit strongly depended on the ability values (the higher the ability, the worse the estimate of the propensity to omit). In the empirical study, the empirical reliability for ability estimates was high in both physics and mathematics. As in the simulation study, the estimates of the propensity to omit were less reliable and their precision varied with ability. Regarding the absolute item fit, the MCMO fitted the data better than the other models. Also, the MCMO offered significant increments in convergent validity between scores from multiple-choice and constructed-response items, with an increase of around 0.02 to 0.04 in R2 in comparison with the two other methods. Finally, the high correlation between the country means of the propensity to omit in mathematics and physics suggests that (1) the propensity to omit is somehow affected by the country of residence of the examinees, and (2) the propensity to omit is independent of the test contentsPublicación The Journey from Likert to Forced-Choice Questionnaires: Evidence of the Invariance of Item Parameters(Colegio Oficial de Psicólogos de Madrid, 2019-06-21) Morillo Cuadrado, Daniel Vicente; Abad, Francisco ; Schames Kreitchmann, Rodrigo; Leenen, Iwin; Hontangas, Pedro; Ponsoda, VicenteMultidimensional forced-choice questionnaires are widely regarded in the personnel selection literature for their ability to control response biases. Recently developed IRT models usually rely on the assumption that item parameters remain invariant when they are paired in forced-choice blocks, without giving it much consideration. This study aims to test this assumption empirically on the MUPP-2PL model, comparing the parameter estimates of the forced-choice format to their graded-scale equivalent on a Big Five personality instrument. The assumption was found to hold reasonably well, especially for the discrimination parameters. In the cases in which it was violated, we briefly discuss the likely factors that may lead to non-invariance. We conclude discussing the practical implications of the results and providing a few guidelines for the design of forced-choice questionnaires based on the invariance assumption.Publicación Controlling for response biases in self-report scales: Forced-choice vs. psychometric modeling of Likert items(Frontiers Media, 2019-10-15) Abad, Francisco; Ponsoda, Vicente; Nieto, María Dolores; Schames Kreitchmann, Rodrigo; Morillo Cuadrado, Daniel VicenteOne important problem in the measurement of non-cognitive characteristics such as personality traits and attitudes is that it has traditionally been made through Likert scales, which are susceptible to response biases such as social desirability (SDR) and acquiescent (ACQ) responding. Given the variability of these response styles in the population, ignoring their possible effects on the scores may compromise the fairness and the validity of the assessments. Also, response-style-induced errors of measurement can affect the reliability estimates and overestimate convergent validity by correlating higher with other Likert-scale-based measures. Conversely, it can attenuate the predictive power over non-Likert-based indicators, given that the scores contain more errors. This study compares the validity of the Big Five personality scores obtained: (1) ignoring the SDR and ACQ in graded-scale items (GSQ), (2) accounting for SDR and ACQ with a compensatory IRT model, and (3) using forced-choice blocks with a multi-unidimensional pairwise preference model (MUPP) variant for dominance items. The overall results suggest that ignoring SDR and ACQ offered the worst validity evidence, with a higher correlation between personality and SDR scores. The two remaining strategies have their own advantages and disadvantages. The results from the empirical reliability and the convergent validity analysis indicate that when modeling social desirability with graded-scale items, the SDR factor apparently captures part of the variance of the Agreeableness factor. On the other hand, the correlation between the corrected GSQ-based Openness to Experience scores, and the University Access Examination grades was higher than the one with the uncorrected GSQ-based scores, and considerably higher than that using the estimates from the forced-choice data. Conversely, the criterion-related validity of the Forced Choice Questionnaire (FCQ) scores was similar to the results found in meta-analytic studies, correlating higher with Conscientiousness. Nonetheless, the FCQ-scores had considerably lower reliabilities and would demand administering more blocks. Finally, the results are discussed, and some notes are provided for the treatment of SDR and ACQ in future studies.Publicación On bank assembly and block selection in multidimensional forced-choice adaptive assessments(SAGE, 2023-04-01) Sorrel, Miguel A.; Abad, Francisco; Schames Kreitchmann, RodrigoMultidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in non-cognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, IRT models enable the estimation of non-ipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of 1) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and 2) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a non-adaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.