Producción científica
URI permanente para esta comunidad
Examinar
Examinando Producción científica por Autor "Abad, Francisco"
Mostrando 1 - 4 de 4
Resultados por página
Opciones de ordenación
Publicación A two-dimensional multiple-choice model accounting for omissions(Frontiers Media, 2018-12-11) Abad, Francisco; Ponsoda, Vicente; Schames Kreitchmann, RodrigoThis paper presents a new two-dimensional Multiple-Choice Model accounting for Omissions (MCMO). Based on Thissen and Steinberg multiple-choice models, the MCMO defines omitted responses as the result of the respondent not knowing the correct answer and deciding to omit rather than to guess given a latent propensity to omit. Firstly, using a Monte Carlo simulation, the accuracy of the parameters estimated from data with different sample sizes (500, 1,000, and 2,000 subjects), test lengths (20, 40, and 80 items) and percentages of omissions (5, 10, and 15%) were investigated. Later, the appropriateness of the MCMO to the Trends in International Mathematics and Science Study (TIMSS) Advanced 2015 mathematics and physics multiple-choice items was analyzed and compared with the Holman and Glas' Between-item Multi-dimensional IRT model (B-MIRT) and with the three-parameter logistic (3PL) model with omissions treated as incorrect responses. The results of the simulation study showed a good recovery of scale and position parameters. Pseudo-guessing parameters (d) were less accurate, but this inaccuracy did not seem to have an important effect on the estimation of abilities. The precision of the propensity to omit strongly depended on the ability values (the higher the ability, the worse the estimate of the propensity to omit). In the empirical study, the empirical reliability for ability estimates was high in both physics and mathematics. As in the simulation study, the estimates of the propensity to omit were less reliable and their precision varied with ability. Regarding the absolute item fit, the MCMO fitted the data better than the other models. Also, the MCMO offered significant increments in convergent validity between scores from multiple-choice and constructed-response items, with an increase of around 0.02 to 0.04 in R2 in comparison with the two other methods. Finally, the high correlation between the country means of the propensity to omit in mathematics and physics suggests that (1) the propensity to omit is somehow affected by the country of residence of the examinees, and (2) the propensity to omit is independent of the test contentsPublicación Controlling for response biases in self-report scales: Forced-choice vs. psychometric modeling of Likert items(Frontiers Media, 2019-10-15) Abad, Francisco; Ponsoda, Vicente; Nieto, María Dolores; Morillo, Daniel; Schames Kreitchmann, RodrigoOne important problem in the measurement of non-cognitive characteristics such as personality traits and attitudes is that it has traditionally been made through Likert scales, which are susceptible to response biases such as social desirability (SDR) and acquiescent (ACQ) responding. Given the variability of these response styles in the population, ignoring their possible effects on the scores may compromise the fairness and the validity of the assessments. Also, response-style-induced errors of measurement can affect the reliability estimates and overestimate convergent validity by correlating higher with other Likert-scale-based measures. Conversely, it can attenuate the predictive power over non-Likert-based indicators, given that the scores contain more errors. This study compares the validity of the Big Five personality scores obtained: (1) ignoring the SDR and ACQ in graded-scale items (GSQ), (2) accounting for SDR and ACQ with a compensatory IRT model, and (3) using forced-choice blocks with a multi-unidimensional pairwise preference model (MUPP) variant for dominance items. The overall results suggest that ignoring SDR and ACQ offered the worst validity evidence, with a higher correlation between personality and SDR scores. The two remaining strategies have their own advantages and disadvantages. The results from the empirical reliability and the convergent validity analysis indicate that when modeling social desirability with graded-scale items, the SDR factor apparently captures part of the variance of the Agreeableness factor. On the other hand, the correlation between the corrected GSQ-based Openness to Experience scores, and the University Access Examination grades was higher than the one with the uncorrected GSQ-based scores, and considerably higher than that using the estimates from the forced-choice data. Conversely, the criterion-related validity of the Forced Choice Questionnaire (FCQ) scores was similar to the results found in meta-analytic studies, correlating higher with Conscientiousness. Nonetheless, the FCQ-scores had considerably lower reliabilities and would demand administering more blocks. Finally, the results are discussed, and some notes are provided for the treatment of SDR and ACQ in future studies.Publicación Improving reliability estimation in cognitive diagnosis modeling(Springer, 2023-10-01) Torre, Jimmy de la; Sorrel, Miguel A.; Nájera, Pablo; Abad, Francisco; Schames Kreitchmann, RodrigoCognitive diagnosis models (CDMs) are used in educational, clinical, or personnel selection settings to classify respondents with respect to discrete attributes, identifying strengths and needs, and thus allowing to provide tailored training/treatment. As in any assessment, an accurate reliability estimation is crucial for valid score interpretations. In this sense, most CDM reliability indices are based on the posterior probabilities of the estimated attribute profiles. These posteriors are traditionally computed using point estimates for the model parameters as approximations to their populational values. If the uncertainty around these parameters is unaccounted for, the posteriors may be overly peaked, deriving into overestimated reliabilities. This article presents a multiple imputation (MI) procedure to integrate out the model parameters in the estimation of the posterior distributions, thus correcting the reliability estimation. A simulation study was conducted to compare the MI procedure with the traditional reliability estimation. Five factors were manipulated: the attribute structure, the CDM model (DINA and G-DINA), test length, sample size, and item quality. Additionally, an illustration using the Examination for the Certificate of Proficiency in English data was analyzed. The effect of sample size was studied by sampling subsets of subjects from the complete data. In both studies, the traditional reliability estimation systematically provided overestimated reliabilities, whereas the MI procedure offered more accurate results. Accordingly, practitioners in small educational or clinical settings should be aware that the reliability estimation using model parameter point estimates may be positively biased. R codes for the MI procedure are made availablePublicación On bank assembly and block selection in multidimensional forced-choice adaptive assessments(SAGE, 2023-04-01) Sorrel, Miguel A.; Abad, Francisco; Schames Kreitchmann, Rodrigo::virtual::4386::600; Schames Kreitchmann, Rodrigo; Schames Kreitchmann, Rodrigo; Schames Kreitchmann, RodrigoMultidimensional forced-choice (FC) questionnaires have been consistently found to reduce the effects of socially desirable responding and faking in non-cognitive assessments. Although FC has been considered problematic for providing ipsative scores under the classical test theory, IRT models enable the estimation of non-ipsative scores from FC responses. However, while some authors indicate that blocks composed of opposite-keyed items are necessary to retrieve normative scores, others suggest that these blocks may be less robust to faking, thus impairing the assessment validity. Accordingly, this article presents a simulation study to investigate whether it is possible to retrieve normative scores using only positively keyed items in pairwise FC computerized adaptive testing (CAT). Specifically, a simulation study addressed the effect of 1) different bank assembly (with a randomly assembled bank, an optimally assembled bank, and blocks assembled on-the-fly considering every possible pair of items), and 2) block selection rules (i.e., T, and Bayesian D and A-rules) over the estimate accuracy and ipsativity and overlap rates. Moreover, different questionnaire lengths (30 and 60) and trait structures (independent or positively correlated) were studied, and a non-adaptive questionnaire was included as baseline in each condition. In general, very good trait estimates were retrieved, despite using only positively keyed items. Although the best trait accuracy and lowest ipsativity were found using the Bayesian A-rule with questionnaires assembled on-the-fly, the T-rule under this method led to the worst results. This points out to the importance of considering both aspects when designing FC CAT.