Hernández Lorenzo, Laura

Hernández Lorenzo
    From stage to page: Stylistic variation in fictional speech
    (De Gruyter, 2024) Šeļa, Artjoms; Nagy, Ben; Byszuk, Joanna; Hernández Lorenzo, Laura; Szemes, Botond; Eder, Maciej
    Stylometryismostlyappliedtoauthorialstyle.Morerecently,researchershave begun investigating the style ofcharacters, finding that although there isdetectable stylistic variation, the variation remains within authorial bounds. Inthis article, we address the stylistic distinctiveness of characters in drama. Ourprimary contribution is methodological; we introduce and evaluate two non-parametric methods to produce a summary statistic for character distinctivenessthat can be usefully applied and compared across languages and times. This is asignificant advance – previous approaches have either been based on pairwisesimilarities (which cannot be easily compared) or indirect methods that attemptto infer distinctiveness using classification accuracy. Our first method is based onbootstrap distances between 3-gram probability distributions, the second (rem-iniscent of ‘unmasking’ techniques) on word keyness curves. Both methods arevalidated and explored by applying them to a reasonably large corpus (a subsetof DraCor): we analyze 3301 characters drawn from 2324 works, covering fivecenturies and four languages (French, German, Russian, and the works of Shake-speare). Both methods appear useful; the 3-gram method is statistically morepowerful, but the word keyness method offers rich interpretability. Both methodsare able to capture phonological differences such as accent or dialect, as well asbroad differences in topic and lexical richness. Based on exploratory analysis,we find that smaller characters tend to be more distinctive and that women arecross-linguistically more distinctive than men, with this latter finding carefullyinterrogated using multiple regression. This greater distinctiveness stems froma historical tendency for female characters to be restricted to an ‘internal nar-rative domain’ covering mainly direct discourse and family/romantic themes. Itis hoped that direct, comparable statistical measures will form a basis for moresophisticated future studies, and advances in theory