Publicación:
None of the above: comparing scenarios for answerability detection in question answering systems

dc.contributor.authorReyes Montesinos, Julio
dc.contributor.authorRodrigo Yuste, Álvaro
dc.contributor.authorPeñas Padilla, Anselmo
dc.date.accessioned2025-08-06T06:03:46Z
dc.date.available2025-08-06T06:03:46Z
dc.date.issued2025-07-04
dc.descriptionThe registered version of this article, first published in Applied Intelligence 55, 887 (2025), is available online from the publisher's website: https://doi.org/10.1007/s10489-025-06765-y. La versión registrada de este artículo, publicado por primera vez en Applied Intelligence 55, 887 (2025), está disponible en línea en el sitio web del editor: https://doi.org/10.1007/s10489-025-06765-y.
dc.description.abstractQuestion Answering (QA) is often used to assess the reasoning capabilities of NLP systems. For a QA system, it is crucial to have the capability to determine answerability– whether the question can be answered with the information at hand. Previous works have studied answerability by including a fixed proportion of unanswerable questions in a collection without explaining the reasons for such proportion or the impact on systems’ results. Furthermore, they do not answer the question of whether systems learn to determine answerability. This work aims to answer that question, providing a systematic analysis of how unanswerable question ratios in training data impact QA systems. To that end, we create a series of versions of the well-known Multiple-Choice QA dataset RACE by modifying different amounts of questions to make them unanswerable, and then train and evaluate several Large Language Models on them. We show that LLMs tend to overfit the distribution of unanswerable questions encountered during training, while the ability to decide on answerability always comes at the expense of finding the answer when it exists. Our experiments also show that a proportion of unanswerable questions around 30%– as found in existing datasets– produces the most discriminating systems. We hope these findings offer useful guidelines for future dataset designers looking to address the problem of answerability.en
dc.description.versionversión publicada
dc.identifier.citationReyes-Montesinos, J., Rodrigo, Á. & Peñas, A. None of the above: comparing scenarios for answerability detection in question answering systems. Appl Intell 55, 887 (2025). https://doi.org/10.1007/s10489-025-06765-y
dc.identifier.doihttps://doi.org/10.1007/s10489-025-06765-y
dc.identifier.issn0924-669X; eISSN:1573-7497
dc.identifier.urihttps://hdl.handle.net/20.500.14468/29834
dc.journal.titleApplied Intelligence
dc.journal.volume55
dc.language.isoen
dc.page.initial887
dc.publisherSpringer
dc.relation.centerE.T.S. de Ingeniería Informática
dc.relation.departmentLenguajes y Sistemas Informáticos
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.es
dc.subject1203.17 Informática
dc.subject.keywordsQuestion answeringen
dc.subject.keywordsAnswerabilityen
dc.subject.keywordsMultiple choiceen
dc.titleNone of the above: comparing scenarios for answerability detection in question answering systemsen
dc.typejournal articleen
dspace.entity.typePublication
relation.isAuthorOfPublicationfd81aefe-8163-4abc-950e-1d764f1ff4c6
relation.isAuthorOfPublication90ababf8-3bd1-44b2-9d12-368f2c6568ac
relation.isAuthorOfPublication1e1b14bc-1284-4aef-908c-bccf31bd055e
relation.isAuthorOfPublication.latestForDiscoveryfd81aefe-8163-4abc-950e-1d764f1ff4c6
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
ReyesMontesinos_Julio_None_of_the_above.pdf
Tamaño:
3.55 MB
Formato:
Adobe Portable Document Format
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.62 KB
Formato:
Item-specific license agreed to upon submission
Descripción: