Relevance of automated generated short summaries of scientific abstract: use case scenario in healthcare

Gregor Štiglic, Kasandra Musović, Lucija Gosak, Nino Fijačko, Primož Kocbek

2022-06-16

PDF Code Project DOI

Image credit: IEEE

Abstract

The recent development and successful deployment of large pre-trained natural language models in few-shot and zero-shot scenarios enabled impressive results in different downstream tasks. One such task is abstractive text summarization combining understanding, information compression, and language generation, which might be of great potential in healthcare, where time is a premium. A potential real-world scenario is explored, where the relevance of extremely short summaries from the latest scientific literature is generated by state-of-the-art (SOTA) models and is evaluated by healthcare experts, and can be seen as support at the point of care. In a small-scale study, a baseline fine-tuned model (Semantic Scholar TLDR) was compared with three SOTA models (OpenAI Curie, OpenAI Davinci, and PEGASUS-XSum). Healthcare experts evaluated the abstracts and extremely short summaries considering the scenario and it was observed that in terms of relevance, the OpenAI models and Semantic Scholar TLDR model differed just slightly, i.e. OpenAI Curie model had the highest average score of 4.59 (SD=1.69), followed by with 4.58 (SD=1.58) and OpenAI Davinci with 4.48 (SD=1.87). On the other hand, the PEGASUS-XSum model’s relevance was significantly lower, with 4.01 (SD=1.81). A deeper analysis of selected short summaries has shown that some concepts are difficult to understand for AI models that still have difficulty to “understand”, which often results in uninformative or false facts. One should be aware that extreme summarization using AI-based approaches is still a relatively new field of research and the technology is still not ready to be used in clinical practice. Our small-sample study indicates that it could already support the healthcare experts in the decision-making process.

Type

Conference paper

Publication

In 2022 IEEE 10th International Conference on Healthcare Informatics, Jun 11.-14., Rochester, Minnesota, USA, pp. 599-605

Gregor Štiglic

Associate Professor and head of Research Institute

My research interests include predictive models in healthcare, interpretability of complex models.

Kasandra Musović

PhD Student

My research interests include the newest pedagogical technologies in different healthcare fields and their effect on individual persons. Specific areas of interest include how serious game in gamification affect the level of physiological and psychological aspects in critical situations, such as cardiopulmonary resuscitation.

Nino Fijačko

PhD Student

Primož Kocbek

PhD Student

My research interests include statistical models and machine learning techniques with applications in healthcare. My specific areas of interest include temporal data analysis, interpretability of prediction models, stability of algorithms, advanced machine learning methods on massive datasets, e.g. deep neural networks.

Relevance of automated generated short summaries of scientific abstract: use case scenario in healthcare

Abstract

Gregor Štiglic

Associate Professor and head of Research Institute

Kasandra Musović

PhD Student

Lucija Gosak

PhD Student

Nino Fijačko

PhD Student

Primož Kocbek

PhD Student

Related