AI Framework Improves Factuality of Medical Summaries

Top post
Artificial Intelligence and Medical Facts: A New Approach for Evaluating Lay Summaries
The automated creation of medical summaries in plain language, known as Plain Language Summaries (PLS), offers great potential for making complex medical information accessible to a wide audience. At the same time, the use of large language models carries the risk of hallucinations, i.e., the generation of false or misleading information. This is particularly problematic in the medical context, as misinformation can have serious consequences for patients' health.
Previous methods for evaluating the factuality of texts, such as approaches based on entailment or question-answering, encounter difficulties when evaluating PLS. One reason for this is the phenomenon of elaborative explanation. PLS often contain additional information not present in the original document, such as definitions, background information, or examples. These additions serve to improve understanding, but can make factuality assessment difficult because they cannot be directly compared with the source text.
A new approach to solving this problem is PlainQAFact, a framework for the automatic evaluation of the factuality of biomedical PLS. PlainQAFact is based on a finely-grained, human-annotated dataset called PlainFact. This dataset distinguishes between different types of factuality, allowing for a differentiated evaluation of sentences that summarize information from the original text and those that contain elaborative explanations.
The framework works in two steps. First, the factuality type of a sentence is classified. Then, the factuality is evaluated using a retrieval-augmented QA-based scoring procedure. In this process, questions about the sentence are generated and answered based on relevant documents. The agreement of the answers with the sentence being evaluated serves as a measure of factuality.
PlainQAFact is characterized by its low complexity and high computational efficiency. Empirical results show that existing metrics for factuality evaluation reach their limits with PLS, especially with elaborative explanations. PlainQAFact, on the other hand, achieves state-of-the-art performance. Further analyses of the framework's effectiveness regarding various external knowledge sources, answer extraction strategies, overlap measures, and document granularities support the robustness of the approach.
PlainQAFact and Mindverse: Potential for the Future
The development of robust and efficient methods for factuality evaluation is a crucial step for the use of AI in sensitive areas such as medicine. PlainQAFact offers a promising approach here. This opens up new possibilities for companies like Mindverse, which offer AI-powered content solutions.
Integrating PlainQAFact into the content creation pipeline could significantly improve the quality and reliability of automatically generated medical summaries. This would not only strengthen user trust in AI-generated content but also further unlock the potential of AI in healthcare.
The combination of advanced AI models and reliable evaluation methods like PlainQAFact enables the development of innovative solutions that meet the needs of both medical professionals and patients.
Bibliography: You, Z., & Guo, Y. (2024). PlainQAFact: Automatic Factuality Evaluation Metric for Biomedical Plain Language Summaries Generation. *Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)*, 5805–5824. Maynez, J., Narayan, S., Bohnet, B., & McDonald, R. (2020). On Faithfulness and Factuality in Abstractive Summarization. *Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics*, 1906–1919. Kryscinski, W., Min, B., Mostafazadeh, N., Rajani, N. F., Zhang, Z., Song, X., Daniel, M., Gottumukkala, A., Gui, L., Han, S., Hu, W., Ji, Y., Liu, P., Nadeem, M., Parmavik, P., Radhakrishnan, J., Rekabsaz, N., Sagot, B., Shakeri, H., Wu, H., Wu, X., Yu, H., Yuan, W., & Zhao, W. (2023). Do Automatic Factuality Metrics Measure Factuality? A Critical Evaluation. *Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)*, 5759–5775.