Assessing the Ability of Artificial Intelligence-Driven Language Processing Frameworks to Create Patient-Oriented Medical Education Material on Hypothermia
DOI:
https://doi.org/10.62573/knxh0m41Keywords:
artificial intelligence, hypothermia, patient educationAbstract
Introduction: Artificial Intelligence-Driven Language Processing Frameworks (AI-LPFs) such as ChatGPT, Grok, and Gemini are increasingly being explored for their ability to generate patient-oriented medical education material (PEM). While prior studies have assessed AI-generated PEM in various medical fields, their applicability to operational medicine remains understudied. Given the significance of hypothermia in operational and civilian settings, this study evaluates the quality and readability of AI-generated PEM on hypothermia.
Methods: Three AI-LPFs (ChatGPT-4, Grok-3, and Gemini 2.0 Flash) were prompted to generate PEM on hypothermia. Readability was assessed using the Flesch-Kincaid reading grade level and Flesch Reading Ease Score (FRE). Additional text metrics included PEM length, the proportion of complex words and sentences, and average sentence and word length. The quality of AI-generated PEM was scored using the CDC Clear Communication Index (CCI), and content accuracy was assessed through fact-checking against the Wilderness Medical Society guidelines. A benchmark PEM from the American Red Cross was included for comparison.
Results: Readability analysis showed that the PEM from Gemini and the American Red Cross met NIH recommendations for an 8th-grade reading level, whereas ChatGPT and Grok were slightly above this threshold. Grok generated the most comprehensive PEM, uniquely categorizing hypothermia into mild, moderate, and severe, aligning with Wilderness Medical Society guidelines. Unlike the other AI-generated PEM, it also addressed both EMS activation and CPR. The PEM from Grok scored the highest on the CDC CCI, outperforming the other AI-generated PEMs and the benchmark from the American Red Cross. A manual review confirmed that all AI-generated PEM were factually accurate
Conclusion: AI-LPFs successfully produced factually accurate PEM on hypothermia, with Grok generating the most comprehensive material. These findings suggest AI-LPFs have potential for enhancing public education on operational medicine topics. Further refinement of AI-generated PEM to improve readability and adherence to established guidelines may enhance their utility as reliable educational tools.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Adam Schwartz, Alfred Urba (Author)

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution (CC BY-NC) 4.0 License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal.