ISSN 1016-5169 | E-ISSN 1308-4488
pdf
Can Large Language Models Guide Aortic Stenosis Management? A Comparative Analysis of ChatGPT and Gemini AI [Turk Kardiyol Dern Ars]
Turk Kardiyol Dern Ars. Ahead of Print: TKDA-54968 | DOI: 10.5543/tkda.2025.54968

Can Large Language Models Guide Aortic Stenosis Management? A Comparative Analysis of ChatGPT and Gemini AI

Ali Sezgin1, Veysel Ozan Tanık1, Murat Akdoğan1, Yusuf Bozkurt Şahin1, Kürşat Akbuğa1, Vedat Hekimsoy1, Çağatay Tunca1, Erhan Saraçoğlu1, Bülent Özlek2
1Department of Cardiology, Ankara Etlik City Hospital, Ankara, Türkiye
2Department of Cardiology, Muğla Sıtkı Koçman University, School of Medicine, Muğla, Türkiye


OBJECTIVE
Aortic stenosis (AS) management requires the integration of complex clinical, imaging, and risk stratification data. Large language models (LLMs) such as ChatGPT and Gemini AI have shown promise in healthcare, but their performance in valvular heart disease, particularly AS, has not been thoroughly assessed. This study aimed to systematically compare ChatGPT and Gemini AI in addressing guideline-based and clinical scenario questions related to AS.

METHOD
Forty open-ended AS-related questions were developed, comprising 20 knowledge-based and 20 clinical scenario items based on the 2021 ESC/EACTS guidelines. Both models were independently queried. Responses were evaluated by two blinded cardiologists using a structured 4-point scoring system. Composite scores were categorized, and comparisons were made using Wilcoxon signed-rank and chi-square tests.

RESULTS
Gemini AI achieved a significantly higher mean overall score than ChatGPT (3.96 ± 0.17 vs 3.56 ± 0.87; p = 0.003). Fully guideline-compliant responses were more frequent with Gemini AI (95.0%) than ChatGPT (72.5%), though the overall compliance distribution did not reach conventional significance (p = 0.067). Gemini AI performed more consistently across both question types. Inter-rater agreement was excellent for ChatGPT (κ = 0.94) and moderate for Gemini AI (κ = 0.66).

CONCLUSION
Gemini AI demonstrated superior accuracy, consistency, and guideline adherence compared to ChatGPT. While LLMs show potential as adjunctive tools in cardiovascular care, expert oversight remains essential, and further model refinement is needed before clinical integration, particularly in the management of AS.

Keywords: Aortic stenosis, artificial intelligence, clinical decision support, guideline adherence, large language models

Corresponding Author: Bülent Özlek, Türkiye
Manuscript Language: English
×
APA
NLM
AMA
MLA
Chicago
Copied!
CITE


Journal Metrics

Journal Citation Indicator: 0.18
CiteScore: 1.1
Source Normalized Impact
per Paper:
0.22
SCImago Journal Rank: 0.348

Quick Search

Copyright © 2025 Archives of the Turkish Society of Cardiology



Kare Publishing is a subsidiary of Kare Media.