Chief Medical Information Officer San Joaquin General Hospital, California
Background & Introduction: Stigmatizing language in the electronic health record (EHR) has been associated with adverse patient experiences and disparities in substance use disorder care, including opioid use disorder (OUD). Prior studies have shown that terminology choice in clinical documentation may influence patient trust, engagement, and continuity of care. However, scalable methods to audit documentation language in routine clinical records remain limited. Existing approaches have relied primarily on manual chart review or keyword-based natural language processing (NLP) to identify overt labels, which lack contextual interpretation and are difficult to deploy at scale.
Discharges against medical advice (AMA) represent a clinically meaningful outcome in OUD care, yet differences in documentation language associated with AMA have not been well characterized within diagnostically homogeneous populations. Recent advances in locally deployable large language models (LLMs) offer new opportunities for contextual language analysis while preserving patient privacy.
The objective of this study was to develop and evaluate a privacy-preserving, local LLM–enabled pipeline to audit documentation language in OUD admissions and to compare predefined linguistic markers historically considered stigmatizing between AMA and non-AMA cohorts.
Methods: We conducted a retrospective cohort study using 477 opioid use disorder (OUD)–associated inpatient admissions, analyzing discharge summaries extracted from the MIMIC-IV database. To ensure data privacy and HIPAA alignment, all analyses were performed locally using a privacy-preserving deployment of the Llama-3.3-70B large language model in an on-device environment. A hybrid NLP–LLM pipeline was employed: first, a predefined 14-term lexicon, informed by national guidelines, identified sentences containing candidate substance-use-related terminology. These candidate sentences were then evaluated by the locally deployed LLM using behavioral-bias–focused prompts to classify contextual usage as stigmatizing or neutral clinical language. To improve construct validity, sentences documenting administrative AMA disposition were excluded from stigma counts using a keyword-based text filter, ensuring that flags reflected clinician-attributed narrative rather than the discharge event itself. This approach was supported by manual expert review of a random sample of flagged sentences (n = 100). Statistical significance was assessed using Fisher’s exact test for stigma prevalence and the Mann–Whitney U test for stigma marker frequency.
Results: The automated audit identified a significant disparity in documentation patterns between groups. Stigmatizing language was present in 64.2% (145/226) of discharge summaries among admissions resulting in discharge against medical advice (AMA) compared with 42.6% (107/251) among the non-AMA cohort. After exclusion of administrative AMA terminology, the AMA cohort demonstrated 2.41 times higher odds of containing stigmatizing language (OR 2.41; 95% CI 1.67–3.49; p < 0.0001, Fisher’s exact test). Non-parametric analysis confirmed a significantly higher intensity of stigma markers per admission in the AMA cohort (p < 0.0001, Mann–Whitney U test).
The most prevalent markers were legacy substance-use labels such as “abuse” and “IVDU,” as well as behavioral descriptors such as “refused.” The locally deployed LLM served as a rule-constrained contextual adjudicator to validate candidate stigmatizing language identified by deterministic NLP. These findings suggest that documentation differences associated with AMA discharge within an OUD-only cohort are driven by lexical choices and narrative framing rather than diagnostic content alone.
Conclusion & Discussion: In this retrospective analysis of discharge summaries from an OUD-only inpatient cohort, the AMA cohort demonstrated a higher prevalence and frequency of predefined stigmatizing lexical markers compared with the non-AMA cohort. This disparity persisted after excluding administrative AMA terminology and neutral diagnostic language, suggesting that differences are driven by narrative framing rather than clinical content.
These findings demonstrate the feasibility of combining deterministic NLP with privacy-preserving, local LLMs to audit documentation at scale. Here, the LLM functioned as a rule-constrained, contextual adjudicator to validate candidate language rather than inferring clinician intent. This offers a governance-aligned method for health systems to evaluate documentation practices without external data exposure.
Limitations include the use of retrospective discharge summaries and a focus on predefined markers rather than implicit stigma. Additionally, these observed associations do not imply causality. Finally, the LLM primarily served a confirmatory role and was not evaluated for balanced discrimination or recall of non-stigmatizing language. Despite these limitations, this work supports the use of local, auditable language models as a practical, scalable tool for quality improvement and documentation review in OUD care.
References:
Barcelona V, Scharp D, Idnay BR, Moen H, Cato K, Topaz M. Identifying stigmatizing language in clinical documentation: a scoping review of emerging literature. PLoS One. 2024;19(6):e0303653. doi:10.1371/journal.pone.0303653
Chen H, Alfred M, Cohen E. Efficient detection of stigmatizing language in electronic health records via in-context learning: comparative analysis and validation study. JMIR Med Inform. 2025;13:e68955. doi:10.2196/68955
Goddu AP, O'Conor KJ, Lanzkron S, et al. Do words matter? Stigmatizing language and the transmission of bias in the medical record. J Gen Intern Med. 2018;33(5):685-691. doi:10.1007/s11606-017-4289-2
Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10:1. doi:10.1038/s41597-022-01899-x
Sethi R, Caskey J, Gao Y, et al. Detecting stigmatizing language in clinical notes with large language models for addiction care. npj Health Syst. 2026;3:15. doi:10.1038/s44401-026-00069-0
Disclosure(s):
Joseph A. Izzo, MD: No financial relationships to disclose
Learning Objectives:
Identify the significant disparity in stigmatizing language documented in the electronic health records of patients with opioid use disorder who are discharged against medical advice compared to routine discharges.
Evaluate the feasibility of using locally deployed, privacy-preserving large language models as a scalable tool for auditing clinical documentation and narrative framing in addiction medicine.
Discuss how narrative bias in clinical documentation acts as a modifiable, system-level contributor to health inequities and treatment fragmentation for vulnerable patient populations.