(135) Auditing Documentation Language in OUD Admissions Using Privacy-Preserving Local LLMs

Saturday, April 25, 2026

9:45 AM - 1:30 PM PT

Location: Harbor Foyer, Level 2

Presenter(s)

Joseph A. Izzo, MD

Chief Medical Information Officer
San Joaquin General Hospital, California

Background & Introduction: Stigmatizing language in the electronic health record (EHR) has been associated with adverse patient experiences and disparities in substance use disorder care, including opioid use disorder (OUD). Prior studies have shown that terminology choice in clinical documentation may influence patient trust, engagement, and continuity of care. However, scalable methods to audit documentation language in routine clinical records remain limited. Existing approaches have relied primarily on manual chart review or keyword-based natural language processing (NLP) to identify overt labels, which lack contextual interpretation and are difficult to deploy at scale.

Patient-Directed Discharges (PDD), previously referred to as discharges against medical advice (AMA), represent a clinically meaningful outcome in OUD care, yet differences in documentation language associated with PDD have not been well characterized within diagnostically homogeneous populations. Recent advances in locally deployable large language models (LLMs) offer new opportunities for contextual language analysis while preserving patient privacy.

The objective of this study was to develop and evaluate a privacy-preserving, local LLM–enabled pipeline to audit documentation language in OUD admissions and to compare predefined linguistic markers historically considered stigmatizing between PDD and non-PDD cohorts.

Methods: We conducted a retrospective cohort study using 477 opioid use disorder (OUD)–associated inpatient admissions, analyzing discharge summaries extracted from the MIMIC-IV database. To ensure data privacy and HIPAA alignment, all analyses were performed locally using a privacy-preserving deployment of the Llama-3.3-70B large language model in an on-device environment. A hybrid NLP–LLM pipeline was employed: first, a predefined 14-term lexicon, informed by national guidelines, identified sentences containing candidate substance-use-related terminology. These candidate sentences were then evaluated by the locally deployed LLM using behavioral-bias–focused prompts to classify contextual usage as stigmatizing or neutral clinical language. To improve construct validity, sentences documenting administrative PDD disposition were excluded from stigma counts using a keyword-based text filter, ensuring that flags reflected clinician-attributed narrative rather than the discharge event itself. This approach was supported by manual expert review of a random sample of flagged sentences (n = 100). Statistical significance was assessed using Fisher’s exact test for stigma prevalence and the Mann–Whitney U test for stigma marker frequency.

Results: The automated audit identified a significant disparity in documentation patterns between groups. Stigmatizing language was present in 64.2% (145/226) of discharge summaries among admissions resulting in patient-directed discharge (PDD) compared with 42.6% (107/251) among the non-PDD cohort. After exclusion of administrative PDD terminology, the PDD cohort demonstrated 2.41 times higher odds of containing stigmatizing language (OR 2.41; 95% CI 1.67–3.49; p < 0.0001, Fisher’s exact test). Non-parametric analysis confirmed a significantly higher intensity of stigma markers per admission in the PDD cohort (p < 0.0001, Mann–Whitney U test).

The most prevalent markers were legacy substance-use labels such as “abuse” and “IVDU,” as well as behavioral descriptors such as “refused.” The locally deployed LLM served as a rule-constrained contextual adjudicator to validate candidate stigmatizing language identified by deterministic NLP. These findings suggest that documentation differences associated with PDD within an OUD-only cohort are driven by lexical choices and narrative framing rather than diagnostic content alone.

Conclusion & Discussion: In this retrospective analysis of discharge summaries from an OUD-only inpatient cohort, the PDD cohort demonstrated a higher prevalence and frequency of predefined stigmatizing lexical markers compared with the non-PDD cohort. This disparity persisted after excluding administrative PDD terminology and neutral diagnostic language, suggesting that differences are driven by narrative framing rather than clinical content.

These findings demonstrate the feasibility of combining deterministic NLP with privacy-preserving, local LLMs to audit documentation at scale. Here, the LLM functioned as a rule-constrained, contextual adjudicator to validate candidate language rather than inferring clinician intent. This offers a governance-aligned method for health systems to evaluate documentation practices without external data exposure.

Limitations include the use of retrospective discharge summaries and a focus on predefined markers rather than implicit stigma. Additionally, these observed associations do not imply causality. Finally, the LLM primarily served a confirmatory role and was not evaluated for balanced discrimination or recall of non-stigmatizing language. Despite these limitations, this work supports the use of local, auditable language models as a practical, scalable tool for quality improvement and documentation review in OUD care.

References:

Barcelona V, Scharp D, Idnay BR, Moen H, Cato K, Topaz M. Identifying stigmatizing language in clinical documentation: a scoping review of emerging literature. PLoS One. 2024;19(6):e0303653. doi:10.1371/journal.pone.0303653

Goddu AP, O'Conor KJ, Lanzkron S, et al. Do words matter? Stigmatizing language and the transmission of bias in the medical record. J Gen Intern Med. 2018;33(5):685-691. doi:10.1007/s11606-017-4289-2

Johnson AEW, Pollard TJ, Shen L, et al. MIMIC-IV, a freely accessible electronic health record dataset. Sci Data. 2023;10:1. doi:10.1038/s41597-022-01899-x

Sethi R, Caskey J, Gao Y, et al. Detecting stigmatizing language in clinical notes with large language models for addiction care. npj Health Syst. 2026;3:15. doi:10.1038/s44401-026-00069-0

Weiner SG, Lo YC, Carroll AD, et al. The Incidence and Disparities in Use of Stigmatizing Language in Clinical Notes for Patients With Substance Use Disorder. J Addict Med. 2023;17(4):424-430. doi:10.1097/ADM.0000000000001145

Disclosure(s):

Joseph A. Izzo, MD: No financial relationships to disclose

Learning Objectives:

Identify the significant disparity in stigmatizing language documented in the electronic health records of patients with opioid use disorder who are discharged against medical advice compared to routine discharges.
Evaluate the feasibility of using locally deployed, privacy-preserving large language models as a scalable tool for auditing clinical documentation and narrative framing in addiction medicine.
Discuss how narrative bias in clinical documentation acts as a modifiable, system-level contributor to health inequities and treatment fragmentation for vulnerable patient populations.