7. Natural Language Processing (NLP), Part 1
The Role of Natural Language Processing in Healthcare
Introduction to the Topic
- Peter Szolovits introduces the discussion on natural language processing (NLP) in machine learning within healthcare, highlighting a focus on non-neural network methods today and neural network methods in the next session.
- Dr. Katherine Liao, a rheumatologist from Partners HealthCare, will join for a Q&A session later, sharing insights from their collaborative work.
Importance of Clinical Text
- Szolovits emphasizes the significance of clinical text analysis, noting that while some methods are conceptually appealing, they may not be practically feasible.
- A common approach involves term spotting to identify key words and phrases indicative of diseases or symptoms, which is essential for current clinical research practices.
Case Study: Discharge Summary Analysis
- An example discharge summary from MIMIC illustrates how de-identified patient data can still convey critical medical information despite anonymization.
- The case study focuses on Mr. Blind, detailing his medical history and treatment outcomes to highlight the importance of narrative texts in understanding patient conditions.
Research Project Insights
- Szolovits discusses a project aimed at identifying genetic correlates of rheumatoid arthritis by analyzing billing codes associated with patient visits.
- Initial findings revealed that only 19% of patients billed for rheumatoid arthritis actually had the condition, raising concerns about billing code accuracy.
Challenges with Billing Codes
- The low positive predictive value is attributed to billing codes being designed primarily for insurance purposes rather than accurate disease representation.
- Szolovits explains how multiple billing codes can inflate perceived diagnoses; for instance, various tests related to rheumatoid arthritis might lead to multiple charges even if results are negative.
Ensuring Data Quality
- To improve diagnostic accuracy, researchers proposed requiring three distinct billing codes for rheumatoid arthritis instead of one; this increased positive predictive value to 27%.
Rheumatoid Arthritis Diagnosis: The Role of Data and Natural Language Processing
Overview of the Study
- The study aimed to evaluate the effectiveness of using codified data (lab values, prescriptions, demographics) versus narrative text (nursing notes, doctor's notes) in diagnosing rheumatoid arthritis (RA).
- Initial results showed a positive predictive value of 88% with codified data; however, using natural language processing on narrative text improved this to 89%.
- Combining both data types in a joint model yielded an impressive positive predictive value of 94%, indicating significant improvements through integrated approaches.
Methodology
- The research involved analyzing electronic medical records (EMRs) from approximately four million patients, focusing on 29,000 individuals with at least one ICD-9 code for RA or an anti-CCP titer.
- A sample of 500 cases was selected for algorithm training to predict true RA diagnoses. Validation involved rheumatologists reviewing these cases for accuracy.
Data Selection and Analysis
- Specific ICD-9 codes were utilized while excluding irrelevant ones related to other rheumatoid diseases. Multiple coding issues were addressed by ignoring codes within a week of each other.
- Various lab tests were analyzed, including rheumatoid factor and anti-cyclic citrullinated peptide levels. Counting patient data points served as a proxy for illness severity.
Narrative Text Processing
- A system called HITex was employed to extract entities from narrative texts such as health provider notes and discharge summaries. This tool was effective for its time but not state-of-the-art today.
- The extraction process included disease mentions, medications, lab results, and radiology findings. Negation detection was crucial to ensure accurate interpretations when conditions were stated as absent.
Predictive Modeling Insights
- Logistic regression was used to build the predictive model incorporating both NLP-derived features and codified data predictors.
- Key predictors included direct mentions of RA in notes and seropositivity indicators. Negative predictors also played a role in refining diagnosis accuracy.
Discussion on Regression Coefficients
- Questions arose regarding standardized regression coefficients in logistic regression models; they indicate how likely certain factors influence diagnosis outcomes.
- Standardization allows comparison across different datasets by showing relative weights of each predictor's impact on the model's predictions.
Study Replication Across Institutions
Exploring the Feasibility of Multi-Institutional Studies
- The discussion begins with a proposal to replicate a study at Vanderbilt and Northwestern University, both of which have electronic medical record (EMR) systems and are interested in identifying patients with rheumatoid arthritis.
- Differences in EMR systems necessitate varied extraction methods for medications and natural language queries, leading to concerns about the study's replicability across different institutions.
- Contrary to expectations, the model performance was fairly similar across institutions despite differences in data extraction methods, indicating some level of generalizability.
Concerns About Predictive Value
- A notable concern arises regarding the positive predictive value (PPV), which was calculated differently in this study compared to previous findings, resulting in lower values that raise questions about reliability.
- ROC curve analysis reveals that training on data from Partners or Vanderbilt yielded better results when tested against other datasets than vice versa, suggesting variability in algorithm performance based on training data.
Challenges of Interpreting Medical Notes
Complexity of Nursing Notes
- An example from an old paper illustrates the difficulty of interpreting nursing notes due to their unstructured format and use of abbreviations that may not be universally understood.
- The complexity is highlighted by specific abbreviations like "SOB" for shortness of breath and "DOE" for dyspnea on exertion, emphasizing the need for clarity in medical documentation.
Applications of Natural Language Processing (NLP)
- NLP can be utilized to codify terms within notes into standardized codes such as ICD-9 for conditions like rheumatoid arthritis.
- De-identification is another critical application where NLP helps remove identifying information from records while maintaining essential medical facts.
Identifying Relationships Within Textual Data
Understanding Entity Relationships
- It’s important to determine various aspects related to entities mentioned in texts, including timeframes, locations, and degrees of certainty regarding medical conditions or treatments.
- Identifying relationships between entities—such as causation or treatment indications—is crucial for accurate interpretation and analysis.
Challenges with Electronic Medical Records
- Summarization poses significant challenges due to repetitive entries caused by copy-pasting practices within electronic medical records. This can lead to inaccuracies if changes are not consistently updated across notes.
Understanding the Challenges of Medical Text Analysis
The Complexity of Identifying Health Information
- The process of determining whether text constitutes protected health information often involves aggregate judgments, where not all words are significant.
- An example illustrates this complexity: a researcher studying tobacco mosaic virus was misidentified as a smoker due to contextually misleading terms.
- Another case presented is that of a patient who quit smoking two days prior, raising questions about their classification as a smoker.
Historical Context and Development in Natural Language Processing
- A historical note references work from 1966 by the speaker's PhD advisor, who proposed parsing English text based on grammatical rules to derive semantic meaning.
- This approach led to the development of systems that could analyze narrative texts in fields like anthropology, demonstrating early applications of natural language processing.
Early Attempts at Natural Language Access in Medicine
- In the 1980s, SRI developed Diamond Diagram, aimed at enabling users unfamiliar with command languages to interact with computers using English.
- Researchers attempted to apply similar principles for accessing medical texts but faced challenges with system performance and user adaptability.
User Adaptability and System Limitations
- Users tend to adapt better than computer systems; they learn how to phrase commands effectively despite rigid syntax requirements imposed by early systems.
- A notable experiment involved creating an artificial language for cardiology notes. However, it proved insufficiently expressive for clinical needs, leading to its discontinuation after initial trials.
Evolving Approaches in Medical Text Analysis
- Traditional methods for identifying relevant medical terms relied heavily on expert input followed by keyword searches through notes.
How to Improve Medical Term Identification
Understanding Negation in Medical Discharge Summaries
- The discussion begins with the challenge of identifying effective medical terms that yield better results than initial terms. A simple algorithm for identifying negated findings in discharge summaries is introduced.
- The algorithm involves a dictionary lookup within a large database of medical terms (UMLS), focusing on patterns where negation phrases appear near UMLS terms.
- Examples of negation phrases include "no sign of," "ruled out," and "absence of," which indicate the non-presence of conditions or findings.
- The algorithm accounts for exceptions, such as "gram negative," which does not imply negativity regarding associated conditions. Despite its simplicity, it achieves reasonable sensitivity and specificity rates.
Generalization Techniques in Phenotyping
- To enhance term identification, related terms like hypo- or hypernyms are utilized, allowing for diagnostic reasoning based on symptoms mentioned alongside diseases.
- This recursive machine learning problem focuses on effectively associating relevant terms with specific medical concepts, known as phenotyping.
Overview of the Unified Medical Language System (UMLS)
- The UMLS was established in the mid-1980s to unify various medical terminologies into a meta-thesaurus, facilitating better communication across different medical fields.
- It maps synonymous expressions from diverse terminologies (e.g., myocardial infarction vs. heart attack), aiding normalization across databases and enhancing natural language processing capabilities.
Structure and Content of UMLS
- Currently, there are approximately 3.7 million distinct concepts within UMLS, organized hierarchically with semantic types assigned to each concept for easier data navigation.
- Semantic networks categorize relationships among concepts into 54 relations and 127 types, providing a structured approach to understanding complex medical terminology.
Application of Semantic Types in Data Analysis
- Common semantic types include therapeutic procedures and drug classifications; these categories help streamline searches through vast datasets derived from human medicine and veterinary research alike.
Normalization and NLP in Medical Records
Lexical Variants and Normalization
- The concept of lexical variant generators is introduced, which helps normalize medical text by converting statements into lowercase, alphabetized versions.
- An example illustrates how the word "was" can be translated to "be," highlighting potential pitfalls in normalization where abbreviations may lead to misinterpretation (e.g., "b-e" as beryllium).
Tools for Concept Mapping
- An online tool demonstrates the translation of phrases like "weakness of the upper extremities" into more standardized medical concepts such as "proximal weakness."
- MetaMap, developed by the National Library of Medicine, is mentioned as a tool that finds mappings from text to medical concepts, aiding in understanding negation and specific conditions.
Advancements in Phenotyping Techniques
Evolution of Research Methods
- Katherine Liao discusses improvements made since 2010 in phenotyping techniques used in research settings, indicating they are now on version five with enhanced automation.
- The challenge of obtaining clinician labels through chart reviews is highlighted as a rate-limiting factor; efforts are being made to streamline this process.
Reducing Chart Review Burden
- Strategies have been implemented to reduce the amount of chart review required by narrowing down feature space and determining essential gold standard labels.
- A new approach involves using publicly available resources (e.g., Wikipedia, Medline) to generate term lists without needing extensive input from clinicians.
Automating Data Processing
- The methodology has shifted from manual list creation to processing articles directly for relevant terms through majority voting among multiple sources.
- Training now utilizes silver standard labels instead of solely relying on gold standards, allowing for flexibility when no specific ICD code exists for certain phenotypes.
Current Applications and Future Directions
Integration with Biobanks
- The updated pipeline is integrated into Partners Healthcare's biobank system, linking patient blood samples with electronic health records (EHR), facilitating research opportunities.
Broader Disease Targeting
Understanding the Pipeline for Disease Prevalence
Overview of Disease Prevalence and Research Approaches
- The base pipeline is most effective for diseases with a prevalence of 1% or higher, such as rheumatoid arthritis, which affects about 1% of the population.
- Rare diseases that occur episodically are less suited for this approach due to their lower prevalence and sporadic nature. Most common diseases fall above the 1% threshold.
Genotyping Patients: Past vs Present
- A decade ago, genotyping was costly ($500-$700 per patient) and limited to well-defined cases like rheumatoid arthritis (RA). Careful selection was crucial to avoid wasted resources.
- With advancements in biobanks like VA MVP and UK Biobank, blood samples are now systematically collected and genotyped without a specific study focus, streamlining research processes.
Changes in Research Methodology
- Current methodologies allow researchers to access existing genotype data rather than needing to identify patients first, significantly altering research dynamics compared to ten years ago.
Clinical Applications of Data Analysis
- There is potential for clinical applications using machine learning algorithms to analyze pathology reports; however, widespread implementation has not yet been achieved. Challenges remain due to varying adoption rates of Electronic Health Records (EHR).
- The distinction between research and clinical settings is critical; misclassification in clinical practice can have serious consequences compared to research studies where it may only affect statistical power. Thus, accuracy is paramount in clinical applications.
Future Directions in Phenotyping Algorithms
- Ongoing efforts aim to develop algorithms that output disease probabilities rather than definitive classifications, particularly useful in remote areas lacking specialists like rheumatologists. This could enhance decision-making through telehealth consultations.
Optimism About Future Developments
Understanding Phenotyping and Mapping in Medical Informatics
The Role of UMLS in Mapping Keywords
- The Unified Medical Language System (UMLS) provides a framework for mapping keywords to terms and concepts, which is foundational for the discussed processes.
- Initial mapping efforts were highly manual; however, there has been a shift towards automation using systems like UMLS to enhance efficiency across various phenotypes.
Tools Used for Phenotyping
- Katherine Liao mentions utilizing NIAL, a system developed by Cheng, as it is less computationally intensive than cTAKES, which was deemed too detailed for their needs.
- The focus of NIAL is on identifying whether a concept was mentioned and its negation rather than extensive detail. This tool has been validated over time through various testing methods.
Machine Learning Approaches in Concept Identification
- A paper from David Sontag's group discusses using anchor concepts to automate the identification of relevant medical terms based on their frequent co-occurrence with other terms found in sources like Wikipedia or Mayo Clinic data.
- This approach formalizes the idea of leveraging certain indicative terms to train machine learning models that can identify additional useful terms, creating what is referred to as a "silver standard" derived from a smaller gold standard dataset.
Collaboration Between Academics and Machine Learning Experts
- Katherine Liao describes her experience collaborating with academics through the i2B2 project led by Zak Kohane, emphasizing the importance of structured meetings to address problems collectively.
- She notes that initial skepticism existed among colleagues regarding the integration of AI into clinical research but acknowledges that such collaborations have become more mainstream over time.
Challenges in Interdisciplinary Collaboration
- Effective collaboration requires bringing together individuals with diverse expertise; having team members who can communicate across disciplines is crucial for success. Liao highlights this necessity when discussing her team's dynamics during collaborative sessions.
- The process involves overcoming significant barriers due to differing terminologies and approaches between fields, necessitating patience and effort from all parties involved. Liao uses an analogy about "kissing frogs" to illustrate the challenges faced in finding suitable collaborators.
Addressing Alarm Fatigue in EMR Systems
- Alarm fatigue emerges as a major barrier due to increased regulations accompanying electronic medical records (EMRs), complicating physicians' workflows significantly since 2010 when EMRs became prevalent.
Barriers to Effective EMR Implementation
Alarm Fatigue and User-Friendliness of EMRs
- Alarm fatigue is a significant barrier in clinical settings, compounded by the user-unfriendly nature of Electronic Medical Records (EMRs), which are primarily designed for billing rather than clinical care.
- The challenge lies not in the science itself but in the implementation of systems that can effectively integrate into clinical workflows.
Historical Context and Lessons Learned
- Peter Szolovits recalls a past experience where a drug-drug interaction system was implemented at Brigham, highlighting issues with excessive alerts that led to desensitization among clinicians.
- A dataset from First Databank was used to identify potential drug interactions, but the overwhelming number of alerts diminished their effectiveness due to lack of prioritization.
Streamlining Alerts for Clinical Relevance
- To address alert fatigue, senior doctors reduced the list of drug interactions from thousands down to 20 critical alerts based on actual adverse events experienced at the hospital.
- However, newer systems like Epic have reverted back to generating multiple alerts per order, leading to similar problems as before.
Challenges with ICD Coding Systems
- Katherine Liao discusses challenges related to ICD coding systems post-ICD-10 implementation, noting its impact on data accuracy and granularity compared to ICD-9.
- The transition introduced more specific codes for conditions like rheumatoid arthritis but raised concerns about accuracy and increased complexity in coding practices.
Mapping Issues Between ICD Versions
- There are significant challenges in mapping between ICD-9 and ICD-10 due to differences in disease descriptions and classifications that complicate data harmonization efforts.
- Current efforts focus on developing systematic approaches for counting and categorizing codes accurately within healthcare systems like the VA.
Future Directions: Development of ICD-11
- Discussion around the upcoming ICD-11 highlights optimism due to its foundation on SNOMED, which offers a more logical structure compared to previous versions.
Understanding NLP and Clinical Data Labeling
The Role of Language in NLP
- Peter Szolovits discusses the necessity of a structured language for describing complex situations, hinting at a potential shift towards more natural language processing (NLP)-based systems.
- Katherine Liao mentions Chris's perspective on the evolution of this language, suggesting it may resemble older models like Fred Thompson or the Diamond Diagram.
Addressing Ambiguities in Clinical Data
- A question arises regarding how clinicians handle ambiguities when labeling training data, particularly in uncertain cases such as rheumatoid arthritis (RA).
- Liao explains their approach involves categorizing data into three groups: definite, possible, and no. This system acknowledges inherent ambiguities in clinical diagnoses.
Importance of Structured Adjudication
- Liao emphasizes the need for structured adjudication processes during clinical trials to define disease phenotypes accurately. This includes having multiple reviewers assess cases to ensure reliability.