Disclaimer:

The following are my sections of a (very lengthy) ethical analysis on Large Language Models (LLMs) in US Healthcare as part of my graduate ethics class term paper. Sections missing include: History of the Technology, Power & Justice Analysis, and Normatively Informed Recommendations. The following sections are included:

  • Executive Summary
  • Social Context & the problem
  • Technical De-mystification
  • Privacy Analysis
  • Accountability Analysis

Section 1: Executive Summary

With extensive amounts of data becoming available to learn from, AI systems have seen a surge in use cases over the past decade. Large Language Models (LLMs) are an AI based system whose purpose is to learn the ability to comprehend and generate human-like language based on large collections of text. There are a variety of tasks that these systems can be developed towards; generating text, carrying on conversations, language translation, and summarizing documents are just the beginning of what they are capable of. Based on these applications, integrating these systems into the field of healthcare provides promising results for patient care improvements. However, in order to integrate these systems into US healthcare, there are a handful of concerns to address prior to their public acceptance.

Throughout this paper, we analyze issues that arise when discussing the integration of LLMs in the field of healthcare. Specifically, we look at the current state of the US healthcare system and identify the flaws that leave it vulnerable to automation through these systems. We analyze how these systems impact the privacy of patients by utilizing their private medical information to train these AI tools, how they comply or do not comply with current legal structure, and concerns of misuse by large tech companies. We then provide a detailed description of how disparate impacts lead to unequal patient treatment, algorithmic bias reinforces social inequalities that limit healthcare opportunities, and a shift in current power structures from medical professionals to large tech companies can occur with the integration of these systems. Finally we present a detailed discussion on principles implemented to hold these systems accountable, where current gaps allowed moral crumple zones to form and shift blame to innocent medical professionals, and how impacted individuals lack current resources to make their voice heard. It is not to say that there are only negative implications when discussing the integrating of LLMs into the field of healthcare, but understanding the potential downfalls is a critical point when considering its use case in potentially life-threatening areas.

Through the conclusions of this analysis, we suggest a collection of recommendations that aim to address the identified issues and promote equitable, ethical, and effective use of LLMs in healthcare. Ensuring we collect and use diverse and representative data for training these systems will mitigate any existing biases from being outputted and prompt equitable health care for all. Transparency and accountability must be held through clear documentation for all system decisions, including collected data and training methods. By promoting interdisciplinary collaboration between developers, healthcare professionals, ethicists, and patients, we can ensure that the systems are designed and implemented with a comprehensive understanding of their impacts. Finally, strengthening privacy and data security measures to protect patient information used in these systems is imperative, ensuring it is de-identified and is in agreement with current legal standards. By implementing the suggested recommendations, we can mitigate known risks associated with AI while capturing its potential to improve the overall US healthcare system.

Section 2: Social Context & Its Problem

Healthcare is a universal service that is essential to the overall wellbeing of individuals, but not all countries provide equal access to this necessary service. Among all high income countries, despite spending the highest percentage of their Gross Domestic Product (GDP) on healthcare, the US is the only country that lacks guaranteed access to universal health coverage (Gunja, Gumas, and Williams II 2023). We see a high percentage of Americans that either struggle to pay for required medical care, or completely avoid necessary medical treatment due to the fear of financial burden that is attached to these services. A system that provides services based on an individual’s ability to pay rather than their need for medical attention, many flaws plague the US healthcare industry. With the impact of the COVID-19 pandemic, the lack of access to healthcare services was amplified and has continued to cripple the current state of US medical care. Despite the high spending of this healthcare system, we see one of the lowest numbers of practicing physicians among high-income countries. We focus too much of our investments in the medical field into the wrong areas, emphasizing advancement in technology and unnecessary medical procedures, rather than investing in increasing the patient’s overall health as a result of these services (Shmerling 2021). There are three key issues that need to be discussed to examine the current state of the US healthcare system and how they have propelled it into its vulnerable position; affordability, quality, and access.

A key factor is the affordability of healthcare in the US is based on the individual’s ability to pay for insurance, whether that be out of pocket or through an employer. Let’s focus specifically on a single person, whose employer does not provide sponsored health insurance, and must pay for their own coverage. As of 2023 they can expect to pay $8,435 per year for health insurance, which does not include full coverage for medical visits. They are still required to provide a copay, which averages between $40 to $120 per visit depending on the type of care. For a person without insurance, they can expect to pay between $300 and $600 depending on the type of care that is required. This price only covers the visit, but no follow ups or unexpected medications and procedures that may become required. Despite the necessity to retain it in order to afford medical treatment in the US, insurance does not guarantee low costs. A high number of insured adults are still concerned about affording monthly insurance premiums and the unexpected out of pocket costs that can be exempt from insurance coverage, with many being in medical debt due to this (Lopes et al. 2024). Similar to the lack of affordability for medical visits, the inability to afford required medical treatment is another limiting factor in the current state of US healthcare systems.

Take the example of insulin, a medication used for the treatment of type-1 and type-2 diabetes which aids in the control of blood sugar levels and is essential for living with the medical condition. In a 2018 study done by the US Department of Health and Human Services, insulin prices were compared with those in 33 other countries to assess affordability among each. The results were shocking, with the average price in America being more than ten times higher than that of the average for any other country included in the study (Robins 2021). Averaging a price of $98.70 per vial in the US compared to $7.52 in the UK, the lack of universal health care negatively impacts not only the financial state of an individual but the overall wellbeing of them if they are not able to afford their required medication. A key factor as well is that this is not an optional medical treatment for diabetes, it is required to live with the condition. Emphasized by Clarke Buckley in the article, a person cannot decide to not take insulin just because of the cost, and as a result their number one priority becomes worrying about how they will afford this expensive medication. Although we have seen improvements in affordability for insulin by the American Diabetes Association (ADA), much of this assistance comes through government assistance programs such as Medicare and Medicaid that not all individuals can receive benefits through.

An overall lack in affordability has left the US healthcare system desperate for cheaper alternatives. One solution comes through telehealth services that provide online consultations at a much cheaper, but still higher than average, price range through a variety of companies. Although this is a step in the correct direction, they cannot provide the same services as an in-person medical visit. If the condition requires further care, the individual must then pay for both the telehealth and suggested in-person visit followup, only further increasing the already high financial burden. Shown by this, despite the extremely high cost of healthcare we do not see the result of high quality services being provided to these individuals. In fact, through further discussion, we will discover that despite these high expenses the US scores extremely poorly on many key heath measurements.

If we begin to examine the quality of medical services provided by the US healthcare system, we would expect that with high costs there would be a correlation with high quality, right? It is actually the contrary. The US has the lowest life expectancy at birth with an average lifespan of 77 years, compared to the average of 80.4 years in other high-income countries. Similarly they have the highest death rates for treatable and avoidable conditions, which can be prevented through timely and effective health care treatments such as exams and screenings (Gunja, Gumas, and Williams II 2023). However due to the previous discussion of high costs for medical visits in the US, these preventative actions cannot be successfully applied in its current state. We see an overall lack in health quality despite our high spendings in the medical field, much of which has been attributed to the incorrect focus of the system. The US healthcare system is run as a business rather than a public service, with less focus on quality and the majority being aimed towards maximizing profits for health insurance companies. Combining the high cost of medical attention and the low quality service that can be provided, many individuals struggle to receive medical attention and further reduces the ability to access medical attention in the US.

When discussing access to healthcare there are two main components that contribute to this issue, financial barriers and time allocation. Similar to the discussion on affordability, the high cost of healthcare services limits access to individuals based on their income level and ability to obtain insurance coverage. Since financial costs of health insurance have been discussed previously, it is important to now provide insights to the effect it has on access to healthcare services as well. Emergency medical care is protected under the Emergency Medical Treatment and Labor Act (EMTALA), which requires emergency departments to provide medical care to individuals regardless of their health insurance, or lack thereof. This pertains only to life threatening treatment, and non-emergency care comes at a different cost. Many healthcare providers either require proof of health insurance or payment in order to receive medical services, and can refuse to treat a patient if these requirements cannot be met. This financial barrier of health insurance can limit access to medical care for those who fall under low income homes, and although there are other treatment options, the time allocation to wait for treatment is just another added issue to healthcare access (“Access to Health Services”, n.d.).

Timely access to care can be another contributing factor when it comes to individuals seeking medical attention. As of a 2023 national survey, a vast number of US adults experienced a longer than reasonable wait in order to receive medical treatment, with roughly a quarter of these respondents waiting over two months in order to receive the care they need (Black 2023). With these extremely long wait times and lack of easy access to healthcare, an unnecessary burden is placed on individuals to find alternative treatment options, with some even opting to give up on seeking medical care. One commonly proposed solution to poor access is emergency department visits, but they only further limit access due to extensive wait times for medical care. The median time US patients spent waiting for care in emergency rooms was 160 minutes, not including actual medical attention. Similar to extensive wait times previously discussed, the need to allocate large periods of time to get non guaranteed medical care causes many individuals to forgo medical attention and ignore potential life threatening conditions. The lack of accessibility to healthcare options in the US has left the system in dire need of revisions and vulnerable to options that may not be beneficial, and the push for AI systems as the solution to these problems could prove to only worsen the issue if not properly implemented.

We have already seen the integration of AI systems into the US healthcare system begin, and given its current state, makes it extremely vulnerable due to these unresolved issues. The previously discussed topics, along with a lack of practicing physicians, leaves the field open to automation to complete what can’t be currently done. We already see applications of AI in the area of healthcare in a variety of systems, such as medical image and smart IoT device analysis. Many large medical device manufacturers have begun development on AI system integration in their products such as General Electrics, Siemens, and Philips. However, we are now seeing large tech companies such as Google, Apple, and Amazon pushing for advancements in AI medical device research. Although they are contributing to the overall advancement in the field, the focus is placed on the business achievements rather than benefiting public health (Park 2020, 2). The collection of vast amounts of data are necessary for these products to succeed, but utilizing this information outside of its intended use raises substantial privacy concerns for patients.

Although there is a wide field of AI systems that are currently being developed, we aim to discuss the integration of Large Language Models (LLMs) within the context of the US healthcare system today. Extensive information regarding patient records and other private documents provide vast amounts of data that can be used to learn from and improve medical care through these systems, but may also do the exact opposite as intended. We already see companies such as Allscripts, Athenahealth, Cerner, eClinicalWorks, and Epic conducting research in order to optimize patient healthcare through the development of these systems. Similarly, IBM has continued their efforts towards developing Watson for Oncology, which provides optimized treatments for cancer patients that can be personalized based on the use of electronic medical records (Park 2020, 4). These systems are not without their flaws, and a misdiagnosis by an AI tool can be the difference between life and death. Many entities that are pushing for the use of AI in healthcare acknowledge these systems can be biased based on an individual’s personal features, misrepresent certain minority communities, and violate patient privacy by using their data to learn from and diagnose others. However, they believe that by mitigating these risks, the “small” tradeoff can provide enormous benefits for both patients and doctors in all different communities. Pulled from the White House briefing on utilizing AI to improve health outcomes, “[its] broader adoption could help doctors and health care workers deliver higher-quality, more empathetic care to patients in communities across the country while cutting healthcare costs by hundreds of billions of dollars annually” (Brainard, Tanden, and Prabhakar 2023). All of these sounds promising; fixing the main issues that impact the current healthcare system by reducing costs, increasing quality, and providing universal access to people in all different communities. However, this could raise a similar issue that we have previously seen through our discussion where the US healthcare system invests in technological advancements with the hope, but not guarantee, that it could improve patient care.

Although AI integration in US healthcare does provide a great amount of potential benefits for improving patient experience and medical treatment, it is not a replacement for the lack of practicing physicians or inability to access health care in low income communities. AI medical tools have the potential to create another barrier, where it can be accessed from anywhere, but also requires technology that not all individuals have access to. The vulnerable state of the US healthcare system makes AI integration sound promising and a potential solution to all the current issues that prevent affordable and quality care for individuals. However, without addressing the core issues that have been presented in this discussion, no amount of technology will be able to bring US healthcare quality and affordability equivalent to those of other high-income countries.

Section 3: Technical De-mystification

Large Language Models (LLMs) are an AI based system whose purpose is to learn the ability to comprehend and generate human-like language based on large collections of text. These systems belong in the field of Natural Language Processing (NLP), an area of AI that focuses on allowing computers to understand the structures behind human language. There are a variety of tasks that these systems can be developed towards; generating bodies of text, carrying on conversations, language translation, and summarizing documents are just the beginning of what this technology is capable of. Text comprehension is one of the main use cases of these systems, allowing them to understand and respond to text inputs in order to carry out conversations. One example of this is Duolingo Max, an educational service that incorporates LLMs to allow users to have live conversations with virtual characters in a variety of different languages in order to practice real-world conversation skills (Duolingo 2023). Text analysis is another common task, in which meaningful information is extracted to better understand the meaning behind language. Grammarly is an online service for grammar and spell checking, which also utilizes LLMs to analyze text and provide information on tone that helps writers understand how their words will be perceived, commonly referred to as sentiment analysis. Each of these tasks requires a different type of learning, and grasping how this is achieved is a key step in understanding these systems.

To understand how LLMs work, we must define what type of information is being learned from and how they are trained. These systems require vast amounts of text based data in order to learn from, often collected from a variety of sources such as the internet, articles, books, and any other publicly available sources. Training is the main component of how these systems learn, reading in large amounts of texts and extracting patterns that allow them to produce a given output that matches what is expected. This process is repeated until a certain level of accuracy is achieved by the system, typically dependent on the task in which it is being developed for. Once trained these systems work well in tasks that require general knowledge, but struggle in topic specific questions. After a social context is chosen for the system, a method known as “fine-tuning” is performed in order to continue the training and increase its understanding for a specific domain. If a system was being deployed in the field of healthcare this could include medical documents, Electronic Health Records (EHRs), books on human anatomy, and any other text based material that would increase the systems medical knowledge. Once this process is successfully completed, the system is considered ready for deployment and can be used in real world environments. Although these systems are highly advanced in their capabilities, they are restricted by their intended purpose in what they can and cannot do.

In a fascinating study by Yiu et al. LLMs were tested against children in a variety of tasks in order to determine what they are and are not capable of. From their findings, when compared to humans these systems were not successful at tasks that required innovation, but excelled when tasks required generating a response based on abstractions of existing knowledge (Yiu 2023, 2). These systems excel at imitating knowledge, but not at innovating new ways of problem solving. Throughout this paper we focus on its use case in the field of US healthcare, and it is essential to understand what they can and cannot do in that social context. Looking at the current capabilities of these systems, we do see many advantages to utilizing them as a tool for medical professionals. Virtual health assistants can allow patients to provide symptoms and suggest medical recommendations based on provided information, removing the necessity of medical visits for non-critical conditions. In addition to this, these systems are capable of summarizing large amounts of text into brief summaries. This allows for years of medical records to be quickly collected and presented to doctors in real time, allowing for increased diagnosis accuracy and patient understanding. To grasp what LLMs cannot do, there are two main concepts to cover. The first is that these systems lack any interpersonal skills that would allow them to have emotional understanding or sympathy for an individual, which is a crucial component in the patient-doctor relationship. Although they produce human-like language, they do not effectively communicate or interact with individuals as we do. Most importantly, these systems cannot differentiate between what they output as being right or wrong. And in the context of healthcare, this is the difference between giving correct or incorrect medical advice that can directly affect a person’s wellbeing. In addition to these, there are also limitations that are preventing the widespread adoption of this technology.

Although LLMs have the potential to see a very bright future in the area of healthcare, there are certain limitations that hinder its public acceptance. These systems are only aware of the text that they are trained on, and if it contains unknown biases that negatively impact certain groups of people, then the provided outputs will only continue to reinforce these harmful concepts. Building off this and as discussed earlier, these systems learn from a large amount of general text data in order to understand human language. However, in order to integrate the systems into the field of healthcare, they must receive further specialized training on medical data. Accessing this type of data, which includes Electronic Health Records (EHRs), poses a threat to patient privacy as their personal information is fed into these systems to learn from and make recommendations for other patients with. Finally, a common issue with many AI systems and especially LLMs, is that they have a very low level of interpretability and are often difficult to determine why certain outputs or decisions are made. This poses the issue, who do we hold accountable when incorrect medical advice is given to a patient by these systems? There is no current answer, but defining this solution is crucial to lessen the limitations that are impeding the acceptance of LLMs in the healthcare field.

Section 5: Privacy Analysis

Over the past few years, large language models (LLMs) have become a leading technology in the machine learning community. However, these models require substantial amounts of text data in order to train and because of this, the medical field is in a prime position for integration with this technology. Extensive amounts of text based data such as Electronic Health Records (EHRs), doctors and clinical notes, prescriptions, and medical textbooks all provide rich sources that these models can learn from. This technology does have the potential to greatly increase the subpar state of the current US healthcare system, with the ability to improve access and affordability to medical care to all communities. Through the use of online chatbots for symptom assessment, identifying new treatments, developing effective therapy techniques, and an overall improvement in the patient’s medical treatments, LLMs do have a bright future in the medical field. However, in order to achieve this, we must also ensure that we protect the privacy of those whose information is being used to develop these systems. There are many concerns that surround the integration of LLMs in the field of US healthcare, but one of the most crucial components is avoiding breaching patient confidentiality. Medical records are protected documents, and in order to use this data to train these models, researchers must comply with HIPAA protocols to protect the identities of these individuals. Similarly, the misuse of this data outside of its intended context is a concern that not many companies address, but must be discussed. In order to fully understand the impact that integrating LLMs into the US healthcare system can have, we must analyze a handful of concerns in order to determine the impact it will have on privacy as a whole. Understanding how the data used in these models is collected, the protection or breach of individual and group privacy, the lack of legal frameworks put into place, and insights on how we can mitigate the violation of medical privacy are all topics that require further discussion when it comes to integrating these AI systems into the healthcare field.

Before we can discuss how data is collected, we first must understand what type of data is being gathered in order to train these AI systems. Since LLMs specifically take text data as input we see information such as Electronic Health Records (EHRs) and medical literature being collected for use. Although there are many use cases for LLMs in the field of healthcare, we will specifically focus on their application to medical research for this discussion. To begin our understanding, it is important to know that EHRs are protected under the HIPAA Privacy Rule, which establishes standards to protect individuals medical records and other health information that can be used to identify them (“The HIPAA Privacy Rule” 2022). This becomes a gray area where certain medical information cannot be given to these models, but may be required in order to gain the performance necessary to make accurate diagnoses. These records contain information on patient demographic, medical history, medications, lab results, and diagnoses over time. As it composes their entire medical history, it contains a rich amount of data from which patterns can be drawn and improvements in diagnosis can be made. However, it also holds the power to reinforce incorrect information and biases that may be present in historically inaccurate medical records.

Medical literature tends to be a bit more straightforward when it comes to the use case, it is intended to allow these LLMs to understand medical concepts and current advancements in the field in order to accurately produce medical language. Data sources come from scientific papers, clinical trials, case studies, and published research that all aim to aid in the understanding of disease treatment and terminology. However, similar to EHRs, depending on the sources from which the data is drawn they can produce an unintended bias towards certain demographics if the research does not correctly represent all groups within the data. Not only do the methods of collecting both EHRs and medical literature raise concerns, but the quality of these sources can directly impact the performance and fairness of these AI systems. The collection of large amounts of data is necessary for the ability of these systems to perform well, but many healthcare providers are reluctant to share personal medical information. It becomes a tradeoff of privacy concerns and model performance, where in order to increase the accuracy of these LLMs we must provide more and more personal medical information that not all patients may be comfortable disclosing. In order to understand this, we must look further into how the data is collected for these systems and how it can affect the individual.

The collection of medical data can be a complex task, not only in terms of legal policies that must be followed but in obtaining the information from healthcare providers as well. In a 2020 paper that discusses AI changing medical sciences, the authors acknowledge that the current limiting factor in AI healthcare systems is the ability to access large amounts of data due to patient privacy and risks of data breaches. Due to personal information being present in EHR data, patients should have the right to decide what they are willing and unwilling to share for use in these models, which can lead to fragmented data being collected. Not only could we see large amounts of missing data due to patients' desire to maintain privacy, but sharing is often limited between healthcare organizations and can degrade the reliability of these models if not properly trained on enough data (Basu et al. 2020). However it is difficult to find if patients are being presented with the option of what they are willing to share, or if data anonymization has been used as a replacement to maintain patient privacy without their knowledge.

There are three key components that can mitigate risks in medical data collection to ensure patient knowledge and privacy is at the forefront; de-identifying personal data, securing patient consent, and establishing robust data sharing agreements that comply with current HIPAA protocols (Whittaker 2023). However, knowing if all, or any, of these practices are being followed currently has proven to be a difficult answer to uncover. Through all of the documentation and articles that have been explored, the emphasis has been on the relationship of medical record data between hospitals and AI researchers developing these models. Although they do mention patient privacy being a key factor in why hospitals are hesitant to share electronic health records to be used in LLMs, there is no mention of patient consent or awareness that their information is being shared. Many sources state that patients have the right to be informed how their data is being used and protected in these systems, and that they should be able to withdraw consent at any time they deem appropriate. But this raises the question of how consent is being given for the collection of data, is it an opt-in or opt-out system and is the public aware of this? Although data is anonymized before sharing, there are still methods for re-identifying individuals from these records that can compromise privacy and their ability to receive medical care. A lack of guidelines on how data is being collected, and an understanding of consent from patients, leads to large concerns of individuals privacy violation being made and requires further analysis to understand the current state.

When it comes to the privacy of individuals' medical records, these types of documents are protected under safeguards set forth by the HIPAA Privacy Rule. These standards allow patients to understand and decide how their health information is being utilized, but with the applications of LLMs in healthcare this has become a gray area. These standards do not protect the use of information, but only the privacy of being able to identify an individual based on information from their medical records. In order to comply with HIPAA regulations, AI systems such as LLMs can be used to scan large amounts of EHRs and anonymize sensitive information to create immense data sets. The idea of de-identification removes all personally identifiable information such that no single person can be identified from the data being used, protecting their medical history and future care. However, just as LLMs can remove this information, the issue of re-identification through these models can compromise patient anonymity and be used to identify individuals based on their medical records (“AI in Healthcare; What it means for HIPAA” 2023). In combination with the immense amount of data being collected, the security of where this is being stored is another issue that can compromise an individual’s privacy.

With the collection of substantial amounts of data that contains sensitive personal information, these storage systems have increasingly been the target for data breaches as we continue to advance LLM applications in the healthcare field. Because of this, the need to ensure this data remains anonymous and confidential only increases as we begin to apply them within AI systems. The US Department of Health and Human Services (HHS) contains a list of breaches for health information that has affected 500 or more individuals over the past 24 months. With over 900 currently open medical data breach cases, millions of individuals have had their information compromised and privacy breached as a result of electronic medical records not being properly protected. Although most appear to be small scale leaks, one notable breach has been from the Kaiser Foundation Health Plan, which saw 13.4 million individuals lose privacy due to unauthorized access/disclosure on April 12, 2024. Although a data breach is, for the most part, an accidental sharing of patient information, we also need to further explore the current state of patient consent to understand how individuals view their medical history being used in the training of these AI systems.

Consent is an essential component when it comes to acquiring personal medical information of individuals to train AI systems, but we do not currently see a standard practice in how this is implemented in the field of healthcare. We can infer the current state to be an opt-out procedure, where presumed consent is given by the individuals to the hospitals to use their anonymized medical records for these AI systems. Announced on March 19, 2024 by US Senators Ben Ray Lujan and Peter Welch, the AI CONSENT Act has been proposed to “get the express consent of the public before using their private, personal data to train your AI models… This legislation will help strengthen consumer protections and give Americans the power to determine how their data is used by online platforms. We cannot allow the public to be caught in the crossfire of a data arms race” (Senator Ben Ray Lujan 2024). These types of legal standards are a step in the right direction, but also an alarming unveiling of the current state of consent for the use of personal data in the AI healthcare systems. The act aims to require the opt-in consent standards for researchers using individuals data to train AI systems, but also presents the lack of consent in the current system. Patients can be unaware that their information is being used in LLMs, or possibly outside of the context in which it was intended.

Take a specific example in 2018 where Google acquired DeepMind, an AI development company that had been a leader in healthcare applications of these systems. Acquiring data from the National Health Service (NHS), roughly 1.6 million patients had their data uploaded to DeepMind servers without explicit consent as part of Google’s Project Nightingale. Employees at Google, with no medical expertise, had full access to non-anonymized patient records in order to analyze and train AI systems for medical care (Khan et al. 2023). A lack of transparency, explicit consent, and data anonymization led to the violation of millions of Americans' right to privacy and only further represented the need for legal frameworks to be put in place. Not only do LLMs in healthcare pose a potential threat to individual privacy, but also the capability of these systems to target entire communities based on misrepresentative data and threaten group privacy.

One of the main benefits of LLMs also poses one of the largest threats to group privacy. The system’s ability to recognize patterns and correlations within the data it is trained on can allow them to identify characteristics that are common within certain groups. Since these models are trained on large amounts of text data, they often permit direct knowledge of a person’s health record by others, often whom the individual would not want to obtain access to this information. From this entire groups can be categorized based on assumptions made by the system, whether they are correct or incorrect, which further violates privacy (Price & Cohen 2019, 40). If an individual falls within one of the identified groups, and negative connotations surround the suggestions by the system, they may experience both discrimination and embarrassment. In addition, medical treatment could be directly affected for these groups and will be further discussed.

Just as this accidental identification can violate privacy, it could also see real world implications that can directly affect the individuals that fall within these groups. If the system produces a detailed profile of groups based on its data, external companies can then use this information to take actions that would invade privacy. If employers of insurance companies are able to learn of sensitive patient information from private medical records, specifically regarding life-threatening or treatment intensive diseases, they may decide not to employ or insure the individual. This is crucial to individuals within the United States, as health insurance is typically tied to employment (Price & Cohen 2019, 41). Even if the individuals that are identified to be within the group are unaware of potential medical conditions, outputs by these systems can directly affect their ability to obtain fair medical coverage through an unintended use of private medical records. Similarly, if the group being inadvertently identified pertains to a high-risk condition or behavior, then individuals may experience increased monitoring as a result. This could range from requiring more frequent medical visits, unnecessary preventative actions, or a possible misdiagnosis based on the assumptions made about these groups.

Another major concern for privacy is the ability of large tech companies to re-identify individuals by cross referencing de-identified medical records with previously collected data. One case study is Dinerstein vs Google, when a collaboration between the University of Chicago Medicine and Google saw the sharing of de-identified medical records for the development of an AI system in the hospital. Dinerstein filed a class action complaint against both companies, claiming that the patients could easily be re-identified by combining their medical records and geolocated data collected by Google, along with a lack of consent being obtained before sharing this information (Ethics and governance of artificial intelligence for health: WHO guidance 2021, 38). With this data, targeted ads for medication and prescriptions could be presented to the individuals based on their medical history, both being intrusive and a breach of privacy. If companies have access to de-identified medical information, they are able to unlock new forms of advertisement and prioritize profit over the privacy of these groups. In some cases, companies do not even need to cross reference the medical records in order to identify the individuals.

Take for example Google’s Project Nightingale, a collaborative effort with a large healthcare provider, Ascension, in which the personal medical data of up to 50 million customers was shared for storage on Google Cloud. Surely the hospital got the consent of their patients to share this information with an external company, all medical records were de-identified, and only medical professionals had access to the data, right? All three are untrue. Google had obtained access to roughly 50 million medical records that were not anonymized, no patients were consulted before sharing this information, and any employee working at Google who worked on the project had access to all of the medical records. An entire group of patients who trusted Ascension as their medical provider to protect their personal information experienced privacy violations which allowed a tech company to have full access to their non-anonymized medical history. Just as the previous case study unveiled, this would allow for targeted advertisements for identified medical conditions of individuals and violate group privacy. It is often said that data is the new gold, and medical records contain an abundance of valuable information that can be used and learned from for these systems. However, ensuring that privacy breaches do not continue and that consent is clearly obtained for patients being utilized for training these systems, is a key component in the successful integration of LLMs into the field of healthcare.

Section 7: Accountability Analysis

The Organisation for Economic Co-operation and Development (OECD) is an intergovernmental organization whose purpose is to develop global policy standards for a variety of areas, of which the United States is a member country. Most recently they have enacted their principles for trustworthy AI, in which they define what accountability means for these systems. Paraphrased from their legal instructions, AI systems should ensure traceability in the decisions made during the systems life cycle, enabling the outputs and responses to be analyzed based on the inputted queries. In addition, these systems should apply systematic risk management approaches to adopt responsible processes to address risks related to the systems, including cooperation between different resources, system users, and stakeholders to mitigate risks and ensure human rights are protected (OECD, Recommendation of the Council on Artificial Intelligence 2024). Both of these structures ensure that the systems can be held accountable for the outputs produced, as well as include impacted individuals to improve the tools and minimize risks in future use cases. However, these are merely recommendations and not required policies for creating AI based systems. Despite the US adhering to the AI principles set forth by the OECD, we do not always see the recommended regulations for accountability when developing and deploying these AI systems in the field of healthcare.

The US Food and Drug Administration (FDA) is a US government agency that is responsible for protecting public health by assuring the safety and security of medical devices, with strict regulations on the technology. They are responsible for holding the medical industry accountable, taking legal action when violations of human rights occur as a result of these medical devices. LLMs are a computer based AI tool, with no physical component attached to these systems. As defined by the FDA, clinical decision support (CDS) software is a broad term that refers to any technology that provides health care providers and patients with specific information that is intelligently present at appropriate times to enhance the overall health and health care (U.S. Food and Drug Administration, n.d.). Although the FDA regulates physical devices that utilize AI, software based systems such as LLMs are exempt from FDA regulations since they fall under the category of CDS. By claiming these systems are to support medical professionals to make recommendations and not a replacement, they are exempt from regulations and the FDA can not hold the systems accountable for negative impacts to public health. This system falls into a legal gray area, where accountability cannot be defined since LLMs are an unregulated tool based on their use case definition.

When discussing LLMs in the field of healthcare there are two main types that need to be defined, advisory and independent systems, both of which require different levels of accountability. An advisory system refers to an AI tool that is utilized only by medical professionals to assist in medical diagnosis, where the human clinician makes the final decision on whether or not to use the system’s recommendation. An independent system refers to an AI tool that is accessible by the general public, in which the user interprets the output and makes their own decision. A general issue within both system types is that the developers often are not held accountable for the system, and the undue burden is placed on either the affected patient or the medical professional who used the technology (Ethics and governance of artificial intelligence for health: WHO guidance 2021, 42). Each system will see accountability fall on a different party, and create gaps in different areas that lead to potential moral crumple zones. To be clear, a moral crumple zone refers to accountability for an action being wrongfully attributed to a human actor who had limited, but slight, control over the behavior of an AI system (Elish 2019, 1). The medical field already sees complex definitions of accountability when medical malpractice occurs, and the integration of LLMs into this area only further complicates this.

Advisory systems refer to LLMs that are capable of summarizing patients' medical records, highlighting key medical conditions, and providing an initial diagnosis for medical professionals. In terms of these systems, accountability is defined to be secure since the medical professionals utilize the AI systems as a tool and still make the final decision before a medical diagnosis is given. However, this action can result in a dilemma with two equally undesirable choices. Either clinicians must spend the time to develop their own medical diagnosis as to the best course of action, meaning that the AI system adds little value; or clinicians must accept the advice blindly, unaware if the recommendations are correct or not (Habli, Lawton, & Porter 2020, 253). Where the moral crumple zone can occur in this type of system, and where we see norms in accountability begin to change, is when medical professionals and AI systems produce different recommendations based on the provided patient information. If the AI system is incorrect, but the medical professional catches the error and provides the correct diagnosis, we do not see accountability being held for the system. Re-training the system to no longer make this error is the furthest level of accountability we see, but no legal implications are enacted due to the medical malpractice. However, if the AI system is incorrect and the medical professional accepts the advice, we see accountability falling upon the human rather than the AI tool since they are the “final decider” in these advisory systems. Assigning the final decision to a clinician creates moral and safety dilemmas for them, and raises questions about how far a medical professional is accountable for patient harm (Habli, Lawton, & Porter 2020, 252). These systems are designed to be better than their human counterparts, but if they still require human intervention in order to verify recommendations, their use becomes questionable as a whole. Blame cannot be shifted from the system to the medical professional in this case, even if they are expected to identify incorrect recommendations. Advisory systems have to directly be held accountable for their misdiagnosis, whether that means legal implications or removing the system from being used.

An independent system could refer to a tool such as a medical chatbot, where symptoms can be provided and an estimated diagnosis can be produced. In terms of these systems, defining accountability becomes a bit of a gray area where there is no clear definition. This is best shown through the provided example. There are a variety of available chatbots for medical diagnosis, such as Ada Health which can asses user inputted symptoms to provide medical based solutions. However, most of these medical chatbots require signing up for a service or being covered by their companies insurance provider. One free service is ChatGPT, an online LLM capable of the tasks of providing medical diagnosis based on a list of provided symptoms (OpenAI 2023). Below are the symptoms for allergies provided by the Asthma and Allergy Foundation of America, which are then used as an input to ChatGPT to receive a medical diagnosis:

  • Prompt: Medical diagnosis for the following symptoms: swelling, rash, itching, and shortness of breath.

  • Response: The combination of symptoms you described could indicate several possible medical conditions, some of which could be serious. However, I must emphasize that I’m not a doctor, and a proper diagnosis requires a thorough examination by a qualified healthcare professional. That being said, the symptoms you mentioned could be associated with conditions such as…

The chatbot proceeds to list a variety of allergy based conditions, but no clear answers as to which one specifically. Similarly, these independent systems shift accountability before providing any type feedback to the user input. Ensuring that the patient knows the information being provided is not from a medical professional, even though it maintains the same knowledge as one, shifts accountability to the user for following any of the provided recommendations. This raises the question, what if a patient using a medical chatbot inputs the wrong symptoms and receives an incorrect diagnosis which they interpret as correct. Who becomes responsible in this case, the user for providing the wrong information or the system for providing the wrong recommendation? These gaps in accountability are what make these systems difficult to fully understand, and the moral crumple zone sees the patient taking accountability for a simple mistake that could lead to a life threatening misdiagnosis. Although we do not see it currently, these systems also need to be held accountable for misinformation being provided. Whether that pertains to the developers creating the system, or the healthcare provider utilizing it, accountability needs to be shifted away from the patient in order to ensure medical malpractices continue to be clearly defined while these systems become integrated.

The use of technology in medical diagnosis is not a new concept, with many non-AI technologies such as ultrasounds, MRIs, and other imaging devices being currently used. Although we do hold medical professionals accountable when the results from these devices are misread, we do not hold them accountable when the device itself malfunctions and they correctly interpret the faulty results. Similarly, if recommendations by the AI system are correct but misread by the medical professionals, accountability does fall on them. However, similar to faulty equipment, if the algorithm itself has an unknown error that is producing misinformation we cannot place accountability on the medical professional, but rather the developer of the system. Medical professionals operate these tools, but they do not have any input in the design or creation of them. This leads to a moral crumple zone when tech companies and developers are able to shift accountability to the human users, becoming scapegoats for the faults within the internal decisions made by the systems that the users have no control over (Ethics and governance of artificial intelligence for health: WHO guidance 2021, 44). It is not to say that medical professionals have no accountability with these systems, as they still have to interpret the results and infer if they are correct based on their own medical knowledge. However, the majority of the blame for these misdiagnosis recommendations needs to fall upon the developers of the system who have overlooked unknown biases in the data or errors in the algorithms that have inadvertently impacted individuals lives through outputting misinformation.

Before looking at impacted individuals and how they can be consulted to redesign the system or what resources they have to hold the systems accountable, it is important to understand the principles outlined for them. In October 2022, the AI Bill of Rights was proposed by the White House Office of Science and Technology Policy (OSTP) to identify five principles to guide the design and use of AI systems for the American public. The principle for “Human Alternatives, Consideration, and Fallback” aims to ensure impacted individuals have available resources in the event of an error in the AI system. Specifically, they suggest that individuals have access to the necessary means to provide feedback to the systems operators in the event of a system error, or in order to appeal the impacts experienced by the individual (“Blueprint for an AI Bill of Rights”, 2023). Although this is loosely defined, it does ensure that there is some type of mechanism in place for people to hold the system accountable and are provided the opportunity to be consulted in providing feedback to improve the system. However, the extent of these resources is highly dependent on the area in which the system is deployed, and the medical field does not see extensive amounts available to patients.

Individuals who are impacted by the use of the LLMs, whether it occurs through medical professional use or independent use of the system, still have very limited resources to make their concerns heard. As with any improper medical treatment experienced by a patient, the only current resource is to file a medical malpractice lawsuit against the provider. However, this is also a gray area currently since we do not have a clear definition regarding who is accountable when AI tools provide incorrect medical information. Similarly there is no way to directly provide feedback to the developers of the system, but only the medical provider that deploys the system. As many hospitals have a patient relations department where patients can raise concerns on ethical issues experienced through these AI systems, there is no evidence to support that providing feedback to them can contribute to the redesign of the system. The disconnect between developers and patients makes it difficult to provide feedback, as the medical provider acts as a middleman responsible for gathering impacted user information and providing it to developers for redesigning the system. Typically individuals have mechanisms such as the FDA MedWatch, which allows for them to report safety issues for regulated medical products. However, as discussed in the beginning of this section, LLMs fall under the CDS category and are exempt from FDA regulations based on their use case. This means that patients do not have access to filing complaints through this resource, and have yet to receive a legal mechanism similar to this for AI based systems. As we see more individuals begin to be impacted by these systems, in both a positive and negative outcome, it is crucial we develop mechanisms for allowing user feedback to be incorporated into the redesign of these tools.

In addition to accountability, allowing individuals to understand how these systems work can potentially help them identify when they have been impacted by it. It is important that individuals utilizing these systems understand how they work, both for a patient and medical professional. The previously discussed AI Bill of Rights also contains a principle regarding “Notice and Explanation”, which aims to ensure the following: individuals responsible for the development and deployment of AI systems should provide generally accessible clearly worded descriptions of the systems functionality, notice of its use, who is accountable for the system, and explanations of the outcomes to understand how they impact individuals (“Blueprint for an AI Bill of Rights”, 2023). Because of this, it is highly possible for impacted individuals to understand how LLMs work. Although they might not grasp every detail of how they are designed and built, a general knowledge of the application and limitations of these systems can help them better understand and identify misinformation from these systems. However just as we understand how a car works, it does not mean we know which party to hold accountable when something breaks. In a similar way just because we understand how the LLM can produce an output, we do not know where in the overall process the error occurred and which party to hold accountable. Do we hold the person who collected the input data accountable if there was misinformation initially provided? Do we hold the developer accountable if something in the algorithm causes an error to be produced? Or do we hold the user accountable if the outputs are misinterpreted or assumed to be correct with no proof? Understanding does not infer accountability, and although knowing how the system works can help prevent major issues, it is not enough to hold the system accountable when the output is perceived to be incorrect.

Section 9: References

“Access to Health Services.” n.d. Office of Disease Prevention and Health Promotion. Accessed May 1, 2024. https://health.gov/healthypeople/priority-areas/social-determinants-health/literature-summaries/access-health-services.

“AI in Healthcare; What it means for HIPAA.” 2023. Accountable HQ. https://www.accountablehq.com/post/ai-and-hipaa.

Basu, Kanadpriya, Ritwik Sinha, Aihui Ong, and Treena Basu. 2020. “Artificial Intelligence: How is It Changing Medical Sciences and Its Future?” NCBI. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7640807/.

Black, Bryan. 2023. “Two in Five Americans Report Unreasonable Health Care Wait Times.” American Association of Nurse Practitioners. https://www.aanp.org/news-feed/two-in-five-americans-report-unreasonable-health-care-wait-times.

“Blueprint for an AI Bill of Rights.” The White House, November 22, 2023. https://www.whitehouse.gov/ostp/ai-bill-of-rights/.

Brainard, Lael, Neera Tanden, and Arati Prabhakar. 2023. “Delivering on the Promise of AI to Improve Health Outcomes.” The White House. https://www.whitehouse.gov/briefing-room/blog/2023/12/14/delivering-on-the-promise-of-ai-to-improve-health-outcomes/.

Duolingo. 2023. “Introducing Duolingo Max, a Learning Experience Powered by GPT-4.” Duolingo Blog. March 14, 2023. https://blog.duolingo.com/duolingo-max/.

Elish, Madeleine Clare, Moral Crumple Zones: Cautionary Tales in Human-Robot Interaction (pre-print) (March 1, 2019). Engaging Science, Technology, and Society (pre-print), http://dx.doi.org/10.2139/ssrn.2757236

Ethics and governance of artificial intelligence for health: WHO guidance. Geneva: World Health Organization; 2021. License: CC BY-NC-SA 3.0 IGO

Gunja, Munira Z., Evan D. Gumas, and REginald D. Williams II. 2023. “U.S. Health Care from a Global Perspective, 2022: Accelerating Spending, Worsening Outcomes.” Commonwealth Fund. https://www.commonwealthfund.org/publications/issue-briefs/2023/jan/us-health-care-global-perspective-2022.

Habli, I., Lawton, T., & Porter, Z. (2020). Artificial intelligence in health care: accountability and safety. Bulletin of the World Health Organization, 98(4), 251–256. https://doi.org/10.2471/BLT.19.237487

Johnson, A. E. W., Pollard, T. J., Shen, L., Lehman, L.-W. H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Celi, L. A., & Mark, R. G. (2016). MIMIC-III, a freely accessible critical care database. Scientific Data, 3(1), 160035–160035. https://doi.org/10.1038/sdata.2016.35

Khan, Bangul, Hajira Fatima, Ayatullah Quereshi, Sanjay Kumar, Abdul Hanan, Jawad Hussain, and Saad Abdullah. 2023. “Drawbacks of Artificial Intelligence and Their Potential Solutions in the Healthcare Sector.” NCBI. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9908503/.

Kroll, J. A., Huey, J., Barocas, S., Felten, E. W., Reidenberg, J. R., Robinson, D. G., & Yu, H. (2017). “Accountable algorithms.” University of Pennsylvania Law Review, 165, 633.

Lopes, Lunna, Alex Montero, Marley Presiado, and Liz Hamel. 2024. “Americans’ Challenges with Health Care Costs.” KFF. https://www.kff.org/health-costs/issue-brief/americans-challenges-with-health-care-costs/.

OECD, Recommendation of the Council on Artificial Intelligence, OECD/LEGAL/0449

OpenAI. (2023). ChatGPT (Mar 14 version) [Large language model]. https://chat.openai.com/chat

Park, C. W., Seo, S. W., Kang, N., Ko, B., Choi, B. W., Park, C. M., Chang, D. K., Kim, H., Kim, H., Lee, H., Jang, J., Ye, J. C., Jeon, J. H., Seo, J. B., Kim, K. J., Jung, K. H., Kim, N., Paek, S., Shin, S. Y., Yoo, S., … Yoon, H. J. (2020). Artificial Intelligence in Health Care: Current Applications and Issues. Journal of Korean medical science, 35(42), e379. https://doi.org/10.3346/jkms.2020.35.e379

Price , W.N., Cohen, I.G. Privacy in the age of medical big data. Nat Med 25, 37–43 (2019). https://doi.org/10.1038/s41591-018-0272-7

Radley, David C., Arnav Shah, Sara R. Collins, Neil R. Powe, and Laurie C. Zephyrin. 2024. “Advancing Racial Equity in U.S. Health Care: State Disparities.” Commonwealth Fund. https://www.commonwealthfund.org/publications/fund-reports/2024/apr/advancing-racial-equity-us-health-care.

Robins, Geoff. 2021. “The Astronomical Price of Insulin Hurts American Families.” RAND. https://www.rand.org/pubs/articles/2021/the-astronomical-price-of-insulin-hurts-american-families.html.

Senator Ben Ray Lujan. 2024. “Luján, Welch Introduce Bill to Require Online Platforms Receive Consumers' Consent Before Using Their Personal Data to Train AI Models - Senator Ben Ray Luján.” March 19, 2024. https://www.lujan.senate.gov/newsroom/press-releases/lujan-welch-introduce-billto-require-online-platforms-receive-consumers-consent-before-using-their-personal-data-to-train-ai-models/.

Shmerling, Robert H. 2021. “Is our healthcare system broken?” Harvard Health. https://www.health.harvard.edu/blog/is-our-healthcare-system-broken-202107132542.

U.S. Food and Drug Administration, “Clinical Decision Support Software—Draft Guidance for Industry and Food and Drug Administration Staff” (2019), https://www.fda.gov/media/109618/download.

Whittaker, Becky. 2023. “Healthcare AI and HIPAA privacy concerns: Everything you need to know.” Tebra. https://www.tebra.com/theintake/practice-operations/legal-and-compliance/privacy-concerns-with-ai-in-healthcare.

Yiu, E., Kosoy, E., & Gopnik, A. (2023). Transmission Versus Truth, Imitation Versus Innovation: What Children Can Do That Large Language and Language-and-Vision Models Cannot (Yet). Perspectives on Psychological Science, 0(0). https://doi.org/10.1177/17456916231201401