LibGuides: AI in the Library: Data Safety and Privacy

Why Be Concerned?

When using GenAI tools, it is important to understand how they are trained, as any input or prompt may be used to further train that AI depending on model-specific data privacy settings and policies. This gets especially tricky for HIPAA, FERPA, and any other personal identifiable information. Commercial GenAI models should provide transparent information on how your input is being used, and it is best practice to read the data storage and training policies (see below) to familiarize yourself with what you are agreeing to. Please do not enter personal information, sensitive data, or proprietary data, and always consider that any and all information that is input into an LLM may have ethical, legal, or other safety concerns.

One solution to consider while working with personal information or sensitive data is installing an open-source model locally on a personal machine. Ensuring that the model is run without access to the internet or connected to a server (e.g., firewalls) can provide an extra layer of security. Individuals can also create personal or business accounts with Amazon Bedrock, a full service that provides use of their Nova foundation AI models, customizable data techniques, and the ability to build GenAI applications with secure data protection and privacy.

Another option for organizations and businesses is to work with a private cloud or a virtual network (VNet)/virtual private cloud (VPC). Private clouds offer dedicated working environments to a single organization, allowing for control over security and infrastructure. VPCs are isolated working environments within a larger public cloud that offer scalability and enhanced security while minimizing the operational costs of a full-scale private cloud.

Amazon Bedrock User Guide | Microsoft Azure - Private Cloud | VNet vs VPC | AWS VPC

Open-Source vs Closed-Source

GenAI models are not all the same when it comes to accessibility and security—this depends on whether a model is open-source or closed-source. Understanding the advantages and disadvantages of both can inform decisions about data safety and privacy.

Popular LLMs such as ChatGPT and Gemini are closed-source models, meaning that they are developed and maintained within a closed environment and users cannot access the underlying code or training data. The benefits of closed-source models are that they provide dedicated expertise and support, and protect businesses with intellectual property. To maintain support and costs, closed-source models typically charge subscription fees to access the full range of features and limit users on access and customizability. Input data is processed on a server that may be at risk of security breaches, and may be stored or used for further training.

Open-source models such as Llama 3 (Meta), Grok, and BLOOM are examples of LLMs (or components) that can be accessed by anyone for personal use and distribution under certain licensing terms. Open-source models can be customized, inspected, and modified by users. This allows for community-driven collaboration and development to constantly improve the models and minimize security flaws. Open-source models are not maintained with official technical support, do not have predictable operating costs, and do not offer centralized security.

GenAI Data & Training Policies

GenAI models do not all treat data the same way, and user prompts may be used to further train or improve the model. For example, the free version of OpenAI's ChatGPT uses input to improve their models by default and users will need to opt out of training in their settings. ChatGPT Enterprise and API users are offered enhanced security and privacy with their data, so prompts are not used for training in order to protect professional and institutional data. For OU students, faculty, and staff, Copilot is available with institutional data protection through Microsoft 365.

Copilot:

Data, Privacy, and Security for Microsoft 365 Copilot

OpenAI (ChatGPT):

Data | Privacy Policy | Usage Policies

Google Gemini:

Gemini Privacy Hub | Gemini Documentation

Anthropic (Claude):

Data Handling & Retention | Usage Policy