Package to test Prompt Injection Against OpenAI's ChatGPT, Google's Gemini and Azure Open AI

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Prompt Injection Benchmarking

The mother of all prompt injection benchmarking repositories, this is the one you have been waiting for (or soon will be)

Analysing ChatGPT-4 and Gemini Pro Jailbreak Detection (CyberPunk mode)

This repository contains Python code to analyze the Hugging Face Jailbreak dataset against OpenAI's ChatGPT-4 an Gemini Pro models. The code sends prompts from the dataset, processes and tabulates results.

Anxious to check the results?

The table below shows the total injection attack prompts according to the data set and the number of detected attacks by ChatGPT-4 and Gemini Pro.

Prompts	GPT-4	Gemini
139
Detected	133	56
Not Attack	TBD	TBD
Missed Attack	TBD	TBD

Next steps

Test with Azure Open AI w/wo Jailbreak Risk Detection
Tabulate missed attacks (attack not caught by the model) vs. not attack (the model explicitly did not consider an attack)

Requirements

To run this code, you need the following:

Python 3
OpenAI Python library
Hugging Face datasets library
A valid OpenAI API key
A valid Google API key
A valid Azure Open AI endpoint and all the config that comes with it

If you do not have all these keys the code will skip that LLM, no need to worry.

Setup

Clone this repository:

git clone https://github.com/BenderScript/PromptInjectionBench.git
cd PromptInjectionBench

Create a .env

Create a .env file in the project root directory that contains your OpenAI API, Azure and Google Keys. If you do not have all these keys the code will skip that LLM, no need to worry.

OPENAI_API_KEY=your key>
#
GOOGLE_API_KEY=<your key>
# Azure
AZURE_OPENAI_API_KEY=your key>
AZURE_MODEL_NAME=gpt-4
AZURE_OPENAI_ENDPOINT=your endpoint>
AZURE_OPENAI_API_VERSION=<your api verion, normally 2023-12-01-preview>
AZURE_OPENAI_DEPLOYMENT=<your deployment name. This is the name you gave when deploying the model>

Install the required libraries using pip3 and the provided requirements.txt file:

pip3 install -r requirements.txt

Running the Analysis

To analyze the Jailbreak dataset with ChatGPT-4 and Gemini Pro, execute the following Python script:

uvicorn prompt_injection_bench.server:prompt_bench_app --reload --port 9002

If Everything goes well, you should see the following page at http://127.0.0.1:9001

Landing page

This script loads the dataset, iterates through prompts, sends them to ChatGPT-4, and detects potential injection attacks in the generated responses.

Testing

See the demo below where the App checks a prompt with a malicious URL and injection.

Demo

Code Structure

server.py: The main script that loads the dataset, drives the test, and performs the analysis.
gemini* : The gemini pro code
openai* : The openai code

Skipping "Benign" Prompts

In the interest of time, the code skips prompts labeled as "benign." This helps focus the analysis on potentially harmful prompts where injection attacks might occur.

Tabulating Attacks vs. Detected Attacks

The script tabulates the total number of generated texts and the number of texts where potential injection attacks are detected. It counts these attacks based on the presence of specific keywords such as "I'm sorry," "apologize," or "developed by OpenAI" in the generated text.

Results

The script will print information about each prompt, including its type, the generated text from ChatGPT-4, and whether an injection attack was detected based on the presence of specific keywords.

License

This code is provided under the Apache License 2.0. Feel free to use and modify it as needed.

This analysis is provided as a reference and demonstration of using OpenAI's ChatGPT-4 model for evaluating prompt injection attacks in text datasets.

For more information about OpenAI's GPT-4 model and the Hugging Face Jailbreak dataset, please refer to the official documentation and sources:


These explanations added to the README.md should help users understand why "benign" prompts are skipped and how the code tabulates attacks vs. detected attacks.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.13

Mar 13, 2024

0.1.12

Feb 20, 2024

0.1.11

Feb 15, 2024

0.1.9

Feb 9, 2024

0.1.8

Feb 4, 2024

This version

0.1.7

Feb 4, 2024

0.1.6

Jan 26, 2024

0.1.4

Jan 24, 2024

0.1.3

Jan 24, 2024

0.1.2

Jan 22, 2024

0.1.1

Jan 21, 2024

0.1.0

Jan 21, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_injection_bench-0.1.7.tar.gz (13.1 MB view hashes)

Uploaded Feb 4, 2024 Source

Built Distribution

prompt_injection_bench-0.1.7-py3-none-any.whl (13.1 MB view hashes)

Uploaded Feb 4, 2024 Python 3

Hashes for prompt_injection_bench-0.1.7.tar.gz

Hashes for prompt_injection_bench-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`34ec14178f2c7925d4e7c12b38052db218106740658e7cde68617823b1bbc520`
MD5	`73b57b5b9d5870c9040109b1b40add0e`
BLAKE2b-256	`cfdf90963c01d1822368af8fce7f4ca46dc7002a19f8d95cfb6f80e2f99ea2e3`

Hashes for prompt_injection_bench-0.1.7-py3-none-any.whl

Hashes for prompt_injection_bench-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`03fa7ff48d19eb375adf9fb75cbf39fc359e660cbfe103fefbea098801bc1441`
MD5	`7ea2853542c73fc74585bba797e100fd`
BLAKE2b-256	`de43408c1612e56568be2573caf3109e624abcdb644651c7527af302515c7162`