fmeval

Amazon Foundation Model Evaluations

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Intended Audience
- Developers
License
- Other/Proprietary License
Natural Language
- English
Programming Language

Project description

Foundation Model Evaluations Library

FMEval is a library to evaluate Large Language Models (LLMs), to help evaluate and select the best large language models (LLMs) for your use case. The library can help evaluate LLMs for the following tasks:

Open-ended generation - the production of natural human responses to general questions that do not have a pre-defined structure.
Text summarization - the verbatim extraction of a few pieces of highly relevant text (extraction) or the condensed summarization of the original text (abstraction).
Question Answering - the generation of a relevant and accurate response to a question.
Classification - assigning a category, such as a label or score, to text based on its content.

The library contains the following:

Implementation of popular metrics (eval algorithms) such as Accuracy, Toxicity, Semantic Robustness and Prompt Stereotyping or evaluating LLMs across different tasks.
Implementation of ModelRunner interface. ModelRunner encapsulates the logic for invoking LLMs, exposing a predict method that greatly simplifies interactions with LLMs within eval algorithm code. The interface can be extended by the user for their LLMs. We have built-in support for AWS SageMaker Jumpstart Endpoints, AWS SageMaker Endpoints and Bedrock Models.

Installation

To install the package from PIP you can simply do:

pip install fmeval

Usage

You can see examples of running evaluations on your LLMs with built-in or custom datasets in the examples folder.

Main steps for using fmeval are:

Create a ModelRunner which can perform invocations on your LLM. We have built-in support for AWS SageMaker Jumpstart Endpoints, AWS SageMaker Endpoints and AWS Bedrock Models. You can also extend the ModelRunner interface for any LLMs hosted anywhere.
Use any of the supported eval_algorithms.

eval_algo = get_eval_algorithm("toxicity", ToxicityConfig())
eval_output = eval_algo.evaluate(model=model_runner)

Note: You can update the default eval config parameters for your specific use case.

Using a custom dataset for an evaluation

We have our built-in datasets configured, which are consumed for computing the scores in eval algorithms. You can choose to use a custom dataset in the following manner.

Create a DataConfig for your custom dataset

config = DataConfig(
    dataset_name="custom_dataset",
    dataset_uri="./custom_dataset.jsonl",
    dataset_mime_type="application/jsonlines",
    model_input_location="question",
    target_output_location="answer",
)

Use eval algorithm with custom dataset

eval_algo = get_eval_algorithm("toxicity", ToxicityConfig())
eval_output = eval_algo.evaluate(model=model_runner, dataset_config=config)

Please refer to code documentation and examples for understanding other details around usage of eval algorithms.

Development

Setup

Once a virtual environment is set up with python3.10, run the following command to install all dependencies:

./devtool all

Adding python dependencies

We use poetry to manage python dependencies in this project. If you want to add a new dependency, please update the pyproject.toml file, and run the poetry update command to update the poetry.lock file (which is checked in).

Other than this step above to add dependencies, everything else should be managed with devtool commands.

Adding your own Eval Algorithm

Details TBA

Security

See CONTRIBUTING for more information.

License

This project is licensed under the Apache-2.0 License.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Intended Audience
- Developers
License
- Other/Proprietary License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

1.0.3

May 10, 2024

1.0.2

Apr 25, 2024

1.0.1

Apr 17, 2024

1.0.0

Mar 29, 2024

0.4.0

Feb 21, 2024

0.3.0

Dec 13, 2023

0.2.1

Dec 7, 2023

This version

0.2.0

Nov 29, 2023

0.1.0 yanked

Nov 3, 2023

Reason this release was yanked:

initial version

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

fmeval-0.2.0-py3-none-any.whl (103.0 kB view hashes)

Uploaded Nov 29, 2023 Python 3

Hashes for fmeval-0.2.0-py3-none-any.whl

Hashes for fmeval-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a7cbf5be8ec4abe7e361fbc2d8db54e55c59407e82844782ae29898acbf845b8`
MD5	`0389fca7e262c2c1390783c2e943ab39`
BLAKE2b-256	`47283da2727dec8259c691b2b27b89a1cc003fc9702344aae28abe2515620757`