The smallest possible LLM API
Project description
MicroLlama
The smallest possible LLM API. Build a question and answer interface to your own content in a few minutes. Uses OpenAI embeddings, gpt-3.5 and Faiss, via Langchain.
Usage
- Combine your source documents into a single JSON file called
source.json
. It should look like this:
[
{
"source": "Reference to the source of your content. Typically a title.",
"url": "URL for your source. This key is optional.",
"content": "Your content as a single string. If there's a title or summary, put these first, separated by new lines."
},
...
]
See example.source.json
for an example.
- Install dependencies:
pip install langchain faiss-cpu openai fastapi "uvicorn[standard]"
-
Get an OpenAI API key and add it to the environment, e.g.
export OPENAI_API_KEY=sk-etc
. Note that indexing and querying require OpenAI credits, which aren't free. -
Run your server with
uvicorn serve:app
. If the search index doesn't exist, it'll be created and stored. -
Query your documents at /api/ask?your question or use the simple front-end at /
Deploying your API
On Fly.io
Sign up for a Fly.io account and install flyctl. Then:
fly launch # answer no to Postgres, Redis and deploying now
fly secrets set OPENAI_API_KEY=sk-etc
fly deploy
On Google Cloud Run
gcloud run deploy --source . --set-env-vars="OPENAI_API_KEY=sk-etc"
For Cloud Run and other serverless platforms you should probably generate the
FAISS index at container build time, to reduce cold starts. See the two
commented lines in Dockerfile
.
Based on
- Langchain
- Simon Willison's blog post, datasette-openai and datasette-faiss.
- FastAPI
- GPT Index
- Dagster blog post
TODO
- Use splitting which generates more meaningful fragments, e.g.
text_splitter =
SpacyTextSplitter(chunk_size=700, chunk_overlap=200, separator=" ")
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for microllama-0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 78585ab5f3bbd59573bdfeca12e87da670b9bc7f547225f978ea1be91022efa0 |
|
MD5 | ac8d04ad787286723641beee97d5234c |
|
BLAKE2b-256 | c652a0f955c8007407b859dc6a1f8082c4070587df592c8ac352e56ba9dfe8fe |