ChatGPT on a budget


The code for this blog can be found in this repository.

For us who use ChatGPT on a daily basis, regardless if it is for fun, learning, side projects or serious production loads, the cost of ChatGPT can easily rack up. Call me thrifty, but I wouldn't say I like spending money when not necessary and as it turns out, there is quite a neat way of bringing down your LLM costs by using a feature called Batch API.


It's all about budgeting

Together with my girlfriend, I try to track our expenses. Every month we set aside a bit of money for things like groceries, utilities etc. Each of these categories gets its own budget so that at the end of the month we can see where our money went. It's a neat way of tracking our spending habits. We leverage a tool called Actual budget which I have every praise for - it's free, open-source, has great UI and has an amazing team behind it. Consider supporting them in case you are using their software. As you can imagine, tracking these things is also a lot of work, so as every engineer I try to automate the process. 🤓 The nice part is that I also use ChatGPT for certain steps, so I will show you how to stay within your budget.


Batch API

Luckily for me, my banking app allows exporting monthly transactions in the CSV format. However, Actual budget has a slightly different format so I have to transform this CSV. While most of this process is regular code, I use ChatGPT to categorize each of my transactions.

Because I don't need my categories immediately, this makes a great use case for Batch API. By using this API, I save 50% on ChatGPT costs, the only catch is that I have to wait because the whole API works asynchronously. OpenAI guarantees that your requests will be processed within 24 hours, but in practice, this usually happens faster. When I wrote this blog, it took only 15 minutes! Let's batch up our transactions!

Batch API is also available via Azure OpenAI service.

I wrote a small Python script that reads a CSV file with dummy bank transactions which we will then hand off to ChatGPT as a single batch. The categories prompt is evidently biased by this dummy data, but feel free to adjust it to your needs, it's good enough for example purposes.

The Batch API is not that straightforward and roughly consists of 4 steps:

  1. Create a .jsonl batch file with your requests
  2. Upload the batch file
  3. Create the batch
  4. Retrieve results after 24 hours

Before I embarked on the Batch API journey, I'd never heard of JSONL format. According to the JSONL website it is well suited for bulk import due to flexibility. Since this is still valid JSON, I'll be using json module from the Python standard library. Run uv run create_batch.py that will produce a file such as this one.

In a separate post, I wrote more about ChatGPT structured outputs which I'm using here as well. Next up, we will upload our batch file by running uv run upload_batch.py then within the same script, we will call another endpoint that will create the batch. Here is a response, it's stored in uploaded-batch.json.

{
  "id": "batch_6708afc33ee481909572c2ac60d28e43",
  "completion_window": "24h",
  "created_at": 1728622531,
  "endpoint": "/v1/chat/completions",
  "input_file_id": "file-rMdafG9KruWxFrzK670KjBtg",
  "object": "batch",
  "status": "validating",
  "cancelled_at": null,
  "cancelling_at": null,
  "completed_at": null,
  "error_file_id": null,
  "errors": null,
  "expired_at": null,
  "expires_at": 1728708931,
  "failed_at": null,
  "finalizing_at": null,
  "in_progress_at": null,
  "metadata": null,
  "output_file_id": null,
  "request_counts": { "completed": 0, "failed": 0, "total": 0 }
}

It's very important to save it because it's our confirmation that the batch has been created. Remember, we now need to wait. You can think of this as ordering on Amazon, once you order you'll get an order number. In our case, it's called id which we will use to retrieve our results from ChatGPT.

After a while execute uv run download_batch.py to awe in the batching magic! The best part - that was 50% cheaper than your usual LLM costs.

A few days ago, Anthropic one of the main OpenAI competitors also announced their Batch API feature. It's quite similar to this one and I'll cover it in another post.


Real-world savings and shortcomings

It's apparent that Batch API is not suitable for every workload because no one likes waiting, especially if you need to wait for 24 hours. However, for those non-critical routes when you don't need an immediate response, it can be a massive cost reduction. A careful observer might have noticed that one of the supported endpoints for the Batch API is /v1/embeddings which in my opinion is the best use case of this API. Folks building RAG systems can benefit massively from it.

To paraphrase a saying from my mother tongue - if you don't pay while you are crossing the bridge, you'll be paying on the next one. In other words, we did cut down our LLM costs by 50% but on the other side, we have costs that might not be so evident. To run the Batch API in production, you undoubtedly need to manage more code that orchestrates all those steps. Since we have to wait for results, we could use a task queue that would occasionally check back on the status but that brings more infrastructure that we need to maintain. Don't get me wrong, it's worth it, regardless of the additional maintenance costs but good to keep this in mind when working with the Batch API.

In case this blog saved you a bit of money and you are still reading, you owe me a beer. Enjoy! 🍻