Not "Just A Wrapper" - Using OpenAI with Tools
AI has gone mainstream, and OpenAI's ChatGPT is ushering in a wave of new applications layered on top of one of the most exciting APIs that indie hackers have ever had to play with. Many of these apps are just shallow wrappers over OpenAI's technology, offering little more than structured prompts or basic integrations. This trend mirrors past hype cycles like cryptocurrency where people were trying to capitalize on buzzwords without being truly innovative.
This all has a very negative connotation, but don't get me wrong - wrappers are not inherently bad. Vercel is essentially a wrapper over AWS, providing a nice user interface and better developer experience and people love them. Never feel bad! If you are learning, then creating a wrapper over OpenAI's functionality is a great way to think about all of the decisions that they made when creating a user-facing front end.
If you are just starting with the Assistants API, then check out this introduction first. I'll assume some basic knowledge from here:
If you want to create an application that truly enhances the capabilities of AI (and is therefore cool), then using tools is a great way to do that. OpenAI have provided an intuitive API that you can use to enhance and give your application a little more horsepower / uniqueness that will make it more than just a wrapper.
What this Covers
- Deep(ish) Dive on Each Tool - learn how to integrate data into your AI workflow, enable assistants to write and execute code, and leverage function calling.
- Practical Examples and Code - Enough to get you started with each of the three main tool types
Types of Tools
There are (currently) three types of tools that you can use with OpenAI's APIs and these are: File Search, Code Interpreter, and Function Calling. With these three classes of tools, you can add a ton of value / functionality to your AI application.
File Search
File search is the newest tool to be added to the assistants API and is incredibly useful for one of the most popular custom AI tools called RAG (retrieval augmented generation). All language models or generative AI systems have some kind of knowledge cutoff. Training takes a long time so the source data is inherently non-current. If you want to build an AI that can use current data, or even add data on the fly, then you will need to be able to retrieve and integrate that data into your threads / completion contexts. Make sure to give your assistant the "file_search" tool when creating it (note the "tools" section):
const assistant = await openai.beta.assistants.create({
name: "Study Buddy Pro",
instructions: `You are an intelligent study partner.
I will upload course notes for you to help me learn from.`,
model: "gpt-4o",
tools: [{ type: "file_search" }],
});
A great example of this is the plethora of "Chat with you PDF" apps that are going around. At the time of writing, OpenAI supports parsing about 20 different file types. You can add a file to an thread by first uploading the file, and then attaching it to a message:
const classnotes = await openai.files.create({
file: fs.createReadStream("class-notes.pdf"),
purpose: "assistants",
});
const thread = await openai.beta.threads.create({
messages: [
{
role: "user",
content: "Concisely, what did we cover in class today?",
attachments: [
{
file_id: classnotes.id,
tools: [{ type: "file_search" }]
}
],
},
],
});
This file will be accessible from messages belonging to the thread. You add a file to either (1) an assistant, or (2) a thread. The choice of which should depend on how you want to control access to the file. Files attached to assistants are accessible from any thread that assistant is part of.
When you execute a run, if a file was referenced or searched, you will receive annotation data that describes what files were used.
const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
assistant_id: assistant.id,
});
const messages = await openai.beta.threads.messages.list(thread.id, {
run_id: run.id,
});
const message = messages.data.pop()!;
if (message.content[0].type === "text") {
const { text } = message.content[0];
console.log(text.value);
console.log(text.annotations);
}
The annotations data also provides an index_start
and index_end
which tell you exactly what part of the response was "upgraded" using the file data. The maximum file size is 512MB but if you need more, you can add multiple files to a vector_store
and search it as one entity, this is actually what happens "behind-the-scenes" when your file is added to an assistant and run. Check out the API docs for more detail about how this works and the relevant defaults like chunk sizes / indexing strategies.
Code Interpreter
This tool allows your assistants to run code that they write. You simply ask questions with code_interpreter enabled, and the assistant will write code, run it, keep track of logs and output, and then summarize the results for you. It's great for performing simple data analysis, but you need to interpret the output carefully. First, create an assistant with the tool enabled:
const assistant = await openai.beta.assistants.create({
instructions: "You are an expert data analyst, write and run code to answer my questions.",
model: "gpt-4o",
tools: [{"type": "code_interpreter"}]
});
Your assistant can now write and run it's own code. To make this really useful, try adding a file to a thread. To let the assistant access this file from the code it writes, you need to specify the type as code_interpreter
.
If code interpreter was used in a run, the assistant will output some data including input and output logs. You can really dig into exactly what ChatGPT did. The assistant can also return (1) images or (2) data files. This is very useful for generating a csv
or a plot of data (for example). If they affected the output, data files are going to be found in the citations (similar to what we saw with function search). Image files will be in the assistant message response. Both have file_id
values that you can use to access them using the SDK. For example:
const response = await openai.files.content("file-id-value");
// Extract the binary data from the Response object
const image_data = await response.arrayBuffer();
// Convert the binary data to a Buffer
const image_data_buffer = Buffer.from(image_data);
If you want to see what's possible with this tool, check out a fun conversation I had with a code_interpreter
enabled Custom GPT, check out this link below. I uploaded a csv
with public historical data from my favourite collectible card game and we analyzed it together:
Function Calling
This is the most open-ended of the tools and gives you a lot of control over how it works. It allows you to describe functions and required parameters to your assistants that they can use when they need to. During a run, if an agent determines that it would be useful to call your function, it will return the name of the function and inputs and then wait for your response.
This is the tool that I find the most exciting. Imagine using it for UI Automation (think controlling a map in a front end), image recognition (call another API in your function) or even e-commerce or transactional functionality. The possibilities are endless!
Here is an example I built into an assistant recently. I add a tool for getting a search engine results page. Start by describing your function and telling your assistant about it:
const serp = {
"name": "internet_search",
"description": "Get information on recent events from the web.",
"parameters": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": `The search query to use.
For example: "Where is Sporty Spice from?"`
}
},
"required": [
"query"
]
}
}
const assistant = await client.beta.assistants.create({
model: "gpt-4o",
instructions:
"You are a helpful research assistant. Use the provided functions to answer questions.",
tools: [ serp ]
});
The properties object tells the assistant what parameters it needs to provide to the function. In this case, simply query. It follows the JSON specification, so any of the following are valid types: object, array, number, string, true
, false
, or null
.
When you execute a run, if the assistant determines that a response requires_action
it will wait until you submitToolOutputs
and use those in it's response.
You'll need to implement getToolOutputs
and it should handle your function name specifically:
public async getToolOutputs(toolsToCall: OpenAI.Beta.Threads.Runs.RequiredActionFunctionToolCall[]): Promise<OpenAI.Beta.Threads.Runs.RunSubmitToolOutputsParams.ToolOutput[]> {
const toolOutputArray = (await Promise.all(toolsToCall.map(async call => {
if (call.function.name === 'internet_search') {
const query = JSON.parse(call.function.arguments)["query"];
console.log('tool called -- internet_search', query);
return {
tool_call_id: call.id,
output: await this.serpProvider.search(query)
}
}
}))).filter(Boolean)
return toolOutputArray;
}
These tool outputs are submitted back to OpenAI, and then it will finish formulating a response using the results. For more details, check out the official API docs.
Thanks for reading! Need to get started quickly? Grab one of these working examples from OpenAI, and happy building!