Not "Just A Wrapper" - Using OpenAI with Tools

Not "Just A Wrapper" - Using OpenAI with Tools
/imagine giant whiteboard with a colorful diagram, clear details, box diagrams, outline, plan, arrows title at top says "Not Just a Wrapper" --ar 16:9

AI has gone mainstream, and OpenAI's ChatGPT is ushering in a wave of new applications layered on top of one of the most exciting APIs that indie hackers have ever had to play with. Many of these apps are just shallow wrappers over OpenAI's technology, offering little more than structured prompts or basic integrations. This trend mirrors past hype cycles like cryptocurrency where people were trying to capitalize on buzzwords without being truly innovative.

šŸ’”
A "shallow wrapper" over AI typically refers to a software or interface layer that encapsulates the functionality of an AI system without adding significant additional intelligence or depth. This wrapper provides a way to interact with the AI, often simplifying the user experience or integrating it into a larger system, but it doesn't fundamentally change or enhance the underlying AI capabilities.

This all has a very negative connotation, but don't get me wrong - wrappers are not inherently bad. Vercel is essentially a wrapper over AWS, providing a nice user interface and better developer experience and people love them. Never feel bad! If you are learning, then creating a wrapper over OpenAI's functionality is a great way to think about all of the decisions that they made when creating a user-facing front end.

If you are just starting with the Assistants API, then check out this introduction first. I'll assume some basic knowledge from here:

The OpenAI Assistants API - What is it?
The OpenAI Assistants API streamlines the management of conversational AI by abstracting message handling, making it easier for developers. The API revolves around three key concepts: Assistants (customized AI models with specific instructions and tools), Messages (conversation content with roles), and Threads (conversations with a unique ID for context). It

If you want to create an application that truly enhances the capabilities of AI (and is therefore cool), then using tools is a great way to do that. OpenAI have provided an intuitive API that you can use to enhance and give your application a little more horsepower / uniqueness that will make it more than just a wrapper.

What this Covers

  • Deep(ish) Dive on Each Tool - learn how to integrate data into your AI workflow, enable assistants to write and execute code, and leverage function calling.
  • Practical Examples and Code - Enough to get you started with each of the three main tool types

Types of Tools

There are (currently) three types of tools that you can use with OpenAI's APIs and these are: File Search, Code Interpreter, and Function Calling. With these three classes of tools, you can add a ton of value / functionality to your AI application.

File search is the newest tool to be added to the assistants API and is incredibly useful for one of the most popular custom AI tools called RAG (retrieval augmented generation). All language models or generative AI systems have some kind of knowledge cutoff. Training takes a long time so the source data is inherently non-current. If you want to build an AI that can use current data, or even add data on the fly, then you will need to be able to retrieve and integrate that data into your threads / completion contexts. Make sure to give your assistant the "file_search" tool when creating it (note the "tools" section):

const assistant = await openai.beta.assistants.create({
  name: "Study Buddy Pro",
  instructions: `You are an intelligent study partner. 
  I will upload course notes for you to help me learn from.`,
  model: "gpt-4o",
  tools: [{ type: "file_search" }],
});

A great example of this is the plethora of "Chat with you PDF" apps that are going around. At the time of writing, OpenAI supports parsing about 20 different file types. You can add a file to an thread by first uploading the file, and then attaching it to a message:

const classnotes = await openai.files.create({
  file: fs.createReadStream("class-notes.pdf"),
  purpose: "assistants",
});

const thread = await openai.beta.threads.create({
  messages: [
    {
      role: "user",
      content: "Concisely, what did we cover in class today?",
      attachments: [
        { 
          file_id: classnotes.id, 
          tools: [{ type: "file_search" }] 
        }
      ],
    },
  ],
});

This file will be accessible from messages belonging to the thread. You add a file to either (1) an assistant, or (2) a thread. The choice of which should depend on how you want to control access to the file. Files attached to assistants are accessible from any thread that assistant is part of.

When you execute a run, if a file was referenced or searched, you will receive annotation data that describes what files were used.

const run = await openai.beta.threads.runs.createAndPoll(thread.id, {
  assistant_id: assistant.id,
});
 
const messages = await openai.beta.threads.messages.list(thread.id, {
  run_id: run.id,
});
 
const message = messages.data.pop()!;
if (message.content[0].type === "text") {
  const { text } = message.content[0];

  console.log(text.value);
  console.log(text.annotations);
}

The annotations data also provides an index_start and index_end which tell you exactly what part of the response was "upgraded" using the file data. The maximum file size is 512MB but if you need more, you can add multiple files to a vector_store and search it as one entity, this is actually what happens "behind-the-scenes" when your file is added to an assistant and run. Check out the API docs for more detail about how this works and the relevant defaults like chunk sizes / indexing strategies.

Code Interpreter

This tool allows your assistants to run code that they write. You simply ask questions with code_interpreter enabled, and the assistant will write code, run it, keep track of logs and output, and then summarize the results for you. It's great for performing simple data analysis, but you need to interpret the output carefully. First, create an assistant with the tool enabled:

const assistant = await openai.beta.assistants.create({
  instructions: "You are an expert data analyst, write and run code to answer my questions.",
  model: "gpt-4o",
  tools: [{"type": "code_interpreter"}]
});

Your assistant can now write and run it's own code. To make this really useful, try adding a file to a thread. To let the assistant access this file from the code it writes, you need to specify the type as code_interpreter.

const file = await openai.files.create({
  file: fs.createReadStream("mydata.csv"),
  purpose: "assistants",
});

const thread = await openai.beta.threads.create({
  messages: [
    {
      "role": "user",
      "content": "What columns are present in that file?",
      "attachments": [
        {
          file_id: file.id,
          tools: [{type: "code_interpreter"}]
        }
      ]
    }
  ]
});

Adding a file with code interpreter is great for things like computing average values of columns or any basic data science tasks. Treat it like a talented python dev with infinite patience.

If code interpreter was used in a run, the assistant will output some data including input and output logs. You can really dig into exactly what ChatGPT did. The assistant can also return (1) images or (2) data files. This is very useful for generating a csv or a plot of data (for example). If they affected the output, data files are going to be found in the citations (similar to what we saw with function search). Image files will be in the assistant message response. Both have file_id values that you can use to access them using the SDK. For example:

  const response = await openai.files.content("file-id-value");

  // Extract the binary data from the Response object
  const image_data = await response.arrayBuffer();

  // Convert the binary data to a Buffer
  const image_data_buffer = Buffer.from(image_data);

If you want to see what's possible with this tool, check out a fun conversation I had with a code_interpreter enabled Custom GPT, check out this link below. I uploaded a csv with public historical data from my favourite collectible card game and we analyzed it together:

ChatGPT
A conversational AI system that listens, learns, and challenges

I added some Magic: The Gathering data from 17lands.com to a GPT and then we worked together to develop and compute a formula for WAR (wins above replacement) for cards.

Function Calling

This is the most open-ended of the tools and gives you a lot of control over how it works. It allows you to describe functions and required parameters to your assistants that they can use when they need to. During a run, if an agent determines that it would be useful to call your function, it will return the name of the function and inputs and then wait for your response.

This is the tool that I find the most exciting. Imagine using it for UI Automation (think controlling a map in a front end), image recognition (call another API in your function) or even e-commerce or transactional functionality. The possibilities are endless!

Here is an example I built into an assistant recently. I add a tool for getting a search engine results page. Start by describing your function and telling your assistant about it:

const serp =  {
  "name": "internet_search",
  "description": "Get information on recent events from the web.",
  "parameters": {
    "type": "object",
    "properties": {
      "query": {
        "type": "string",
        "description": `The search query to use. 
For example: "Where is Sporty Spice from?"`
      }
    },
    "required": [
      "query"
    ]
  }
}

const assistant = await client.beta.assistants.create({
  model: "gpt-4o",
  instructions:
    "You are a helpful research assistant. Use the provided functions to answer questions.",
  tools: [ serp ]
});

The properties object tells the assistant what parameters it needs to provide to the function. In this case, simply query. It follows the JSON specification, so any of the following are valid types: object, array, number, string, truefalse, or null.

When you execute a run, if the assistant determines that a response requires_action it will wait until you submitToolOutputs and use those in it's response.

let run = await this.openai.beta.threads.runs.create(
    threadId,
    {
        assistant_id: assistantId
    }
);

while (['queued', 'in_progress', 'cancelling', 'requires_action'].includes(run.status)) {
    if (run.status === 'requires_action') {
        if (run.required_action?.submit_tool_outputs) {
            const tool_outputs = await this.getToolOutputs(run.required_action.submit_tool_outputs.tool_calls, phoneNumber, otherRecipients);
            await this.openai.beta.threads.runs.submitToolOutputs(
                threadId,
                run.id,
                { tool_outputs }
            );
        }
    }

    await new Promise(resolve => setTimeout(resolve, 500));
    run = await this.openai.beta.threads.runs.retrieve(
        run.thread_id,
        run.id
    );
}

You can also do this using streaming if you don't want to poll for responses.

You'll need to implement getToolOutputs and it should handle your function name specifically:

public async getToolOutputs(toolsToCall: OpenAI.Beta.Threads.Runs.RequiredActionFunctionToolCall[]): Promise<OpenAI.Beta.Threads.Runs.RunSubmitToolOutputsParams.ToolOutput[]> {

    const toolOutputArray = (await Promise.all(toolsToCall.map(async call => {
        if (call.function.name === 'internet_search') {
            const query = JSON.parse(call.function.arguments)["query"];
            console.log('tool called -- internet_search', query);
            return {
                tool_call_id: call.id,
                output: await this.serpProvider.search(query)
            }
        }
    }))).filter(Boolean)

    return toolOutputArray;
}

These tool outputs are submitted back to OpenAI, and then it will finish formulating a response using the results. For more details, check out the official API docs.

Thanks for reading! Need to get started quickly? Grab one of these working examples from OpenAI, and happy building!

GitHub - openai/openai-assistants-quickstart: OpenAI Assistants API quickstart with Next.js.
OpenAI Assistants API quickstart with Next.js. Contribute to openai/openai-assistants-quickstart development by creating an account on GitHub.

If you're looking to start developing an AI application quickly, check out Openai's Assistants Quickstart ā˜ļø