The OpenAI Assistants API - What is it?

The OpenAI Assistants API - What is it?
/imagine giant whiteboard with a diagram, clear details, box diagrams, outline, plan, arrows title at top says "AI Assistants" --ar 16:9

The OpenAI Assistants API streamlines the management of conversational AI by abstracting message handling, making it easier for developers. The API revolves around three key concepts: Assistants (customized AI models with specific instructions and tools), Messages (conversation content with roles), and Threads (conversations with a unique ID for context). It simplifies creating and managing AI interactions, supporting asynchronous operations and real-time streaming responses. It is designed to streamline the management of conversational AI, and it may very well define the structure for future APIs like it from other companies.

What this Covers

  • Overview of the OpenAI Assistants API: Introduction to the API's purpose and its core concepts (Assistants, Messages, Threads).
  • Assistant Customization: How to create and configure assistants with custom instructions, tools, and files.
  • Thread and Message Management: Process of creating threads and managing conversation messages.
  • Run Execution and Response Handling: Explanation of asynchronous run execution, including polling for completion and handling streaming responses.

Managing Messages

Have you experimented with "completions" using ChatGPT (they're the most common entry point to this ecosystem). You pass OpenAI (or any conversational model, most have completions as a feature) an array of messages and ask it to "complete" the conversation. It replies with it's best guess at a meaningful next message in the sequence.

  const completion = await openai.chat.completions.create({
    messages: [{
      role: "system",
      content: "You are a confident expert in marketing." 
    }],
    model: "gpt-4",
  });

  console.log(completion.choices[0]);

If a user replies, you need to add their new message to this array and get a new completion. The assistants API promises to abstract away all of this message management for you and let me tell you - the idea of not having to manage all of those messages is a delight. The assistants API solves this problem very well. In my opinion, it's also a brilliant moat because it solves an immediate problem for AI developers. So, how does it work?

The OpenAI Assistants API

There are three main concepts to understand: Assistants, Messages, and Threads. Fortunately the naming of these entities makes it straightforward to see what they are doing, but let's dive into each one.

Assistants - This API (obviously) revolves around assistants. An assistant is a customized version of an OpenAI model (eg - ChatGPT) that has custom system instructions, documents or files, and possibly some tools at it's disposal. Think of system instructions as the base level operating manual for the system. They guide every word or action the llm may take. For example: "You are a confident expert in marketing, you provide short quippy advice that is actionable." The tools you provide to an assistant let it take a pause and get more information before continuing. A classic example is SERP or "search engine results page" as a function. This kind of function lets the assistant search for more information and use that information when crafting a reply.


const file = await openai.files.create({
  file: fs.createReadStream("baseball-data.csv"),
  purpose: "assistants",
});

const assistant = await openai.beta.assistants.create({
  name: "Data visualizer",
  description: "You are great at creating beautiful sabermetric visualizations. You analyze data present in .csv files, understand trends, and come up with data visualizations relevant to those trends. You also share a brief text summary of the trends observed.",
  model: "gpt-4o",
  tools: [{"type": "code_interpreter"}],
  tool_resources: {
    "code_interpreter": {
      "file_ids": [file.id]
    }
  }
});

Creating an assistant - if you use files, then make sure to include code_interpreter (note - tools / files are optional parts of this)

Threads - A thread is a conversation where we are going to be passing messages back and forth. These can have messages, documents, and even images (coming soon). When you add messages to a thread, the Assistant associated with that thread automatically has access to the conversation history. We start a new thread without an assistant. We will later specify which assistant should be used to continue a thread, but start by creating a thread resource.

const thread = await this.openai.beta.threads.create();

That thread ☝️ has an id and we can use that to keep referring to it. We now have an empty thread, so it is time to start populating it.

Messages - Each message in a conversation is a simple object with a role and content property. The role is typically either "user", "system", or "assistant". We can manually add assistant messages to threads, but it is traditional to let the model add those messages during a run. You can add an individual message to a specific thread using the thread.id.

const message = await openai.beta.threads.messages.create(
  thread.id,
  {
    role: "user",
    content: "Visualize the distribution of WAR across players in the data set."
  }
);

Now that we have a thread with a message, and an assistant ready to reply, it is time to execute a run. We can also specify how deep in the history to look if you are concerned about using too many tokens.

Run Execution

A run assigns an assistant to a thread and uses all (or some) of the messages in that thread to add a new message to the end with the role "assistant". Running is an asynchronous process and has a few different states that you need to care about which are in_progress, requires_action, cancelling, and completed. Essentially, you need to either stream the results back or keep polling until your run completes (it's typically pretty quick).

while (['queued', 'in_progress', 'cancelling', 'requires_action'].includes(run.status)) {

  if (run.status === 'requires_action') {
    console.log('requires action!');
    if (run.required_action.submit_tool_outputs) {
        const tool_outputs = await getToolOutputs(run.required_action.submit_tool_outputs.tool_calls);

        await this.openai.beta.threads.runs.submitToolOutputs(
            threadId,
            run.id,
            { tool_outputs }
        );
    }
  }

  // wait for a second and then refetch the run
  await new Promise(resolve => setTimeout(resolve, 1000));
  run = await this.openai.beta.threads.runs.retrieve(
    run.thread_id,
    run.id
  );
}

if (run.status === 'completed') {
  const messages = await this.openai.beta.threads.messages.list(
    run.thread_id
  );

  console.log(messages.data[0].content[0].text.value)
}

Poll for a "completed" status during a run. The new message will be at the top of the list.

OpenAI also offers "streaming" responses. If you have chatted on their web interface, you have seen these (it looks like the AI is actually writing in front of your eyes). You can find more information on streaming responses in their official docs.

Tools - Power up your AI.

I have left a reference to "getToolOutputs" in the code example above. Tools let you super-power your assistants. You can add basically any functionality you can imagine as long as it runs fast enough. Querying databases, checking calendars, e-commerce integrations, sending emails - anything you can do in a function can be provided to the Assistant as a tool. It will call code that you control and use the output in it's subsequent completions. Tools are awesome and I can (and perhaps will) write a whole article on just them. A particularly interesting use case is to let one assistant use another assistant as a tool to answer questions with specific domain knowledge.

In a nutshell, the OpenAI Assistants API is a game-changer for anyone working with conversational AI. It takes the hassle out of managing messages and context, making it super easy to create customized AI assistants. With features like real-time responses and asynchronous operations, you can build smarter, more responsive interactions. This API is setting a new benchmark in the industry, and it's likely to inspire a wave of similar tools from other companies.