Adding support for Gemini thought signatures

Mar 7, 2026

Gemini 3 and 2.5 models require thought signatures for multi-turn tool calling. Here's what they are, why they break OpenAI-compatible proxies, and how we fixed it in Helix with a global signature cache.

If you've tried running Gemini 2.5 Flash or Gemini 3 through an OpenAI-compatible proxy recently, you've probably hit this: tool calls work on the first turn, the model asks to call a function, you send back the result, and then you get a 400 error. No useful message. Just a flat rejection.

The cause is a feature Google calls "thought signatures," and it took us a while to figure out what was happening.

What are thought signatures?

When Gemini's thinking models reason through a problem, they produce internal chain-of-thought before generating a response. That reasoning is private; you don't see it in the API output. But it matters for what comes next.

Thought signatures are encrypted snapshots of that internal reasoning, attached to function call parts in the API response. They're opaque byte arrays. You can't read them, decode them, or generate them yourself. Their only purpose is to be passed back to the model on the next turn so it can pick up where it left off.

It's basically a session token for the model's train of thought. Without it, the model loses context about why it decided to call a particular function, and the API rejects the follow-up.

Why this matters for tool calling

Multi-turn tool calling is how agents get things done. The pattern is:

User sends a message
Model responds with a function call (plus a thought signature)
Your code executes the function and sends back the result
Model processes the result and either responds or calls another function

Step 3 is where things break. The thought signature from step 2 has to be attached to the function call part when you send the conversation history back in step 4. If it's missing, Gemini 3 returns a 400. Gemini 2.5 is more forgiving: missing signatures degrade response quality but don't hard-fail.

You get errors like:

"Function call is missing a thought_signature in functionCall parts. This is required for tools to work correctly, and missing thought_signature may lead to degraded model performance. Additional data, function call `default_api:HelixProjects` , position 4. Please refer to https://ai.google.dev/gemini-api/docs/thought-signatures for more details."

Or more high level when using go libraries:

genai GenerateContent error: Error 400, Message: Function call is missing a thought_signature in functionCall parts. This is required for tools to work correctly, and missing thought_signature may lead to degraded model performance. Additional data, function call `default_api:ListSpecTasks` , position 2. Please refer to https://ai.google.dev/gemini-api/docs/thought-signatures for more details., Status: INVALID_ARGUMENT, Details: []

For sequential tool calls (check a flight, then book a taxi based on the result), each step produces its own signature that needs to be threaded through. For parallel calls (check weather in London and Paris at the same time), only the first function call part carries a signature, but it still has to come back.

The problem with OpenAI-compatible proxies

Most AI orchestration platforms, Helix included, standardize on the OpenAI chat completions format internally. Everyone speaks OpenAI format. You convert provider-specific responses into it, route them through your middleware, and convert back when needed.

The OpenAI format has no concept of thought signatures. There's no field for them on tool calls. So when a Gemini response gets converted, the signatures get silently dropped. When the conversation continues and the history goes back to Gemini, those signatures are gone. The model rejects the request.

This is what issue #1835 reported: Gemini Flash 2.5 stopped working for any workflow involving tool calls.

How we fixed it: a global signature cache

The fix in PR #1845 is straightforward. Since the OpenAI wire format can't carry thought signatures, we store them separately and reattach them when converting back to Gemini's native format.

When we receive a response from Gemini containing function calls with thought signatures, we generate a stable ID for each function call and cache the signature bytes keyed by that ID. The response then gets converted to OpenAI format as normal. The signature doesn't need to travel through the rest of the stack.

When assembling the next request, we look up each tool call ID in the cache and reattach the corresponding thought signature before sending it to Gemini.

The cache is a thread-safe map:

var globalThoughtSigCache = &thoughtSignatureCache{
    store: make(map[string][]byte),
}
 
type thoughtSignatureCache struct {
    mu    sync.RWMutex
    store map[string][]byte // FunctionCall.ID -> ThoughtSignature bytes
}

It's global on purpose, not scoped to a single client instance, because different parts of a conversation can be handled by different client instances (a reasoning model for planning, a generation model for output). Tool call IDs are UUIDs, so there's no collision risk.

Switching to the native SDK

The PR also makes a bigger change: Gemini requests now go through Google's native genai SDK instead of being squeezed through the OpenAI compatibility endpoint.

Previously, Helix talked to Gemini via Google's OpenAI-compatible endpoint (/v1beta/openai). That worked for simple chat completions, but we were converting OpenAI format to OpenAI format with extra steps, and losing provider-specific features like thought signatures along the way.

Now, when the client detects a Google provider URL, it routes through a native code path. It converts OpenAI-format messages to genai.Content objects directly, maps system messages to Gemini's SystemInstruction field, merges consecutive same-role messages (Gemini requires alternating roles), and preserves thought signatures through the cache on both streaming and non-streaming paths. The genai responses get converted back to OpenAI format for the rest of the stack.

The conversion also handles tool definitions, tool choice configuration, multi-modal content, and finish reason mappings. Internal thought parts get filtered out of the response since they're not part of the final output.

What changes for Helix users

Nothing. If you're running Gemini models through Helix, whether on a Sovereign Server or through the cloud platform, tool calling now works correctly across multi-turn conversations. Your agents can use Gemini 2.5 Flash, Gemini 3, and future thinking models without hitting signature errors. The caching is transparent. Your agent definitions, tool configs, and conversation histories stay the same.

Why this matters beyond Gemini

Google shipped a change that broke multi-turn tool calls for anyone using their thinking models through an OpenAI-compatible layer. If you're calling the Gemini API directly, you need to update your code to handle signatures. If you're calling it through Helix, we already did.

Providers add features, change behavior, and sometimes break things. That's the reality of building on multiple model APIs. The PR is on GitHub if you want to read the implementation. The test suite covers both unit tests for the conversion logic and integration tests that run the full multi-turn tool calling flow against the real Gemini API.

Agent Virtualization

Enterprise Coding Agents

Private AI Platform

Digital Sovereignty

Follow the Sun

The Speed Advantage

Financial Services

GPU Cloud & Neoclouds

Public Sector & Defence