Logo
Home|Blog|Speaker Deck
LanguageEnglish

Why LLMs without tool calling capability can call tools?

2026-05-062026-05-17
Icon of AIAIIcon of LLMLLMIcon of ITIT
OGP Why LLMs without tool calling capability can call tools?

Tool calling looks like a model feature, but the model only ever predicts the next token. The "capability" is really two things that live outside the model: a trained output pattern and a runtime that parses it. That's why an LLM without native tool calling can still call tools.

With native tool calling#

  1. Pass tool definitions to the provider; they are flattened into a special system prompt.
  2. Invoke the LLM.
  3. The model emits trained tool-call tokens (e.g. <|python_tag|>{...}<|eom_id|>); the runtime parses them and returns stop_reason: "tool_use".
  4. Execute the tool on your server.
  5. Send the result back as a tool result and re-invoke.

Example#

The Anthropic Messages API takes tools as a top-level tools array. Each tool has a name, a description, and a JSON-schema input_schema:

json

{
  "model": "claude-sonnet-4-20250514",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get current weather for a city. Use this whenever the user asks about weather, temperature, or conditions in a specific location.",
      "input_schema": {
        "type": "object",
        "properties": {
          "city": { "type": "string" },
          "unit": { "type": "string", "enum": ["c", "f"], "default": "c" }
        },
        "required": ["city"]
      }
    }
  ],
  "messages": [
    { "role": "user", "content": "What's the weather in Tokyo?" }
  ]
}

When the model decides to use the tool, the response ends with stop_reason: "tool_use" and contains a structured tool_use content block. You execute the tool and reply with a matching tool_result content block in the next turn.

Without native tool calling#

  1. Describe the tools in the prompt and instruct the model to emit a fixed pattern (e.g. <tool>...</tool>).
  2. Register the closing tag as a stop sequence — optionally enforce the schema with constrained decoding.
  3. Invoke the LLM. Decoding halts when the closing tag appears; finish_reason is just "stop".
  4. Detect the pattern in the text, parse it, and execute the tool.
  5. Send the result back in the next prompt and re-invoke.

The two flows differ only in steps 2–3, and both differences live in the runtime — not in the model.

Example#

A minimal prompt that lets any model — even one without native tool calling — call tools. The same XML-tag convention is widely used in the open-source ecosystem and recognized by major inference servers out of the box.

System prompt#

markdown

You are an assistant that can answer directly OR call tools.

<tools>
[
  {
    "name": "get_weather",
    "description": "Get current weather for a city. Use this whenever the user asks about weather or conditions in a specific location.",
    "parameters": {
      "city": { "type": "string" }
    }
  }
]
</tools>

## Rules

- If a tool is needed, output ONLY: <tool_call>{"name": "...", "arguments": {...}}</tool_call>, then stop.
- Otherwise, answer the user in plain text.
- After a <tool_result>...</tool_result> message, use the result to answer in plain text.

## Examples

User: How is the weather in Tokyo today?
Assistant: <tool_call>{"name": "get_weather", "arguments": {"city": "Tokyo"}}</tool_call>

User: What's the weather like in Paris?
Assistant: <tool_call>{"name": "get_weather", "arguments": {"city": "Paris"}}</tool_call>

User: What is weather?
Assistant: Weather is the state of the atmosphere at a particular place and time — temperature, humidity, wind, precipitation, and so on.

What actually differs#

Both flows produce the same outcome. The real difference is who owns the boundary and who guarantees the format:

  • Native — the runtime detects a special tool-call token, and fine-tuning guarantees the format.
  • Non-native — your app detects a stop sequence, and the prompt (plus optional constrained decoding) guarantees the format.

Add constrained decoding to the non-native path and the two converge in reliability — what remains is just where the boundary lives.

Strip everything else away and tool calling is one loop: prompt → pattern → stop → execute. Whether you wire it up yourself or hand it to a provider's runtime is an implementation detail. AI agent framework SDKs have become the default, and they absorb this abstraction so completely that it stops feeling like an abstraction at all — which is exactly why the boundary between "native" and "non-native" has gotten so blurry.

Profile IconIkuma Yamashita

I like Rust. For work, I'm an infrastructure engineer, and as a hobby, I'm an application engineer. I enjoy drawing illustrations and other creative pursuits.