Skip to content

fix(conversation): preserve tool result JSON structure in sliding window management #94

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

cagataycali
Copy link
Member

@cagataycali cagataycali commented May 23, 2025

Description

Previously, the sliding window conversation manager would convert toolResult JSON
structures to plain text when trimming messages, causing LLMs to lose the ability
to properly parse tool results. This led to hallucinations and incorrect tool calls.

Issue: #95

The issue manifested when the conversation exceeded the window size and the trim
point fell on a message containing toolResult content. The _map_tool_result_content()
method would convert structured JSON like:
{"toolResult": {"toolUseId": "123", "content": [...], "status": "success"}}
into plain text like:
"Tool Result JSON Content: {...}"

This fix:

  • Removes the problematic _map_tool_result_content() method entirely
  • Adds a new _find_safe_trim_index() method that intelligently finds cut points
    that preserve tool use/result pairs together
  • Introduces helper functions to track tool relationships:
    • has_tool_use() and has_tool_result() for checking message content
    • get_tool_use_ids() and get_tool_result_ids() for extracting tool IDs
  • Ensures tool use and result pairs are never separated during trimming
  • Maintains the original JSON structure of tool results throughout the conversation

The new approach:

  1. Maps all tool IDs to their message indices
  2. Adjusts trim points to keep related tool interactions together
  3. Falls back to basic trimming only when no safe cut point exists
  4. Preserves the integrity of the conversation structure for LLMs

Testing shows this eliminates the hallucination issues and maintains proper
tool interaction patterns even when conversations exceed the window size.

BREAKING CHANGE: The internal message trimming behavior has changed. While this
maintains API compatibility, it may result in slightly different message retention
patterns when the window size is exceeded.

Checklist

  • I have read the CONTRIBUTING document
  • I have added tests that prove my fix is effective or my feature works
  • [N/A] I have updated the documentation accordingly
  • [N/A] I have added an appropriate example to the documentation to outline the feature
  • [N/A] My changes generate no new warnings
  • [N/A] Any dependent changes have been merged and published

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…dow management

Previously, the sliding window conversation manager would convert toolResult JSON
structures to plain text when trimming messages, causing LLMs to lose the ability
to properly parse tool results. This led to hallucinations and incorrect tool calls.

The issue manifested when the conversation exceeded the window size and the trim
point fell on a message containing toolResult content. The _map_tool_result_content()
method would convert structured JSON like:
  {"toolResult": {"toolUseId": "123", "content": [...], "status": "success"}}
into plain text like:
  "Tool Result JSON Content: {...}"

This fix:
- Removes the problematic _map_tool_result_content() method entirely
- Adds a new _find_safe_trim_index() method that intelligently finds cut points
  that preserve tool use/result pairs together
- Introduces helper functions to track tool relationships:
  - has_tool_use() and has_tool_result() for checking message content
  - get_tool_use_ids() and get_tool_result_ids() for extracting tool IDs
- Ensures tool use and result pairs are never separated during trimming
- Maintains the original JSON structure of tool results throughout the conversation

The new approach:
1. Maps all tool IDs to their message indices
2. Adjusts trim points to keep related tool interactions together
3. Falls back to basic trimming only when no safe cut point exists
4. Preserves the integrity of the conversation structure for LLMs

Testing shows this eliminates the hallucination issues and maintains proper
tool interaction patterns even when conversations exceed the window size.

BREAKING CHANGE: The internal message trimming behavior has changed. While this
maintains API compatibility, it may result in slightly different message retention
patterns when the window size is exceeded.
tool_use_indices = {} # toolUseId -> message index
tool_result_indices = {} # toolUseId -> message index

for i, message in enumerate(messages):
Copy link
Contributor

@fede-dash fede-dash May 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_tool_use_ids() and get_tool_result_ids() here, consider inlining the logic to avoid scanning message["content"] twice per message and allocating temporary lists.

A single-pass loop over message["content"] can extract both toolUseId and toolResultId efficiently:

for content in message.get("content", []):
    if "toolUse" in content:
        tool_id = content["toolUse"].get("toolUseId")
        if tool_id:
            tool_use_indices[tool_id] = i
    if "toolResult" in content:
        tool_id = content["toolResult"].get("toolUseId")
        if tool_id:
            tool_result_indices[tool_id] = i

This avoids unnecessary allocations and keeps the loop flat

safe_index = initial_trim_index

# Adjust if we would cut in the middle of a tool use/result pair
for tool_id, use_idx in tool_use_indices.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you would like to avoid so much nesting

for tool_id, use_idx in tool_use_indices.items():
    if tool_id not in tool_result_indices:
        continue

    result_idx = tool_result_indices[tool_id]

    # If the pair would be split by the cut, move it earlier to keep them together
    if use_idx < safe_index <= result_idx:
        safe_index = min(safe_index, use_idx)
        continue

    # Invalid ordering: toolResult appears before toolUse
    if result_idx < safe_index < use_idx:
        logger.warning("tool_id=<%s> | found toolResult before toolUse", tool_id)

@cagataycali
Copy link
Member Author

Resolved in: #120

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants