-
Notifications
You must be signed in to change notification settings - Fork 183
fix(conversation): preserve tool result JSON structure in sliding window management #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…dow management Previously, the sliding window conversation manager would convert toolResult JSON structures to plain text when trimming messages, causing LLMs to lose the ability to properly parse tool results. This led to hallucinations and incorrect tool calls. The issue manifested when the conversation exceeded the window size and the trim point fell on a message containing toolResult content. The _map_tool_result_content() method would convert structured JSON like: {"toolResult": {"toolUseId": "123", "content": [...], "status": "success"}} into plain text like: "Tool Result JSON Content: {...}" This fix: - Removes the problematic _map_tool_result_content() method entirely - Adds a new _find_safe_trim_index() method that intelligently finds cut points that preserve tool use/result pairs together - Introduces helper functions to track tool relationships: - has_tool_use() and has_tool_result() for checking message content - get_tool_use_ids() and get_tool_result_ids() for extracting tool IDs - Ensures tool use and result pairs are never separated during trimming - Maintains the original JSON structure of tool results throughout the conversation The new approach: 1. Maps all tool IDs to their message indices 2. Adjusts trim points to keep related tool interactions together 3. Falls back to basic trimming only when no safe cut point exists 4. Preserves the integrity of the conversation structure for LLMs Testing shows this eliminates the hallucination issues and maintains proper tool interaction patterns even when conversations exceed the window size. BREAKING CHANGE: The internal message trimming behavior has changed. While this maintains API compatibility, it may result in slightly different message retention patterns when the window size is exceeded.
tool_use_indices = {} # toolUseId -> message index | ||
tool_result_indices = {} # toolUseId -> message index | ||
|
||
for i, message in enumerate(messages): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
get_tool_use_ids() and get_tool_result_ids() here, consider inlining the logic to avoid scanning message["content"] twice per message and allocating temporary lists.
A single-pass loop over message["content"] can extract both toolUseId and toolResultId efficiently:
for content in message.get("content", []):
if "toolUse" in content:
tool_id = content["toolUse"].get("toolUseId")
if tool_id:
tool_use_indices[tool_id] = i
if "toolResult" in content:
tool_id = content["toolResult"].get("toolUseId")
if tool_id:
tool_result_indices[tool_id] = i
This avoids unnecessary allocations and keeps the loop flat
safe_index = initial_trim_index | ||
|
||
# Adjust if we would cut in the middle of a tool use/result pair | ||
for tool_id, use_idx in tool_use_indices.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you would like to avoid so much nesting
for tool_id, use_idx in tool_use_indices.items():
if tool_id not in tool_result_indices:
continue
result_idx = tool_result_indices[tool_id]
# If the pair would be split by the cut, move it earlier to keep them together
if use_idx < safe_index <= result_idx:
safe_index = min(safe_index, use_idx)
continue
# Invalid ordering: toolResult appears before toolUse
if result_idx < safe_index < use_idx:
logger.warning("tool_id=<%s> | found toolResult before toolUse", tool_id)
Resolved in: #120 |
Description
Previously, the sliding window conversation manager would convert toolResult JSON
structures to plain text when trimming messages, causing LLMs to lose the ability
to properly parse tool results. This led to hallucinations and incorrect tool calls.
Issue: #95
The issue manifested when the conversation exceeded the window size and the trim
point fell on a message containing toolResult content. The _map_tool_result_content()
method would convert structured JSON like:
{"toolResult": {"toolUseId": "123", "content": [...], "status": "success"}}
into plain text like:
"Tool Result JSON Content: {...}"
This fix:
that preserve tool use/result pairs together
The new approach:
Testing shows this eliminates the hallucination issues and maintains proper
tool interaction patterns even when conversations exceed the window size.
BREAKING CHANGE: The internal message trimming behavior has changed. While this
maintains API compatibility, it may result in slightly different message retention
patterns when the window size is exceeded.
Checklist
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.