HE-4 multi_tool_orchestrate — 税理士 agent demo¶
Audience: 税理士 / 会計士 / 行政書士 / 司法書士 agent operators. Goal: Replace N round trips (search + houjin lookup + filing window + reasoning chain) with one bundled call. Billing: transparent — each dispatched call still bills as 1 ¥3/req unit (税込 ¥3.30). The bundle discount is in network round trips and agent prompt overhead, not in ¥.
Sensitive surface — every response carries the canonical §52 / §47条の2 / §72 / §1 / §3 disclaimer envelope. NO LLM is invoked server-side; HE-4 is a deterministic dispatcher only.
Why one tool for many calls?¶
A 税理士 agent answering "I have client 1234567890123. Tell me the
relevant ものづくり-style subsidies, the company's 360° posture, the next
filing window, and walk me through the legal-reasoning chain." normally
issues four sequential MCP calls:
agent → MCP: search_programs(q="ものづくり")
agent ← MCP: { results: [...] }
agent → MCP: get_houjin_360(houjin_bangou="1234567890123")
agent ← MCP: { ... }
agent → MCP: find_filing_window(houjin_bangou="1234567890123")
agent ← MCP: { ... }
agent → MCP: walk_reasoning_chain(query="...")
agent ← MCP: { ... }
Each round trip pays the agent ↔ server network cost and another
tool_use block in the LLM prompt context. For an agent that has
already decided on four independent queries, this is wasted work.
HE-4 collapses all four into:
agent → MCP: multi_tool_orchestrate([
{tool: search_programs, args: {...}},
{tool: get_houjin_360, args: {...}},
{tool: find_filing_window, args: {...}},
{tool: walk_reasoning_chain, args: {...}},
])
agent ← MCP: { results: [{...}×4], summary, billing, _disclaimer }
The internal dispatcher uses asyncio.gather, so the four calls run
concurrently on the server side. The agent sees one round trip.
End-to-end example¶
Request¶
from jpintel_mcp.mcp.moat_lane_tools.he4_orchestrate import multi_tool_orchestrate
bundle = multi_tool_orchestrate(
tool_calls=[
{"tool": "search_programs", "args": {"q": "ものづくり", "limit": 3}},
{"tool": "get_houjin_360", "args": {"houjin_bangou": "1234567890123"}},
{"tool": "find_filing_window", "args": {"houjin_bangou": "1234567890123"}},
{"tool": "walk_reasoning_chain", "args": {"query": "中小企業 設備投資"}},
],
parallel=True,
fail_strategy="partial",
max_concurrent=10,
)
Response shape (abridged)¶
{
"tool_name": "multi_tool_orchestrate",
"schema_version": "moat.he4.v1",
"primary_result": {
"status": "ok",
"lane_id": "HE-4",
"upstream_module": "jpintel_mcp.moat.he4_orchestrate",
"primary_input": {
"tool_calls_count": 4,
"parallel": true,
"fail_strategy": "partial",
"max_concurrent": 10
}
},
"results": [
{"tool_call_idx": 0, "tool": "search_programs", "status": "ok", "result": { ... }, "latency_ms": 42},
{"tool_call_idx": 1, "tool": "get_houjin_360", "status": "ok", "result": { ... }, "latency_ms": 38},
{"tool_call_idx": 2, "tool": "find_filing_window", "status": "ok", "result": { ... }, "latency_ms": 19},
{"tool_call_idx": 3, "tool": "walk_reasoning_chain", "status": "ok", "result": { ... }, "latency_ms": 71}
],
"summary": {
"total_calls": 4,
"ok": 4,
"error": 0,
"rejected": 0,
"skipped": 0,
"total_latency_ms": 73
},
"billing": {
"unit": 4,
"yen": 12,
"_bundle_discount": "1 round trip vs 4 = ~75% agent ↔ server network saving (¥ price unchanged: ¥3 per dispatched call)."
},
"_disclaimer": "本 response は moat lane の retrieval / モデル推論結果で…",
"_provenance": {
"source_module": "jpintel_mcp.moat.he4_orchestrate",
"lane_id": "HE-4",
"wrap_kind": "moat_lane_he4_orchestrate",
"observed_at": "2026-05-17",
"registered_tool_count": 218
}
}
Note: total_latency_ms is the wall-clock of the server-side gather,
not the sum of per-call latencies. With parallel=True, the bundle
finishes in roughly max(per_call_latencies).
Failure modes & how to handle them¶
| status | meaning | billed? |
|---|---|---|
ok |
tool returned successfully | yes (¥3) |
error |
tool raised — bad args, runtime exception | yes (¥3) |
rejected |
tool name unknown / private (_x) / recursion refused |
no |
skipped |
all_or_nothing short-circuit cancelled this call | no |
In fail_strategy="partial" (default), one bad call does not poison
the bundle. Inspect results[i].status per call.
In fail_strategy="all_or_nothing", the orchestrator stops at the
first error or rejected and marks the remaining calls as skipped.
Useful when a downstream client refuses to display a partial bundle.
Security model¶
- Allowlist: dispatched tool names must already be registered on
the MCP server (i.e. present in
mcp._tool_manager.list_tools()). Unknown names are rejected witherror="unknown_tool — not registered on this MCP server.". - Private tool refusal: any tool name starting with
_is refused defensively, even if it somehow appears in the registry. - Anti-recursion: calling
multi_tool_orchestratefrom inside the bundle is rejected to prevent stack-amplification. - Bounded fan-out:
max_concurrentcaps the asyncio semaphore at 10 by default; the hard cap is 32. - Bounded input:
tool_callslength is capped at 32.
Honest performance notes¶
The dominant cost for most jpcite tools is SQLite I/O on warm cache (a
few ms per call). On 4 warm-cache search_programs calls, the
server-side parallel=True total can be slower than parallel=False
by 5-10ms because asyncio.to_thread adds per-call overhead. The win
is asymmetric:
- Network: 1 round trip vs N — a real saving regardless.
- LLM prompt overhead: 1
tool_useblock vs N — measurable when the agent's context budget is tight. - CPU latency: only matters when individual tools take 100ms+ (e.g. cross-encoder rerank, large FTS scans, network-bound fetches). HE-4 makes those bundles ~N× faster end-to-end.
Use HE-4 when you already know you need ≥2 independent queries. Don't use it as a "wrap every single call" idiom — for a 1-call bundle you save no round trips and pay the same ¥3.