I Built an AI With Hands. Then I Stopped.

A project retrospective on giving AI hands - client-side tool execution, the MCP gap on native clients, and knowing when to ship an idea as a blog post instead of an app.

ChatBridger new chat screen on macOS — ChatBridger on macOS - a clean slate, waiting for the AI to do something the others can't.

The gap I kept noticing

Open any AI chat app on your iPhone. ChatGPT. Claude. Gemini. They’re all the same thing underneath: a text box that sends your message to a server, gets a response, displays it. Beautifully polished, genuinely useful - and completely sandboxed from the device they’re running on.

Ask ChatGPT “where am I right now?” and it’ll tell you to check your Maps app. Ask it to remind you of something at a specific time and it might generate a calendar event you have to copy manually. The AI is powerful, but it has no hands. It can’t touch your phone.

I kept running into this on my iPhone. Then I noticed the same thing on my Mac - different shape, same problem.

MCP: it helps, but not as much as you’d think

If you’ve been following the AI tooling space, you’ve heard of MCP - Model Context Protocol. It’s a standardised way to connect AI models to tools: file systems, databases, APIs, web browsers. On desktop, Claude can read your files and run terminal commands. That part works.

Server-side MCP tools available in the app — 10 server-side tools available via MCP - weather, calculator, geocoding, time, and more.

MCP tool execution - get_current_time called in a conversation — The agent calling get_current_time via MCP - server-side execution, visible inline.

Tool usage detail showing parameters and output — Tool usage detail: the parameters sent and the structured JSON result returned.

But MCP runs in a server process. And a server process - even one running locally on your Mac - isn’t the same as a native app. It can’t hold a CoreLocation session. It can’t post a local notification through the system notification centre. It can’t toggle the torch on your iPhone. These APIs live behind platform permission systems that exist specifically to gate access to app-level sandboxes, not server processes.

On mobile the gap is even wider. iOS has no concept of a local MCP server at all. Every AI app on your iPhone is a sandboxed client making requests to something remote. There’s no way to run a sidecar process that bridges to device hardware the way Claude Desktop can on a Mac.

So the problem isn’t just “nobody built the right MCP server.” For a meaningful category of device capabilities, no MCP server - local or remote - can reach them. That’s a structural constraint, not a tooling gap.

I started thinking about it differently: what if the AI didn’t need to run the tools itself? What if it just needed to ask for them?

The architecture: client-side tool execution

The core idea behind ChatBridger is a pattern I’m calling client-side tool execution. It works like this:

The native client (iOS or macOS app) registers a set of tools it knows how to run locally - device_location, send_notification, toggle_flashlight, etc.
When the user sends a message, those tool definitions are sent to the backend along with the message.
The backend (Actors) runs an AI agent. If the agent decides to call a tool, instead of executing it server-side, it emits a client_tool_execution_required event and pauses.
The app receives this event, executes the tool locally (with full access to device APIs and user permissions), and sends the result back.
The agent resumes with the result and continues the conversation.

From the AI’s perspective, it called a tool and got a result. It doesn’t know or care that the execution happened inside a sandboxed native app on a physical device. From the app’s perspective, it just ran some Swift code with CoreLocation or UserNotifications. The intelligence is in the middle.

The backend pauses the agent loop using the OpenAI Agents SDK’s StopAtTools behaviour - when the agent calls a registered client tool, the streaming response terminates cleanly with a structured event over NDJSON:

1
2
3
4
5
6
7
8
9
{
  "type": "client_tool_execution_required",
  "data": {
    "tool_name": "device_location",
    "parameters": { "accuracy": "precise", "includeAddress": true },
    "tool_call_id": "call_abc123",
    "message": "Client must execute 'device_location' and provide the result to continue."
  }
}

The app picks this up from the stream, dispatches to the right tool implementation, waits for the result (including handling any permission dialogs), then POSTs back to the same endpoint with tool_results filled in:

1
2
3
4
5
6
7
{
  "tool_results": [{
    "tool_call_id": "call_abc123",
    "tool_name": "device_location",
    "result": "Location retrieved (precise accuracy):\n• Coordinates: 28.6139, 77.2090\n• Accuracy: ±5.0 meters"
  }]
}

The backend patches the session history - replacing the PENDING_CLIENT_EXECUTION placeholder with the real result - and resumes the agent from where it left off. A single /chat/send_message endpoint handles both the initial message and the continuation; it detects which one it’s dealing with based on whether tool_results is present.

What I built

Actors (backend)

Actors is a FastAPI service in Python. It handles:

Authentication via Supabase JWT - every request is scoped to a user
Conversation persistence - full lifecycle with list, archive, star, delete, and automatic title generation (runs as a background task after each response)
Multi-agent architecture built on the OpenAI Agents SDK - agents can hand off to each other and run with different system prompts
MCP integration - server-side tools via Model Context Protocol for things that can run in the cloud
NDJSON streaming - real-time events including text_delta, tool_call, tool_output, agent_updated, client_tool_execution_required, and done
The client tool delegation mechanism described above

The streaming event model was designed to give the frontend enough information to render a rich UI - tool calls appear as they’re invoked, text streams in token by token, and the client knows exactly when to pause and execute something locally.

ChatBridger (frontend)

ChatBridger is a SwiftUI application targeting iOS and macOS. One backend, one codebase, two platforms - and each platform advertises only the tools it can actually run.

Agent selection screen showing the Triage Agent — The Triage Agent routes requests to the right specialist - part of the multi-agent setup in Actors.

App settings screen — Settings: configure the API endpoint, default agent, and theme.

Navigation: NavigationStack on iPhone, NavigationSplitView on Mac with full keyboard shortcut support
Chat UI: real-time streaming with markdown rendering and a live tool inspector panel showing tool calls as they happen
A ClientTool protocol that makes it straightforward to register new platform capabilities
Three initial tools, each with platform-appropriate availability:
- Location - CoreLocation with reverse geocoding, available on iOS and macOS
- Notifications - UserNotifications, available on both platforms
- Flashlight - AVCaptureDevice torch, iOS only (Macs don’t have a torch; the tool simply isn’t registered on macOS)
Per-tool permission management - users can see which tools are registered, enable or disable them individually, and the app checks both system-level permissions and its own in-app permission layer before executing anything

Client tools screen showing Flashlight, Location, and Send Notification — Client tools on macOS: Flashlight is registered but marked not available - the Mac has no torch.

The permission model was something I spent real time on. The AI can only access tools the user has explicitly enabled. For tools that require system permissions (location, notifications), the app requests them lazily on first use rather than upfront - so you only see the system dialog when the AI actually tries to call that tool, not when you install the app.

In-app permission prompt before executing the Send Notification tool — The app's own permission layer - shown before any system dialog, with full tool details and parameters.

In-app permission prompt before executing the Location tool — Same pattern for location: the user sees exactly what the AI is asking for before approving.

macOS system location permission dialog — After approving in-app, the macOS system dialog appears - triggered lazily on first use.

macOS system notification permission banner — macOS asking for notification permission - only appears when the AI actually tries to send one.

Saved tool permissions showing Always Allow for Location and Send Notification — Saved permissions: the user can review, revoke, or change any tool's standing permission at any time.

The demos worked. That was the problem.

The flashlight demo was always a hit. “Turn on my flashlight” - and it actually turned on. Real hardware, real response, no copy-pasting or Shortcuts automation. Location worked across both iPhone and Mac: “what neighbourhood am I in?” and the AI would query the GPS, reverse-geocode the coordinates, and answer naturally.

But a flashlight demo isn’t a product.

I kept asking: what does someone actually do with this on day ten? What’s the conversation that makes them open this app instead of just using Siri or the native Maps app or asking Claude Desktop? I couldn’t find a convincing answer. The device integration was the feature, but it wasn’t attached to anything the user was trying to accomplish that they couldn’t accomplish another way.

The original motivation was also weakening. I built Actors partly because the MCP ecosystem felt immature and limiting. That stopped being true quickly - more tools, better documentation, broader platform support. The gap I was filling on the server side was narrowing fast.

And I noticed I was spending almost all my time on infrastructure - the streaming pipeline, session management, the permission system, the platform divergences - and almost none on anything a user would care about. That’s a signal worth listening to.

What I’d do differently

Start with the use case, not the architecture. The client-side tool execution pattern is genuinely interesting, but I designed it before I knew what it was for. The right order is: find a specific problem someone has, then figure out whether this pattern solves it better than the alternatives.

The native client integration story isn’t dead - it’s just specific. An AI with access to your location, your calendar, your health data, your contacts is a different value proposition than a generic chat interface - whether that’s on your phone or your Mac. But it needs a vertical. A fitness app where the AI actually sees your workout data. A travel app that knows where you are in real time. Something where the native access is load-bearing, not a party trick.

Knowing when to stop is part of the work. I got real things from this project: streaming agent infrastructure, session management, cross-platform SwiftUI patterns, a novel tool delegation architecture, and a clearer sense of where the native client AI gap actually is. Shipping it as an open-source project and a writeup is a better outcome than grinding for another six months hoping the use case materialises.

The code

Both projects are now open source.

Actors (backend): https://github.com/shreyashag/chat-bridger-backend
ChatBridger (iOS/macOS): https://github.com/shreyashag/chat-bridger

If you’re building something that needs client-side tool execution - a browser extension that needs DOM access, a desktop app that needs filesystem access without a server round-trip, a mobile app that needs device APIs - the pattern here might be worth adapting. The ClientTool protocol in Swift is clean enough to extend, and the backend’s session patching approach is model-agnostic.

If you find the use case I couldn’t, I’d genuinely like to hear about it.

Stack: Python 3.13 · FastAPI · OpenAI Agents SDK · Supabase · SwiftUI (iOS · macOS) · CoreLocation · UserNotifications · NDJSON streaming

The gap I kept noticing#

MCP: it helps, but not as much as you’d think#

The architecture: client-side tool execution#

What I built#

Actors (backend)#

ChatBridger (frontend)#

The demos worked. That was the problem.#

What I’d do differently#

The code#