A project retrospective on giving AI hands - client-side tool execution, the MCP gap on native clients, and knowing when to ship an idea as a blog post instead of an app.

The gap I kept noticing
Open any AI chat app on your iPhone. ChatGPT. Claude. Gemini. They’re all the same thing underneath: a text box that sends your message to a server, gets a response, displays it. Beautifully polished, genuinely useful - and completely sandboxed from the device they’re running on.
Ask ChatGPT “where am I right now?” and it’ll tell you to check your Maps app. Ask it to remind you of something at a specific time and it might generate a calendar event you have to copy manually. The AI is powerful, but it has no hands. It can’t touch your phone.
I kept running into this on my iPhone. Then I noticed the same thing on my Mac - different shape, same problem.
MCP: it helps, but not as much as you’d think
If you’ve been following the AI tooling space, you’ve heard of MCP - Model Context Protocol. It’s a standardised way to connect AI models to tools: file systems, databases, APIs, web browsers. On desktop, Claude can read your files and run terminal commands. That part works.



But MCP runs in a server process. And a server process - even one running locally on your Mac - isn’t the same as a native app. It can’t hold a CoreLocation session. It can’t post a local notification through the system notification centre. It can’t toggle the torch on your iPhone. These APIs live behind platform permission systems that exist specifically to gate access to app-level sandboxes, not server processes.
On mobile the gap is even wider. iOS has no concept of a local MCP server at all. Every AI app on your iPhone is a sandboxed client making requests to something remote. There’s no way to run a sidecar process that bridges to device hardware the way Claude Desktop can on a Mac.
So the problem isn’t just “nobody built the right MCP server.” For a meaningful category of device capabilities, no MCP server - local or remote - can reach them. That’s a structural constraint, not a tooling gap.
I started thinking about it differently: what if the AI didn’t need to run the tools itself? What if it just needed to ask for them?
The architecture: client-side tool execution
The core idea behind ChatBridger is a pattern I’m calling client-side tool execution. It works like this:
- The native client (iOS or macOS app) registers a set of tools it knows how to run locally -
device_location,send_notification,toggle_flashlight, etc. - When the user sends a message, those tool definitions are sent to the backend along with the message.
- The backend (Actors) runs an AI agent. If the agent decides to call a tool, instead of executing it server-side, it emits a
client_tool_execution_requiredevent and pauses. - The app receives this event, executes the tool locally (with full access to device APIs and user permissions), and sends the result back.
- The agent resumes with the result and continues the conversation.
From the AI’s perspective, it called a tool and got a result. It doesn’t know or care that the execution happened inside a sandboxed native app on a physical device. From the app’s perspective, it just ran some Swift code with CoreLocation or UserNotifications. The intelligence is in the middle.
The backend pauses the agent loop using the OpenAI Agents SDK’s StopAtTools behaviour - when the agent calls a registered client tool, the streaming response terminates cleanly with a structured event over NDJSON:
| |
The app picks this up from the stream, dispatches to the right tool implementation, waits for the result (including handling any permission dialogs), then POSTs back to the same endpoint with tool_results filled in:
| |
The backend patches the session history - replacing the PENDING_CLIENT_EXECUTION placeholder with the real result - and resumes the agent from where it left off. A single /chat/send_message endpoint handles both the initial message and the continuation; it detects which one it’s dealing with based on whether tool_results is present.
What I built
Actors (backend)
Actors is a FastAPI service in Python. It handles:
- Authentication via Supabase JWT - every request is scoped to a user
- Conversation persistence - full lifecycle with list, archive, star, delete, and automatic title generation (runs as a background task after each response)
- Multi-agent architecture built on the OpenAI Agents SDK - agents can hand off to each other and run with different system prompts
- MCP integration - server-side tools via Model Context Protocol for things that can run in the cloud
- NDJSON streaming - real-time events including
text_delta,tool_call,tool_output,agent_updated,client_tool_execution_required, anddone - The client tool delegation mechanism described above
The streaming event model was designed to give the frontend enough information to render a rich UI - tool calls appear as they’re invoked, text streams in token by token, and the client knows exactly when to pause and execute something locally.
ChatBridger (frontend)
ChatBridger is a SwiftUI application targeting iOS and macOS. One backend, one codebase, two platforms - and each platform advertises only the tools it can actually run.


- Navigation: NavigationStack on iPhone, NavigationSplitView on Mac with full keyboard shortcut support
- Chat UI: real-time streaming with markdown rendering and a live tool inspector panel showing tool calls as they happen
- A
ClientToolprotocol that makes it straightforward to register new platform capabilities - Three initial tools, each with platform-appropriate availability:
- Location - CoreLocation with reverse geocoding, available on iOS and macOS
- Notifications - UserNotifications, available on both platforms
- Flashlight - AVCaptureDevice torch, iOS only (Macs don’t have a torch; the tool simply isn’t registered on macOS)
- Per-tool permission management - users can see which tools are registered, enable or disable them individually, and the app checks both system-level permissions and its own in-app permission layer before executing anything

The permission model was something I spent real time on. The AI can only access tools the user has explicitly enabled. For tools that require system permissions (location, notifications), the app requests them lazily on first use rather than upfront - so you only see the system dialog when the AI actually tries to call that tool, not when you install the app.





The demos worked. That was the problem.
The flashlight demo was always a hit. “Turn on my flashlight” - and it actually turned on. Real hardware, real response, no copy-pasting or Shortcuts automation. Location worked across both iPhone and Mac: “what neighbourhood am I in?” and the AI would query the GPS, reverse-geocode the coordinates, and answer naturally.
But a flashlight demo isn’t a product.
I kept asking: what does someone actually do with this on day ten? What’s the conversation that makes them open this app instead of just using Siri or the native Maps app or asking Claude Desktop? I couldn’t find a convincing answer. The device integration was the feature, but it wasn’t attached to anything the user was trying to accomplish that they couldn’t accomplish another way.
The original motivation was also weakening. I built Actors partly because the MCP ecosystem felt immature and limiting. That stopped being true quickly - more tools, better documentation, broader platform support. The gap I was filling on the server side was narrowing fast.
And I noticed I was spending almost all my time on infrastructure - the streaming pipeline, session management, the permission system, the platform divergences - and almost none on anything a user would care about. That’s a signal worth listening to.
What I’d do differently
Start with the use case, not the architecture. The client-side tool execution pattern is genuinely interesting, but I designed it before I knew what it was for. The right order is: find a specific problem someone has, then figure out whether this pattern solves it better than the alternatives.
The native client integration story isn’t dead - it’s just specific. An AI with access to your location, your calendar, your health data, your contacts is a different value proposition than a generic chat interface - whether that’s on your phone or your Mac. But it needs a vertical. A fitness app where the AI actually sees your workout data. A travel app that knows where you are in real time. Something where the native access is load-bearing, not a party trick.
Knowing when to stop is part of the work. I got real things from this project: streaming agent infrastructure, session management, cross-platform SwiftUI patterns, a novel tool delegation architecture, and a clearer sense of where the native client AI gap actually is. Shipping it as an open-source project and a writeup is a better outcome than grinding for another six months hoping the use case materialises.
The code
Both projects are now open source.
- Actors (backend): https://github.com/shreyashag/chat-bridger-backend
- ChatBridger (iOS/macOS): https://github.com/shreyashag/chat-bridger
If you’re building something that needs client-side tool execution - a browser extension that needs DOM access, a desktop app that needs filesystem access without a server round-trip, a mobile app that needs device APIs - the pattern here might be worth adapting. The ClientTool protocol in Swift is clean enough to extend, and the backend’s session patching approach is model-agnostic.
If you find the use case I couldn’t, I’d genuinely like to hear about it.
Stack: Python 3.13 · FastAPI · OpenAI Agents SDK · Supabase · SwiftUI (iOS · macOS) · CoreLocation · UserNotifications · NDJSON streaming