This is my AI Assistant, and there will be many like it

If you are a software person, vibe coder, etc., sitting here wondering where our do-anything AI assistants were, you may be using one already.

With a combination of tools and code we already have, accessibility APIs, MCP, skills, and relelentless optimism, CLI agents like Claude Code, Codex, etc., are able to complete many "real life" tasks with minimal guidance. If it has a webpage with some semblance of an API your agent can probably burn tokens until it figures it out.

As many of us might have considered, it seemed fun to go out and build an assistant of sorts around these CLIs. This led me to a working, open-source, deployable, semi-sovereign local-ish assistant system that anyone can run.

When I ask to pay a parking ticket, it will go out and write an app and iterate until it can prompt me for payment information. If I want to know the weather forecast, it will render a custom chart that focuses specifically on precipitation so I can know when I need to plow my driveway. Or when I ask it to write down an idea in our shared Apple note, it will fight with the Mac OS accessibility APIs until it adds them as I ramble on.

Anyway, I built an AI assistant that I call Bigwig and here are some demos, use-cases, and capabilities.

The source and installation instructions are available on GitHub. The iOS app isn't approved but you can email me to join the TestFlight, my address is on the homepage of this site.

How it works

Bigwig iOS AppWebRTC • Audio I/OWebRTCaudio streamOpenAIRealtime APIgpt-realtimetranscriptionsideband wstool routingPOST /sessionverified pairingPUBLIC NETWORK (self-hosted / PaaS)$ bigwig webverified pairing • session proxy/worker wsconnect_callPRIVATE SANDBOX (Docker / Linux / macOS)$ bigwig workersideband client • tool routingtool callsrun_taskCoding Agentamp / Claude Code / opencode / etc.Bridge:9100eventsfilescliskillsbrowsercodeevents &resultsvoiceContent Cardsfilecodemessageprogressimages • screenshots • markdowntask history • status updatesask_user Patternstextselectconfirmformprompts • menus • yes/no dialogsrich forms • file uploads • camera0 / 9

Realtime API as an Orchestrator

The iPhone app connects to OpenAI Realtime for low-latency voice and data inputs over WebRTC. It has tools that it can run that execute on the worker via the workers sideband websocket connection to OpenAI, as well as tools on the iOS client. All tool calls get broadcast to all channels (websocket sideband, WebRTC data channel), but the worker and client (iPhone app) can decide which tools to act on.

Run your own computer, host the connectivity

I love BYOC (bring-your-own-compute) for a "data plane" like this worker, and think it's getting easier. I would probably not use the version of this service that is a SaaS, as the data I create and manage with the coding agent may be sensitive, may grow unbounded, and locks me into a particular structure. I also want to hook into that data, at times, and do fun additional work on it. The flexibility of running this where I want is truly great. However, you do need to host the webserver in a way that works for you, behind a VPN, in a local network, or on the public internet. If you don't trust your network, which you probably shouldn't, you need to put the server behind mTLS.

Local hosting also means that if I were a home automation nerd (I'm not) you could allow this machine to run in your network with targeted connectivity to home automation systems or other devices and build some very fun and truly intelligent automations. Imagine describing the "vibe" of a lighting environment you want and your agent writing code or running CLIs to imagine what that would look like.

Use your preferred agent

The CLI coding agents are too damned good. And there are many with unique strengths. I am personally a fan of the expensive but exceptionally well-designed Amp, but they are, in this system, all treated similarly. As of publishing, it supports Amp and Claude Code.

When tasks come in, a CLI agent is started from a pool and not spun down until the session ends from the user. The agent manages long-running context and communicates with the voice agent over stdin/stdout, and communicates directly with the iPhone client over a proxied websocket and custom tools like send_html, send_markdown, request_file, ask_user, etc.

Why this approach?

  • CLI coding agents already work. By using coding agents for general problem solving, you can ignore a variety of "agent problems": memory (filesystem, git), tool calling and development (a cli agent provides many, many useful tools and can write their own on the fly), or long-running persistence (they are all increasingly tuned to solve the problem, no matter what).
  • Voice is underutilized. All this automation ideally unencumbers us and gives us the opportunity to be in the world, moving around, doing things with ourselves. I have always felt voice is an amazing interface to support that. Current voice experiences are held back by narrow and underpowered tooling, but this feels like an (incredibly sophisticated) leap forward, given the range of tools the underlying coding agent can ochestrate.
  • Agents can and should write small apps for you. I believe we will see more custom, "just-in-time", tailored software. A CLI agent can create an encapsulated "application" for your request in what is effectively realtime. Skills can optionally encode this for future re-use or sharing. I'm also very interested in projects like Google's A2UI that provides tools for generating UIs.
  • Reducing risk through sandboxing. My best effort here was to treat an AI assistant like another human, and give them their own accounts, their own sandboxed and DMZ'd computer with guardrails on their network access, but, in my opinion, points 1 and 3 above require the --yolo equivalent flags to make magic. Controlling specific tool use in a static way is extremely limiting and inflexible. This demands better solutions for secrets management, authentication workflows, and sandboxing.
  • I like nice mobile phone experiences. I did not want to use a terminal on my iPhone, iPad, or text a context-rich language model on iMessage or similar. I wanted a bit more interactivity, history, and a dedicated experience. I want to talk "on the phone" with this agent while driving and use native autofill, uploads, and so on.

I have no idea if anyone else would use this. I do think there's a revolution towards highly personalized software as the cost of building it continues to spiral down. Hence the title of this post.

This is all open source and the iOS application will be public on the app store, with a configurable endpoint for connectivity. The bigwig bun-compiled TS binary contains the worker process (the data plane that you run on a home computer, public VM/container, etc.) and the web process, which can run on whatever network you deem appropriate.

I hope you are having as much fun as I am.

Written December 2025.