Writing

Wiring up an always-on personal AI agent

The premise was simple. Get a personal AI agent running for myself — always on, reachable from my phone, on the cheapest stack I could make work.

OpenClaw is the platform that turned out to fit the brief. It’s @steipete’s open-source personal AI assistant. It sits between an LLM and the real world, with chat apps like WhatsApp as the interface and tools (browser, shell, web search) as the hands.

Here’s the shape of what my instance ended up as.

The hardware

That’s the whole fleet. Both are devices I already owned.

The platform

OpenClaw runs as a systemd user service on the Pi, so it comes up automatically on boot and stays running. Installed via npm, configured through a single openclaw.json file. That’s it. No Docker, no Kubernetes, no managed anything.

The brain: GPT-5.3 Codex, via ChatGPT OAuth

This is the part that surprised me most.

OpenClaw can authenticate to OpenAI’s models through ChatGPT account OAuth instead of an API key. For personal-volume use, that means the agent runs on my flat ChatGPT subscription rather than per-token API billing. No metered invoices, no fear of leaving a loop running overnight.

GPT-5.3 Codex is the default model for everything — the chat interface, the scheduled jobs, the tools.

The fallback: local LLM on the Mac

Ollama runs on the Mac with Qwen3 8B, exposed to the agent as an alternative model. It’s not the primary anymore — for the judgement-heavy tasks I throw at it, the hosted model is still meaningfully smarter — but having a private, zero-cost local path in reserve felt worth keeping.

I’d originally hoped to make this the primary brain. The hardware reality is that 8GB of RAM on an M1 only realistically gets you to 3B-class models at usable speeds, and those weren’t quite smart enough for what I wanted. Bumping up to an 8B as a fallback was the compromise.

The interface: WhatsApp

The agent lives in WhatsApp. You message it like a contact.

OpenClaw’s WhatsApp channel plugin handles the bridge — personal WhatsApp, not the Business API. It’s allowlist-based, so only approved numbers can actually talk to it. Setup was much less painful than I expected.

The big advantage: no new app to install, no new habit to build. The thing I already use all day became the control surface for the agent.

The tools wired in

Four of them:

CloakBrowser is the one I’d flag as non-obvious. Most flight, travel, and aggregator sites detect and block conventional scraping immediately. CloakBrowser patches Chromium at the source level so it looks like a real human browser. Without it, half the things I wanted the agent to monitor would just hit a wall.

So the shape of the thing ended up looking like this:

Architecture diagram: phone via WhatsApp to a Raspberry Pi 5 running OpenClaw as the headless gateway, to GPT-5.3 Codex as the brain via ChatGPT OAuth, to web search via CloakBrowser and Tavily. A secondary branch shows the MacBook with Ollama and Qwen3 8B as a local fallback model.

What it actually does

The jobs running right now are kind of random. They’re me poking at the edges — seeing what the agent can do, where it breaks, what’s worth pushing further. None of it is critical workflow, and that’s the point.

Two flavours of scheduled jobs.

OpenClaw cron jobs — the AI runs the task and decides what to say:

System crontab job — a deterministic Python script, no AI in the loop:

The split between the two is deliberate. Summarising RSS feeds is a judgement task — the AI is good at it. Scraping a flight site for an exact route on exact dates is a precision task — a plain script with CloakBrowser is more reliable and more predictable.

What’s deliberately not connected (yet)

The obvious next integrations would be email, Notion, and the project management tools I live in every day. None of those are wired up right now, and that’s on purpose.

This is exploration mode. I want to understand the limits, the failure modes, the rough edges — what the agent quietly gets wrong, how often, what kind of supervision it actually needs — before I hand it the keys to anything I rely on. That kind of confidence is hard to build if the first thing you do is point it at your inbox.

Email and Notion are on the roadmap. Just not yet.

Design decisions worth flagging

Where Claude Code fit in

Claude Code is the thing that actually let me get all of this wired together without stalling.

I’m a business operator, not a deep engineer. The Pi side, the systemd service, the WhatsApp bridge, the OAuth setup, the cron syntax, the Playwright script, the CloakBrowser config — every one of those was something I could have spent a week getting stuck on. Instead I talked through each problem with Claude Code and kept moving.

That’s the part of this experiment I keep coming back to. The bottleneck used to be technical depth. Now the bottleneck is mostly just deciding what I actually want the thing to do.

Where it goes next

The setup isn’t finished. I keep adding tools, tweaking what’s scheduled, refining how the agent talks to me. Some of it works. Some of it doesn’t.

The longer-term plan is to swap the Pi out for a new Mac mini once the next chip lands. Same headless role, but with enough horsepower to actually host a usable local model in one box — which would collapse the brain and the gateway onto the same machine and probably let me move off the hosted backend too.

For now though, Pi + WhatsApp + GPT-5.3 Codex on a ChatGPT subscription is the sweet spot. The whole point of the exercise was figuring out how low the floor is. Turns out the floor is pretty low.

Back to building.
— Howie

If this was useful, pass it along.