Wiring up an always-on personal AI agent

The premise was simple. Get a personal AI agent running for myself. Always on, reachable from my phone, on the cheapest stack I could make work.

OpenClaw is the platform that turned out to fit the brief. It’s @steipete’s open-source personal AI assistant. It sits between an LLM and the real world, with chat apps like WhatsApp as the interface and tools (browser, shell, web search) as the hands.

Here’s the shape of what my instance ended up as.

The hardware

Raspberry Pi 5, running Debian 13 (Trixie) ARM64. Sits next to the router. On 24/7. Costs a few cents a day in electricity.
MacBook on the same LAN, reachable from the Pi over SSH. Used as a secondary box for larger storage and a fallback LLM.

That’s the whole fleet. Both are devices I already owned.

The platform

OpenClaw runs as a systemd user service on the Pi, so it comes up automatically on boot and stays running. Installed via npm, configured through a single openclaw.json file. That’s it. No Docker, no Kubernetes, no managed anything.

The brain: GPT-5.3 Codex, via ChatGPT OAuth

This is the part that surprised me most.

OpenClaw can authenticate to OpenAI’s models through ChatGPT account OAuth instead of an API key. For personal-volume use, that means the agent runs on my flat ChatGPT subscription rather than per-token API billing. No metered invoices, no fear of leaving a loop running overnight.

GPT-5.3 Codex is the default model for everything: the chat interface, the scheduled jobs, the tools.

The fallback: local LLM on the Mac

Ollama runs on the Mac with Qwen3 8B, exposed to the agent as an alternative model. It’s not the primary anymore. For the judgement-heavy tasks I throw at it, the hosted model is still meaningfully smarter, but having a private, zero-cost local path in reserve felt worth keeping.

I had originally hoped to make this the primary brain. The hardware reality is that 8GB of RAM on an M1 only realistically gets you to 3B-class models at usable speeds, and those weren’t quite smart enough for what I wanted. Bumping up to an 8B as a fallback was the compromise.

The interface: WhatsApp

The agent lives in WhatsApp. You message it like a contact.

OpenClaw’s WhatsApp channel plugin handles the bridge. Personal WhatsApp, not the Business API. It’s allowlist-based, so only approved numbers can actually talk to it. Setup was much less painful than I expected.

The big advantage: no new app to install, no new habit to build. The thing I already use all day became the control surface for the agent.

The tools wired in

Four of them:

web_fetch pulls any URL: RSS feeds, public APIs, web pages.
web_search does live search via Tavily (free tier is enough at personal volume).
exec runs shell commands on the Pi.
CloakBrowser is a stealth Chromium build that bypasses bot detection on JS-heavy sites that normal scrapers can’t touch.

CloakBrowser is the one I would flag as non-obvious. Most flight, travel, and aggregator sites detect and block conventional scraping immediately. CloakBrowser patches Chromium at the source level so it looks like a real human browser. Without it, half the things I wanted the agent to monitor would just hit a wall.

So the shape of the thing ended up looking like this:

Architecture diagram: phone via WhatsApp to a Raspberry Pi 5 running OpenClaw as the headless gateway, to GPT-5.3 Codex as the brain via ChatGPT OAuth, to web search via CloakBrowser and Tavily. A secondary branch shows the MacBook with Ollama and Qwen3 8B as a local fallback model.

What it actually does

The jobs running right now are kind of random. They’re me poking at the edges, seeing what the agent can do, where it breaks, what’s worth pushing further. None of it is critical workflow, and that’s the point.

Two flavours of scheduled jobs.

OpenClaw cron jobs, where the AI runs the task and decides what to say:

Daily news briefing pulls a long list of RSS feeds covering my work domain, summarises the last 24 hours into something readable, and pings the result to WhatsApp every morning.
Lottery monitor checks the next prize pool and tells me to buy a ticket if it’s above a threshold worth a bet.
Market scan hits a public jobs API a few times a week to surface where roles in my space are moving: which functions are hiring, what skills (especially AI-adjacent) are showing up in the postings, where the market is drifting.

System crontab job, a deterministic Python script with no AI in the loop:

Flight price monitor uses Playwright + CloakBrowser to scrape a flight aggregator for a specific trip I’m planning and pings WhatsApp if the total drops below a target.

The split between the two is deliberate. Summarising RSS feeds is a judgement task, and the AI is good at it. Scraping a flight site for an exact route on exact dates is a precision task, and a plain script with CloakBrowser is more reliable and more predictable.

What I haven’t let it touch yet

The obvious next step is to plug it into the things I actually live in. My email, Notion, the work tools I’m in all day. I haven’t done any of that, and I keep catching myself before I do.

The honest reason is that I don’t trust it enough yet. I still don’t really know where it slips up, or how much I would need to watch over it. Right now when it gets something wrong, the worst case is a slightly off news summary, or a lottery reminder I didn’t need. Nobody gets hurt. The day it’s sitting in my inbox, a small mistake suddenly costs me something real.

So I’m letting it earn that slowly. I would rather live with it a while longer and get a feel for its habits before I hand it anything that matters. Email and Notion will probably happen at some point. Just not while it can still surprise me.

Looking back at what made it work

A few things made this work better than I expected, and they weren’t the ones I thought would matter.

Keeping it on the Pi instead of a cloud server was the first. I had assumed I would need to rent a little server somewhere, but the whole thing sits happily on a box next to my router for next to nothing. No monthly bill hanging over it, which matters a lot when it’s still just a thing I’m playing with.

Talking to it through WhatsApp was the other big one. I didn’t have to build an app or talk anyone into installing something new. It lives in the chat app I already have open all day, and honestly that is the only reason I actually use it.

The thing I didn’t expect to matter so much was splitting the jobs into two kinds. When the task needs judgement, like summarising the news, I let the AI run it. When it needs to be exact, like checking a flight price, I use a plain script instead. Working out which job is which saved me more grief than any clever model ever would have.

And then there is the cost. Running it off my flat ChatGPT subscription instead of paying per token means I never watch the meter. I can leave jobs running overnight without bracing for a bill. That one thing is the difference between treating it like something fragile and actually letting it loose.

Where Claude Code fit in

Claude Code is the thing that actually let me get all of this wired together without stalling.

I’m a business operator, not a deep engineer. The Pi side, the systemd service, the WhatsApp bridge, the OAuth setup, the cron syntax, the Playwright script, the CloakBrowser config. Every one of those was something I could have spent a week getting stuck on. Instead I talked through each problem with Claude Code and kept moving.

That’s the part of this experiment I keep coming back to. The bottleneck used to be technical depth. Now the bottleneck is mostly just deciding what I actually want the thing to do.

Where it goes next

The setup isn’t finished. I keep adding tools, tweaking what’s scheduled, refining how the agent talks to me. Some of it works. Some of it doesn’t.

The longer-term plan is to swap the Pi out for a new Mac mini once the next chip lands. Same headless role, but with enough horsepower to actually host a usable local model in one box, which would collapse the brain and the gateway onto the same machine and probably let me move off the hosted backend too.

For now though, Pi + WhatsApp + GPT-5.3 Codex on a ChatGPT subscription is the sweet spot. The whole point of the exercise was figuring out how low the floor is. Turns out the floor is pretty low.

— Howie