I haven't shared much about this but in my free time I've been venturing into the ...

I haven't shared much about this but in my free time I've been venturing into the self-hosted AI space. I acquired an old gaming machine with a decent graphics card from 4 years ago (RTX 4070S) and put linux on it and spend some time getting hermes agent (https://hermes-agent.nousresearch.com/) running on it.

I got it running with various sparse versions of Qwen 3. Managed to cobble together a few scripts to do things like scrape some news and flight data, but I kept running into timeout errors at various levels of the hermes stack. It's really not set up to work with agents that take multiple minutes to respond and after fixing things in a bunch of different places I got tired of it and switched it back to claude. I did find a fork that supports Zulip and I really love it as an interface for many long-running async conversations.

Then I decided to try to some autonomous coding with local models and fell down hard into the Steve Yegge beads/gas town/gas city rabbit hole. I took gas city (which is like the sdk for agent interactions extracted from gas town) and got it running. I tried running the entire thing with only local models but it wasn't working at all. I ended up with Claude as the mayor of the city who oversees a bunch of short-lived agents that use qwen on my gpu and try to write code and open PRs. They aren't doing a very good job yet but the mayor and I learn and improve things a bit more every day.

I'm not a fan of the super-extractive metaphors of gas town but I do really like beads db as a system of getting agents to cooperate. It's basically an issue tracker, but some issues get labeled as memories and some get labeled as mail and some even represent agents, so it creates an observable system of cooperation where agents spin up, read their mail, complete their task, and hand it off to another agent, then shut down. I'm trying to run all these agents serially to limit gpu contention and it somewhat works. But it's going against the system's design which is just to have a mega bonfire of tokens. The biggest weakness I think is just that the free models that fit in 12GB of vram are not enough to do good coding. But the goal I'm working towards is getting frontier-quality code with free models on my own machine by chaining together enough hill-climing loops (planner, coder, architect review, qa review, bounce it back to coder, etc.) to get good code. And I'm thinking a lot about what the right interface is for me to review the work, right now it's just producing pull requests that I review normally.

This has been my first time running --dangerously-skip-permissions agents 24/7 on my hardware and it feels quite cyborgian.

mplorentz on Nostr: I haven't shared much about this but in my free time I've been venturing into the ...