$ cat rpa-was-right-all-along.md
Date: 2025-11-30
I’ve been trying to use the Comet agent almost every day.
Partly because it’s fun, partly because I really want the “personal AI assistant” dream to be real. I open the sidebar, enter a simple prompt, and watch it start clicking around while i'm off browsing something else.
And every week there’s a new twist on the same idea: "Meet your AI coworker for ."
Coworker for finance. Coworker for ops. Coworker for your inbox. A coworker that "lives in your browser" and “just does the work.”
This pitch: an AI "employee" that works 24/7, adapts to your tools, remembers your context, and costs less than an intern.
It’s the enterprise dream. If that were real, every ops and finance team would already have one.
They don’t. And I don’t think the reason is just “the models aren’t good enough yet.” Even if you gave me a model with 10x better accuracy and a 10x larger context window tomorrow, I still wouldn’t hand it my bank login or payroll system and say “go wild.”
The problems I keep running into are structural:
That’s why the winning design won’t look like a computer-use agent. It’ll look like a compiler.
I’m building one version of that idea (Granite), so assume some bias. But even if you ignore the product entirely, the architecture is what matters.
When people say “AI coworker,” it's usually some variant of this loop:
while not done:
screen = take_screenshot(state)
next_action = llm.decide_next_action(screen)
state = desktop.do_action(next_action)
It’s impressive in isolation.
But then you run it a few more times:
For an enterprise, that’s a terrible combo:
You can add layers of protection with guardrails, approval flows, and manual reviews, but at that point, it's not much of a coworker.
What most teams actually want sounds more like:
"Please just do this boring task the same way every time. If you can't do it, then I'll take care of it."
This brings me to the pattern I keep running into.
Back-office work almost everywhere follows:
For example, in accounting, 95% of invoice processing is boring: fields line up, the customer is known, numbers match, and dates are fine. The other 5% is where a PO doesn’t exist, or the customer is on credit hold.
Humans earn their share of that 5%.
The current “AI coworker” inverts this. We use a non-deterministic loop for the 95% that could be a program, and then ask the same loop to improvise on the messy 5%.
Assume your agent is 99.9% reliable per action (an extremely generous estimate). A realistic business workflow can easily involve 200 distinct actions (a very conservative estimate).
The probability that an entire run is perfect:
So even with 99.9% reliable steps, about 18% of runs fail somewhere.
Push it to 99.99% per step:
Now “only” 2% of runs fail. That still means 20,000 broken cases per million. That also only accounts for independent errors; real runs often have cascading failures.
That’s not “good enough.” We’ve chosen the least predictable, most expensive mechanism for the part of the work that is most amenable to being turned into a program.
We already have a thing that clicks through UIs in deterministic ways: RPA (UiPath, BluePrism, etc.).
It has automated huge chunks of enterprise work for years:
RPA bots log into SAP, Oracle, legacy web apps, and other enterprise tools. They click the same buttons in the same order every night, and the world mostly keeps working.
Engineers hate them for good reasons. RPA is painful to build and maintain. From what I’ve seen (and done myself), it takes weeks or months of building and testing to have a complete workflow. It often requires expensive consultants who deeply understand both the business and the RPA software.
So most companies end up with:
The short head exists because determinism + predictable cost matter. The long tail exists because creating that determinism is too expensive.
The new “AI coworker” pitch basically says:
“Skip the scripting. Just let a smart agent drive the UI live.”
I think the more interesting direction is:
“Keep the determinism. Make creating it feel like a demo and not a project.”
That’s where the “compiler” mental model comes in.
Let's redraw the workflow.
Agents as co-workers today:

But in my head, the ideal pipeline would look something like this:

This allows the deterministic workflows to own the 95% and AI agents to carry the unexpected 5%.
Once you view the future of agentic coworkers as compilers, the enterprise applications become more reasonable. You can now:
This is where our compiler model really earns its keep.
Let’s look at another fairly simple task of automating invoice approvals.
You want the invariant:
“If the amount changes after the user approves, abort.”
As code, this would be boring:
def approve_invoice(driver, invoice_id):
driver.open_invoice(invoice_id)
before = driver.read_amount()
driver.click("#approve-button")
driver.wait_for(".toast.approved")
after = driver.read_amount()
if after != before:
raise RuntimeError(
f"Amount changed during approval for {invoice_id}: {before} -> {after}"
)
This is trivial. If the invariant breaks, the workflow stops.
But with an end-to-end agent, you’re relying on the model to:
You can prompt for that, but you can’t assert it with the same confidence, and that's the gap that the compiler model closes.
I’m currently building Granite around this compiler mental model.
Right now, it works roughly like this:
You describe a manual computer task or business process in plain language. Or you sit down at a VM and do it once or twice the way you normally would inside the tool you want to automate. Granite records what happens and uses an LLM to learn how that software is being used.
Granite then creates a deterministic automation that you can parameterize and trigger via API.
It also self-heals and fails loudly, falling back to a tightly scoped agent whose job is to diagnose, propose a change, and patch the workflow. You can also choose to review those changes, gate them, or let them flow automatically.
We’re also building a memory store on top of this, so our agents can learn over time which workflows tend to fail, which repairs worked before, and which inputs correlate with which edge cases. That memory lives outside the model and gradually improves how new workflows are compiled and how failures are handled.
The long-term goal is still something that feels like a coworker. Just not one that roams freely on a desktop.
More like:
The “AI coworker” in that world:
If that works, then the “coworker” part shows up as orchestration using deterministic workflows as the "hands" to execute tasks and workflow suggestions to make it better.
It should be a diligent engineer who writes and tests automation but refuses to be clever in production.
And that, personally, is the coworker I'd be willing to give a login to my bank.
If you're building something similar or have feedback or opinions, I'd love to connect! Reach out to me on X or LinkedIn.