Self-Driving Software: Solver Launches Autonomous AI Coder

What self-driving is to the automobile industry, Solver hopes to be for the programming industry. Solver is the brainchild of Mark Gabel, formally the chief scientist at Viv (an AI assistant company founded by the creators of Siri and then sold to Samsung in 2016). Gabel claims Solver is a paradigm shift in AI-assisted coding, because it allows developers to hand off tasks for AI to build autonomously. He says it’s a clear step up from mere “AI autocomplete tools” like GitHub Copilot, Cursor and Devin.

“This idea of level 4, full self-driving software engineering — the ability to sort of throw tasks over the fence and let AI work on them autonomously — this is so clearly the future,” Gabel told me.

Solver CEO Mark Gabel demonstrating his autonomous AI system.

Ordinarily, this kind of talk from Silicon Valley can be dismissed as hype, but according to Solver, there is one key industry benchmark that will soon support its big claims. Solver says that it will shortly attain the number one spot on SWE-bench, an AI benchmark run by Princeton University “for evaluating large language models on real-world software issues collected from GitHub.” Note: At the time of publishing this article, SWE-bench did not include Solver — but we will update this post when it appears.

Update, 23 October 2024: SWE-bench has now updated (see screenshot below) and Solver is listed at numbers 2 and 4 in the “Verified” section, behind Claude Sonnet 3.5, which received an upgrade yesterday.

SWE-bench screenshot showing Solver at numbers 2 and 4

SWE-bench screenshot dated 23 October 2024, showing Solver at numbers 2 and 4.

There is more Siri and Viv DNA in Solver than just Gabel. His technical co-founder is Daniel Lord, a former software engineer at both Siri and Viv Labs, who is head of engineering at Solver. The pair have recently been joined by Dag Kittlaus, one of the co-founders of Siri and CEO of the company when it was sold to Apple in 2010. Kittlaus, who is executive chairman of Solver, told me he joined because “this team [Mark and Daniel] are the cream of the crop of the team that’s worked for me for the last two companies, so I was very eager to get involved when Mark invited me to do so.”

How Solver defines “level-4 programming automation”.

Solver Product and LLMs

Solver will initially be available in private beta as a web application, but it will soon release an API that can hook into IDEs like Visual Studio Code and those from JetBrains. It also claims to be “relatively language-agnostic,” although some of the features are temporarily restricted to Python. “If it’s text and checked into Git, Solver can work with it,” the documentation states.

“When you train LLMs on all sorts of languages, the AI finds the connections and similarities between them,” Gabel explained. “So we try to train it [Solver] on everything, and we try to evaluate and use it on essentially everything — so it can even write stuff like COBOL.”

On the LLM side, Solver is using a mix of industry-leading models and its own proprietary model.

“We are using the latest in frontier models,” Gabel said. “So we’re certainly using, like, [OpenAI] o1, GPT-4 Pro, and [Anthropic] Sonnet, and others in that category. We also have fully proprietary models as well. So we trained one of our own foundation models entirely from scratch and created a family of proprietary models called TOTAL-HUNK.”

How Solver Works

I was given access to the private beta, where I could connect to my GitHub repositories to both train on that data and work with it. The main screen is basically a giant prompt to “describe your task.”

Solver home screen.

So, what exactly does Solver do? The company says it “not only writes code, it can run, test, improve, and iterate.” In the demo that Gabel showed me, he mostly focused on using Solver to optimize code and to fix bugs. The company refers to this as “scut work that consumes the majority of [a developer’s] time” and says that Solver will “automatically complete” such work.

The company adds that users “can also delegate entire tasks to Solver.” To demonstrate this, Gabel showed me an app his team uses called Linear, which he said was an issue tracker similar to Jira. In the demo, there were some tests in a branch that were broken. First he showed me what he would do prior to Solver: he’d cast his eye over the code, try to figure out which tests were failing and where exactly the bug was, and (as he put it) “hunt and peck” for a solution. He might use GitHub Copilot, or a similar tool, to suggest a solution. But with Solver, he continued, he simply types the following in the text box: “Alignment tests are broken on this branch. Try to fix them.”

In the demo, there was a little wait while Solver scanned the repository. “Forgive us, those few extra seconds were scanning millions of lines of code,” Gabel commented. “It was basically just absorbing the entire thing, what we call repository-based reasoning.”

Solver scanning a repository.

Gabel then explained what happened after the repository was scanned during this demo.

“In this case, what it [Solver] ended up doing was absorbing all the information relevant to the task, running the failing test, and then figuring out how to fix the failing test, fixing it for me, then running it again to kind of certify to me that it was done.”

After that, you can open a pull request and sync the automated fix to your main branch.

Of course, this is a demo only and so it remains to be seen how successfully developers in the wild will be able to get accurate solutions from Solver. But at the very least you can see how attractive this will be for developers, particularly when fixing bugs and otherwise optimizing their programs.

Solver after it has finished solving a task.

Elastic Engineering

The Solver team believes its autonomous coding product will help developers build and maintain software products at scale — what it has termed “elastic engineering.”

“The vision here is to make software engineering a scalable utility, and I believe that we’ve done that,” said Gabel. “And really the big differentiator here is that, unlike this new wave of AI autocomplete tools — like your GitHub Copilots and stuff like that — the big difference here is developers can delegate entire tasks to Solver, close their laptop, walk away, come back and have them done.”

As per the demo, it looks likely that Solver will be used mostly for “scut” tasks — “from unit testing and debugging to refactoring, performance optimization, and more,” as the company puts it. So this probably won’t be a magical genie that creates entirely new software programs for you — but then why bother being a programmer at all if that was the case?

Originally published at The New Stack: https://thenewstack.io/self-driving-software-solver-launches-autonomous-ai-coder/