Back to Index

The Many Ends of Programming - Ray Myers


Transcript

Hi, I'm Ray Myers. I'm currently Chief Architect at All Hands AI, makers of the leading open source coding agent called Open Hands. But I'd like to talk about something different today. And I'm proud to be presenting at the online track for AI Engineer World's Fair. But I feel like I've actually snuck in the back door today because this is not an AI talk.

Perhaps this is not even a programming talk. This is a talk about empathy. This is a talk about listening to each other. But if those things are uncomfortable for you, don't worry, because we will have the pleasant, comforting backdrop of AI and programming. For starters, let me take you through a day in the life of an AI skeptic, which is the role I so often find myself in.

What happens is someone will say something provocative in public, like you may have seen the CEO of Anthropic, Dario Amadei say a few months ago at a Council on Foreign Relations interview. Many interesting things in that interview, but a quote that we saw got shared around a lot was in 12 months, we may be in a world where AI is writing essentially all the code.

Right now, speaking to a general audience, as software engineers, we hear a little bit different when you're saying the part about writing the code, because we understand that the job contains, you know, other factors. But regardless, I posted a friendly challenge in response to that to do with, like, could you replace the software in one mainframe?

Can we kill one mainframe even, how difficult is that right now with these AI tools that are soon to supposedly write essentially all the code, right? And the specifics of my challenge are not really that relevant right now. So, just over the past two years, I have repeatedly said different forms of this.

I have repeatedly said different forms of this LLMs, large language models, break old code. I say that a lot because I think it is ignored, simply the importance of maintaining old code and keeping it alive, that already is ignored and the extent to which these AI tools are performing much better on writing new code than in editing code that already exists, I feel like that's sort of doubly ignored as a result.

I feel like I raise pretty basic questions and point out pretty obvious limitations a lot of the time and get somewhat extreme reactions, honestly. You know, I've been called a Luddite. I've been told I have my head in the sand. I've been told I'm missing the big picture. In the case where I posted that challenge, actually, the full quote was, I was completely missing the big picture so much that it physically hurts to read my post.

And, you know, if my posts have hurt you, I'm sorry. Honestly, it doesn't feel good to be talked to in any of these ways, right? So you may have had to feel that way at some point as well. Or maybe you've been told this. You've been told that you'll be left behind.

I struggle with this one, honestly. I feel that left behind is verbiage better suited for, you know, some post-apocalyptic religious prophecy in the form of a B-movie franchise starring Kirk Cameron than in some sort of nuanced technical discussion. I heard this one recently from someone who was saying it with a straight face and is someone whose work I respect, you know, goes back a long time.

They said resistance is futile. Again, I cannot comprehend what would make someone want to say things like this. For all the money in the bank, do you recall what that quote is from? Was it from the hero of that story? No. It's a quote from the Borg, from Star Trek, one of the most notorious villains in the entire science fiction genre.

If we find ourselves quoting the Borg in earnest, maybe we should reassess what side we're on. When Picard was captured and being mind-controlled by the AI, he said resistance is futile as Locutus. But the real Picard would never say resistance is futile. Now, Picard would say if you're on the side of truth, you should resist to the last breath.

He embodied that again and again. But we need to breathe. I'm responding to emotionally charged rhetoric and now I've started spewing out my own emotionally charged rhetoric in response. This is not helping me listen to these people. They mean well. You know, I've chosen to engage. I've made the decision to engage in these conversations and I need to be able to try to do that productively.

I need to be able to hear people out. I think we need to back up and decide what are we even talking about. It is something very important. It makes sense that a lot of us feel strongly about it, which then leads to us being in conflict with each other.

Because we're pondering, what is the future of software? What is it going to look like next? What can we make it? I mean, that is deeply interesting. If there's one thing I am grateful to AI for, it's probably, even more than the technology, the opportunity for us to have all be thrown into this one conversation about the future of the craft.

It's a very difficult conversation to be in of late. And I'm going to try to make it a little easier with this. But I think it is an important one worth having. I've identified six scenarios that seem to be embedded in the views people have been putting out over the last few years.

And I'd like to share them with you for the remainder of this talk. Let's get started. These are Extreme Completion, the Devocalypse, the Abstraction Leap, Uncharted Waters, the Review Economy, and the Infinite Pile of Garbage. We will discuss each of these briefly. Now, Extreme Completion is probably the most conservative of these views because it is already happening.

We can just see it happen. There's no real doubt about whether this one will happen. It's just a matter of, you know, how much with what impact. And that is simply the autocomplete style editors like Cursor, like GitHub Copilot are just going to continue to help us type more of the typing and be a great convenience, right?

So in that scenario, in Extreme Completion, our job doesn't fundamentally change. Even as they progress to these agents that can take a few more steps, you can still have, you know, fairly extreme completion with it still needing to be on an engineer's lease most of the time. Such that you could argue our role is not changing like a great deal.

So pretty much everyone agrees this at least is happening and is a somewhat significant shift, right? This is what this looks like. If you're at this conference, you've surely seen it. This is an example of using the cursor IDE, which is a fork of VS Code with a lot of AI features built in.

So there's a function called clean prime to remove trailing white space. And I've prompted it here in this little pop-up thing that happens when I hit control K, make the function clean prime prime to remove both leading and trailing white space. And sort of based on that example, it's going to make me another Haskell function pop out.

There it is. This is pretty cool. By the way, the reason I happen to pick the programming language Haskell for this example is that it has a very strong typing system. I think that type theory is a very promising counterbalance to add more certainty into the flow when we have the uncertainty of LLMs doing code gem.

Regardless, here's another example that is maybe a little more where we think things are going, where I have delegated an entire task via Slack. So I'm just in my work chat here. And this is a new experimental way of interacting with the open hands agent. I've said, hey, there's this pull request here where I've added this thing to a log statement and probably some other things ought to have that too.

Could you just poke around and make a PR that's adding that where it ought to go, right? And a few minutes later, boom, comes back this 48 file pull request. Like, this is really cool. You showed this to me like a couple of years ago. I would have thought this was unreal that we have this.

Nonetheless, it is a fairly discreet, you know, task. Not a lot of quote unquote, you know, thinking needed to be done here. But there's a lot of grunt work that's been taken up by being able to do stuff like this. This sort of, you know, all purpose tech debt dirt shoveler.

Really neat. Still, I file it under extreme completion. It is on a very short leash. A lot of expertise still involved. Much more extreme is the scenario of the devocalypse, the developer apocalypse. At least that's what it is to us as software developers. If you're not a software developer, if you are dependent on software developers, actually, they don't see this as an apocalypse at all, right?

They see this as the innovators paradise scenario because they're no longer dependent on us to bring their ideas to fruition. It sounds very nice from their point of view. And I'll say this, even though we benefit from this not happening at the moment, if it actually is possible, if we can deliver it, I think it is desirable.

Like, we should do that. I think really that would be programming fulfilling its destiny. It would be finishing the project of computer science. You know, the objection to it wouldn't be this isn't desirable. I think the objection would be it's not feasible to do. But, of course, because of these different incentives, someone who sees us objecting to it can always say, hey, you just want to keep your job.

And, like, I do want my friends to keep our jobs, right? Of course, I want that. But I just don't think that's why I'm saying the things that I'm saying about how some of these solutions look unsustainable. You know, I think we have real expertise that these critiques come from.

So if you're going to predict the devocalypse, you still must say how we are going to get there, right? And abstraction leap is one of those ideas. It has a few flavors I'll talk about. But basically, if you're a believer in abstraction leap, you think that what we currently think of as code will no longer be the kind of level of abstraction, the substrate in which we do our main work.

Doing something like Java code or Rust code today, you know, that will eventually be in the position of assembly language or JVM bytecode or LLVM bitcode or something to that effect. Only highly specialized people would need to do that. Most of us can just live up here doing something more pleasant and productive, whatever that may be.

But internally, it's ultimately code-like as we currently understand it, right? So how does that happen? One way people say this will happen is with the prompts as code flavor of the scenario, as I call it. So natural language instead of source code becomes the main human-facing artifact that we manipulate.

So how does that scale? Well, maybe it gets some kind of structure, right? Maybe there are a bunch of little requirements inside folders that are interconnected. I don't know. It is structured somehow. It is tested somehow. Now, the objection to this, other than maybe, you know, vagueness, would be predictability.

There is serious reason to doubt that LLM prompts constitute an abstraction, at least as far as something that you're able to really build on because of how unpredictable it is. Perhaps in order to build really large, long-lived projects, this just is not a sturdy enough foundation. It's not a clean abstraction.

You could say that it will become one. We have to see that happen. Or, you know, some people believe that it's possible to make up for that unpredictability with some sort of control. Here's an attempt to do this, right? This is actually from a few years ago now, the parcel paper.

And you can go to this GitHub repo, if you like, or you can read the paper. And this is an example, for instance, where they've got 61 lines of just these text prompts and example input and output of all these different functions. And from these prompts and examples, it generates a 220-line Python program that functions as a Lisp interpreter, right?

So they've, using these structured prompts, generated an entire Lisp interpreter. Like, that's pretty neat, right? And yet, you know, we largely don't believe structured prompts as code as a way to build real applications is a solved problem still. So, you know, when you try to operationalize this, are we going to be able to make this into a real product that is better than its alternatives?

I think very much still up in the air. Another flavor of abstraction leap, I think personally is a little more promising, is domain-specific languages or DSLs. Now, if you believe you're unfamiliar with DSLs, actually, you probably are familiar with them, because in programming, there are many of them that occur, right?

So, CSS, right, that you use to style web pages, art example, or SQL queries, or regular expressions, right? And these are a domain-specific language for particular programming tasks. And then you also have ones that are made for particular business domains, even things like Excel, you know, arguably are a domain-specific language.

So, these are very prolific, very successful oftentimes. And what you do if you're operationalizing this, you know, in an enterprise scenario, is you ultimately are investing in creating a particular specialized programming environment that, you know, optimizes for the kinds of thoughts you usually need to express in your business domain, right?

And that is this upfront investment that can yield great reliability, great quality, great productivity, you know, when it works well. And it can also backfire, of course. And over the last 20 years, the kind of risk-to-reward trade-off has been steadily improving with lots of tooling that's made it easier to create these specialized environments, such as language workbenches, right?

So, examples of that are JetBrains MPS, right? One called LionWeb, one called Xtext. There's one, you could call it a language workbench. They would say it is a language-oriented programming environment called Racket, which is a very interesting Lisp dialect. And there is every reason to believe that language models in various ways would even further improve the cost, you know, risk-reward trade-off of creating these specialized environments.

Can they help us generate, you know, some of the code to process these DSLs? Or can they help on the side of the editor giving the, you know, business domain users suggestions to allow them to more quickly adapt to these languages? So, I think this is a very promising area.

I've actually declared it the year of DSLs on my YouTube channel, Craft vs. Cruft. On the right, you can see my announcement of that. It's a recent video if you want to dive a little deeper or better yet. You could watch this talk called Empowerment of Subject Matter Experts by DSL aficionado Marcus Volter.

But again, an abstraction leap scenario. Ultimately, what's underneath that abstraction is still fundamentally code-like. It is still basically following the rules as we understand them today. Next, we have uncharted waters. Many people argue that the future, the foundation, will not even be code-like. It will be like nothing we have seen before.

So, what does that mean? Some say that it will be direct model inference. So, you will be not just using LLMs in code or using LLMs to code, but LLMs will be like the processors themselves. That model inference will simply be the new computation. Or maybe the AI becomes super intelligent and it invents a new programming paradigm we can't even conceive of.

I think uncharted waters are certainly possible, but in order to plan for them, in order to take these possibilities seriously, we need to chart them. We need to really see what works and what really scales. And I don't think we've seen anything like this really prove out yet. Getting back to the mundane, we have the next-to-last scenario, the review economy.

So, in this one, this is created, for instance, by the extreme completion. And we have very cheap to create all these pull requests, like the one I generated, you know, in that previous slide. And we're still stuck checking their output, because they're not good enough to just completely approve.

So, ultimately, we are just reading pull requests from these AIs slinging them at us, you know. Many people find this to be kind of a dismal scenario, like the least fun part of the job has just become our whole job. Kind of depressing. I don't see this so much as an endgame, but maybe it will be a pit stop.

There are certainly companies that have already seen, you know, this is some kind of reality. But I think it's a pit stop along to something better. I think it is a sign that you're not managing your bottlenecks well. Pro tip, there's a body of knowledge called Theory of Constraints, introduced by Ellie Goldrath, starting in the novel The Goal.

But there's a lot of work in its sense. But the thinking processes in Theory of Constraints are really helpful whenever you're in some situation that it looks like this, right? Where everything's held back by this one choke point, like in this case, manual developer review. Examples of what, you know, improving that can look like are shifting further left.

Like I mentioned with type theory, like any number of things, how can we reduce the error rate such that we're less bound by needing to review these? Or maybe it's just a matter of prioritization. Maybe we need to pick and choose which of these things we're even going to try to review.

And, you know, by shipping a third of those items, we get 90% of the value because we're doing a good job of picking the right ones. The ones that really are going to give us something. Last scenario, and the most dismal, is the infinite pile of garbage. In this scenario, coding assistance made us feel more productive, but ultimately just exploded tech debt and dug us into a hole that even the AI could not dig us out of.

The quality of our products gets worse over time instead of better. Ultimately, it's a world to hurt. We hope that this doesn't happen, of course, and there's reason to think it may already be starting to happen. This is, for example, a group called Uplevel, which did some investigation in which they did this controlled trial and they found that developers who had access to a coding assistant were putting out a significantly higher bug rate and not even having better throughput on their issues.

That's pretty dismal. GetClear also has a white paper, and they have a number of interesting things in that, one of which is that this was the first year, 2024, this past year, was the first year they've seen where the percentage of code that was copy-pasted exceeded that which was moved in a refactoring.

If you know your way around code quality, that's a big red flag. Lots of copy and paste creates a lot of risk. Some people could argue that with these AI tools, the practices, our intuition for what is a good thing or not may need to change, but that needs to be borne out.

You need to prove that. Unsurprisingly, GitHub actually had the opposite finding. They did not find that their co-pilot tool that they sell decreases quality. They found that it increases it. Now, they were doing a controlled experiment involving a fixed task. It was not like these other examples, you know, an example on real work code, but there's any number of factors that could impact this.

So they have a blog, does GitHub Copile improve code quality? Admittedly, much better graphic design. They say, yes, it does. So what do we make of this? I mean, these results are obviously, you know, contradictory and ambiguous, but, you know, even more importantly, they're all from sources that have something to sell.

You know, I have seen some more rigorous academic work start to come together. I'd love next year to be talking about not some white papers and a blog, but, you know, maybe a meta analysis of multiple different, you know, independent academic studies of what results we have in the wild with AI coding assistance.

Will we be able to do that, I mean, I hope so, but one way or another, you know, do these make things better under certain circumstances and worse under other circumstances? Does it matter how we use them? Does it matter what we use them on? These are all things we need to understand so that we know we're making the situation better and not worse.

People will say the models will get better, but the products we use them to build will get better only if we make them better. We need to be deliberate. Again, these are the scenarios. Extreme completion, devocalypse, abstraction leap, uncharted waters, review economy, and infinite pile of garbage. Now, these interrelate in various ways, right?

So, for example, by doing extreme completion, many fears, I just mentioned, that we will try to get to devocalypse, but actually we will overinvest in them before we're capable of dealing with their results, and we'll slide right past devocalypse and into the infinite pile of garbage. Some think, this is maybe the closest to my point of view, that a really promising area would be to combine abstraction leap with maybe domain-specific languages, combine that with the extreme completion.

We have a very nice DSL or, say, a formal methods-based, you know, specification system, and then we combine that with the extreme completion to, you know, help us navigate it ergonomically. So, this could work very well, so there are these multiple different endgames, and if someone believes in a different one as you, you might both be right, because many of them are going to play out.

The industries are vast. This is going to impact different areas very differently. We wouldn't expect the AI coding impact on, like, video game programming to be the same as in healthcare tech, right? And lastly, I want to leave us with this. We get a say. People have spoken about AI coding as though it is some meteor from outside the solar system just coming at us to hit us, and we're these passive observers.

We get a say on what happens. This is something that we are actively building together. So, I think we need to ask ourselves, what do we want from software? What is the goal? Do we want there to be no programmers? Or do we want everyone to be a programmer?

I don't think those are the same thing. Is it somewhere in between? Where? Do we want software to be better of higher quality? Or do we just need more software and we don't care about the quality? Again, not the same thing. As the skills required to do our jobs change, what do we want to happen?

Do we think it's a good idea to just let people go who have diligently learned what yesterday was the thing we needed? Or are we going to figure out how to continue to develop a valuable relationship with those people? I want many things from software. I want it to work well.

I want it to provide value for the people who use it and for the people who build it. And I want people who work together to treat each other well. Well, thank you very much for having me, and I look forward to your questions.