Microsoft is dogfooding AI dev tools’ future
👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’ve been forwarded this email, you can subscribe here. Microsoft is dogfooding AI dev tools’ futureImpressions from a week in Seattle, at Microsoft’s annual developer conference. Microsoft is eating its own dogfood with Copilot – and it’s not tasty
Last week, I was in Seattle and stopped by at Microsoft’s annual developer conference, BUILD. I spent a bunch of time with GitHub CEO, Thomas Dohmke, interviewed Scott Guthrie, who leads Microsoft’s Cloud+AI group, and chatted with Jay Parikh, who heads up a new org called CoreAI. Interviews with Thomas and Scott will be published as podcasts soon. I also talked extensively with developers at BUILD, and with engineering leaders at tech companies in the city. Today’s issue is a roundup of everything I heard and learned. We cover:
I attended BUILD on my own dime; neither Microsoft nor any other vendor paid for my travel, accommodation, and other expenses (they did offer). More in my ethics statement. The bottom of this article could be cut off in some email clients. Read the full article uninterrupted, online. 1. AI, AI, AIThe only theme of Microsoft's event was AI. Indeed, I can barely think of a major announcement from the tech giant that was not GenAI-related. Even when Microsoft unveiled improved security infrastructure, it was for AI applications. The one exception was the open sourcing of the Windows Subsystem for Linux (WSL). Some notable announcements:
It was my first visit to BUILD, and I talked with engineers who have been coming for years. They felt this was the first BUILD at which a single topic – in this case, AI – has dominated everything, and I heard a few complaints about the narrow focus. These reactions reminded me of Amazon’s annual event last year, which caused similar grumbles. In The Pulse 101, AWS Serverless Hero Luc van Donkersgoed said:
I think I now better understand why Microsoft and other Big Tech giants are going so big on AI. There was nothing but AI positivity from every Microsoft presenter, but nobody mentioned trade-offs and current shortcomings, like hallucinations, and getting stuck on tasks. But these folks are very much aware of the current limitations and developers’ frustrations, so why the blanket optimism? Basically, there’s little downside in being over-optimistic about the impact of AI, but there’s a major downside in being a little too pessimistic. If optimism is overstated, then people just spent too much time and effort on tools that get used less than predicted. However, if there’s too much pessimism and it leads to inaction, then startups coming from nowhere can become billion-dollar businesses and rivals. This is exactly what happened with Google and Transformers: the search giant invented Transformers in 2017, but didn’t really look into how to turn it into products. So, a small startup called OpenAI created ChatGPT, and now has more than 500 million weekly active users, and a business worth $300B. Google is playing catchup. It was a costly mistake to not be optimistic enough about the impact of Transformers and AI! My hunch is that Microsoft does not want to make the same error with developer tools. On that basis, take projections from the company’s leadership with a pinch of salt, and know that being optimistic is just what they do, but that there’s also a sensible rationale for this approach. 2. Copilot as “peer programmer”In the past year, many startups have made bold promises of building tools to replace developers, such as Devin, the world’s first “AI engineer”, Magic.dev, which aims to build a “superhuman software engineer”, and Google, whose chief scientist, Jeff Dean, recently shared that he expects AI systems to operate at the level of a “junior engineer” by 2026. In all cases, the message to business decision makers seems clear: spend money on AI and you won’t have to hire engineers because the AI will be just as good, if not better, than devs. I previously analyzed how startups like Magic.dev and Devin might have seen no other option for raising funding than to make bold claims, since GitHub Copilot has already won the “copilot” category. These startups resorted to marketing stunts to claim they have (or will have) a product that’s equal to human developers. In contrast, Microsoft is not talking about replacing developers. It says GitHub Copilot – and the latest version of the product called “Coding agent” – are tools to partner with developers, akin to a peer programmer. At BUILD, Microsoft showed a more advanced demo of how their latest tools work across the development lifecycle – from planning, all the way to deployment and oncall – than I’ve seen from anyone else. 3. AI agents for day-to-day workAt BUILD, Microsoft stressed that all its demos were real, live, and used the tools that all developers can access immediately. This felt like a stab at Apple, which faked its demo for Apple Intelligence, and possibly a reference to Devin, whose claim that its “AI Engineer” could complete real tasks on Upwork wasn’t correct. I also take it as a sign that Microsoft understands that people can be tricked once with puffed-up demos, but that there’s a longer term cost, and the company would rather do something blander which works, than to set unrealisable expectations. The live demo featured an imaginary project to put together an events page for the conference. It was not too ambitious or complex, and showed off that Copilot is integrated into far more places than before. Jay Parthik (EVP of CoreAI at Microsoft), and Jessica Hawk (CVP of Data, AI, and Digital Apps at Microsoft) worked together to simulate a mini dev team, whose third “member” was Copilot. The demo showed examples of the type of collaboration devs can do with the preview version of GitHub Copilot in Visual Studio: Information gatheringJay acted as a new joiner to the team, and asked Copilot for context on a new codebase. This is a pretty basic GenAI use case, which also seems a useful one. Create a PRJay asked the agent to improve the Readme of the project, and the agent responded by opening a PR based on this prompt. Important context that the GitHub UI clearly shows that the agent is working on behalf of a developer:
I want to highlight the wording used by GitHub because it’s significant: Unlike many startups that treat AI agents as “autonomous” tools that do work on their own, with human devs left to clean up the mistakes, Copilot makes some things clear:
Assign a ticketYou can assign a ticket to the Copilot Agent, and the agent starts work asynchronously. The outcome is usually a PR that needs to be reviewed.
Follow written team coding guidelinesThe coding agent follows coding guidelines defined in copilot-instructions.md. This is a very similar approach to what Cursor does with Cursor.md: it’s a way to add “persistent” context for the agent to use before runs.
Connect to Figma and other tools using MCPMicrosoft added support for the Model Context Protocol (MCP), which allows adding tools like GitHub, Figma, or databases to both the IDE (VS Code), or for the agent to use. This means the agent can be pointed to a Figma file, told to implement it using e.g. React, and it does this:
Draft commit messagesCommit messages usually summarize your work, and AIs are pretty good at summarizing things. It’s no surprise Copilot does well at summarizing the changes into a commit message. As a developer, you can change this, but it gives a good enough draft, and saves a lot of effort. Act on code review feedbackWhen the agent submits a pull request, it will not act on comments added to it. This is not too surprising as the commit message will be yet another input for the agent, but Microsoft did a nice job in integrating it into the workflow, and showing how these agents can use more tools. For example, here’s a comment instructing the agent to add an architecture diagram about the structure of the code, using a specific library (Mermaid.js): Upon adding the comment, the agent picks it up immediately and kicks off work, asynchronously: And it updates the PR by adding a diagram, in this case. We can imagine how this works similarly with other comments and change requests: Principles of using agentsFrom this presentation, there are signs of how Microsoft sees Copilot being used:
All this makes sense. My only addition would be to not forget the obvious fact that this “sidekick” is not human, and does not “learn” from interactions. I keep making this error when I use agentic mode in Windsurf or Cursor: even after solving a problem with the tool, like having it query a database table after giving it guidance, the tool “forgets” all this detail. These tools have limited context, and unlike humans who (usually) learn from previous actions, this is not currently true of AI agents. Demos that feel artificialThere are many things I like about what Microsoft showed on stage:
However, the demo itself was underwhelming by a tech company of Microsoft’s stature. Also, the demo felt a bit unrealistic while it showcased the agent’s functionality. On one hand, this is the nature of demos, right? But on the other, they omit the reality of working on a larger, often messy codebase, within a larger team. To be fair, these kinds of demos would have been impossible even a year ago, simply because AI models were less capable. Microsoft did a good job of integrating the agent with a sensible UI, and using increasingly better models to show that agents can take on simpler work and deliver acceptable results. I want to stress that the output I observed was okay, but I couldn’t call it standout. What Microsoft demoed can be one-shotted these days with a single prompt by tools like Lovable, bolt.new, Vercel v0, or similar. To its credit, Microsoft did real demos, but used overly-simple scenarios. The project was a web-based one – the environment in which LLMs are strongest – all the tickets were very well-defined, the codebase was simple – and the demo was well rehearsed. 4. Dogfood that’s not tasty: Copilot Agent fumbles in real world with .NETTo its credit, Microsoft is dogfooding Copilot themselves, and they are doing this in the open. The .NET team has been experimenting with using Copilot Agent for this complex and large project. This codebase is not easy to work on, even as an experienced engineer, and Copilot lags behind the capabilities of experienced engineers – heck, sometimes even junior ones!... Subscribe to The Pragmatic Engineer to unlock the rest.Become a paying subscriber of The Pragmatic Engineer to get access to this post and other subscriber-only content. A subscription gets you:
|
Comments
Post a Comment