Deepdive: How 10 tech companies choose the next generation of dev tools
👋 Hi, this is Gergely with a subscriber-only issue of the Pragmatic Engineer Newsletter. In every issue, I cover challenges at Big Tech and startups through the lens of engineering managers and senior engineers. If you’ve been forwarded this email, you can subscribe here. Deepdive: How 10 tech companies choose the next generation of dev toolsTech businesses from seed-stage startups to publicly-listed companies reveal how they select and roll out next-generation IDEs, CLIs and code review tools. And how they learn which ones work… and whicRight now, it seems like almost every tech company is changing its developer tooling stack, which is a big shift from eighteen months ago when the answer to “what to use for AI-assisted coding?” was simple: buy a GitHub Copilot license and boot up ChatGPT. In our AI tooling survey in 2024, those two tools racked up more mentions than all the others combined. But no more. Today, a plethora of tools outpace Copilot in various ways, like Cursor, Claude Code, Codex, and Gemini CLI, and there’s also AI code review tools like CodeRabbit, Graphite, and Greptile, not to mention all the MCP integrations which plug into agentic tools. So, for this deepdive I asked 10 tech companies which tools their engineers use and, crucially, how they made their choices from among all the options. These businesses range from a 5-person seed-stage startup, to one that employs 1,500 people and is publicly listed. All are anonymous, except for Wealthsimple and WeTravel. WeTravel has also kindly shared the most detailed measurement framework I’ve yet seen. We cover:
The goal of this article is to showcase what tech companies of different sizes are doing, and to offer a few pointers on measuring and comparing the tools. It’s hard to do, but not impossible, as two in-depth case studies illustrate, below. Don’t forget, what matters is to find tools that work for your team. During this research, I found vendors which are beloved by one company and loathed in other workplaces. There’s no single vendor that’s universally rated by every team in all contexts. As always, I have no affiliation with any vendor mentioned in this article, and was not paid to mention any of them. I used to be an investor in Graphite, but no longer am. For more details, see my ethics statement. The bottom of this article could be cut off in some email clients. Read the full article uninterrupted, online. 1. Speed, trust, & show-and-tell: how small teams select toolsDecisions are informal and made quickly at the smallest businesses in our survey, with the decisive factor being how people feel about the tools. Trial periods are short, at around two weeks, and individual developers have outsized influence on whether a tool is adopted or binned, with the decisions spreading organically. Below are two examples: Seed-stage logistics startup (20 people, 5 engineers)The head of engineering at this startup describes their approach as high-trust and developer-led:
Developers there suggest which tools to try and decide whether to keep using them or to seek alternatives. For AI code reviews, the team first tried Korbit for around a week but the tool felt “off”, so they roadtested CodeRabbit which “stuck” within a few days:
And that was that: decision made. As a small team, it’s easy to switch to something better and it only takes a single engineer to suggest it. The broader tooling stack of this startup has evolved quickly over the last year:
“Show and tells” – where team members show colleagues their tooling setups during weekly team meetings and demos – are used by this startup to identify which tools do or don’t work:
The team makes a clear distinction between company-wide tools like Claude and CodeRabbit that everyone is expected to use, and devs’ personal environments (IDE choice, terminal setup), over which individuals have full autonomy. By now, almost everyone has migrated to Claude Code, but six months ago the team was evenly split between Cursor and Claude Code. The head of engineering said:
Series A startup (30 people, 15 engineers)A staff engineer at this company says the team is split on Cursor versus Claude Code, with the latter gaining momentum. He also says code reviews cause headaches:
They evaluated three code review tools: Cursor’s Bugbot (okay but not great), Graphite (not good), and Greptile (good). They’re now trialing Greptile for PR approvals, taking advantage of its confidence-scoring feature. What works really well for this team is maintaining extensive Agents.md and Claude.md files, which are very handy because they’re used by:
These two files help maintain a single source of truth for coding-style guidance across the toolchain. There’s praise for Cursor’s integration with Linear and Slack from a staff engineer:
Series D observability company (150 people, 60 engineers)The director of open source at this place summarizes what’s stuck there:
An interesting signal at this company is that non-engineers have jumped onto Claude Code. Product managers, solutions engineers, and technical account managers alike are using it more than the median engineer, and they’re handling customer bug reports by opening Claude Code PRs directly:
2: How mid-to-large companies chooseAt companies with 150+ engineers, it’s not about how a tool “feels”. Instead, existing vendor relationships may be decisive, and there’s often pressure from the C-level (leadership team), as well as security and compliance matters to address. There’s also the new challenge of how to coordinate tooling rollouts across several departments and potentially hundreds of engineers. This is where a decisive CTO can cut through red tape to achieve faster adoption. Our first case study covers how one fintech business did precisely that. EU-based software company (500 people, 150 engineers)This place’s experience is a cautionary tale of what can happen when a leadership moves on AI tooling without a plan for what comes next. A senior engineer there says:
But it wasn’t, as the Copilot rollout was immediately met with questions about alternatives:
They got “stuck” and unable to approve any new tools for six months. The attempt to create a formal approval process is stalled, thanks to legal and IT being gridlocked, with the European Union’s AI Act causing concerns and governance questions:
Meanwhile, their default Copilot setup uses GPT 4.1, a 10-month-old model. Many developers there don’t know if they can change the model or use coding agents. This creates a vicious cycle where the tool feels underwhelming, which suppresses adoption and makes it harder to justify further investment in better options. Cloud infrastructure company (900 people, 300 engineers)A principal engineer responsible for AI tooling at this company describes the constant push-pull between developer enthusiasm and executive scrutiny:
The answer to this also came from the exec team: pricing. Execs simply did not want to invest in the tools, and pricing remains a persistent headache. Claude’s team plan is ~$150/month, Cursor’s is ~$65, and this company’s C-level was not comfortable with going from Copilot’s $40/month to Cursor’s $65/month. The principal engineer also worries that costs will keep mounting, even with approval to move to Claude Code’s $150/month:
Public travel company (1,500 people, 800 engineers)A staff engineer at this business highlighted vendor lock-in as a primary concern:
They rolled out GitHub Copilot last year and are now evaluating Claude Code as a replacement. They remain cautious, given that the per-engineer cost is steep with Claude. Public tech company (2,000 people, 700 engineers, productivity space)The engineering leader in charge of dev productivity at this business calls security the biggest challenge:
Unsurprisingly, the tooling selection process is more in-depth at companies of this size, with many vendors as options. Here’s how they go about things:
Evaluation is more organized and beta trials are common, he says:
3. Measurement problem: metrics are needed but none workIf there’s one theme that unites every company in this deepdive regardless of size, it’s the struggle to measure whether AI tools actually work. Execs want data, but engineers distrust the data that exists. Meanwhile, vendors’ own metrics are mostly useless. Among our research sample, the EU-based software company debated options and only found bad or worse ones:
There’s also the point that some of the most valuable uses of AI don’t lie in the writing of code, but in research, generation of ideas, debugging, etc, which makes measuring code generated by AI tools a dead end. Meanwhile, vibe-coded scripts and tools that never hit production can feel like real productivity breakthroughs. In the end, this company chose lines of code generated by Copilot as the “official” metric, which met a predictable response:
The principal engineer at the 900-person cloud infrastructure company was more blunt:
This principal engineer dismissed developer-productivity vendors’ own measurement approaches:
A fundamental question remains unanswered, says the principal engineer:
4. How Wealthsimple measured and decidedWealthsimple is a Canadian fintech company, employing about 1,500 people, around 600 of whom are engineers. I talked with CTO Diederik van Liere about how they choose AI code review and AI coding tools. For AI code review tools they run a thorough measurement process, and for AI coding tools it was more of a push from Diederik. He shared exclusive details on their exact measurement process, and how they landed on Graphite for code review and Claude Code for coding: Choosing an AI code review tool via a “shootout” process...Subscribe to The Pragmatic Engineer to unlock the rest.Become a paying subscriber of The Pragmatic Engineer to get access to this post and other subscriber-only content. A subscription gets you:
|

Comments
Post a Comment