You're a #CTO. Your board asks: "What's our ROI on AI coding tools?" Your answer: "40% of our code is AI-generated!" They respond: "So what? Are we shipping faster? Are customers happier?" Most CTOs are measuring AI impact completely wrong. Here's what some are tracking: - Percentage of AI-generated code - Developer hours saved per week - Lines of code produced - AI tool adoption rates These metrics are like measuring how fast your assembly line workers attach parts while ignoring whether your cars actually start. Here's what you SHOULD measure instead: 1. Delivered business value 2. Customer cycle time 3. Development throughput 4. Quality and reliability 5. Total cost of delivery (not just development) 6. Team satisfaction Software development isn't a typing competition—it's a complex system. If AI makes your developers 30% faster but your deployment takes 2 weeks and QA adds another week, your customer delivery improves by maybe 7%. You've speed up the wrong part. The solution: A/B test your teams. Give half your teams AI tools, measure business outcomes over 2-3 release cycles. Track what customers actually experience, not how much developers produce. Companies that measure business impact from AI will pull ahead. Those measuring vanity metrics will wonder why their expensive tools aren't moving the needle. Stop measuring how much code AI generates. Start measuring how much faster you deliver value to customers. What are you actually measuring? And is it moving your business forward? -> Follow me for more about building great tech organizations at scale. More insights in my book "All Hands on Tech"
Developer Productivity Metrics
Explore top LinkedIn content from expert professionals.
-
-
Has Amazon cracked the code on developer productivity with its cost to serve software (CTS-SW) metric? Amazon applied its well-known "working backwards" methodology to developer productivity. "Working backwards" in this case starting with the outcome: concrete returns for the business. This is measured by looking at the rate of customer-facing changes delivered by developers, i.e. "what the team deems valuable enough to review, merge, deploy, and support for customers", in the words of the blog post by Jim Haughwout https://lnkd.in/eqvW5wbi . This metric is different from other measures of developer productivity which look only at velocity or time saved. Instead, "CTS-SW directly links investments in the developer experience to those outcomes by assessing how frequently we deliver new or better experiences. Some organizations fall into the anti-pattern of calculating minutes saved to measure value, but that approach isn’t customer-centered and doesn’t prove value creation." This aligns with Gartner's own research on developer productivity. In our 2024 Software Engineering survey, we asked what productivity metric organizations are using to measure their developers. We also asked about a basket of ten success metrics, including software usability, retention of top performers, and meeting security standards. This allowed us to find out which productivity metric was associated most with success. What we found in our survey was that *rate of customer-facing changes* is the metric most associated with success. Some other productivity metrics were actually *negative associated* with success. But *rate of customer-facing changes* is what organizations should focus on. Sadly, our survey found that few organizations (just 22%) use this metric. I presented this data at our #GartnerApps summit [and the next summit is coming up in September: https://lnkd.in/ey2kpc2 ] Every metrics gets gamed. So I always recommend "gaming the gaming". A developer might game the CTS-SW metric by focusing more on customer-facing changes. But... this is actually a good thing. You're gaming the gaming. We will be watching closely how this metric gets adopted alongside DORA, SPACE, and other metrics in the industry.
-
Stop chasing waterfall (and vanity metrics)! Forget vanity metrics and focus on 4 simple Flow Metrics. Vanity metrics like velocity or the number of commits or pull request reviews by developer, can do more harm than good. "What gets measured, gets managed" Which means, what gets measured gets gamed - and developers are some really smart people who quickly learn to game the system. Flow Metrics are in your system anyway and can help you create a better narrative around metrics. You are not measuring individual contributions. You are not comparing one team with another. You simply want to create a more stable and system - by improving the flow of work. Here are the 4 Flow Metrics: -> Work In Progress: The number of work items started but not finished. Too much WIP? Expect delays, context-switching, and all the madness that follows. ->Throughput: The number of work items finished per unit of time. Think of it as a speedometer for value delivery. -> Work Item Age: The amount of elapsed time between when a work item started and the current time. High values here? Work is probably waiting around longer than it’s getting done. A crucial measure for predictability. -> Cycle Time: The amount of elapsed time between when a work item started and when a work item finished. How long work takes from start to finish - gives you an idea to determine "when it will be done" Follow me for more tips on improving your ways of working!
-
My approach to developer productivity metrics has changed a lot in the last few years. I used to recommend that leaders go deep in the research — SPACE, DORA, DevEx — to come up with their own list of metrics that fits their leadership and business needs. But now we have more information about what’s actually working in the field, and my guidance has changed. I have a very clear answer about what to measure, and it's the DX Core 4 framework. DX Core 4 unifies SPACE, DORA, and DevEx, and gives you a prescriptive list of key metrics to track. 🔹 Robust: Four dimensions that hold each other in tension for a comprehensive view into performance. 🔹 Easy to deploy: Get a baseline in weeks, not months. 🔹 Balanced: Qualitative and quantitative data to tell you not just what’s going on, but why, so that you can improve. 🔹 Peer benchmarks: See industry 50th, 75th, and 90th percentile values, including segmentation for size, sector, and even mobile engineers. This framework is based on years of research and field experience from real companies using metrics in their day-to-day operations. This framework was developed by Abi Noda and me, with collaboration from the creators of DORA, SPACE, and DevEx, and feedback from experts and our incredible DX customers. Read more here: https://lnkd.in/dSbr8aAD
-
The best-performing software engineering teams measure both output and outcomes. Measuring only one often means underperforming in the other. While debates persist about which is more important, our research shows that measuring both is critical. Otherwise, you risk landing in Quadrant 2 (building the wrong things quickly) or Quadrant 3 (building the right things slowly and eventually getting outperformed by a competitor). As an organization grows and matures, this becomes even more critical. You can't rely on intuition, politics, or relationships—you need to stop "winging it" and start making data-driven decisions. How do you measure outcomes? Outcomes are the business results that come from building the right things. These can be measured using product feature prioritization frameworks. How do you measure output? Measuring output is challenging because traditional methods don’t accurately measure this: 1. Lines of Code: Encourages verbose or redundant code. 2. Number of Commits/PRs: Leads to artificially small commits or pull requests. 3. Story Points: Subjective and not comparable across teams; may inflate task estimates. 4. Surveys: Great for understanding team satisfaction but not for measuring output or productivity. 5. DORA Metrics: Measure DevOps performance, not productivity. Deployment sizes vary within & across teams, and these metrics can be easily gamed when used as productivity measures. Measuring how often you’re deploying is meaningless from a productivity perspective unless you’re also measuring _what_ is being deployed. We propose a different way of measuring software engineering output. Using an algorithmic model developed from research conducted at Stanford, we quantitatively assess software engineering productivity by evaluating the impact of commits on the software's functionality (ie. we measure output delivered). We connect to Git and quantify the impact of the source code in every commit. The algorithmic model generates a language-agnostic metric for evaluating & benchmarking individual developers, teams, and entire organizations. We're publishing several research papers on this, with the first pre-print released in September. Please leave a comment if you’d like to read it. Interested in leveraging this for your organization? Message me to learn more. #softwareengineering #softwaredevelopment #devops
-
The "10,000 lines of code per day with AI" claim perfectly demonstrates how AI productivity metrics have become completely detached from software development reality. Measuring developer productivity by lines of code generated reveals fundamental misunderstanding of what creates business value in software development. Quality, maintainability, and problem-solving matter more than output volume. I've worked with teams that generate thousands of AI-assisted code lines daily while struggling with basic debugging, architecture decisions, and requirement gathering. High-volume code generation often creates more technical debt than business value. The most productive developers I know focus on solving complex problems efficiently rather than maximizing code output. They write less code that accomplishes more, not more code that accomplishes less. AI tools can accelerate certain coding tasks, but they can't replace systematic thinking, domain expertise, or architectural planning. These cognitive capabilities determine software quality and business impact. Organizations obsessing over AI-enhanced code generation metrics typically miss fundamental software development challenges: unclear requirements, poor process design, and inadequate quality assurance. Better approach: measure AI impact on delivered business value rather than intermediate outputs like code volume or development speed. Sometimes the most valuable code is the code you don't need to write. #SoftwareDevelopment #AI #ProductivityMetrics #TechManagement #CodeQuality
-
Most CTOs can't answer this question: "Where are we actually spending our engineering hours?" And that's a $10M+ blind spot. I was talking to a CTO recently who thought his team was spending 80% of their time on new features. Reality: They were spending 45% of their time on new features and 55% on technical debt, bug fixes, and unplanned work. That's not a developer problem. That's a business problem. When you don't have visibility into how code quality impacts your engineering investment, you can't make strategic decisions about where to focus. Here's what engineering leaders are starting to track: → Investment Hours by Category: How much time goes to features vs. debt vs. maintenance → Change Failure Rate Impact: What percentage of deployments require immediate fixes → Cycle Time Trends: How code quality affects your ability to deliver features quickly → Developer Focus Time: How much uninterrupted time developers get for strategic work The teams that measure this stuff are making data-driven decisions about technical debt prioritization. Instead of arguing about whether to "slow down and fix things," they're showing exactly how much fixing specific quality issues will accelerate future delivery. Quality isn't the opposite of speed. Poor quality is what makes you slow. But you can only optimize what you can measure. What metrics do you use to connect code quality to business outcomes? #EngineeringIntelligence #InvestmentHours #TechnicalDebt #EngineeringMetrics
-
Most software development KPIs measure very short term output, NOT process efficiency or output stability. Velocity tells you how much work was completed. It doesn't tell you that 60% of cycle time was waiting or if the code sucks. If a space shuttle can get to outer space and fly 17,000 mph…but then it explodes, who cares how fast it was able to go? Leadership teams often optimize for metrics that measure output and which are easy to game while the most common underlying problems are process efficiency and progress sustainability. Low process efficiency means bottlenecks at handoffs between roles or teams and misalignment about quality standards and expectations. Code stability shows technical quality. Low code stability means insufficient testing, lack of alignment/clarity on requirements or technical debt. The velocity that matters should be how fast a team is moving forward with a stable product over months. Not how many feature tickets are done at the end of one sprint. That can be deceptive and lead to poor quality…and most dangerously, it can be like carbon monoxide…you don’t see it until too late. Here are 4 KPIs that measure project health instead of immediate output: 1) How often tickets are rejected from QA? 2) Quantity and quality of test coverage. Supporting logic that led to this choice. 3) The presence of instrumentation (user analytics, error monitoring, logging). 4) Quantity of new bugs in existing features after the deployment of new features
-
For decades, engineering teams have been measured by lines of code, commit counts, and PRs merged—but does more code actually mean more productivity? 🚀 Some of the best developers write LESS code, not more. 🚀 The fastest-moving teams focus on outcomes, not just output. 🚀 High commit counts can mean inefficiency, not impact. Recent research from DORA, GitHub, and real-world case studies from IT Revolution debunk the myth that developer activity = developer productivity. Here’s why: 🔹 DORA Research: After studying thousands of engineering teams, DORA (DevOps Research & Assessment) found that the best teams optimize for four key engineering performance metrics: ✅ Deployment Frequency → How often do we ship value to users? ✅ Lead Time for Changes → How fast can an idea go from code to production? ✅ Change Failure Rate → Are we improving quality, or just shipping fast? ✅ MTTR (Mean Time to Restore) → Can we recover quickly when things go wrong? → Notice what’s missing? Not a single metric is based on lines of code, commits, or individual developer output. 🔹 GitHub’s Data: GitHub found that developers working remotely during 2020 pushed more code than ever—but many felt less productive. Why? Longer workdays masked inefficiencies. More commits ≠ meaningful work; some were just fighting bad tooling or slow reviews. Teams that automated workflows (CI/CD, code reviews) merged PRs faster and felt more productive. 🔹 IT Revolution case studies: High-performing engineering orgs measure outcomes, not just outputs. The best teams: Shift from tracking commit counts → to measuring customer value. Use DORA metrics to improve DevOps flow, not micromanage engineers. View engineering productivity as a team effort, not an individual scoreboard. If you want a high-performing engineering org, don’t just push developers to write more code. Instead, ask: ✅ Are we shipping value faster? ✅ Are we reducing friction in our workflows? ✅ Are our developers able to focus on meaningful work? 🚨 The takeaway? Great engineering teams don’t write the most code—they deliver the most impact. 📢 What’s the worst “productivity metric” you’ve ever seen? Drop a comment below 👇 #DeveloperProductivity #SoftwareDevelopment #DORA #GitHub #EngineeringLeadership
-
How can you measure that your team is getting more productive when they use AI coding tools? I've seen leaders at big companies struggle to answer this question to non-technical executives when they have to justify ballooning software costs from Cursor, Claude Code, etc. driven by token-usage pricing models. Here are some metrics you can use to prove ROI. - 𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 𝗱𝗲𝗹𝗶𝘃𝗲𝗿𝘆 𝘀𝗽𝗲𝗲𝗱 typically measured through feature lead time (time from feature request to deployed in production) - 𝗦𝗵𝗶𝗽𝗽𝗶𝗻𝗴 𝗰𝗮𝗱𝗲𝗻𝗰𝗲 measured by code check-in frequency, time to merge pull requests and number of requests merged, and number of tickets closed in project management software like Jira - 𝗦𝗼𝗳𝘁𝘄𝗮𝗿𝗲 𝗾𝘂𝗮𝗹𝗶𝘁𝘆 measured by change failure rate and changes in # of production incidents - 𝗗𝗲𝘃𝗲𝗹𝗼𝗽𝗲𝗿 𝗿𝗲𝗽𝗼𝗿𝘁𝗲𝗱 𝘀𝗮𝘁𝗶𝘀𝗳𝗮𝗰𝘁𝗶𝗼𝗻 measured through periodic engineering surveys - 𝗗𝗢𝗥𝗔 𝗺𝗲𝘁𝗿𝗶𝗰𝘀 which are standardized performance indicators - 𝗧𝗼𝗸𝗲𝗻 𝗰𝗼𝗻𝘀𝘂𝗺𝗽𝘁𝗶𝗼𝗻 over a period of time. In other words, if you are seeing your team use more tokens in Claude Code or Cursor, then they are by definition using the tools more. This is of course noisy because you could burn a lot of tokens doing nothing productive and ship very little production code. - 𝗙𝗶𝗻𝗲-𝗴𝗿𝗮𝗶𝗻𝗲𝗱 𝗮𝘁𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻 𝗺𝗲𝗮𝘀𝘂𝗿𝗲𝘀 for lines of code (i.e. which one was generated by humans vs agents). Many coding platforms support this including Claude Code and Cursor and then there are also some open-source vendor-agnostic offerings. There’s no golden metric so I think the right way to measure ROI is a mix of metrics combining individual code attribution coupled with output-oriented measures like pull requests merged.
Explore categories
- Hospitality & Tourism
- Finance
- Soft Skills & Emotional Intelligence
- Project Management
- Education
- Technology
- Leadership
- Ecommerce
- User Experience
- Recruitment & HR
- Customer Experience
- Real Estate
- Marketing
- Sales
- Retail & Merchandising
- Science
- Supply Chain Management
- Future Of Work
- Consulting
- Writing
- Economics
- Artificial Intelligence
- Employee Experience
- Healthcare
- Workplace Trends
- Fundraising
- Networking
- Corporate Social Responsibility
- Negotiation
- Communication
- Engineering
- Career
- Business Strategy
- Change Management
- Organizational Culture
- Design
- Innovation
- Event Planning
- Training & Development