30% Less is More: GitClear Releases Research on 12,638 Pull Requests

GitClear utilizes the largest known set of pull requests to investigate the extent to which the lines labeled "changes" by GitHub are in fact substantively changed. The company finds that the Myers diff algorithm (released 1986) used by GitHub and others results in 25-30% more code lines to review, relative to a more modern code diffing tool.

30% Less is More: GitClear Releases Research on 12,638 Pull Requests
Seattle, WA, June 14, 2024 --(PR.com)-- GitClear has released a comprehensive 2024 pull request research paper that analyzes 12,638 pull requests in the effort to enumerate the extent to which modern code review practices subject software engineers to unnecessary work. The research finds it possible to reduce "changed lines pending review" by 30% through using more precise diff analysis than the Myers algorithm (used by GitHub, Bitbucket, GitKraken, GitLab and Azure Devops) offers.

"Pull request review consumes 2-5 hours per week for the median developer in 2024," says Bill Harding, GitClear CEO and among paper's researchers. "But the tools that are being used to review code are all based off a 1986 algorithm that is insufficient to identify many types of no-op changes."

Harding continues, "For this research, we wanted to quantify the extent to which this outdated diff algorithm -- still utilized by all the major git platforms, almost 40 years after it was released -- fails to recognize common idioms like 'moved' or 'updated' lines. Our research suggests that developers could be spending 25-30% less time reviewing code if they adopted a more modern diffing algorithm, such as the Commit Cruncher algorithm GitClear provides." The implication of these findings is that developers could reduce code review time by around 60 minutes per week by using a more modern diff tool than Myers.

The pull requests research builds upon previous research first party software development research that GitClear has released, including "Coding on Copilot: 2023 Data Suggests Downward Pressure on Code Quality," which was released in January 2024 and cited by Stack Overflow, Geekwire, Lee Atchison, and more than 10 other media outlets. The unprecedented size of GitClear's code change database allows the company to publish research with much larger datasets (i.e., hundreds of millions of changed lines) than are available to academic researchers.
William Harding
Email is best contact method