79182297

Date: 2024-11-12 18:22:42
Score: 1
Natty:
Report link

Note: 🫵 You can help. This issue has been filed as Gerrit bug 40015217. Please +1 it and upvote this question to help it get the attention it deserves.

Cause

File rename operations in a git commit are identified by computing the similarity (percentage of lines unchanged) of pairs of files in the commit—one that was deleted and one that was added. If the similarity exceeds some threshold, the file deletion and file addition together are considered a file rename instead. Git’s default similarity threshold for rename detection is 50%. This is clearly documented and Google knows it well.

Gerrit uses JGit, and for some reason its similarity threshold for rename detection is 60% and has been since at least 2010 (commit 978535b). What’s more, the threshold isn’t customizable. jgit diff has a -M for detecting renames, but it doesn’t accept a custom threshold.

Effects

Here are some of the problems that Gerrit users face as a result of Gerrit using a different rename detection threshold than git:

Workarounds

Authors can usually work around this issue by splitting a file rename + edits into multiple commits. In some compiled languages like Java, file renames usually require some edits (such as to the package line and/or class name) for the file to continue compiling. The minimum amount of edits required to appease the compiler usually keep a file’s similarity index over 60% though. Other edits can come in separate (prior or follow-up) commits.

Authors may not realize, however, that Gerrit won’t properly identify their file rename until after they’ve prepared the commit, written a commit title and description, and uploaded it for review. Their local git installation and git tools (such as IntelliJ’s git integration, for example) properly identified the file rename before they uploaded the commit to Gerrit. At this point, reworking the commit to work around a Gerrit limitation may have significant time cost, since the commit may have several more on top of it, and all of them might need to be rebased and have acceptance tests run again as a result. In short, the workaround may not always be quick and low-cost.

Other reports

This issue has been reported and discussed in several other places.

Possible solutions

  1. JGit could be updated to use git’s default similarity threshold for rename detection (50%).
  2. JGit could be updated to make the similarity threshold for rename detection configurable, and Gerrit could set it to git’s default value (50%).
  3. JGit could be updated to make the similarity threshold for rename detection configurable, and Gerrit could create a setting for what threshold to use for the installation or for each repository.
  4. Gerrit could switch to using git rather than JGit for diffs.

Insights from other products

It seems that BitBucket (or Stash) recently made the copy/rename similarity thresholds configurable (BSERV-3249). They use git rather than JGit.

IntelliJ/IDEA (a Java application) once tried using JGit for its git plugin, but concluded in this IJPL-88784 comment (emphasis added):

My comments given 18 months ago are still valid. JGit is still below Git. Moreover, you might be surprised, but we actually gave JGit a try: we used it in IDEA 11.X and 12.0 for HTTP remote operations. Our users got a lot of problems that were not easy to fix or even to reproduce, so we've rolled back to native Git. If other projects are happy with JGit, that's their funeral, we have our own vision on the subject.

So it is not because we are lazy to rewrite the plugin. We just don't want to fix issues for the JGit team, and on the other hand we can't say users "blame JGit" if they experience problems with IDEA which they don't have in the command line: they will still blame IDEA, because they don't care which library do we use inside.

Reasons:
  • Blacklisted phrase (0.5): upvote
  • RegEx Blacklisted phrase (3): You can help
  • Long answer (-1):
  • Has code block (-0.5):
  • High reputation (-1):
Posted by: jaredjacobs