[GH-ISSUE #3845] On squashing commits and git workflows #2423

Open
opened 2026-05-05 09:05:53 -06:00 by gitea-mirror · 0 comments
Owner

Originally created by @kmk3 on GitHub (Dec 27, 2020).
Original GitHub issue: https://github.com/netblue30/firejail/issues/3845

Recently I opened two Pull Requests which were integrated using GitHub's
"squash and merge" method. This causes GitHub to replace the author date
(i.e.: when the changes were made) of the original commit with "the date that
the PR was merged" (which is what the commit date is for) and to integrate the
resulting commit. I find historical value in preserving the original commit
date, as the registry of when the patch was made and it can also be considered
part of its context. Thus, replacing it with an unrelated value pretending to
be the actual author date bothers me. As an extreme example, imagine merging a
PR this way a year after it was opened. Not only would the date be way off
than what was intended, but also what made sense at the time of change might
not make as much sense at a later time (e.g.: using functions which later
became deprecated). Moreover, note that this is a GitHub-specific deficiency;
git merge --squash at least puts the metadata in the commit message. But
still, ideally the author would be the one to squash their own commits.

Further, when a branch has more than 1 commit, there are additional problems
with the method:

  • GitHub mangles the commit subject (uses PR title)
  • Commits that were once a single logical change each become a big blob of
    changes

(This hasn't happened to any of my PRs in this project, but since it's not
documented and since it's been done to single commits, I assume that it can
happen to larger branches as well)

I see that squash-merge has been used in a few PRs as a quick way to turn many
amend commits into a singular one (like a web-based git rebase). Which is
fine when used in your own branches (and I can understand why it would be done
if a PR is opened with many amend commits), but doing it to someone else's
branch without notice would be very rude, as in my case you'd be integrating
under my name a patch other than the ones I've sent, as if I had written and
submitted that tweaked version myself.

I personally try to write "good" commits (e.g.: clarify context, ensure 1
commit per logical change), so that 6 months later I can quickly understand the
what/why by just reading each commit locally, without depending on external
sources (and possibly refer to it when I need to do something similar). That
usually entails many local rebases before sending the patches, so if such
branch-squashing happened to me, it would be a punch to the face.

With that said, I ask three things:

  1. Ask for changes instead of directly changing someone else's commits (yes, I
    know it's tempting to simply do that)
  2. Use "squash and merge" only after the author(s) agreed to it
  3. Document on e.g.: CONTRIBUTING.md that "squash and merge" is part of the
    workflow so that whoever sends patches knows what to expect

Perhaps the latter 2 could be simplified to adding an "I agree to have all of
my commits squashed" checkbox on PULL_REQUEST_TEMPLATE.md. Either way, being
upfront about it would be an improvement.


Now, about the merge methods; I noticed that their usage varies in this
repository:

$ git show -q --pretty='%h %s' master
039e77e7 remove trailing whitespaces
# Total commits
$ git log --oneline | wc -l
6701
# Normal commits+
$ git log --oneline --no-merges | grep -vE '\(#[0-9]+\)$' | wc -l
5231
# Merge commits
$ git log --oneline --merges | wc -l
1040
# Squashed commits+
$ git log --oneline --committer='GitHub' | grep -E '\(#[0-9]+\)$' | wc -l
316

Again, it's not documented when/why each method is used, so I can't tell for
sure, but when glancing at the graph with tig, it seems to depend on whoever
is doing the merge.

Anyways, I have recently done some testing on the way GitHub handles these
methods and these are the main differences that I've noticed:

Advantages:

  • Merge (git): Registers committer, keeps original commits
  • Merge (GitHub): Registers committer, keeps original commits, registers PR number
  • Rebase-merge (git): Linear git history, registers committer
  • Rebase-merge (GitHub): Linear git history
  • Squash-merge (git): Linear git history, registers committer
  • Squash-merge (GitHub): Linear git history, registers PR number

Disadvantages:

  • Merge (git): Non-linear history, no PR number
  • Merge (GitHub): Non-linear history
  • Rebase-merge (any): Changes history++, no PR number
  • Squash-merge (git): Changes history
  • Squash-merge (GitHub): Changes history, mangles title, author name and author
    date

Personally, I usually try to stick with either merge or rebase-merge for a
given integration branch (e.g.: master), to make it easier to reason about the
log (e.g.: with tig) and also for scripting, though the integration workflow
is simple enough here that neither matter too much. Following that, I see that
many small PRs are routinely integrated, so using merge everywhere could turn
the graph into spaghetti, but I can't say for sure looking at the current
version. Plus, there are some interesting discussions which happen on the PR
itself rather than on issues, so using rebase-merge for everything would make
these exchanges less discoverable. A remedy for that would be to add the PR
number manually to the commit message when desired and re-push, but that's more
brittle and might be too much added friction, considering the current
workflows.

Thus, if I had to come up with an alternative, I would suggest using "rebase
and merge" for trivial changes which have little to no discussion/no important
information on the PR and normal merges for everything else, which is a
workflow that does not divert much from what is currently being done. Or try
using just merges for e.g.: two weeks and then see what the graph looks like.

TL;DR: Please don't squash my commits; use merge or rebase instead.


+May miss entries for "Normal commits" and contain false positives for
"Squashed commits" if/when the commit was made in the web UI and is referencing
an issue rather than a PR.

++By "changes history" I mean that the original commits get turned into new
commits, with new SHA-1 checksums.

Originally created by @kmk3 on GitHub (Dec 27, 2020). Original GitHub issue: https://github.com/netblue30/firejail/issues/3845 Recently I opened two Pull Requests which were integrated using GitHub's "squash and merge" method. This causes GitHub to replace the author date (i.e.: when the changes were made) of the original commit with "the date that the PR was merged" (which is what the commit date is for) and to integrate the resulting commit. I find historical value in preserving the original commit date, as the registry of when the patch was made and it can also be considered part of its context. Thus, replacing it with an unrelated value pretending to be the actual author date bothers me. As an extreme example, imagine merging a PR this way a year after it was opened. Not only would the date be way off than what was intended, but also what made sense at the time of change might not make as much sense at a later time (e.g.: using functions which later became deprecated). Moreover, note that this is a GitHub-specific deficiency; `git merge --squash` at least puts the metadata in the commit message. But still, ideally the author would be the one to squash their own commits. Further, when a branch has more than 1 commit, there are additional problems with the method: * GitHub mangles the commit subject (uses PR title) * Commits that were once a single logical change each become a big blob of changes (This hasn't happened to any of my PRs in this project, but since it's not documented and since it's been done to single commits, I assume that it can happen to larger branches as well) I see that squash-merge has been used in a few PRs as a quick way to turn many amend commits into a singular one (like a web-based `git rebase`). Which is fine when used in your own branches (and I can understand why it would be done if a PR is opened with many amend commits), but doing it to someone else's branch without notice would be very rude, as in my case you'd be integrating under my name a patch other than the ones I've sent, as if I had written and submitted that tweaked version myself. I personally try to write "[good][1]" commits (e.g.: clarify context, ensure 1 commit per logical change), so that 6 months later I can quickly understand the what/why by just reading each commit locally, without depending on external sources (and possibly refer to it when I need to do something similar). That usually entails many local rebases before sending the patches, so if such branch-squashing happened to me, it would be a punch to the face. With that said, I ask three things: 1. Ask for changes instead of directly changing someone else's commits (yes, I know it's tempting to simply do that) 2. Use "squash and merge" only after the author(s) agreed to it 3. Document on e.g.: `CONTRIBUTING.md` that "squash and merge" is part of the workflow so that whoever sends patches knows what to expect Perhaps the latter 2 could be simplified to adding an "I agree to have all of my commits squashed" checkbox on `PULL_REQUEST_TEMPLATE.md`. Either way, being upfront about it would be an improvement. --- Now, about the merge methods; I noticed that their usage varies in this repository: ```console $ git show -q --pretty='%h %s' master 039e77e7 remove trailing whitespaces # Total commits $ git log --oneline | wc -l 6701 # Normal commits+ $ git log --oneline --no-merges | grep -vE '\(#[0-9]+\)$' | wc -l 5231 # Merge commits $ git log --oneline --merges | wc -l 1040 # Squashed commits+ $ git log --oneline --committer='GitHub' | grep -E '\(#[0-9]+\)$' | wc -l 316 ``` Again, it's not documented when/why each method is used, so I can't tell for sure, but when glancing at the graph with `tig`, it seems to depend on whoever is doing the merge. Anyways, I have recently done some testing on the way GitHub handles these methods and these are the main differences that I've noticed: Advantages: * Merge (git): Registers committer, keeps original commits * Merge (GitHub): Registers committer, keeps original commits, registers PR number * Rebase-merge (git): Linear git history, registers committer * Rebase-merge (GitHub): Linear git history * Squash-merge (git): Linear git history, registers committer * Squash-merge (GitHub): Linear git history, registers PR number Disadvantages: * Merge (git): Non-linear history, no PR number * Merge (GitHub): Non-linear history * Rebase-merge (any): Changes history++, no PR number * Squash-merge (git): Changes history * Squash-merge (GitHub): Changes history, mangles title, author name and author date Personally, I usually try to stick with either merge or rebase-merge for a given integration branch (e.g.: master), to make it easier to reason about the log (e.g.: with `tig`) and also for scripting, though the integration workflow is simple enough here that neither matter too much. Following that, I see that many small PRs are routinely integrated, so using merge everywhere _could_ turn the graph into spaghetti, but I can't say for sure looking at the current version. Plus, there are some interesting discussions which happen on the PR itself rather than on issues, so using rebase-merge for everything would make these exchanges less discoverable. A remedy for that would be to add the PR number manually to the commit message when desired and re-push, but that's more brittle and might be too much added friction, considering the current workflows. Thus, if I had to come up with an alternative, I would suggest using "rebase and merge" for trivial changes which have little to no discussion/no important information on the PR and normal merges for everything else, which is a workflow that does not divert much from what is currently being done. Or try using just merges for e.g.: two weeks and then see what the graph looks like. TL;DR: Please don't squash my commits; use merge or rebase instead. --- +May miss entries for "Normal commits" and contain false positives for "Squashed commits" if/when the commit was made in the web UI and is referencing an issue rather than a PR. ++By "changes history" I mean that the original commits get turned into new commits, with new SHA-1 checksums. [1]: https://chris.beams.io/posts/git-commit/
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: github-starred/firejail#2423
No description provided.