It's often necessary to find not just which commit changed a certain keyword, but also the specific file, whether the keyword was added or removed, the actual line of text, and details like the commit author and date. Standard git log -S
or git log -G
can find commits, but getting this detailed, line-specific output requires a bit more work.
This tutorial provides a shell script that does exactly that.
You want to search your entire Git history for commits where a specific keyword appears in the added or removed lines of a file's diff. For each match, you need to see:
The full Commit ID
The Commit Date (in YYYY-MM-DD format)
The Commit Author's Name
The path to the modified file
Whether the line containing the keyword was an [ADDITION]
or [DELETION]
The actual text of the line containing the keyword
This script iterates through your Git history, inspects diffs, and formats the output as described.
Bash
#!/bin/sh
# 1. Check if an argument (the keyword) was passed to the script
if [ "$#" -ne 1 ]; then
echo "Usage: $0 <keyword>"
echo "Error: Please provide a keyword to search for."
exit 1
fi
# Use the first argument from the command line as the keyword
KEYWORD="$1"
# The Grep pattern:
# ^[+-] : Line starting with '+' (addition) or '-' (deletion).
# .* : Followed by any character (can be empty).
# $KEYWORD : The keyword itself (as a substring).
GREP_PATTERN='^[+-].*'"$KEYWORD"
echo "Searching for commits containing '$KEYWORD' in diffs..."
# 2. Find commits where the keyword appears in ANY modification of the commit.
# git log -G uses the KEYWORD as a regex.
git log --all --pretty="format:%H" -G"$KEYWORD" | while IFS= read -r commit_id; do
# Get the author and date for this commit_id
# %an = author name
# %ad = author date. --date=short gives a format YYYY-MM-DD.
commit_author_name=$(git show -s --format="%an" "$commit_id")
commit_author_date=$(git show -s --format="%ad" --date=short "$commit_id")
# 3. For each found commit, list the files that have been modified.
git diff-tree --no-commit-id --name-only -r "$commit_id" | while IFS= read -r file_path; do
# Ensure file_path is not an empty string.
if [ -n "$file_path" ]; then
# 4. Get the diff for THIS specific file IN THIS specific commit.
# Then, `grep` (with -E for extended regex) searches for the keyword
# in the added/deleted lines.
git show --pretty="format:" --unified=0 "$commit_id" -- "$file_path" | \
grep --color=never -E "$GREP_PATTERN" | \
while IFS= read -r matched_line; do
# 5. For each corresponding line, determine the type (ADDITION/DELETION)
# and extract the text of the line.
change_char=$(echo "$matched_line" | cut -c1)
line_text=$(echo "$matched_line" | cut -c2-) # Text from the second character onwards
change_type=""
if [ "$change_char" = "+" ]; then
change_type="[ADDITION]"
elif [ "$change_char" = "-" ]; then
change_type="[DELETION]"
else
change_type="[???]" # Should not happen due to the GREP_PATTERN
fi
# 6. Display the collected information, including the date and author
echo "$commit_id [$commit_author_date, $commit_author_name] $file_path $change_type: $line_text"
done
fi
done
done
echo "Search completed for '$KEYWORD'."
Argument Parsing: The script first checks if exactly one argument (the keyword) is provided. If not, it prints a usage message and exits.
Initial Commit Search: git log --all --pretty="format:%H" -G"$KEYWORD"
searches all branches for commits where the diff's patch text contains the specified KEYWORD
. The -G
option treats the keyword as a regular expression. It outputs only the commit hashes (%H
).
Author and Date Fetching: For each commit_id
found, git show -s --format="%an"
and git show -s --format="%ad" --date=short
are used to retrieve the author's name and the authoring date (formatted as YYYY-MM-DD), respectively. The -s
option suppresses diff output, making these calls efficient.
File Iteration: git diff-tree --no-commit-id --name-only -r "$commit_id"
lists all files modified in the current commit.
Diff Inspection: For each modified file, git show --pretty="format:" --unified=0 "$commit_id" -- "$file_path"
displays the diff (patch) for that specific file within that commit.
Line Matching: The output of git show
is piped to grep --color=never -E "$GREP_PATTERN"
.
GREP_PATTERN
(^[+-].*'"$KEYWORD"'
) searches for lines starting with +
or -
(indicating added or removed lines) that contain the KEYWORD
.
--color=never
ensures that grep
doesn't output color codes if it's aliased to do so, which would interfere with text parsing.
-E
enables extended regular expressions for the pattern.
Line Processing: Each matching line found by grep
is processed:
The first character (+
or -
) is extracted using cut -c1
to determine if it's an [ADDITION]
or [DELETION]
.
The rest of the line text (after the +
/-
) is extracted using cut -c2-
.
Output: Finally, all the collected information (commit ID, date, author, file path, change type, and line text) is printed to the console.
Save the Script: Copy the script above into a new file in your project directory or a directory in your PATH
. Let's name it git_search_diff.sh
.
Make it Executable: Open your terminal, navigate to where you saved the file, and run:
Bash
chmod +x git_search_diff.sh
Run the Script: Execute the script from the root directory of your Git repository, providing the keyword you want to search for as an argument.
Bash
./git_search_diff.sh "your_keyword_here"
For example, to search for the keyword "API_KEY":
Bash
./git_search_diff.sh "API_KEY"
Or to search for "Netflix":
Bash
./git_search_diff.sh "Netflix"
The output will look something like this:
Searching for commits containing 'your_keyword_here' in diffs...
abcdef1234567890abcdef1234567890abcdef12 [2023-05-15, John Doe] src/example.js [ADDITION]: + // TODO: Integrate your_keyword_here for new feature
fedcba0987654321fedcba0987654321fedcba09 [2022-11-01, Jane Smith] config/settings.py [DELETION]: - OLD_API_KEY_FORMAT_WITH_your_keyword_here = "..."
...
Search completed for 'your_keyword_here'.
Keyword as Regex: Both git log -G"$KEYWORD"
and grep -E "$GREP_PATTERN"
treat the provided keyword as a regular expression. If your keyword contains special regex characters (e.g., .
, *
, +
, ?
, []
, ()
, \
) and you want to search for them literally, you'll need to escape them when providing the argument (e.g., \.
for a literal dot).
Date Format: The script uses --date=short
for a YYYY-MM-DD
date format. You can change this in the commit_author_date=$(git show ...)
line to other formats like --date=iso
, --date=rfc2822
, or --date=relative
if you prefer.
Performance: On very large repositories with extensive histories, the script might take some time to run as it iterates through commits and files and executes multiple Git commands.
Shell Compatibility: The script uses #!/bin/sh
and standard POSIX utilities like grep
, cut
, and echo
, so it should be broadly compatible across different Unix-like systems.
This script provides a powerful way to pinpoint exactly where and how specific keywords were introduced or removed in your project's history, along with valuable contextual information.