Analyzing history
The six pre-built pgit analyses (churn, coupling, hotspots, authors, activity, bus-factor) with their filters, sorting, and output formats.
Analyzing history
pgit analyze wraps optimized SQL behind one-word subcommands. Each one answers a common question about a codebase's history, and each is tuned for pgit's storage so you do not have to think about delta chains. There are six.
Files changed in the most commits.
File pairs that change together.
Churn rolled up by directory.
Commits per contributor.
Commit volume over time.
Files only one person touches.
Shared options
Every subcommand accepts the same core flags:
| Flag | Default | Effect |
|---|---|---|
--limit, -n |
25 | Maximum rows to show |
--path, -p |
(none) | Glob filter on file paths, for example src/**/*.go |
--sort |
per-analysis | Sort by a named column |
--reverse |
off | Flip the sort order |
--json |
off | Emit a JSON array |
--raw |
off | Tab-separated output, for piping |
--no-pager |
off | Plain table instead of the interactive viewer |
--remote |
(local) | Run against a configured remote database |
--timeout |
5m | Abort if the analysis runs longer |
By default results open in an interactive table you can search, sort, expand columns, and copy from. --json and --raw switch to non-interactive output for scripts.
--path "src/**/*.go" matches .go files at any depth under src/. A plain * matches within a single path segment.
Every analysis takes --remote <name> to run against a configured remote database instead of your local one, for example pgit analyze churn --remote origin. The query runs where the data lives, so you can interrogate a shared database without pulling it down. See Remotes.
churn
Rank files by how many commits touched them. Maintenance hotspots tend to surface here.
pgit analyze churn --limit 20Columns: path, versions. This query reads only heap tables, so it stays fast on large repos. Default sort is versions descending.
coupling
Find file pairs most often modified in the same commit. High coupling between unrelated files hints at a missing abstraction or a hidden dependency.
pgit analyze coupling --min 5Columns: file_a, file_b, commits_together. Two extra flags shape it:
--min(default 2) sets the minimum co-change count to include a pair.--max-files(default 100) skips commits that touch more than that many files, because a tree-wide reformat couples everything to everything and drowns out real signal.
The pair counting runs in Go rather than as a SQL self-join, which keeps it tractable on millions of file references.
hotspots
Aggregate churn by directory, so you see which subsystems accumulate change rather than which individual files do.
pgit analyze hotspots --depth 2Columns: directory, files, total_versions, avg_versions. --depth (default 1) controls how many directory levels to roll up to; depth 1 is top-level, depth 2 splits one level deeper. Files at the repository root are grouped under (root).
authors
Rank contributors by commit count, with the span of their activity.
pgit analyze authorsColumns: author, email, commits, first_commit, last_commit. This one reads commit metadata, so it streams pgit_commits once front-to-back (the cheapest way to read a delta chain). On a very large history it takes seconds rather than milliseconds.
activity
Commit counts bucketed by time period, with empty periods included so the timeline has no gaps.
pgit analyze activity --period month --chartColumns: period, commits, and optionally bar. Options:
--periodisweek,month(default),quarter, oryear.--chartadds an ASCII bar column for a quick visual.
Output is chronological, so --sort does not apply here; use --reverse to flip oldest-first to newest-first.
bus-factor
For each file, count the distinct authors who have ever touched it. Files with a single author are knowledge silos: a bus-factor risk.
pgit analyze bus-factor --max-authors 1Columns: path, authors, author_list. --max-authors (default 0, meaning no cap) shows only files with at most that many authors, so --max-authors 1 lists the pure silos. Results sort by author count ascending, most vulnerable first.
Output for scripts and agents
Anything you can see, you can pipe. --json gives structured rows; --raw gives tab-separated values:
pgit analyze churn --json | jq '.[0]'pgit analyze coupling --raw | awk -F'\t' '$3 > 50'When the built-in analyses do not answer your exact question, drop to SQL.
Where to go next
Write your own queries and search across history.
The tables and columns these analyses read.
Did this work on your setup?
Not rated yet