Quickstart
Import a real git repository into pgit and run your first analyses and SQL queries, start to finish, in about ten minutes.
Quickstart
The fastest way to understand pgit is to point it at a repository you already know and start asking questions. In this walkthrough you import a real git repo, run a couple of built-in analyses, drop into SQL, and search across history. By the end you will have the whole loop in your hands.
You need pgit installed and a working Docker or Podman. If pgit doctor is happy, you are set. If not, start at Installation.
1. Get a repository to play with
Any git repo works. If you do not have one handy, clone a small public project:
git clone https://github.com/junegunn/fzf2. Initialize a pgit workspace
Make a separate directory for the analysis and initialize pgit there. Keeping it apart from the source repo is tidy, not required.
mkdir fzf-analysis && cd fzf-analysispgit initpgit init creates a .pgit/ directory for local config and detects your container runtime. It does not start the database yet; the next command does that for you.
3. Import the history
Point import at the git repo and pick a branch:
pgit import ../fzf --branch masterA few things happen on this first run:
- the local pg-xpatch container starts (and the image is pulled if needed),
- pgit runs
git fast-exportto read the full history, including merges, renames, and real author dates, - a worker pool streams the content into PostgreSQL with delta compression.
Leave off --branch and pgit imports the current branch, or shows a picker when there is more than one. Pass --branch <name> to skip straight to one.
When it finishes, every version of every file is in the database, ready to query.
4. Run your first analyses
The analyze commands wrap optimized queries behind one-word subcommands. Start with churn, the files that changed the most:
pgit analyze churnThen coupling, the files that tend to change together:
pgit analyze couplingfile_a file_b commits_together─────────────── ────────────── ────────────────CHANGELOG.md man/man1/fzf.1 311src/terminal.go src/options.go 288man/man1/fzf.1 src/options.go 276That is the real top of fzf's coupling at the time of writing. Your output depends on the repo you imported and when, and results open in an interactive viewer you can search, sort, and copy from. Add --json or --raw to pipe the data somewhere else.
Every analysis takes --limit, a --path glob filter, and sort flags. Analyzing history covers all six.
5. Drop into SQL
When a built-in analysis is not exactly what you want, the data is all in plain tables. Ask for the ten most recent commits:
pgit sql "SELECT id, author_name, message FROM pgit_commits ORDER BY seq DESC LIMIT 10"Not sure what is in there? pgit documents its own schema:
pgit sql schema # all tablespgit sql schema pgit_commits # one table's columnspgit sql examples # ready-to-run example queriesSQL is read-only by default, so you cannot corrupt the import by exploring. Querying with SQL goes deeper.
6. Search across every version
Unlike git grep, pgit can search history, not just the current checkout:
pgit search "TODO" --path "*.go" # latest version of each filepgit search --all "panic\(" --ignore-case # every version ever (the pattern is a regex, so escape the paren)What you just did
Initialized a pgit workspace with pgit init.
Imported a full git history with pgit import.
Ran analyze churn and analyze coupling.
Queried the raw tables with pgit sql.
Searched across all of history with pgit search.
That is the core loop: import, then ask.
Optional: the native git-like workflow
pgit can also record commits itself, no import required. This is the secondary use case, but it is there when you want it:
pgit config user.name "Your Name"pgit config user.email "you@example.com"pgit add . # stage everything (-A includes untracked)pgit commit -m "Initial commit" # or omit -m to open your editorpgit logRemember that imported history is append-only, so this workflow is best on a repo you started in pgit rather than one you imported.
Where to go next
The tables behind these commands, and why the queries are shaped this way.
Branches, workers, resuming, and importing straight into a remote.
All six analyses, with filters and output formats.
Custom queries, the schema, and search.
Did this work on your setup?
Not rated yet