Importing a git repository
Import a git history into pgit in depth, covering branches, parallel workers, resuming an interrupted run, and importing straight into a remote.
Importing a git repository
pgit import is the command you will reach for most. It reads a git repository's full history and writes it into pgit's tables with delta compression. This guide covers the options that matter once you move past the basics.
The basics
Run import from inside a pgit workspace (pgit init first) and point it at a git repo:
pgit initpgit import /path/to/repo --branch mainThe path argument defaults to the current directory, so pgit import with no path imports the repo you are standing in. The target must contain a .git directory.
Under the hood, pgit runs git fast-export (with --reencode=yes --show-original-ids) so it gets correct handling of merges, renames, and full commit messages, then streams the content into PostgreSQL through a pool of workers. Locally, the database container starts automatically if it is not already running.
Choosing a branch
pgit import /path/to/repo --branch developLeave --branch off and pgit imports the current branch, or shows an interactive picker if the repo has several branches. Each import brings in one branch's history.
Parallel workers
Blob import is parallelized. Control the worker count with --workers (-w):
pgit import /path/to/repo --workers 8The default comes from import.workers in your global config (a conservative 3 on a typical laptop), and pgit caps the effective count at the number of CPU cores, since more workers than cores only adds contention. On a big machine, raising both the config default and this flag is the single biggest lever on import speed.
Worker count is only half the story. For a multi-million-commit import, the container's memory and xpatch cache settings matter just as much. See Configuration and tuning for a worked profile.
Resuming an interrupted import
Large imports can take a while, and pgit tracks its progress so you do not have to start over. If a run is interrupted, re-running import tells you what state the database is in. To continue from where it stopped:
pgit import /path/to/repo --resumeResume picks up the blob phase when commits are already in, or skips already-inserted commits otherwise. The states pgit distinguishes:
| Situation | What pgit does |
|---|---|
| Database empty | Normal import |
| Commits inserted, blobs incomplete | --resume continues the blob phase |
| Import already complete | Refuses, unless you pass --force |
Re-importing and --force
Imported history is immutable, so pgit will not quietly overwrite a finished import. To wipe the database and start fresh, pass --force:
pgit import /path/to/repo --forcepgit's storage schema is versioned. When you upgrade pgit to a version with a newer schema, an existing database is rejected with a message telling you to re-import (pgit import --force). This is by design: the layout is append-only and optimized per version, so there is no in-place migration.
Importing straight into a remote
By default import writes to your local container. To import directly into a remote pg-xpatch database (skipping the local container entirely), name a configured remote:
pgit remote add origin postgres://user:pass@host:5432/mydbpgit import /path/to/repo --remote originpgit initializes the schema on the remote if needed. This is useful for populating a shared database without a local copy. See Remotes for the connection model.
Reusing a fast-export file
If you already have a git fast-export stream (or want to export once and import several times), skip the re-export with --fastexport:
git fast-export --reencode=yes --show-original-ids master > repo.streampgit import /path/to/repo --fastexport repo.streamOther flags
--dry-run(-n) reports what would be imported without writing anything.--timeoutbounds the whole operation (default 24h); raise it for very large repos, for example--timeout 48h.
What gets stored
After import you have the full history in queryable tables: commits, every file version, the path-to-group mapping, and the commit graph. From here the repository is read-only and ready for analysis.
Where to go next
Run the pre-built analyses on what you just imported.
Ask anything the built-ins do not cover.
Make a large import faster.
Hat das auf deinem Setup funktioniert?
Noch nicht bewertet