Versioning Large Files with Git LFS
I’ve written before about the advantages of keeping content in version control. A quick recap of the benefits:
- Content is portable - just
- Full content history is available; can rollback changes and recover earlier revisions
- Content updates can take advantage of the same workflows popular with code: pull request→review→merge
While the benefits are signficant, one of the major pain points of this solution arises when large non-text files are introduced to your content. Given that interesting content is often media-heavy, this can be a real problem. Enter Git LFS.
Problems With Versioning Binary Files
Git is very good at handling text files efficiently, but not so great with files like images, videos, and other formats that can’t be expressed in plaintext. Images don’t compress as easily as text, and a large image will be orders of magnitude larger than a very long text file due to the simple fact that images are more information dense than text.
Git is clever, but it may not be as magic as you think. Everything you commit to a repository is in there somewhere, so if you commit lots of big files, you will end up with a large repo that takes longer to clone and is slower at performing Git operations.
Git Large File Storage
Git is a distributed version control system, meaning that there is no canonical “master” repository. Every cloned copy of the repo has equal importance in the eyes of Git.
Of course, it’s convenient for teams to use Git in a more centralized manner; a copy of the repo hosted on a cloud provider such as GitHub serves as the canonical copy, the source of truth.
Git LFS takes advantage of this centralized pattern. Large files are stored in the cloud, and these files are referenced via pointers in local copies of the repo. When a
pull occurs, the appropriate version of the file is downloaded from the remote. Updates to a large file will still create multiple copies of the file, but these copies will be stored on the cloud and won’t have to be downloaded by everyone who clones the repo.
Setting up LFS in your git repo
Git LFS is easy to set up and works transparently on your repository. Once you configure Git LFS in your repo, you can continue to commit and push large files as you would normally.
- Download and install the Git LFS extension from the Git LFS website.
- Navigate to your repository and run
git lfs install.
git lfs trackfollowed by the file pattern you want. To track all PNG files for example, run
git lfs track "*.png"
- Commit the
.gitattributesfile as well as any existing files that are now tracked in LFS, and push the changes to your remote
That’s it! You’re now using Git LFS to handle your large binary files.
Forestry Now Supports Git LFS
Forestry is now able to handle files stored in Git LFS correctly, so this is an excellent and simple solution if you want to use the image transformation functionality available in Hugo or Gatsby. Since Git LFS works transparently on your repository, if you’re already using the default media library setting of Commit Media to Repository, you don’t need to do anything else to take advantage of Git LFS in Forestry.
Images stored in GitLab and Bitbucket private repositories will not be shown in Forestry due to current limitations: https://gitlab.com/gitlab-org/gitlab-ce/issues/45149
Caught a mistake or want to contribute to the blog? Edit this page on Github!