Package: git-buildpackage Version: 0.9.30 Severity: normal User: de...@kali.org Usertags: origin-kali
Dear Maintainer, In Kali Linux, we package an upstream that uses Git LFS to store a big file (a GeoIP database). The upstream is at: https://github.com/rsmusllp/king-phisher In a previous version, upstream used to version the database "as is", it was a regular file in the Git repo. Then in a subsequent version, they switched to use Git LFS to store this file. Gbp doesn't handle this transition well, apparently this is due to the combination of: * "gbp clone" disabling Git attributes (hence git lfs) * however "gbp import-orig" does no such thing I'm the person who updated this package, so my local copy of king-phisher doesn't have the git attributes disabled, and everything works fine with me. However other folks who clone the repo complain, as it leads to an unclean git checkout, and I don't know what's the way forward. For a longer (and hopefully crystal-clear) explanation of the issue, I prepared a Git repo and a walkthrough to reproduce the issue. There we go :) Let's first clone the king-phisher package *before* upstream switched to Git LFS: $ gbp clone https://gitlab.com/arnaudr/king-phisher.git $ cd king-phisher $ cat .gitattributes cat: .gitattributes: No such file or directory $ ls -l data/server/king_phisher/GeoLite2-City.mmdb -rw-r--r-- 1 arno arno 61615395 Dec 15 11:53 data/server/king_phisher/GeoLite2-City.mmdb So at this point, the file GeoLite2-City.mmdb is versioned "as is", it is a regular file. Now let's update the package to latest Git snapshot: $ gbp import-orig --uscan gbp:info: Launching uscan... Downloading data/server/king_phisher/GeoLite2-City.mmdb (62 MB) gbp:info: Using uscan downloaded tarball ../king-phisher_1.15.0+git20221107.orig.tar.xz What is the upstream version? [1.15.0+git20221107] gbp:info: Importing '../king-phisher_1.15.0+git20221107.orig.tar.xz' to branch 'upstream'... gbp:info: Source package is king-phisher gbp:info: Upstream version is 1.15.0+git20221107 gbp:info: Replacing upstream source on 'kali/master' gbp:info: Successfully imported version 1.15.0+git20221107 of ../king-phisher_1.15.0+git20221107.orig.tar.xz The line "Downloading data/server/king_phisher/GeoLite2-City.mmdb (62 MB" comes from git lfs, which is downloading the file. And here's the situation now: $ cat .gitattributes *.mmdb filter=lfs diff=lfs merge=lfs -text $ cat .git/info/attributes cat: .git/info/attributes: No such file or directory $ ls -l data/server/king_phisher/GeoLite2-City.mmdb -rw-r--r-- 1 arno arno 61615395 Dec 15 11:56 data/server/king_phisher/GeoLite2-City.mmdb So we can see the git lfs thinggy, and we can see that .git/info/attributes' doesn't exist (more on that below). Let's push that work (I prepared a fork to push changes): $ git remote add arnaudr2 g...@gitlab.com:arnaudr/king-phisher2.git $ git push arnaudr2 : --follow-tags Locking support detected on remote "arnaudr2". Consider enabling it with: $ git config lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true Locking support detected on remote "arnaudr2". Consider enabling it with: $ git config lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true Locking support detected on remote "arnaudr2". Consider enabling it with: $ git config lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true Locking support detected on remote "arnaudr2". Consider enabling it with: $ git config lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true Uploading LFS objects: 100% (1/1), 62 MB | 3.4 MB/s, done. Enumerating objects: 112, done. Counting objects: 100% (82/82), done. Delta compression using up to 8 threads Compressing objects: 100% (46/46), done. Writing objects: 100% (49/49), 19.06 KiB | 19.06 MiB/s, done. Total 49 (delta 29), reused 5 (delta 0), pack-reused 0 remote: remote: To create a merge request for pristine-tar, visit: remote: https://gitlab.com/arnaudr/king-phisher2/-/merge_requests/new?merge_request%5Bsource_branch%5D=pristine-tar remote: remote: remote: To create a merge request for upstream, visit: remote: https://gitlab.com/arnaudr/king-phisher2/-/merge_requests/new?merge_request%5Bsource_branch%5D=upstream remote: To gitlab.com:arnaudr/king-phisher2.git c5db68b..dbf4ce7 kali/master -> kali/master d9ec6a5..e4e9390 pristine-tar -> pristine-tar be63910..f4f0fae upstream -> upstream * [new tag] upstream/1.15.0+git20221107 -> upstream/1.15.0+git20221107 And now, the issue: when we clone this repo with gbp, the resulting repo is not clean. Let's try: $ gbp clone -v g...@gitlab.com:arnaudr/king-phisher2.git gbp:debug: ['git', 'rev-parse', '--show-cdup'] gbp:info: Cloning from 'g...@gitlab.com:arnaudr/king-phisher2.git' gbp:debug: ['git', 'clone', '--quiet', 'g...@gitlab.com:arnaudr/king-phisher2.git'] gbp:debug: ['git', 'rev-parse', '--show-cdup'] gbp:debug: ['git', 'rev-parse', '--is-bare-repository'] gbp:debug: ['git', 'rev-parse', '--git-dir'] gbp:debug: ['git', 'rev-parse', '--show-cdup'] gbp:debug: ['git', 'rev-parse', '--is-bare-repository'] gbp:debug: ['git', 'rev-parse', '--git-dir'] gbp:debug: Will track branches: ['kali/master', 'upstream', 'pristine-tar'] gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/origin/kali/master'] gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/kali/master'] gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/origin/upstream'] gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/upstream'] gbp:debug: ['git', 'branch', 'upstream', 'origin/upstream'] gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/origin/pristine-tar'] gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/pristine-tar'] gbp:debug: ['git', 'branch', 'pristine-tar', 'origin/pristine-tar'] gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/kali/master'] gbp:debug: ['git', 'config', 'user.name', 'Arnaud Rebillout'] gbp:debug: ['git', 'config', 'user.email', 'arna...@kali.org'] gbp:debug: ['git', 'ls-tree', '-z', '-r', '-l', 'HEAD', '--'] gbp:debug: Found non-empty .gitattributes: b'.gitattributes' gbp:debug: Configuring Git attributes $ cd king-phisher2 $ git status On branch kali/master Your branch is up to date with 'origin/kali/master'. Changes not staged for commit: (use "git add <file>..." to update what will be committed) (use "git restore <file>..." to discard changes in working directory) modified: data/server/king_phisher/GeoLite2-City.mmdb no changes added to commit (use "git add" and/or "git commit -a") $ cat .gitattributes *.mmdb filter=lfs diff=lfs merge=lfs -text $ cat .git/info/attributes # Added by git-buildpackage to disable .gitattributes found in the upstream tree [attr]dgit-defuse-attrs -text -eol -crlf -ident -filter -working-tree-encoding * -export-ignore * dgit-defuse-attrs $ ls -l data/server/king_phisher/GeoLite2-City.mmdb -rw-r--r-- 1 arno arno 61615395 Dec 15 12:12 data/server/king_phisher/GeoLite2-City.mmdb As we can see above (my interpretation): * during the 'gbp clone' step, the 'git clone' command will actually trigger git lfs, and download the GeoLite2 database (assuming you have the package git-lfs installed on your machine). * then at the end of the gbp clone operation, we can see "Configuring Git attributes", and this is when gbp creates the file .git/info/attributes * as a result, the git repo is in an unclean state To bring back the Git repo in shape, we can either: 1) Undo what gbp just did: rm -fr .git/info/attributes 2) Undo what git lfs did: $ git checkout data/server/king_phisher/GeoLite2-City.mmdb Updated 1 path from the index $ cat data/server/king_phisher/GeoLite2-City.mmdb version https://git-lfs.github.com/spec/v1 oid sha256:a253d9cd68fe17b00087da24375f31f07cd4bb3852dc5fe3afe37b8f59e5abd0 size 61615395 As we can see with option 2), the LFS file becomes a short metadata file, because that's what's really in the Git repo, before "git lfs" replaces it with the "real file" that it fetches from somewhere else. == Questions How does the git LFS files should be handled? When "gbp clone" disables the gitattributes, it disables Git LFS in turn: is it intended, or not? Does gbp has an opinion on that? In any case, it seems that disabling the gitattributes after 'git clone' has run is too late, because the Git LFS objects were already fetched. Thanks for reading, and please help me understand how we should handle those LFS files. Arnaud