Package: git-buildpackage
Version: 0.9.30
Severity: normal
User: de...@kali.org
Usertags: origin-kali

Dear Maintainer,

In Kali Linux, we package an upstream that uses Git LFS to store a big
file (a GeoIP database). The upstream is at:
https://github.com/rsmusllp/king-phisher

In a previous version, upstream used to version the database "as is", it
was a regular file in the Git repo. Then in a subsequent version, they
switched to use Git LFS to store this file.

Gbp doesn't handle this transition well, apparently this is due to the
combination of:
* "gbp clone" disabling Git attributes (hence git lfs)
* however "gbp import-orig" does no such thing

I'm the person who updated this package, so my local copy of
king-phisher doesn't have the git attributes disabled, and everything
works fine with me. However other folks who clone the repo complain, as
it leads to an unclean git checkout, and I don't know what's the way
forward.

For a longer (and hopefully crystal-clear) explanation of the issue, I
prepared a Git repo and a walkthrough to reproduce the issue. There we
go :)

Let's first clone the king-phisher package *before* upstream switched to
Git LFS:

  $ gbp clone https://gitlab.com/arnaudr/king-phisher.git
  $ cd king-phisher
  $ cat .gitattributes
  cat: .gitattributes: No such file or directory
  $ ls -l data/server/king_phisher/GeoLite2-City.mmdb 
  -rw-r--r-- 1 arno arno 61615395 Dec 15 11:53 
data/server/king_phisher/GeoLite2-City.mmdb

So at this point, the file GeoLite2-City.mmdb is versioned "as is", it
is a regular file.

Now let's update the package to latest Git snapshot:

  $ gbp import-orig --uscan
  gbp:info: Launching uscan...
  Downloading data/server/king_phisher/GeoLite2-City.mmdb (62 MB)
  gbp:info: Using uscan downloaded tarball 
../king-phisher_1.15.0+git20221107.orig.tar.xz
  What is the upstream version? [1.15.0+git20221107] 
  gbp:info: Importing '../king-phisher_1.15.0+git20221107.orig.tar.xz' to 
branch 'upstream'...
  gbp:info: Source package is king-phisher
  gbp:info: Upstream version is 1.15.0+git20221107
  gbp:info: Replacing upstream source on 'kali/master'
  gbp:info: Successfully imported version 1.15.0+git20221107 of 
../king-phisher_1.15.0+git20221107.orig.tar.xz

The line "Downloading data/server/king_phisher/GeoLite2-City.mmdb (62
MB" comes from git lfs, which is downloading the file. And here's the
situation now:

  $ cat .gitattributes 
  *.mmdb filter=lfs diff=lfs merge=lfs -text
  $ cat .git/info/attributes
  cat: .git/info/attributes: No such file or directory
  $ ls -l data/server/king_phisher/GeoLite2-City.mmdb
  -rw-r--r-- 1 arno arno 61615395 Dec 15 11:56 
data/server/king_phisher/GeoLite2-City.mmdb

So we can see the git lfs thinggy, and we can see that
.git/info/attributes' doesn't exist (more on that below).

Let's push that work (I prepared a fork to push changes):

 $ git remote add arnaudr2 g...@gitlab.com:arnaudr/king-phisher2.git
 $ git push arnaudr2 : --follow-tags
  Locking support detected on remote "arnaudr2". Consider enabling it with:
    $ git config 
lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
  Locking support detected on remote "arnaudr2". Consider enabling it with:
    $ git config 
lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
  Locking support detected on remote "arnaudr2". Consider enabling it with:
    $ git config 
lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
  Locking support detected on remote "arnaudr2". Consider enabling it with:
    $ git config 
lfs.https://gitlab.com/arnaudr/king-phisher2.git/info/lfs.locksverify true
  Uploading LFS objects: 100% (1/1), 62 MB | 3.4 MB/s, done.
  Enumerating objects: 112, done.
  Counting objects: 100% (82/82), done.
  Delta compression using up to 8 threads
  Compressing objects: 100% (46/46), done.
  Writing objects: 100% (49/49), 19.06 KiB | 19.06 MiB/s, done.
  Total 49 (delta 29), reused 5 (delta 0), pack-reused 0
  remote:
  remote: To create a merge request for pristine-tar, visit:
  remote:   
https://gitlab.com/arnaudr/king-phisher2/-/merge_requests/new?merge_request%5Bsource_branch%5D=pristine-tar
  remote:
  remote:
  remote: To create a merge request for upstream, visit:
  remote:   
https://gitlab.com/arnaudr/king-phisher2/-/merge_requests/new?merge_request%5Bsource_branch%5D=upstream
  remote:
  To gitlab.com:arnaudr/king-phisher2.git
     c5db68b..dbf4ce7  kali/master -> kali/master
     d9ec6a5..e4e9390  pristine-tar -> pristine-tar
     be63910..f4f0fae  upstream -> upstream
   * [new tag]         upstream/1.15.0+git20221107 -> 
upstream/1.15.0+git20221107

And now, the issue: when we clone this repo with gbp, the resulting repo
is not clean. Let's try:

  $ gbp clone -v g...@gitlab.com:arnaudr/king-phisher2.git
  gbp:debug: ['git', 'rev-parse', '--show-cdup']
  gbp:info: Cloning from 'g...@gitlab.com:arnaudr/king-phisher2.git'
  gbp:debug: ['git', 'clone', '--quiet', 
'g...@gitlab.com:arnaudr/king-phisher2.git']
  gbp:debug: ['git', 'rev-parse', '--show-cdup']
  gbp:debug: ['git', 'rev-parse', '--is-bare-repository']
  gbp:debug: ['git', 'rev-parse', '--git-dir']
  gbp:debug: ['git', 'rev-parse', '--show-cdup']
  gbp:debug: ['git', 'rev-parse', '--is-bare-repository']
  gbp:debug: ['git', 'rev-parse', '--git-dir']
  gbp:debug: Will track branches: ['kali/master', 'upstream', 'pristine-tar']
  gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/origin/kali/master']
  gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/kali/master']
  gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/origin/upstream']
  gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/upstream']
  gbp:debug: ['git', 'branch', 'upstream', 'origin/upstream']
  gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/origin/pristine-tar']
  gbp:debug: ['git', 'show-ref', '--verify', 'refs/heads/pristine-tar']
  gbp:debug: ['git', 'branch', 'pristine-tar', 'origin/pristine-tar']
  gbp:debug: ['git', 'show-ref', '--verify', 'refs/remotes/kali/master']
  gbp:debug: ['git', 'config', 'user.name', 'Arnaud Rebillout']
  gbp:debug: ['git', 'config', 'user.email', 'arna...@kali.org']
  gbp:debug: ['git', 'ls-tree', '-z', '-r', '-l', 'HEAD', '--']
  gbp:debug: Found non-empty .gitattributes: b'.gitattributes'
  gbp:debug: Configuring Git attributes
  
  $ cd king-phisher2
  
  $ git status
  On branch kali/master
  Your branch is up to date with 'origin/kali/master'.
  
  Changes not staged for commit:
    (use "git add <file>..." to update what will be committed)
    (use "git restore <file>..." to discard changes in working directory)
        modified:   data/server/king_phisher/GeoLite2-City.mmdb
  
  no changes added to commit (use "git add" and/or "git commit -a")
  
  $ cat .gitattributes 
  *.mmdb filter=lfs diff=lfs merge=lfs -text
  $ cat .git/info/attributes 
  # Added by git-buildpackage to disable .gitattributes found in the upstream 
tree
  [attr]dgit-defuse-attrs  -text -eol -crlf -ident -filter 
-working-tree-encoding
  * -export-ignore
  * dgit-defuse-attrs
  $ ls -l data/server/king_phisher/GeoLite2-City.mmdb 
  -rw-r--r-- 1 arno arno 61615395 Dec 15 12:12 
data/server/king_phisher/GeoLite2-City.mmdb
  
As we can see above (my interpretation):
* during the 'gbp clone' step, the 'git clone' command will actually
  trigger git lfs, and download the GeoLite2 database (assuming you have
  the package git-lfs installed on your machine).
* then at the end of the gbp clone operation, we can see "Configuring
  Git attributes", and this is when gbp creates the file
  .git/info/attributes
* as a result, the git repo is in an unclean state

To bring back the Git repo in shape, we can either:

1) Undo what gbp just did:

    rm -fr .git/info/attributes

2) Undo what git lfs did:

    $ git checkout data/server/king_phisher/GeoLite2-City.mmdb
    Updated 1 path from the index
    $ cat data/server/king_phisher/GeoLite2-City.mmdb
    version https://git-lfs.github.com/spec/v1
    oid sha256:a253d9cd68fe17b00087da24375f31f07cd4bb3852dc5fe3afe37b8f59e5abd0
    size 61615395

As we can see with option 2), the LFS file becomes a short metadata
file, because that's what's really in the Git repo, before "git lfs"
replaces it with the "real file" that it fetches from somewhere else.

  == Questions

How does the git LFS files should be handled? When "gbp clone" disables
the gitattributes, it disables Git LFS in turn: is it intended, or not?
Does gbp has an opinion on that?  In any case, it seems that disabling
the gitattributes after 'git clone' has run is too late, because the Git
LFS objects were already fetched.

Thanks for reading, and please help me understand how we should handle
those LFS files.

Arnaud

Reply via email to