SHA-1 collision in repository?
When we try to commit a very specific version of a very specific binary file, we get a SHA-1 collision error from the Subversion repository: D:\confidential>svn commit secret.bin -m "Testing broken commit" Sendingsecret.bin Transmitting file data .svn: E16: Commit failed (details follow): svn: E16: SHA1 of reps '604440 34 134255 136680 c9f4fabc4d093612fece03c339401058 db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0 134255 136680 c9f4fabc4d093612fece03c339401058 db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ What can cause this? This file is a binary pixel shader compiled from a build process. It's most certainly not Google's SHA-1 collision PDF files. We also scanned the repository to confirm that nobody has committed Google's collision files. Occam's Razor suggests that something is wrong with our repository or Subversion itself, rather than this being a true SHA-1 collision. In that case, what is wrong with our repository? If this really is a SHA-1 collision, it would be major cryptography news that someone randomly ran into a second collision without even trying. In that case, is there a method by which we could recover the two files that supposedly have the same SHA-1? The collision doesn't appear to be in the file itself, but in some sort of diff or revision output? Thanks, Melissa
Re: SHA-1 collision in repository?
That was one document we ran into when searching, yes. We can do an svnsync, but this will take about a week to run--the repository is 43 GB with 600,000 commits. I guess we'll start it now. On Thu, Feb 22, 2018 at 2:04 PM, Matt Simmons wrote: > Hi Melissa, > > That definitely is interesting. > > I assume you have read > http://blogs.collab.net/subversion/subversion-sha1-collision-problem-statement-prevention-remediation-options > > If you do an svnsync to another location and attempt the commit there, does > the problem replicate itself? > > --Matt > > > On Thu, Feb 22, 2018 at 12:30 PM, Myria wrote: >> >> When we try to commit a very specific version of a very specific >> binary file, we get a SHA-1 collision error from the Subversion >> repository: >> >> D:\confidential>svn commit secret.bin -m "Testing broken commit" >> Sendingsecret.bin >> Transmitting file data .svn: E16: Commit failed (details follow): >> svn: E16: SHA1 of reps '604440 34 134255 136680 >> c9f4fabc4d093612fece03c339401058 >> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0 >> 134255 136680 c9f4fabc4d093612fece03c339401058 >> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches >> (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ >> >> >> What can cause this? This file is a binary pixel shader compiled from >> a build process. It's most certainly not Google's SHA-1 collision PDF >> files. We also scanned the repository to confirm that nobody has >> committed Google's collision files. >> >> Occam's Razor suggests that something is wrong with our repository or >> Subversion itself, rather than this being a true SHA-1 collision. In >> that case, what is wrong with our repository? >> >> If this really is a SHA-1 collision, it would be major cryptography >> news that someone randomly ran into a second collision without even >> trying. In that case, is there a method by which we could recover the >> two files that supposedly have the same SHA-1? The collision doesn't >> appear to be in the file itself, but in some sort of diff or revision >> output? >> >> Thanks, >> >> Melissa > > > > > -- > "Today, vegetables... Tomorrow, the world!"
Re: SHA-1 collision in repository?
I'm not subscribed to this mailing list, so I have no standard way to reply to Philip's email. I don't even know his email address. > That pattern, all of MD5, SHA1 and size matching, is exactly what > happens if a SHA1 collision is committed using an old version of > Subversion where the rep-cache does not detect collisions. The first > part of the collision would have been committed in r604440 and the > second part in r605556. > > If that is the case, and a SHA1 collision did occur, then: > > svnadmin verify -r604440 path/to/repository > > will succeed while: > >svnadmin verify -r605556 path/to/repository > > will fail with an MD5 checksum error. > > If this is what you see then unfortunately the colliding r605556 content > has been elided and the r605556 revision is corrupt. The revision 605556 is simply the current revision number of the repository at the time of the attempted commit, and is unrelated to the problem. If I attempt the commit now, it's a higher number, but otherwise the same error message. Something I did notice is that the commit I'm trying to do is a reversion to an older version of the same file. The revision of the file throwing the error at 604440 is identical to the file I'm trying to commit, but the file currently in the repository is different. If I commit a dummy version of the file, then commit the version I actually want, the latter commit works. Could the collision be in a "diff" instead of the files themselves? Melissa On Thu, Feb 22, 2018 at 2:45 PM, Matt Simmons wrote: > I would get more advice from people here before you invest that time. I'm a > relative amateur and would listen to people with more experience than > myself. > > --Matt > > On Thu, Feb 22, 2018 at 2:29 PM, Myria wrote: >> >> That was one document we ran into when searching, yes. >> >> We can do an svnsync, but this will take about a week to run--the >> repository is 43 GB with 600,000 commits. I guess we'll start it now. >> >> On Thu, Feb 22, 2018 at 2:04 PM, Matt Simmons wrote: >> > Hi Melissa, >> > >> > That definitely is interesting. >> > >> > I assume you have read >> > >> > http://blogs.collab.net/subversion/subversion-sha1-collision-problem-statement-prevention-remediation-options >> > >> > If you do an svnsync to another location and attempt the commit there, >> > does >> > the problem replicate itself? >> > >> > --Matt >> > >> > >> > On Thu, Feb 22, 2018 at 12:30 PM, Myria wrote: >> >> >> >> When we try to commit a very specific version of a very specific >> >> binary file, we get a SHA-1 collision error from the Subversion >> >> repository: >> >> >> >> D:\confidential>svn commit secret.bin -m "Testing broken commit" >> >> Sendingsecret.bin >> >> Transmitting file data .svn: E16: Commit failed (details follow): >> >> svn: E16: SHA1 of reps '604440 34 134255 136680 >> >> c9f4fabc4d093612fece03c339401058 >> >> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' and '-1 0 >> >> 134255 136680 c9f4fabc4d093612fece03c339401058 >> >> db11617ef1454332336e00abc311d44bc698f3b3 605556-czmh/_8' matches >> >> (db11617ef1454332336e00abc311d44bc698f3b3) but contents differ >> >> >> >> >> >> What can cause this? This file is a binary pixel shader compiled from >> >> a build process. It's most certainly not Google's SHA-1 collision PDF >> >> files. We also scanned the repository to confirm that nobody has >> >> committed Google's collision files. >> >> >> >> Occam's Razor suggests that something is wrong with our repository or >> >> Subversion itself, rather than this being a true SHA-1 collision. In >> >> that case, what is wrong with our repository? >> >> >> >> If this really is a SHA-1 collision, it would be major cryptography >> >> news that someone randomly ran into a second collision without even >> >> trying. In that case, is there a method by which we could recover the >> >> two files that supposedly have the same SHA-1? The collision doesn't >> >> appear to be in the file itself, but in some sort of diff or revision >> >> output? >> >> >> >> Thanks, >> >> >> >> Melissa >> > >> > >> > >> > >> > -- >> > "Today, vegetables... Tomorrow, the world!" > > > > > -- > "Today, vegetables... Tomorrow, the world!"
Re: SHA-1 collision in repository?
On Fri, Feb 23, 2018 at 2:50 PM, Philip Martin wrote: > Stefan Sperling writes: > > I think this might be the case since you mentioned earlier that you > could not find a file with the given checksum. The checksums apply to > the repository format, i.e. before keyword/eol transformation, and if > you were calculating checksums from working copy files the values would > be different. > > One way to check repository format checksums is to use "svn info" on > working copy files. The checksum reported for one of the files modified > in r604440 should be db11617ef1454332336e00abc311d44bc698f3b3 > db11617ef1454332336e00abc311d44bc698f3b3 is the SHA-1 of the actual pixel shader file I'm trying to commit; in other words, it doesn't seem to be a hash of a special format. Similarly, c9f4fabc4d093612fece03c339401058 is its MD5. > >>> Melissa >> >> Hi Melissa, >> >> What is the output of the 'svnadmin verify' commands which Philip >> wrote about above? > > That does require server access. > I had our server admin run the commands: [root@meow ~]# svnadmin verify -r604440 /srv/subversion/repositories/meow/ * Verifying repository metadata ... * Verified revision 604440. [root@meow ~]# svnadmin verify -r605556 /srv/subversion/repositories/meow/ * Verifying repository metadata ... * Verified revision 605556. ("meow" replacing confidential names. No I don't know why the server admin is running svnadmin as root >.<) > Since you refer to the r604440 content does that mean you can > successfully checkout, or update to, that revision? If so that would > indicate that the revision is not corrupt in the repository. I was able to branch (svn copy) the affected branch to a new branch, and committing the same file to the new branch has the same error. Checking out that revision works fine; only that commit is affected. I started an svnsync yesterday to clone the repository to my desktop machine. It's at revision ~15 now, so maybe on Monday or Tuesday I'll be able to try things safely on my local machine. Once it's on my local machine, I'll be able to compile TortoiseSVN and debug it while pointing to a file:// repository. (TortoiseSVN instead of command-line svn because TortoiseSVN is compiled with Visual C++ and is therefore many times easier to debug.) Melissa
Re: SHA-1 collision in repository?
-bash-4.1$ sqlite3 rep-cache.db "select * from rep_cache where hash='db11617ef1454332336e00abc311d44bc698f3b3'" db11617ef1454332336e00abc311d44bc698f3b3|604440|34|134255|136680 The line from the grep -a command containing that hash is below. They all match. text: 604440 34 134255 136680 c9f4fabc4d093612fece03c339401058 db11617ef1454332336e00abc311d44bc698f3b3 604439-cyqm/_13 In other news, unknown whether related to the current problem, my attempt to clone the repository to my local computer is failing: D:\>svnsync sync file:///d:/svnclone Transmitting file data .svnsync: E16: SHA1 of reps '227170 153 193 57465 bb52be764a04d511ebb06e1889910dcf e6291ab119036eb783d0136afccdb3b445867364 227184-4vap/_4o' and '-1 0 193 57465 bb52be764a04d511ebb06e1889910dcf e6291ab119036eb783d0136afccdb3b445867364 227184-4vap/_4o' matches (e6291ab119036eb783d0136afccdb3b445867364) but contents differ svnsync: E160004: Filesystem is corrupt svnsync: E200014: Checksum mismatch while reading representation: expected: bb52be764a04d511ebb06e1889910dcf actual: 80a10d37de91cadc604ba30e379651b3 This is odd, because revision 227185 (the revision it's trying to commit) verifies fine on the originating server: -bash-4.1$ sudo svnadmin verify -r227170 /srv/subversion/repositories/meow * Verifying repository metadata ... * Verifying metadata at revision 227170 ... * Verified revision 227170. -bash-4.1$ sudo svnadmin verify -r227185 /srv/subversion/repositories/meow * Verifying repository metadata ... * Verified revision 227185. On Fri, Feb 23, 2018 at 5:42 PM, Philip Martin wrote: > Philip Martin writes: > >> There are a couple of options: >> >> A) disable rep-caching by editing fsfs.conf inside the repository >> >> B) reset the mapping by deleting/renaming the file db/rep-cache.db >> inside the repository (but please rename rather than delete if you >> want to help us identify the corruption) >> >> Doing either of these should allow the commit to succeed. > > To verify the corruption start with the rep-cache: > > sqlite3 db/rep-cache.db "select * from rep_cache where > hash='db11617ef1454332336e00abc311d44bc698f3b3'" > > That should give you five numbers: the hash, the revision (604440), the > offset, the size and the expanded size. > > Then examine the revision file for r604440. It could be unpacked: > > grep -a "^text: 604440.*/_" db/revs/604/604440 > > or packed: > > grep -a "^text: 604440.*/_" db/revs/604.pack/pack > > One of the lines from grep should contain the hash and that line should > start: > > text: 604440 > > followed by three more numbers then hashes and other stuff. The three > numbers are the offset, size and expanded size and should match the > values from the rep-cache but I suspect the rep-cache has the wrong > offset. > > -- > Philip
Re: SHA-1 collision in repository?
On Tue, Feb 27, 2018 at 05:54 Philip Martin wrote: > Myria writes: > > > -bash-4.1$ sqlite3 rep-cache.db "select * from rep_cache where > > hash='db11617ef1454332336e00abc311d44bc698f3b3'" > > db11617ef1454332336e00abc311d44bc698f3b3|604440|34|134255|136680 > > > > The line from the grep -a command containing that hash is below. They > > all match. > > text: 604440 34 134255 136680 c9f4fabc4d093612fece03c339401058 > > db11617ef1454332336e00abc311d44bc698f3b3 604439-cyqm/_13 > > The rep-cache looks correct. There doesn't seem to be any corruption in > the repository: you confirmed that you could retreive the revision in > question, and that you could verify the revision, and the rep-cache > looks OK. So why is the commit that attempts to reuse the data in the > revision failing? I don't know :-( > > > In other news, unknown whether related to the current problem, my > > attempt to clone the repository to my local computer is failing: > > > > D:\>svnsync sync file:///d:/svnclone > > Transmitting file data > > > .svnsync: > > E16: SHA1 of reps '227170 153 193 57465 > > bb52be764a04d511ebb06e1889910dcf > > e6291ab119036eb783d0136afccdb3b445867364 227184-4vap/_4o' and '-1 0 > > 193 57465 bb52be764a04d511ebb06e1889910dcf > > e6291ab119036eb783d0136afccdb3b445867364 227184-4vap/_4o' matches > > (e6291ab119036eb783d0136afccdb3b445867364) but contents differ > > svnsync: E160004: Filesystem is corrupt > > svnsync: E200014: Checksum mismatch while reading representation: > >expected: bb52be764a04d511ebb06e1889910dcf > > actual: 80a10d37de91cadc604ba30e379651b3 > > > > This is odd, because revision 227185 (the revision it's trying to > > commit) verifies fine on the originating server: > > That's an error committing to the new repository on your local machine, > i.e. the problem is in the new repository not the repository on the > originating server. Can you run "svnadmin verify" on the new > repository? You may want to use -M to increase the cache size for the > verify command as the default is small. > > It would be odd for svnsync to create a corrupt repository, so I half > expect verify to report no problems. If that is the case it seems to be > the original pproblem again: an apparently valid repository with a > checksum error only on commit. So this problem is happening on two > repositories, on two machines with different OS. Not to mention that the two revisions complained about are unrelated, and 2/3 the repository history apart. One thing that's interesting is that the commit the svnsync failed on is a gigantic commit. It's 1.8 GB. Maybe that svnsync is failing because of a Subversion bug with huge files...? I started an svnadmin verify on my incomplete local copy last night, and no problems were reported when it finished this morning. I'll try again with this -M option you mention. I'll also start an svnsync from a Linux machine. I'm going to see how hard it would be to just copy the 43 GB repository directly. We'd have to shut down Subversion service during the copy, so it might be a while before I have a chance to. > > > -- > Philip >
Re: SHA-1 collision in repository?
On Wed, Feb 28, 2018 at 6:17 AM, Nico Kadel-Garcia wrote: > On Tue, Feb 27, 2018 at 4:09 PM, Myria wrote: > >> Not to mention that the two revisions complained about are unrelated, and >> 2/3 the repository history apart. >> >> One thing that's interesting is that the commit the svnsync failed on is a >> gigantic commit. It's 1.8 GB. Maybe that svnsync is failing because of a >> Subversion bug with huge files...? > > Hmm. Could 2 GB filesize limites be involved? > > When someone starts encountering this kind of issue with such large > commits, it leads me to think "what the heck was in that commit"? > There are various tools more likely to break when hammered that hard, > wuch as pre-commit hooks written carelessly in Python that try to > preload a hash with the contents of the file and just say "holy sone > of a !@#$, I'm out of resources!!!". Been there, done that, had to > explain the concept of reading a text file with a loop to the > programmer in question. > The error with the 2 GB file occurred when trying to replicate the repository in order to diagnose the original problem. The original problem does not involve large files. Also, I have no control over what was in the repository five years ago. The huge files were compiled versions of WebKit libraries. The alternative to committing these very large files would have been to quadruple the build times, because it takes four times longer to build WebKit than it does to build our project. In other news, I can now reproduce the huge file problem in TortoiseSVN committing to my "file:" partial copy of the repository. However, with SourceForge down due to a DDoS, I cannot get the source code to TortoiseSVN in order to debug it. This does mean that this is very likely to be a Subversion bug, probably something in 1.8.x or 1.9.x. The commit that prevented "svnsync" from working was probably during 1.6 or 1.7, which succeeded. Melissa
Re: SHA-1 collision in repository?
I just found out that the file causing the error from the large commit is not the large file - it's one of the smaller files, about 55 KB. If I commit that single smaller file from the large commit, it errors the same way as the original 227185 would. This is exactly like the original problem with committing the pixel shader. I managed to get the db/transactions/227184-4vb2.txn directory by breakpointing kernel32!DeleteFileW in TortoiseSVN (so I could get the contents before TortoiseSVN deleted them at failure). I don't know how they're useful, though. The only way I know how to proceed is to wait until the source code to TortoiseSVN is available so that I can debug it in Visual Studio. Is there something else I can do? On Thu, Mar 1, 2018 at 6:45 PM, Myria wrote: > On Wed, Feb 28, 2018 at 6:17 AM, Nico Kadel-Garcia wrote: >> On Tue, Feb 27, 2018 at 4:09 PM, Myria wrote: >> >>> Not to mention that the two revisions complained about are unrelated, and >>> 2/3 the repository history apart. >>> >>> One thing that's interesting is that the commit the svnsync failed on is a >>> gigantic commit. It's 1.8 GB. Maybe that svnsync is failing because of a >>> Subversion bug with huge files...? >> >> Hmm. Could 2 GB filesize limites be involved? >> >> When someone starts encountering this kind of issue with such large >> commits, it leads me to think "what the heck was in that commit"? >> There are various tools more likely to break when hammered that hard, >> wuch as pre-commit hooks written carelessly in Python that try to >> preload a hash with the contents of the file and just say "holy sone >> of a !@#$, I'm out of resources!!!". Been there, done that, had to >> explain the concept of reading a text file with a loop to the >> programmer in question. >> > > The error with the 2 GB file occurred when trying to replicate the > repository in order to diagnose the original problem. The original > problem does not involve large files. > > Also, I have no control over what was in the repository five years > ago. The huge files were compiled versions of WebKit libraries. The > alternative to committing these very large files would have been to > quadruple the build times, because it takes four times longer to build > WebKit than it does to build our project. > > > In other news, I can now reproduce the huge file problem in > TortoiseSVN committing to my "file:" partial copy of the repository. > However, with SourceForge down due to a DDoS, I cannot get the source > code to TortoiseSVN in order to debug it. > > This does mean that this is very likely to be a Subversion bug, > probably something in 1.8.x or 1.9.x. The commit that prevented > "svnsync" from working was probably during 1.6 or 1.7, which > succeeded. > > Melissa
Re: SHA-1 collision in repository?
The problem is identical on Windows command line, Windows TortoiseSVN, Ubuntu-Linux, Ubuntu-Linux on Windows, and macOS. I'm just bad at GDB. On Thu, Mar 1, 2018 at 9:09 PM, Nico Kadel-Garcia wrote: > On Thu, Mar 1, 2018 at 10:25 PM, Myria wrote: >> I just found out that the file causing the error from the large commit >> is not the large file - it's one of the smaller files, about 55 KB. >> If I commit that single smaller file from the large commit, it errors >> the same way as the original 227185 would. This is exactly like the >> original problem with committing the pixel shader. >> >> I managed to get the db/transactions/227184-4vb2.txn directory by >> breakpointing kernel32!DeleteFileW in TortoiseSVN (so I could get the >> contents before TortoiseSVN deleted them at failure). I don't know >> how they're useful, though. >> >> The only way I know how to proceed is to wait until the source code to >> TortoiseSVN is available so that I can debug it in Visual Studio. Is >> there something else I can do? > > Sorry that I've not been paying attention to every detail. Do you see > the same issues if you use the Subversion from CygWin, which is > proably a lot easier to recompile?
Re: SHA-1 collision in repository?
How can I dump out the two things that Subversion thinks have the same SHA-1 checksum but don't match? This seems to be rather difficult to do. That said, it's far more likely that there's a bug in Subversion than that we randomly collided SHA-1. On Sun, Mar 4, 2018 at 02:29 Philip Martin wrote: > Nathan Hartman writes: > > > Does this mean that content being committed to the repository is never > > elided based on the SHA hash alone but only after a fulltext > > verification that the content actually already exists in the > > repository? > > That's correct. Fulltext matching was added in 1.9.6 and 1.8.18, older > versions of Subversion relied on the SHA1 match alone. > > -- > Philip >
Fwd: SHA-1 collision in repository?
GMail keeps doing reply instead of reply all. I'm having to manually add the users list back now. Below is the thread I sent. -- Forwarded message -- From: Myria Date: Mon, Mar 5, 2018 at 6:37 PM Subject: Re: SHA-1 collision in repository? To: Philip Martin I now know where the checksum error happens, but not why. svn: E200014: Checksum mismatch while reading representation: expected: bb52be764a04d511ebb06e1889910dcf actual: 80a10d37de91cadc604ba30e379651b3 It's calculating the MD5 of only the first 16 KB of the input file and comparing against the MD5 of the entire file. The 16 KB number seems to be SVN__STREAM_CHUNK_SIZE. bb52be764a04d511ebb06e1889910dcf is the MD5 of the entire file. 80a10d37de91cadc604ba30e379651b3 is the MD5 of the first 16384 bytes. On Mon, Mar 5, 2018 at 5:23 PM, Myria wrote: > I managed to compile a subversion command line client with debugging > information and optimizations disabled, and can reproduce the problem > with GDB attached. > > Here is a backtrace at the time at which the error occurs. A few line > numbers in stream.c will be wrong by a few lines due to a few printf's > I added. > > #0 svn_checksum_mismatch_err (expected=0x7ffdcf00, > actual=0x7a0700a0, scratch_pool=0x7a070028, > fmt=0x7c259ac0 "Checksum mismatch while reading > representation") at subversion/libsvn_subr/checksum.c:638 > #1 0x7c2123de in rep_read_contents (baton=0x7a1f6190, > buf=0x7a1f66a8 "// "..., len=0x7ffdcf88) > at subversion/libsvn_fs_fs/cached_data.c:2062 > #2 0x7e5645fd in svn_stream_read_full (stream=0x7a1f6470, > buffer=0x7a1f66a8 "// "..., len=0x7ffdcf88) > at subversion/libsvn_subr/stream.c:193 > #3 0x7e5653f3 in svn_stream_contents_same2 > (same=0x7ffdd01c, stream1=0x7a1f6470, > stream2=0x7a1f6650, pool=0x7a1e0028) at > subversion/libsvn_subr/stream.c:589 > #4 0x7c247226 in get_shared_rep (old_rep=0x7ffdd188, > fs=0x7f601030, rep=0x7a0e20b8, > file=0x7a1e0390, offset=0, reps_hash=0x0, > result_pool=0x7f5e0028, scratch_pool=0x7a1e0028) > at subversion/libsvn_fs_fs/transaction.c:2280 > #5 0x7c247734 in rep_write_contents_close > (baton=0x7a232ff0) at subversion/libsvn_fs_fs/transaction.c:2370 > #6 0x7e56492b in svn_stream_close (stream=0x7a233140) at > subversion/libsvn_subr/stream.c:274 > #7 0x7e841001 in apply_window (window=0x0, > baton=0x7a1000a0) at subversion/libsvn_delta/text_delta.c:732 > #8 0x7c2520d2 in window_consumer (window=0x0, > baton=0x7f5f1ab8) at subversion/libsvn_fs_fs/tree.c:2935 > #9 0x7e8405ef in svn_txdelta_run (source=0x7f5f1a18, > target=0x7f5f1298, > handler=0x7c25209f , > handler_baton=0x7f5f1ab8, checksum_kind=svn_checksum_md5, > checksum=0x7ffdd458, cancel_func=0x0, cancel_baton=0x0, > result_pool=0x7f5e0028, > scratch_pool=0x7f5e0028) at subversion/libsvn_delta/text_delta.c:454 > #10 0x7ee98a57 in svn_wc__internal_transmit_text_deltas (tempfile=0x0, > new_text_base_md5_checksum=0x7ffdd5b0, > new_text_base_sha1_checksum=0x7ffdd5b8, db=0x7f6c17d8, > local_abspath=0x7f672d08 > "/mnt/d/svntest/repository/directory/Redacted.cpp", > fulltext=0, editor=0x7f673700, file_baton=0x7f510110, > result_pool=0x7f6c0028, > scratch_pool=0x7f5e0028) at subversion/libsvn_wc/adm_crawler.c:1109 > #11 0x7ee98d68 in svn_wc_transmit_text_deltas3 > (new_text_base_md5_checksum=0x7ffdd5b0, > new_text_base_sha1_checksum=0x7ffdd5b8, wc_ctx=0x7f6c17c0, > local_abspath=0x7f672d08 > "/mnt/d/svntest/repository/directory/Redacted.cpp", > fulltext=0, editor=0x7f673700, file_baton=0x7f510110, > result_pool=0x7f6c0028, > scratch_pool=0x7f5e0028) at subversion/libsvn_wc/adm_crawler.c:1199 > #12 0x7f18eb12 in svn_client__do_commit ( > base_url=0x7f6142c0 "file:///mnt/d/svntest/repository/directory", > commit_items=0x7f672c48, editor=0x7f673700, > edit_baton=0x7f6300a0, > notify_path_prefix=0x7f672900 "/mnt/d/svntest/repository", > sha1_checksums=0x7ffdd750, > ctx=0x7f6c16f0, result_pool=0x7f6c0028, > scratch_pool=0x7f650028) > at subversion/libsvn_client/commit_util.c:1920 > #13 0x7f18a5f9 in svn_client_commit6 (targets=0x7f670a18, > depth=svn_depth_infinity, keep_locks=0, > keep_changelists=0, commit_as_operations=1, > include_file_externals=0, include_dir_externals=0, > changelists=0x7f6c0780, revprop_t
Re: SHA-1 collision in repository?
When Subversion gets to this part of rep_read_contents, rb->len is 16384. It thinks it is then done reading the entire file, and can compare the checksum, but it's not done with the file yet. rb->rep.expanded_size is correct at the error point, 57465. rep_read_get_baton sets rb->len to rb->rep.expanded_size, so I don't know why the value changed by the time rep_read_contents got its paws on the baton. I saw that rb->len might be getting clobbered by rep_read_content's call to build_rep_list, which has the following line of code: *expanded_size = first_rep->expanded_size; expanded_size is &rep->len. I haven't had a chance to debug this area yet, so it might be fine. I verified with sqlite3 that the rep-cache.db has the correct size (57465): $ sqlite3 /mnt/d/svnclone/db/rep-cache.db "select * from rep_cache where hash='e6291ab119036eb783d0136afccdb3b445867364'" e6291ab119036eb783d0136afccdb3b445867364|227170|153|193|57465 On Mon, Mar 5, 2018 at 6:56 PM, Myria wrote: > GMail keeps doing reply instead of reply all. I'm having to manually > add the users list back now. > > Below is the thread I sent. > > > -- Forwarded message -- > From: Myria > Date: Mon, Mar 5, 2018 at 6:37 PM > Subject: Re: SHA-1 collision in repository? > To: Philip Martin > > > I now know where the checksum error happens, but not why. > > svn: E200014: Checksum mismatch while reading representation: >expected: bb52be764a04d511ebb06e1889910dcf > actual: 80a10d37de91cadc604ba30e379651b3 > > It's calculating the MD5 of only the first 16 KB of the input file and > comparing against the MD5 of the entire file. The 16 KB number seems > to be SVN__STREAM_CHUNK_SIZE. > > bb52be764a04d511ebb06e1889910dcf is the MD5 of the entire file. > 80a10d37de91cadc604ba30e379651b3 is the MD5 of the first 16384 bytes. > > > On Mon, Mar 5, 2018 at 5:23 PM, Myria wrote: >> I managed to compile a subversion command line client with debugging >> information and optimizations disabled, and can reproduce the problem >> with GDB attached. >> >> Here is a backtrace at the time at which the error occurs. A few line >> numbers in stream.c will be wrong by a few lines due to a few printf's >> I added. >> >> #0 svn_checksum_mismatch_err (expected=0x7ffdcf00, >> actual=0x7a0700a0, scratch_pool=0x7a070028, >> fmt=0x7c259ac0 "Checksum mismatch while reading >> representation") at subversion/libsvn_subr/checksum.c:638 >> #1 0x7c2123de in rep_read_contents (baton=0x7a1f6190, >> buf=0x7a1f66a8 "// "..., len=0x7ffdcf88) >> at subversion/libsvn_fs_fs/cached_data.c:2062 >> #2 0x7e5645fd in svn_stream_read_full (stream=0x7a1f6470, >> buffer=0x7a1f66a8 "// "..., len=0x7ffdcf88) >> at subversion/libsvn_subr/stream.c:193 >> #3 0x7e5653f3 in svn_stream_contents_same2 >> (same=0x7ffdd01c, stream1=0x7a1f6470, >> stream2=0x7a1f6650, pool=0x7a1e0028) at >> subversion/libsvn_subr/stream.c:589 >> #4 0x7c247226 in get_shared_rep (old_rep=0x7ffdd188, >> fs=0x7f601030, rep=0x7a0e20b8, >> file=0x7a1e0390, offset=0, reps_hash=0x0, >> result_pool=0x7f5e0028, scratch_pool=0x7a1e0028) >> at subversion/libsvn_fs_fs/transaction.c:2280 >> #5 0x7c247734 in rep_write_contents_close >> (baton=0x7a232ff0) at subversion/libsvn_fs_fs/transaction.c:2370 >> #6 0x7e56492b in svn_stream_close (stream=0x7a233140) at >> subversion/libsvn_subr/stream.c:274 >> #7 0x7e841001 in apply_window (window=0x0, >> baton=0x7a1000a0) at subversion/libsvn_delta/text_delta.c:732 >> #8 0x7c2520d2 in window_consumer (window=0x0, >> baton=0x7f5f1ab8) at subversion/libsvn_fs_fs/tree.c:2935 >> #9 0x7e8405ef in svn_txdelta_run (source=0x7f5f1a18, >> target=0x7f5f1298, >> handler=0x7c25209f , >> handler_baton=0x7f5f1ab8, checksum_kind=svn_checksum_md5, >> checksum=0x7ffdd458, cancel_func=0x0, cancel_baton=0x0, >> result_pool=0x7f5e0028, >> scratch_pool=0x7f5e0028) at subversion/libsvn_delta/text_delta.c:454 >> #10 0x7ee98a57 in svn_wc__internal_transmit_text_deltas >> (tempfile=0x0, >> new_text_base_md5_checksum=0x7ffdd5b0, >> new_text_base_sha1_checksum=0x7ffdd5b8, db=0x7f6c17d8, >> local_abspath=0x7f672d08 >> "/mnt/d/svntest/repository/directory/Redacted.cpp", >> fulltext=0, editor=0x7f673700, file_ba
Re: SHA-1 collision in repository?
Final email for the night >.< What's clobbering the expanded_size is this in build_rep_list: /* The value as stored in the data struct. 0 is either for unknown length or actually zero length. */ *expanded_size = first_rep->expanded_size; first_rep->expanded_size here is zero for the last call to this function before the error. In every other case before the error, the two values are equal. Then this code executes: if (*expanded_size == 0) if (rep_header->type == svn_fs_fs__rep_plain || first_rep->size != 4) *expanded_size = first_rep->size; first_rep->size is 16384, and this is why rb->len becomes 16384, leading to the error. I don't know what all this code is doing, but that's the proximate cause of the failure. Melissa On Mon, Mar 5, 2018 at 7:41 PM, Myria wrote: > When Subversion gets to this part of rep_read_contents, rb->len is > 16384. It thinks it is then done reading the entire file, and can > compare the checksum, but it's not done with the file yet. > > rb->rep.expanded_size is correct at the error point, 57465. > rep_read_get_baton sets rb->len to rb->rep.expanded_size, so I don't > know why the value changed by the time rep_read_contents got its paws > on the baton. I saw that rb->len might be getting clobbered by > rep_read_content's call to build_rep_list, which has the following > line of code: > > *expanded_size = first_rep->expanded_size; > > expanded_size is &rep->len. I haven't had a chance to debug this area > yet, so it might be fine. > > I verified with sqlite3 that the rep-cache.db has the correct size (57465): > > $ sqlite3 /mnt/d/svnclone/db/rep-cache.db "select * from rep_cache > where hash='e6291ab119036eb783d0136afccdb3b445867364'" > e6291ab119036eb783d0136afccdb3b445867364|227170|153|193|57465 > > > On Mon, Mar 5, 2018 at 6:56 PM, Myria wrote: >> GMail keeps doing reply instead of reply all. I'm having to manually >> add the users list back now. >> >> Below is the thread I sent. >> >> >> -- Forwarded message -- >> From: Myria >> Date: Mon, Mar 5, 2018 at 6:37 PM >> Subject: Re: SHA-1 collision in repository? >> To: Philip Martin >> >> >> I now know where the checksum error happens, but not why. >> >> svn: E200014: Checksum mismatch while reading representation: >>expected: bb52be764a04d511ebb06e1889910dcf >> actual: 80a10d37de91cadc604ba30e379651b3 >> >> It's calculating the MD5 of only the first 16 KB of the input file and >> comparing against the MD5 of the entire file. The 16 KB number seems >> to be SVN__STREAM_CHUNK_SIZE. >> >> bb52be764a04d511ebb06e1889910dcf is the MD5 of the entire file. >> 80a10d37de91cadc604ba30e379651b3 is the MD5 of the first 16384 bytes. >> >> >> On Mon, Mar 5, 2018 at 5:23 PM, Myria wrote: >>> I managed to compile a subversion command line client with debugging >>> information and optimizations disabled, and can reproduce the problem >>> with GDB attached. >>> >>> Here is a backtrace at the time at which the error occurs. A few line >>> numbers in stream.c will be wrong by a few lines due to a few printf's >>> I added. >>> >>> #0 svn_checksum_mismatch_err (expected=0x7ffdcf00, >>> actual=0x7a0700a0, scratch_pool=0x7a070028, >>> fmt=0x7c259ac0 "Checksum mismatch while reading >>> representation") at subversion/libsvn_subr/checksum.c:638 >>> #1 0x7c2123de in rep_read_contents (baton=0x7a1f6190, >>> buf=0x7a1f66a8 "// "..., len=0x7ffdcf88) >>> at subversion/libsvn_fs_fs/cached_data.c:2062 >>> #2 0x7e5645fd in svn_stream_read_full (stream=0x7a1f6470, >>> buffer=0x7a1f66a8 "// "..., len=0x7ffdcf88) >>> at subversion/libsvn_subr/stream.c:193 >>> #3 0x7e5653f3 in svn_stream_contents_same2 >>> (same=0x7ffdd01c, stream1=0x7a1f6470, >>> stream2=0x7a1f6650, pool=0x7a1e0028) at >>> subversion/libsvn_subr/stream.c:589 >>> #4 0x7c247226 in get_shared_rep (old_rep=0x7ffdd188, >>> fs=0x7f601030, rep=0x7a0e20b8, >>> file=0x7a1e0390, offset=0, reps_hash=0x0, >>> result_pool=0x7f5e0028, scratch_pool=0x7a1e0028) >>> at subversion/libsvn_fs_fs/transaction.c:2280 >>> #5 0x7c247734 in rep_write_contents_close >>> (baton=0x7a232ff0) at subversion/libsvn_fs_fs/transaction.c:2370 >&
Re: SHA-1 collision in repository?
During rep_write_contents_close, there is a call to get_shared_rep. get_shared_rep calls svn_fs_fs__get_contents_from_file, which has the code pasted below. /* Build the representation list (delta chain). */ if (rh->type == svn_fs_fs__rep_plain) { rb->rs_list = apr_array_make(pool, 0, sizeof(rep_state_t *)); rb->src_state = rs; } else if (rh->type == svn_fs_fs__rep_self_delta) { rb->rs_list = apr_array_make(pool, 1, sizeof(rep_state_t *)); APR_ARRAY_PUSH(rb->rs_list, rep_state_t *) = rs; rb->src_state = NULL; } else { representation_t next_rep = { 0 }; /* skip "SVNx" diff marker */ rs->current = 4; /* REP's base rep is inside a proper revision. * It can be reconstructed in the usual way. */ next_rep.revision = rh->base_revision; next_rep.item_index = rh->base_item_index; next_rep.size = rh->base_length; svn_fs_fs__id_txn_reset(&next_rep.txn_id); SVN_ERR(build_rep_list(&rb->rs_list, &rb->base_window, &rb->src_state, &rb->len, rb->fs, &next_rep, rb->filehandle_pool)); The bug is occurring because build_rep_list is being called with first_rep->expanded_size set to zero. Well, the reason it's zero is because first_rep is the second to last parameter to build_rep_list, and the above code initialized expanded_size to zero: representation_t next_rep = { 0 }; Does the code just need this? I don't know this call >.< next_rep.expanded_size = rb->rep.expanded_size; Melissa On Wed, Mar 7, 2018 at 9:02 AM, Nathan Hartman wrote: > On Mar 5, 2018, at 10:54 PM, Myria wrote: >> >> Final email for the night >.< >> >> What's clobbering the expanded_size is this in build_rep_list: >> >> /* The value as stored in the data struct. >> 0 is either for unknown length or actually zero length. */ >> *expanded_size = first_rep->expanded_size; >> >> first_rep->expanded_size here is zero for the last call to this >> function before the error. In every other case before the error, the >> two values are equal. >> >> Then this code executes: >> >> if (*expanded_size == 0) >>if (rep_header->type == svn_fs_fs__rep_plain || first_rep->size != 4) >> *expanded_size = first_rep->size; >> >> first_rep->size is 16384, and this is why rb->len becomes 16384, >> leading to the error. >> >> I don't know what all this code is doing, but that's the proximate >> cause of the failure. >> >> Melissa > > Has it been possible to determine what is setting expanded_size to 0 before > that last call? I wonder if there is specific logic that decides (perhaps > incorrectly?) to do that? > > Alternatively is it being clobbered by some out-of-bounds access, > use-after-free, or another such issue? > > Is it possible in your debugger setup to determine the address of that > variable and set a breakpoint that triggers when that memory is written? (It > may be called a watchpoint?) > > Which leads me to another thought: if you can set such a breakpoint / > watchpoint and it does not trigger, then this expanded_size might not be the > same instance in that final call. Perhaps a shallow copy of an enclosing > structure is made which leaves out the known size and sets it to 0 for some > reason, and that final call is given that (incomplete) copy. > > Caveat: I am not familiar with the codebase but these are my thoughts based > on adventures in other code bases. >
Re: SHA-1 collision in repository?
The fulltext whose checksum is 80a10d37de91cadc604ba30e379651b3 I found out is the first 16384 bytes of the file (see other parts of this thread). 16384 is SVN__STREAM_CHUNK_SIZE. On Fri, Mar 2, 2018 at 3:07 PM, Daniel Shahaf wrote: > Daniel Shahaf wrote on Fri, Mar 02, 2018 at 22:57:51 +: >> Myria wrote on Mon, Feb 26, 2018 at 13:41:05 -0800: >> > In other news, unknown whether related to the current problem, my >> > attempt to clone the repository to my local computer is failing: >> > >> > D:\>svnsync sync file:///d:/svnclone >> > Transmitting file data >> > .svnsync: >> > E16: SHA1 of reps '227170 153 193 57465 >> > bb52be764a04d511ebb06e1889910dcf >> > e6291ab119036eb783d0136afccdb3b445867364 227184-4vap/_4o' and '-1 0 >> > 193 57465 bb52be764a04d511ebb06e1889910dcf >> > e6291ab119036eb783d0136afccdb3b445867364 227184-4vap/_4o' matches >> > (e6291ab119036eb783d0136afccdb3b445867364) but contents differ >> > svnsync: E160004: Filesystem is corrupt >> > svnsync: E200014: Checksum mismatch while reading representation: >> >expected: bb52be764a04d511ebb06e1889910dcf >> > actual: 80a10d37de91cadc604ba30e379651b3 >> >> When this error happens, could you print the first lines of the two reps >> identical? The first line is "PLAIN\n" or "DELTA\n" or "DELTA 42 43 44\n". >> (I wonder whether we have some stray whitespace that's transparent to parsing >> but breaks checksums.) > > In second thought I'm not sure this makes sense. A better question is: can we > obtain the fulltext whose checksum is 80a10d37de91cadc604ba30e379651b3? > >> Do you happen to have a copy of the repository lying around that you can run >> 'grep -a 80a10d37de91cadc604ba30e379651b3 db/revs/{0,1,2,...,227}' on? >> Admittedly that's a bit of a shot in the dark. >> >> Cheers, >> >> Daniel