Files with identical SHA1 breaks the repo

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Files with identical SHA1 breaks the repo

sunny256
Earlier today, the first known SHA1 collision was presented:

  https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
  http://shattered.io/

It turns out that adding these two PDF files to a svn repository makes
it impossible to checkout the repository properly if both files exist in
the repo. This script demonstrates what happens:

--- CUT
#!/bin/sh

if test -e repo -o -e wc1 -o -e wc2; then
  echo repo, wc1 or wc2 already exist >&2
  exit 1
fi
svnadmin create repo
svn co file://$(pwd)/repo wc1
cd wc1
wget https://shattered.it/static/shattered-1.pdf
wget https://shattered.it/static/shattered-2.pdf
svn add *.pdf
svn ci -m "Add files with identical SHA1"
cd ..
svn co file://$(pwd)/repo wc2
--- CUT

This happens:

  $ ./runme
  Checked out revision 0.
  --2017-02-23 20:41:05--  https://shattered.it/static/shattered-1.pdf
  Resolving shattered.it (shattered.it)... 216.239.38.21, 216.239.36.21, 216.239.32.21, ...
  Connecting to shattered.it (shattered.it)|216.239.38.21|:443... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: 422435 (413K) [application/pdf]
  Saving to: ‘shattered-1.pdf’

  shattered-1.pdf   100%[===============>] 412.53K  --.-KB/s   in 0.04s

  2017-02-23 20:41:05 (10.9 MB/s) - ‘shattered-1.pdf’ saved [422435/422435]

  --2017-02-23 20:41:05--  https://shattered.it/static/shattered-2.pdf
  Resolving shattered.it (shattered.it)... 216.239.38.21, 216.239.36.21, 216.239.32.21, ...
  Connecting to shattered.it (shattered.it)|216.239.38.21|:443... connected.
  HTTP request sent, awaiting response... 200 OK
  Length: 422435 (413K) [application/pdf]
  Saving to: ‘shattered-2.pdf’

  shattered-2.pdf   100%[===============>] 412.53K  --.-KB/s   in 0.04s

  2017-02-23 20:41:06 (9.03 MB/s) - ‘shattered-2.pdf’ saved [422435/422435]

  A  (bin)  shattered-1.pdf
  A  (bin)  shattered-2.pdf
  Adding  (bin)  shattered-1.pdf
  Adding  (bin)  shattered-2.pdf
  Transmitting file data ..
  Committed revision 1.
  A    wc2/shattered-1.pdf
  svn: E200014: Checksum mismatch for '/home/sunny/src/git/svn-sha1/wc2/shattered-2.pdf':
     expected:  5bd9d8cabc46041579a311230539b8d1
       actual:  ee4aa52b139d925f8d8884402b0a750c

  $

Tested with svn-1.8.10, which is the default svn in Debian 8.7, newest
stable. shattered-1.pdf is checked out, but not shattered-2.pdf.

This is the only known SHA-1 collision at the moment, but Google will
release the collision code in 90 days, so we can expect this not to last
forever.

Regards,
Øyvind

+-| Øyvind A. Holm <[hidden email]> - N 60.37604° E 5.33339° |-+
| OpenPGP: 0xFB0CBEE894A506E5 - http://www.sunbase.org/pubkey.asc |
| Fingerprint: A006 05D6 E676 B319 55E2  E77E FB0C BEE8 94A5 06E5 |
+------------| cb5c25a6-fa01-11e6-8cd8-db5caa6d21d3 |-------------+

signature.asc (188 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Stefan Sperling-9
On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
> Earlier today, the first known SHA1 collision was presented:
>
>   https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
>   http://shattered.io/
>
> It turns out that adding these two PDF files to a svn repository makes
> it impossible to checkout the repository properly if both files exist in
> the repo. This script demonstrates what happens:

As a workaround, disable rep-sharing and the error goes away.

[[[
#!/bin/sh

if test -e repo -o -e wc1 -o -e wc2; then
  echo repo, wc1 or wc2 already exist >&2
  exit 1
fi
svnadmin create repo
sed -i -e 's/# enable-rep-sharing = true/enable-rep-sharing = false/' repo/db/fsfs.conf
svn co file://$(pwd)/repo wc1
cd wc1
wget https://shattered.it/static/shattered-1.pdf
wget https://shattered.it/static/shattered-2.pdf
svn add *.pdf
svn ci -m "Add files with identical SHA1"
cd ..
svn co file://$(pwd)/repo wc2
]]]

A  (bin)  shattered-1.pdf
A  (bin)  shattered-2.pdf            
Adding  (bin)  shattered-1.pdf            
Adding  (bin)  shattered-2.pdf
Transmitting file data ..done        
Committing transaction...                            
Committed revision 1.
A    wc2/shattered-1.pdf
A    wc2/shattered-2.pdf                          
Checked out revision 1.
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Branko Čibej
On 24.02.2017 11:51, Stefan Sperling wrote:

> On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
>> Earlier today, the first known SHA1 collision was presented:
>>
>>   https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
>>   http://shattered.io/
>>
>> It turns out that adding these two PDF files to a svn repository makes
>> it impossible to checkout the repository properly if both files exist in
>> the repo. This script demonstrates what happens:
> As a workaround, disable rep-sharing and the error goes away.

This is precisely why rep-sharing is disabled by default when the
repository is created.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Daniel Shahaf-2
Branko Čibej wrote on Fri, Feb 24, 2017 at 12:18:05 +0100:

> On 24.02.2017 11:51, Stefan Sperling wrote:
> > On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
> >> Earlier today, the first known SHA1 collision was presented:
> >>
> >>   https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
> >>   http://shattered.io/
> >>
> >> It turns out that adding these two PDF files to a svn repository makes
> >> it impossible to checkout the repository properly if both files exist in
> >> the repo. This script demonstrates what happens:
> > As a workaround, disable rep-sharing and the error goes away.
>
> This is precisely why rep-sharing is disabled by default when the
> repository is created.

It's _enabled_ by default:

  /* Initialize ffd->rep_sharing_allowed. */
  if (ffd->format >= SVN_FS_FS__MIN_REP_SHARING_FORMAT)
    SVN_ERR(svn_config_get_bool(config, &ffd->rep_sharing_allowed,
                                CONFIG_SECTION_REP_SHARING,
                                CONFIG_OPTION_ENABLE_REP_SHARING, TRUE));
  else
    ffd->rep_sharing_allowed = FALSE;

Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Stefan Hett-2
On 2/24/2017 12:28 PM, Daniel Shahaf wrote:

> Branko Čibej wrote on Fri, Feb 24, 2017 at 12:18:05 +0100:
>> On 24.02.2017 11:51, Stefan Sperling wrote:
>>> On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
>>>> Earlier today, the first known SHA1 collision was presented:
>>>>
>>>>    https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
>>>>    http://shattered.io/
>>>>
>>>> It turns out that adding these two PDF files to a svn repository makes
>>>> it impossible to checkout the repository properly if both files exist in
>>>> the repo. This script demonstrates what happens:
>>> As a workaround, disable rep-sharing and the error goes away.
>> This is precisely why rep-sharing is disabled by default when the
>> repository is created.
> It's _enabled_ by default:
>
>    /* Initialize ffd->rep_sharing_allowed. */
>    if (ffd->format >= SVN_FS_FS__MIN_REP_SHARING_FORMAT)
>      SVN_ERR(svn_config_get_bool(config, &ffd->rep_sharing_allowed,
>                                  CONFIG_SECTION_REP_SHARING,
>                                  CONFIG_OPTION_ENABLE_REP_SHARING, TRUE));
>    else
>      ffd->rep_sharing_allowed = FALSE;
I take it that Brane is refering to SVN 1.8 (where it is disabled by
default, if I'm not mistaken).

--
Regards,
Stefan Hett

Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Stefan Hett-2
On 2/24/2017 12:48 PM, Stefan Hett wrote:

> On 2/24/2017 12:28 PM, Daniel Shahaf wrote:
>> Branko Čibej wrote on Fri, Feb 24, 2017 at 12:18:05 +0100:
>>> On 24.02.2017 11:51, Stefan Sperling wrote:
>>>> On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
>>>>> Earlier today, the first known SHA1 collision was presented:
>>>>>
>>>>> https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
>>>>>    http://shattered.io/
>>>>>
>>>>> It turns out that adding these two PDF files to a svn repository
>>>>> makes
>>>>> it impossible to checkout the repository properly if both files
>>>>> exist in
>>>>> the repo. This script demonstrates what happens:
>>>> As a workaround, disable rep-sharing and the error goes away.
>>> This is precisely why rep-sharing is disabled by default when the
>>> repository is created.
>> It's _enabled_ by default:
>>
>>    /* Initialize ffd->rep_sharing_allowed. */
>>    if (ffd->format >= SVN_FS_FS__MIN_REP_SHARING_FORMAT)
>>      SVN_ERR(svn_config_get_bool(config, &ffd->rep_sharing_allowed,
>>                                  CONFIG_SECTION_REP_SHARING,
>> CONFIG_OPTION_ENABLE_REP_SHARING, TRUE));
>>    else
>>      ffd->rep_sharing_allowed = FALSE;
> I take it that Brane is refering to SVN 1.8 (where it is disabled by
> default, if I'm not mistaken).

Disregard this. Was mixing up repository sharing with repvprop-caching.

--
Regards,
Stefan Hett

Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Branko Čibej
In reply to this post by Daniel Shahaf-2
On 24.02.2017 12:28, Daniel Shahaf wrote:

> Branko Čibej wrote on Fri, Feb 24, 2017 at 12:18:05 +0100:
>> On 24.02.2017 11:51, Stefan Sperling wrote:
>>> On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
>>>> Earlier today, the first known SHA1 collision was presented:
>>>>
>>>>   https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
>>>>   http://shattered.io/
>>>>
>>>> It turns out that adding these two PDF files to a svn repository makes
>>>> it impossible to checkout the repository properly if both files exist in
>>>> the repo. This script demonstrates what happens:
>>> As a workaround, disable rep-sharing and the error goes away.
>> This is precisely why rep-sharing is disabled by default when the
>> repository is created.
> It's _enabled_ by default:
>
>   /* Initialize ffd->rep_sharing_allowed. */
>   if (ffd->format >= SVN_FS_FS__MIN_REP_SHARING_FORMAT)
>     SVN_ERR(svn_config_get_bool(config, &ffd->rep_sharing_allowed,
>                                 CONFIG_SECTION_REP_SHARING,
>                                 CONFIG_OPTION_ENABLE_REP_SHARING, TRUE));
>   else
>     ffd->rep_sharing_allowed = FALSE;

*WHAT*

Since when?

I see now that the default fsfs.conf says the same thing, but this is crazy.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Paul Hammant-3
In reply to this post by sunny256
Linus weighs on on Git's use of SHA1 (may be interesting)

Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Andreas Stieger
> Linus weighs on on Git's use of SHA1 (may be interesting)
> http://marc.info/?l=git&m=148787047422954&w=2

It affects svn more due to it's use of sha1 for versioned entities (here: files) rather than trees.

Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Daniel Shahaf-2
In reply to this post by Branko Čibej
Branko Čibej wrote on Fri, Feb 24, 2017 at 12:59:16 +0100:

> On 24.02.2017 12:28, Daniel Shahaf wrote:
> > Branko Čibej wrote on Fri, Feb 24, 2017 at 12:18:05 +0100:
> >> This is precisely why rep-sharing is disabled by default when the
> >> repository is created.
> >
> > It's _enabled_ by default:
>
> *WHAT*
>
> Since when?

Since rep-sharing was first released (= 1.6.0).

> I see now that the default fsfs.conf says the same thing, but this is crazy.
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Branko Čibej
In reply to this post by Andreas Stieger
On 24.02.2017 13:41, Andreas Stieger wrote:
>> Linus weighs on on Git's use of SHA1 (may be interesting)
>> http://marc.info/?l=git&m=148787047422954&w=2
> It affects svn more due to it's use of sha1 for versioned entities (here: files) rather than trees.

Except that content indexing in Subversion is an optional implementation
detail of one repository implementation whereas in Git the SHA1 unique
identity of a commit is built into the design.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Branko Čibej
In reply to this post by Daniel Shahaf-2
On 24.02.2017 14:52, Daniel Shahaf wrote:

> Branko Čibej wrote on Fri, Feb 24, 2017 at 12:59:16 +0100:
>> On 24.02.2017 12:28, Daniel Shahaf wrote:
>>> Branko Čibej wrote on Fri, Feb 24, 2017 at 12:18:05 +0100:
>>>> This is precisely why rep-sharing is disabled by default when the
>>>> repository is created.
>>> It's _enabled_ by default:
>> *WHAT*
>>
>> Since when?
> Since rep-sharing was first released (= 1.6.0).
>
>> I see now that the default fsfs.conf says the same thing, but this is crazy.

I'm pretty sure we should switch the default to off, and not just
because SHA1 has been cracked. Our promise has always been data
integrity: content indexing that does not use the whole text as key
degrades that promise.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Stefan Hett-2
In reply to this post by sunny256
On 2/23/2017 9:02 PM, Øyvind A. Holm wrote:
> This is the only known SHA-1 collision at the moment, but Google will
> release the collision code in 90 days, so we can expect this not to last
> forever.
Reading up on that in an article on a German magazine [1] clarifies that
the effort to create that hash still quite large (6500 CPU years + 100
GPU years to calculate the collision). So this relativates the impact a bit.
Certainly I'm not trying to say that the situation on SVN's side
should/could not be improved, though.

[1]
https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html

--
Regards,
Stefan Hett

Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Andreas Stieger
Hi,

"Stefan Hett" wrote:

> On 2/23/2017 9:02 PM, Øyvind A. Holm wrote:
> > This is the only known SHA-1 collision at the moment, but Google will
> > release the collision code in 90 days, so we can expect this not to last
> > forever.
> Reading up on that in an article on a German magazine [1] clarifies that
> the effort to create that hash still quite large (6500 CPU years + 100
> GPU years to calculate the collision). So this relativates the impact a bit.
> Certainly I'm not trying to say that the situation on SVN's side
> should/could not be improved, though.
>
> [1]
> https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html

An occurrence of this issue in a production repository with the published PDFs:
https://bugs.webkit.org/show_bug.cgi?id=168774#c29

Andreas
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Stefan Sperling
On Fri, Feb 24, 2017 at 04:17:44PM +0100, Andreas Stieger wrote:

> Hi,
>
> "Stefan Hett" wrote:
> > On 2/23/2017 9:02 PM, Øyvind A. Holm wrote:
> > > This is the only known SHA-1 collision at the moment, but Google will
> > > release the collision code in 90 days, so we can expect this not to last
> > > forever.
> > Reading up on that in an article on a German magazine [1] clarifies that
> > the effort to create that hash still quite large (6500 CPU years + 100
> > GPU years to calculate the collision). So this relativates the impact a bit.
> > Certainly I'm not trying to say that the situation on SVN's side
> > should/could not be improved, though.
> >
> > [1]
> > https://www.heise.de/newsticker/meldung/Todesstoss-Forscher-zerschmettern-SHA-1-3633589.html
>
> An occurrence of this issue in a production repository with the published PDFs:
> https://bugs.webkit.org/show_bug.cgi?id=168774#c29
>
> Andreas

Well, what did they expect? Did they expect that all software which is
part of their toolchain has ever been tested with files that produce
a SHA1 collision? Nobody had such files until yesterday...
They should have tried this on a test repository first.

Anyway, so SVN has multiple problems with SHA1 collisions.

One problem is that the libsvn_wc code does the wrong thing when SHA1
hashes match but MD5 hashes do not. The error on checkout is happening
because pristines are keyed on SHA1, and only one pristine is saved:

$ ls .svn/pristine/
38/
$ ls .svn/pristine/38/
38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
$ sha1 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
SHA1 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = 38762cf7f55934b34d179ae6a4c80cadccbb7f0a
$ md5 .svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base
MD5 (.svn/pristine/38/38762cf7f55934b34d179ae6a4c80cadccbb7f0a.svn-base) = ee4aa52b139d925f8d8884402b0a750c

By design, the current working copy format cannot store both of these PDFs.
This is hard to solve without a working copy format bump :-/
The best fix would probably be moving libsvn_wc to SHA256 or SHA3.

FSFS looks alright. The node records for these two PDFs look like this:

[[[
id: 0-1.0.r1/5
type: file
count: 0
text: 1 3 381130 422435 ee4aa52b139d925f8d8884402b0a750c 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_3
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-1.pdf
copyroot: 0 /

id: 2-1.0.r1/6
type: file
count: 0
text: 1 3 381130 422435 5bd9d8cabc46041579a311230539b8d1 38762cf7f55934b34d179ae6a4c80cadccbb7f0a 0-0/_4
props: 1 4 56 44 cfa89e28d5298bc69638e814df40c883
cpath: /shattered-2.pdf
copyroot: 0 /
]]]

We should look into making the FSFS code make use of both checksums to
handle ambiguities. It seems about time to add SHA256 and/or SHA3 as well.

'svnadmin load' fails, too:

$ svnadmin create repo2
$ vi repo
repo/   repo2/
$ vi repo2/db/fs
fs-type    fsfs.conf
$ vi repo2/db/fsfs.conf # disable rep-sharing
$ svnadmin dump repo > repo.dump
* Dumped revision 0.
* Dumped revision 1.
$ svnadmin load repo2 < repo.dump
<<< Started new transaction, based on original revision 1
     * editing path : shattered-1.pdf ... done.
     * editing path : shattered-2.pdf ...subversion/libsvn_repos/load.c:709,
subversion/libsvn_repos/load.c:351,
subversion/libsvn_subr/stream.c:273,
subversion/libsvn_subr/checksum.c:658: (apr_err=SVN_ERR_CHECKSUM_MISMATCH)
svnadmin: E200014: Checksum mismatch for '/shattered-2.pdf':
   expected:  5bd9d8cabc46041579a311230539b8d1
     actual:  ee4aa52b139d925f8d8884402b0a750c

Again, the dump file looks OK. This problem occurs somewhere in the
commit processing path. No time to debug this ATM.
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Mark Phippard-3
In reply to this post by Stefan Sperling-9
Someone may want to jump in here:


Mark


On Feb 24, 2017, at 5:51 AM, Stefan Sperling <[hidden email]> wrote:

On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
Earlier today, the first known SHA1 collision was presented:

 https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
 http://shattered.io/

It turns out that adding these two PDF files to a svn repository makes
it impossible to checkout the repository properly if both files exist in
the repo. This script demonstrates what happens:

As a workaround, disable rep-sharing and the error goes away.

[[[
#!/bin/sh

if test -e repo -o -e wc1 -o -e wc2; then
 echo repo, wc1 or wc2 already exist >&2
 exit 1
fi
svnadmin create repo
sed -i -e 's/# enable-rep-sharing = true/enable-rep-sharing = false/' repo/db/fsfs.conf
svn co file://$(pwd)/repo wc1
cd wc1
wget https://shattered.it/static/shattered-1.pdf
wget https://shattered.it/static/shattered-2.pdf
svn add *.pdf
svn ci -m "Add files with identical SHA1"
cd ..
svn co file://$(pwd)/repo wc2
]]]

A  (bin)  shattered-1.pdf
A  (bin)  shattered-2.pdf            
Adding  (bin)  shattered-1.pdf            
Adding  (bin)  shattered-2.pdf
Transmitting file data ..done         
Committing transaction...                            
Committed revision 1.
A    wc2/shattered-1.pdf
A    wc2/shattered-2.pdf                           
Checked out revision 1.
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Barry Scott
Surely two files with the same hash was always a posibility, not matter what the hash function is?

Barry

On 24 Feb 2017, at 16:55, Mark Phippard <[hidden email]> wrote:

Someone may want to jump in here:


Mark


On Feb 24, 2017, at 5:51 AM, Stefan Sperling <[hidden email]> wrote:

On Thu, Feb 23, 2017 at 09:02:28PM +0100, Øyvind A. Holm wrote:
Earlier today, the first known SHA1 collision was presented:

 https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
 http://shattered.io/

It turns out that adding these two PDF files to a svn repository makes
it impossible to checkout the repository properly if both files exist in
the repo. This script demonstrates what happens:

As a workaround, disable rep-sharing and the error goes away.

[[[
#!/bin/sh

if test -e repo -o -e wc1 -o -e wc2; then
 echo repo, wc1 or wc2 already exist >&2
 exit 1
fi
svnadmin create repo
sed -i -e 's/# enable-rep-sharing = true/enable-rep-sharing = false/' repo/db/fsfs.conf
svn co file://$(pwd)/repo wc1
cd wc1
wget https://shattered.it/static/shattered-1.pdf
wget https://shattered.it/static/shattered-2.pdf
svn add *.pdf
svn ci -m "Add files with identical SHA1"
cd ..
svn co file://$(pwd)/repo wc2
]]]

A  (bin)  shattered-1.pdf
A  (bin)  shattered-2.pdf            
Adding  (bin)  shattered-1.pdf            
Adding  (bin)  shattered-2.pdf
Transmitting file data ..done         
Committing transaction...                            
Committed revision 1.
A    wc2/shattered-1.pdf
A    wc2/shattered-2.pdf                           
Checked out revision 1.
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Mark Phippard-3
In reply to this post by Stefan Sperling-9
On Fri, Feb 24, 2017 at 5:51 AM, Stefan Sperling <[hidden email]> wrote:
On Thu, Feb 23, 2017 at 09:02:28PM +0100, Řyvind A. Holm wrote:
> Earlier today, the first known SHA1 collision was presented:
>
>   https://security.googleblog.com/2017/02/announcing-first-sha1-collision.html
>   http://shattered.io/
>
> It turns out that adding these two PDF files to a svn repository makes
> it impossible to checkout the repository properly if both files exist in
> the repo. This script demonstrates what happens:

As a workaround, disable rep-sharing and the error goes away.

[[[
#!/bin/sh

if test -e repo -o -e wc1 -o -e wc2; then
  echo repo, wc1 or wc2 already exist >&2
  exit 1
fi
svnadmin create repo
sed -i -e 's/# enable-rep-sharing = true/enable-rep-sharing = false/' repo/db/fsfs.conf
svn co file://$(pwd)/repo wc1
cd wc1
wget https://shattered.it/static/shattered-1.pdf
wget https://shattered.it/static/shattered-2.pdf
svn add *.pdf
svn ci -m "Add files with identical SHA1"
cd ..
svn co file://$(pwd)/repo wc2
]]]

A  (bin)  shattered-1.pdf
A  (bin)  shattered-2.pdf
Adding  (bin)  shattered-1.pdf
Adding  (bin)  shattered-2.pdf
Transmitting file data ..done
Committing transaction...
Committed revision 1.
A    wc2/shattered-1.pdf
A    wc2/shattered-2.pdf
Checked out revision 1.


Note that while this does fix the error, but because of the sha1 storage sharing in the working copy you actually do not get the correct files.  Both PDF's wind up being the same file, I imagine whichever one you receive first is the one you get.

So not only does rep sharing need to be fixed, the WC pristine storage is also broken by this.


--
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Stefan Sperling
On Fri, Feb 24, 2017 at 01:03:09PM -0500, Mark Phippard wrote:
> Note that while this does fix the error, but because of the sha1 storage
> sharing in the working copy you actually do not get the correct files.
> Both PDF's wind up being the same file, I imagine whichever one you receive
> first is the one you get.
>
> So not only does rep sharing need to be fixed, the WC pristine storage is
> also broken by this.

Yes, indeed.

I believe we should prepare a new working format for 1.10.0 which
addresses this problem. I don't see a good way of fixing it without
a format bump. The bright side of this is that it gives us a good
reason to get 1.10.0 ready ASAP.

We can switch to a better hash algorithm with a WC format bump.
If we are willing to dispose of de-duplication in the pristine store we could
make the pristine store future proof by adding a "salt" to each row in the
pristine table. Say 64 bytes of data prepended to file content, which are
random but stay fixed throughout the lifetime of a pristine.
This way, there are 64 bytes of data not controlled by repository content
which affect the hash algorithm's result before data from repository content
gets mixed in. Now hash collisions in repository content become much less
of a problem for the working copy. However, the pristine store would stop
de-duplicating content. So perhaps this is not the best approach.

The rep-cache uses hashes only for de-duplication so it very much relies on
hash collisions being negligible. We should upgrade the hashing algorithm
in a way that 'svnadmin upgrade' can take care of (for new revisions).
Perhaps we should disable the feature by default in a 1.9.x patch release
and advise users to turn it off until they can upgrade to 1.10.

We might have to give up on ra_serf's approach of avoiding retransmissions
of content which is already stored in the pristine store. This is now just
as broken as the rep-cache is. We might be able to salvage it for future
clients, but we should probably send multiple hashes and make it as easy as
possible to add newer hash algorithms in future versions without disturbing
older clients. Perhaps as a first step we should just disable this feature?
Reply | Threaded
Open this post in threaded view
|

Re: Files with identical SHA1 breaks the repo

Andreas Stieger
In reply to this post by sunny256
Hi,

This hook script can prevent further corruptions through files based on
the known two 320 bytes prefixes.
https://svn.apache.org/r1784336

Andreas
12