Data corruption bug in WC, apparently due to race condition?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Data corruption bug in WC, apparently due to race condition?

Karl Fogel-2
This is with stock SVN version 1.9.6 (r1800392) as packaged in Debian GNU/Linux 'testing' distro, compiled Jul 22 2017, 06:41:45 on x86_64-pc-linux-gnu.  (I dist-upgraded from the Debian package tree this morning, FWIW.)

I think I've encountered a data corruption bug due to a race condition somewhere in the working copy library.  I don't have a reliable reproduction recipe, but here's what I did:

  1) Start an SVN commit.

  2) While the commit is in progress, edit and save a
     change to one of the files that's being committed.

  3) The commit completes successfully, or "successfully", but...

...now the repository has the file contents from before my edit of (2), the working copy has the file contents from *after* my edit of (2), but SVN reports no local modifications when I do 'svn status'.  That's... bad, right? :-)

The transcript below, from after the above steps, gives more detail.  What the transcript shows should be impossible:

  $ svn up
  Updating '.':
  At revision 6916.
  $ svn status -q
  ### no output, so no local mods that SVN is aware of ###
  $ svn info org/main.org
  Path: org/main.org
  Name: main.org
  Working Copy Root Path: /home/kfogel/private
  URL: https://svn.red-bean.com/kfogel-private/trunk/org/main.org
  Relative URL: ^/trunk/org/main.org
  Repository Root: https://svn.red-bean.com/kfogel-private
  Repository UUID: c51825e8-60e0-0310-a34c-dea7275b9f40
  Revision: 6916
  Node Kind: file
  Schedule: normal
  Last Changed Author: kfogel
  Last Changed Rev: 6916
  Last Changed Date: 2017-07-27 01:01:45 -0500 (Thu, 27 Jul 2017)
  Text Last Updated: 2017-07-27 01:01:36 -0500 (Thu, 27 Jul 2017)
  Checksum: ab3e79907b1107753ec961826fb22d9975bc6e4e
 
  $ svn ls -v org/main.org
     6916 kfogel          35258 Jul 27 01:01 main.org
  $ svn cat -r6916 org/main.org > org/main.org-from-repository-r6916
  $ sha1sum org/main.org
  25d8f1383a453908782a1e550dad84b0bcafc0c9  org/main.org
  $ sha1sum org/main.org-from-repository-r6916
  ab3e79907b1107753ec961826fb22d9975bc6e4e  org/main.org-from-repository-r6916
  $ diff -u org/main.org org/main.org-from-repository-r6916
  --- org/main.org 2017-07-27 01:01:36.259540030 -0500
  +++ org/main.org-from-repository-r6916 2017-07-27 01:03:19.472775400 -0500
  @@ -9,6 +9,7 @@
       To: Karl Fogel <[hidden email]>
       Date: Wed, 26 Jul 2017 20:29:40 -0400
       Message-ID: <[hidden email]>
  +  - Erie Meyer email
     - Baltimore Hotel fix
     - Record United ORD<->LGA flight information
       United Confirmation Number FGD887
  $ svn --version
  svn, version 1.9.6 (r1800392)
     compiled Jul 22 2017, 06:41:45 on x86_64-pc-linux-gnu
 
  Copyright (C) 2017 The Apache Software Foundation.
  This software consists of contributions made by many people;
  see the NOTICE file for more information.
  Subversion is open source software, see http://subversion.apache.org/
 
  The following repository access (RA) modules are available:
 
  * ra_svn : Module for accessing a repository using the svn network protocol.
    - with Cyrus SASL authentication
    - handles 'svn' scheme
  * ra_local : Module for accessing a repository on local disk.
    - handles 'file' scheme
  * ra_serf : Module for accessing a repository via WebDAV protocol using serf.
    - using serf 1.3.9 (compiled with 1.3.9)
    - handles 'http' scheme
    - handles 'https' scheme
 
  The following authentication credential caches are available:
 
  * Plaintext cache in /home/kfogel/.subversion
  * Gnome Keyring
  * GPG-Agent
  * KWallet (KDE)
 
  $ svn ci -m "Nothing to commit, because SVN mistakenly thinks there are no local mods."
  ### no commit happens ###
  $ cd org
  $ svn status main.org
  $ mv main.org was-local-main.org
  $ svn up
  Updating '.':
  Restored 'main.org'
  At revision 6916.
  $ diff -u main.org was-local-main.org
  --- main.org 2017-07-27 01:19:02.779501081 -0500
  +++ was-local-main.org 2017-07-27 01:01:36.259540030 -0500
  @@ -9,7 +9,6 @@
       To: Karl Fogel <[hidden email]>
       Date: Wed, 26 Jul 2017 20:29:40 -0400
       Message-ID: <[hidden email]>
  -  - Erie Meyer email
     - Baltimore Hotel fix
     - Record United ORD<->LGA flight information
       United Confirmation Number FGD887
  $

Of course, it would be very difficult to reproduce this -- the file save probably has to happen at exactly the wrong moment during the commit.  But maybe it's possible to find it just knowing that it's possible.  In the meantime, I've saved a copy of the relevant top-level .svn/ directory and of the working copy subdirectory in which the bug occurred (from right before the "cd org" step above), in case those would help.

I'm not subscribed to dev@ these days, so please keep me CC'd on any followups.

Best regards,
-Karl
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption bug in WC, apparently due to race condition?

Karl Fogel-2
A bit more info, following up to my previous mail:

In the transcript in my previous mail, at the end I move the corrupt file out of the way and do 'svn up' to restore a good copy from the repository.  FWIW, the restoration worked:

  $ cp main.org real-main.org
  $ cp was-local-main.org main.org
  $ svn status -q
  M       main.org
  $ cp real-main.org main.org
  $ svn status -q
  $

So, the original bug is still there, but at least the obvious method of recovering manually works as intended: the working copy now correctly detects the modified file as modified.

Best regards,
-Karl

> This is with stock SVN version 1.9.6 (r1800392) as packaged in Debian
> GNU/Linux 'testing' distro, compiled Jul 22 2017, 06:41:45 on
> x86_64-pc-linux-gnu.  (I dist-upgraded from the Debian package tree this
> morning, FWIW.)
>
> I think I've encountered a data corruption bug due to a race condition
> somewhere in the working copy library.  I don't have a reliable
> reproduction recipe, but here's what I did:
>
>   1) Start an SVN commit.
>
>   2) While the commit is in progress, edit and save a
>      change to one of the files that's being committed.
>
>   3) The commit completes successfully, or "successfully", but...
>
> ...now the repository has the file contents from before my edit of (2),
> the working copy has the file contents from *after* my edit of (2), but
> SVN reports no local modifications when I do 'svn status'.
> That's... bad, right? :-)
>
> The transcript below, from after the above steps, gives more detail.
> What the transcript shows should be impossible:
>
>   $ svn up
>   Updating '.':
>   At revision 6916.
>   $ svn status -q
>   ### no output, so no local mods that SVN is aware of ###
>   $ svn info org/main.org
>   Path: org/main.org
>   Name: main.org
>   Working Copy Root Path: /home/kfogel/private
>   URL: https://svn.red-bean.com/kfogel-private/trunk/org/main.org
>   Relative URL: ^/trunk/org/main.org
>   Repository Root: https://svn.red-bean.com/kfogel-private
>   Repository UUID: c51825e8-60e0-0310-a34c-dea7275b9f40
>   Revision: 6916
>   Node Kind: file
>   Schedule: normal
>   Last Changed Author: kfogel
>   Last Changed Rev: 6916
>   Last Changed Date: 2017-07-27 01:01:45 -0500 (Thu, 27 Jul 2017)
>   Text Last Updated: 2017-07-27 01:01:36 -0500 (Thu, 27 Jul 2017)
>   Checksum: ab3e79907b1107753ec961826fb22d9975bc6e4e
>  
>   $ svn ls -v org/main.org
>      6916 kfogel          35258 Jul 27 01:01 main.org
>   $ svn cat -r6916 org/main.org > org/main.org-from-repository-r6916
>   $ sha1sum org/main.org
>   25d8f1383a453908782a1e550dad84b0bcafc0c9  org/main.org
>   $ sha1sum org/main.org-from-repository-r6916
>   ab3e79907b1107753ec961826fb22d9975bc6e4e  org/main.org-from-repository-r6916
>   $ diff -u org/main.org org/main.org-from-repository-r6916
>   --- org/main.org 2017-07-27 01:01:36.259540030 -0500
>   +++ org/main.org-from-repository-r6916 2017-07-27 01:03:19.472775400 -0500
>   @@ -9,6 +9,7 @@
>        To: Karl Fogel <[hidden email]>
>        Date: Wed, 26 Jul 2017 20:29:40 -0400
>        Message-ID: <[hidden email]>
>   +  - Erie Meyer email
>      - Baltimore Hotel fix
>      - Record United ORD<->LGA flight information
>        United Confirmation Number FGD887
>   $ svn --version
>   svn, version 1.9.6 (r1800392)
>      compiled Jul 22 2017, 06:41:45 on x86_64-pc-linux-gnu
>  
>   Copyright (C) 2017 The Apache Software Foundation.
>   This software consists of contributions made by many people;
>   see the NOTICE file for more information.
>   Subversion is open source software, see http://subversion.apache.org/
>  
>   The following repository access (RA) modules are available:
>  
>   * ra_svn : Module for accessing a repository using the svn network protocol.
>     - with Cyrus SASL authentication
>     - handles 'svn' scheme
>   * ra_local : Module for accessing a repository on local disk.
>     - handles 'file' scheme
>   * ra_serf : Module for accessing a repository via WebDAV protocol using serf.
>     - using serf 1.3.9 (compiled with 1.3.9)
>     - handles 'http' scheme
>     - handles 'https' scheme
>  
>   The following authentication credential caches are available:
>  
>   * Plaintext cache in /home/kfogel/.subversion
>   * Gnome Keyring
>   * GPG-Agent
>   * KWallet (KDE)
>  
>   $ svn ci -m "Nothing to commit, because SVN mistakenly thinks there are no local mods."
>   ### no commit happens ###
>   $ cd org
>   $ svn status main.org
>   $ mv main.org was-local-main.org
>   $ svn up
>   Updating '.':
>   Restored 'main.org'
>   At revision 6916.
>   $ diff -u main.org was-local-main.org
>   --- main.org 2017-07-27 01:19:02.779501081 -0500
>   +++ was-local-main.org 2017-07-27 01:01:36.259540030 -0500
>   @@ -9,7 +9,6 @@
>        To: Karl Fogel <[hidden email]>
>        Date: Wed, 26 Jul 2017 20:29:40 -0400
>        Message-ID: <[hidden email]>
>   -  - Erie Meyer email
>      - Baltimore Hotel fix
>      - Record United ORD<->LGA flight information
>        United Confirmation Number FGD887
>   $
>
> Of course, it would be very difficult to reproduce this -- the file save
> probably has to happen at exactly the wrong moment during the commit.
> But maybe it's possible to find it just knowing that it's possible.  In
> the meantime, I've saved a copy of the relevant top-level .svn/
> directory and of the working copy subdirectory in which the bug occurred
> (from right before the "cd org" step above), in case those would help.
>
> I'm not subscribed to dev@ these days, so please keep me CC'd on any
> followups.
>
> Best regards,
> -Karl
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption bug in WC, apparently due to race condition?

Philip Martin
In reply to this post by Karl Fogel-2
Karl Fogel <[hidden email]> writes:

>   1) Start an SVN commit.
>
>   2) While the commit is in progress, edit and save a
>      change to one of the files that's being committed.
>
>   3) The commit completes successfully, or "successfully", but...
>
> ...now the repository has the file contents from before my edit of
> (2), the working copy has the file contents from *after* my edit of
> (2), but SVN reports no local modifications when I do 'svn status'.
> That's... bad, right? :-)

The post-commit processing on the client side is not checking for
modifications before recording filesize/timestamp in the nodes table in
.svn/wc.db.

In first terminal:

  $ svnadmin create repo
  $ svnmucc -mm put <(echo foo) file://`pwd`/repo/f
  $ svn co file://`pwd`/repo wc
  $ echo bar > wc/f
  $ gdb --arg svn ci -mm wc
  (gdb) b svn_client__do_commit
  (gdb) r
  hit breakpoint
  (gdb) fin
  run to end of svn_client__do_commit

Switch to second terminal:

  $ svn st wc
    L     wc
  M       wc/f
  $ cat wc/.svn/pristine/*/*
  foo
  bar
  $ echo zigzag > wc/f

Switch back to first terminal:

  (gdb) c
  (gdb) q
  $

I believe that reproduces the problem:

  $ svn cat -r1 wc/f
  foo
  $ svn cat -r2 wc/f
  bar
  $ cat wc/f
  zigzag
  $ sqlite3 wc/.svn/wc.db "select translated_size from nodes where local_relpath='f'"
  7
  $ svn st wc
  $
  $ touch wc/f   # to break timestamp
  $ svn st wc
  M      wc/f

To fix this we would need to have the client's post-commit processing do
a full-text comparison to catch modifications before storing the
timestamp/size in .svn/wc.db.  Avoid a race is a bit tricky, perhaps:

  1) stat() to get timestamp/filesize
  2) full-text compare to ensure text is unchanged
  3) stat() to ensure timestamp/filesize is unchanged
  4) store timestamp/filesize

--
Philip
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption bug in WC, apparently due to race condition?

Karl Fogel-2
Philip Martin <[hidden email]> writes:
>The post-commit processing on the client side is not checking for
>modifications before recording filesize/timestamp in the nodes table in
>.svn/wc.db.

My hat is off to you for tracking this down, Philip!  Thanks.  Comments/questions at the end, after your transcript.

>In first terminal:
>
>  $ svnadmin create repo
>  $ svnmucc -mm put <(echo foo) file://`pwd`/repo/f
>  $ svn co file://`pwd`/repo wc
>  $ echo bar > wc/f
>  $ gdb --arg svn ci -mm wc
>  (gdb) b svn_client__do_commit
>  (gdb) r
>  hit breakpoint
>  (gdb) fin
>  run to end of svn_client__do_commit
>
>Switch to second terminal:
>
>  $ svn st wc
>    L     wc
>  M       wc/f
>  $ cat wc/.svn/pristine/*/*
>  foo
>  bar
>  $ echo zigzag > wc/f
>
>Switch back to first terminal:
>
>  (gdb) c
>  (gdb) q
>  $
>
>I believe that reproduces the problem:
>
>  $ svn cat -r1 wc/f
>  foo
>  $ svn cat -r2 wc/f
>  bar
>  $ cat wc/f
>  zigzag
>  $ sqlite3 wc/.svn/wc.db "select translated_size from nodes where local_relpath='f'"
>  7
>  $ svn st wc
>  $
>  $ touch wc/f   # to break timestamp
>  $ svn st wc
>  M      wc/f
>
>To fix this we would need to have the client's post-commit processing do
>a full-text comparison to catch modifications before storing the
>timestamp/size in .svn/wc.db.  Avoid a race is a bit tricky, perhaps:
>
>  1) stat() to get timestamp/filesize
>  2) full-text compare to ensure text is unchanged
>  3) stat() to ensure timestamp/filesize is unchanged
>  4) store timestamp/filesize

I'm not familiar these days with the current archicture of libsvn_client and libsvn_wc, but just in principle, there should be an easier way to do this than the above, without re-comparing full-texts [1] (or, equivalently, re-calculating the hash).

When the client sends the file for commit, have it remember the timestamp, file size, and hash of the working file (as of the exact version that was used for the commit -- and if the file is being streamily appended to during the commit, or something like that, well, then just remember the relevant values for what was sent in the commit).  Then during commit finalization, just store that remembered metadata, *not* metadata derived from the possibly-now-changed working file.

In other words, why isn't the commit process just taking a data-and-metadata "snapshot" of each working file, and using that snapshot for both the commit and the post-commit bookkeeping on the client side?

If the client were to do that, then if a working file gets modified during the commit, the file would naturally show up as modified afterwards without any special checks like your step (3) above.  (I guess yet another way to say it is: your steps 1-4 are fine, but they should all happen as part of the commit, and all be done by the time the post-commit stage arrives.)

(Also, I thought this was how we were always doing things!  But my memory is fuzzy, and/or things might have changed.)

Am I missing some subtlety about this?

Best regards,
-Karl

[1] In any case, a true full-text comparison should rarely be necessary.  First we can look at the file size from the directory entry and see if it's as expected; in most cases it will differ if the contents differ, so that's the first "early out".  Then we could look at the first 1024 (or whatever) bytes; many files, if they have changed, will show some change near the beginning, so that's a second "early out".  I guess a third early-out would be to do an lseek into the middle and just see if the byte there is as expected? :-)  But yes, eventually, a full-text comparison, i.e., a hash recalculation, may be necessary.
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption bug in WC, apparently due to race condition?

Philip Martin
Karl Fogel <[hidden email]> writes:

> Am I missing some subtlety about this?

I think you are correct.  I'm not really sure why we are not doing it
like that already, but it is a while since I looked at the commit code.
We do store the new pristine text-base before starting the post-commit
processing, we need to add metadata to that data "snapshot".

--
Philip
Reply | Threaded
Open this post in threaded view
|

RE: Data corruption bug in WC, apparently due to race condition?

Bert Huijben-5
In reply to this post by Philip Martin


> -----Original Message-----
> From: Philip Martin [mailto:[hidden email]]
> Sent: donderdag 27 juli 2017 19:57
> To: Karl Fogel <[hidden email]>
> Cc: Subversion Dev <[hidden email]>
> Subject: Re: Data corruption bug in WC, apparently due to race condition?
>
> Karl Fogel <[hidden email]> writes:
>
> >   1) Start an SVN commit.
> >
> >   2) While the commit is in progress, edit and save a
> >      change to one of the files that's being committed.
> >
> >   3) The commit completes successfully, or "successfully", but...
> >
> > ...now the repository has the file contents from before my edit of
> > (2), the working copy has the file contents from *after* my edit of
> > (2), but SVN reports no local modifications when I do 'svn status'.
> > That's... bad, right? :-)
>
> The post-commit processing on the client side is not checking for
> modifications before recording filesize/timestamp in the nodes table in
> .svn/wc.db.

There used to be a system for this around the release of Subversion 1.7.0.
Not sure what happened to this...
There should be a comparison (probably via one of the wc question apis) if
the timestamp changed since what was committed.

But even then there is a theoretical issue, between the file install and
then obtaining the timestamp+size directly after that. But you shouldn't be
able to trigger that without a debugger. Unlike this case where the whole
commit is locked.


We should do better, but we don't promise that this editing while committing
is 100% stable. During the working copy lock we should have exclusive access
to the locked parts of the working copy...

        Bert

Reply | Threaded
Open this post in threaded view
|

Re: Data corruption bug in WC, apparently due to race condition?

Karl Fogel-2
"Bert Huijben" <[hidden email]> writes:
>There used to be a system for this around the release of Subversion 1.7.0.
>Not sure what happened to this...
>There should be a comparison (probably via one of the wc question apis) if
>the timestamp changed since what was committed.

There is no need to do such a check at commit time.  We do it for 'svn status', or when 'svn commit' is figuring out which files have been modified, and we should be able to guarantee that each working file's timestamp is in good order by the time a commit is finished, so that those other commands can rely on the timestamps.

>But even then there is a theoretical issue, between the file install and
>then obtaining the timestamp+size directly after that. But you shouldn't be
>able to trigger that without a debugger. Unlike this case where the whole
>commit is locked.

We should be able to promise that stability, even in theory.  The way to avoid the theoretical issue is to install the new working file using the timestamp of the tmp file that is being atomically moved into place -- which I think is what already happens, actually.  That is, when we install the working file, we always do so by first building a tmp file somewhere and then atomically renaming it to the working file, right?  That rename should be preserving the timestamp of the tmp source (at least on POSIX systems?).  Try this script:

  #!/bin/sh
  rm -rf foo bar
  touch foo
  ls -l --time-style=full-iso foo
  sleep 1
  mv foo bar
  ls -l --time-style=full-iso bar

Here's the output it produces on my system:

  -rw-r--r-- 1 kfogel kfogel 0 2017-08-01 13:23:33.264610172 -0500 foo
  -rw-r--r-- 1 kfogel kfogel 0 2017-08-01 13:23:33.264610172 -0500 bar

>We should do better, but we don't promise that this editing while committing
>is 100% stable. During the working copy lock we should have exclusive access
>to the locked parts of the working copy...

I think it's unreasonable to expect users to know what files might be involved in a commit at any given time.  The user (or bot) doing the editing might not even be the same as the user (or bot) that triggered the commit.  The audience for Subversion's working copy lock is Subversion itself, not "everyone on the system".  Subversion operations are aware of the lock and will honor it, but we can't expect other programs/people to do so.

Best regards,
-Karl
Reply | Threaded
Open this post in threaded view
|

Re: Data corruption bug in WC, apparently due to race condition?

Karl Fogel-2
In reply to this post by Philip Martin
I filed https://issues.apache.org/jira/browse/SVN-4708 about this (after searching to see if anything had already been filed about it).

Best regards
­Karl

Philip Martin <[hidden email]> writes:
>Karl Fogel <[hidden email]> writes:
>
>> Am I missing some subtlety about this?
>
>I think you are correct.  I'm not really sure why we are not doing it
>like that already, but it is a while since I looked at the commit code.
>We do store the new pristine text-base before starting the post-commit
>processing, we need to add metadata to that data "snapshot".