Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Evgeny Kotkov
Stefan Sperling <[hidden email]> writes:

> However, if rep-sharing is enabled, svn_fs_props_changed() does not work
> as advertised because properties do not carry a SHA1 checksum with a
> "uniquifier" which identifies the transaction they were created in.
> The uniquifier is used by svn_fs_props_changed() to tell apart property
> representations which share content but were created in different revisions.
>
> To fix that problem, make FSFS write SHA1 checksums along with uniquifiers
> for file properties, just as it is already done for file content.
>
> A source code comment indicates that SHA1 checksums for property reps
> were not written due to concerns over disk space. In hindsight, this was
> a bad trade-off because it affected correctness of svn_fs_props_changed().

Thinking about this change, it could be that writing the additional 40-byte
SHA1 for the property representations is going to eliminate the benefit of
sharing them in the first place.

If I recall correctly, rep sharing for properties is mostly there to gain
from deduplicating small properties, such as svn:eol-style or svn:keywords.
But with the additional SHA1 written in every representation string, this
overhead is likely to take more space than the property itself.

An alternative approach that might be worth considering here would be:

 (1) Extend the on-disk format and allow representation strings without
     SHA1, but with the uniquifier, something like this (where "-" stands
     for "no SHA1"):

     15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5

     (The new format would be allowed starting from FSFS 8.)

 (2) Use the new format to allow rep sharing for properties that writes
     the uniquifier so that svn_fs_props_changed() would work correctly,
     and doesn't introduce the overhead of writing SHA1 in the representation
     string for every property.

 (3) Disable rep sharing for properties in FSFS formats < 8 that cannot
     read and write such representation strings without SHA1, but with an
     uniquifier.

Barring objections and alternative suggestions, I could give a shot at
implementing this.


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Stefan Fuhrmann
On 22.11.2017 22:08, Evgeny Kotkov wrote:

> Stefan Sperling <[hidden email]> writes:
>
>> However, if rep-sharing is enabled, svn_fs_props_changed() does not work
>> as advertised because properties do not carry a SHA1 checksum with a
>> "uniquifier" which identifies the transaction they were created in.
>> The uniquifier is used by svn_fs_props_changed() to tell apart property
>> representations which share content but were created in different revisions.
>>
>> To fix that problem, make FSFS write SHA1 checksums along with uniquifiers
>> for file properties, just as it is already done for file content.
>>
>> A source code comment indicates that SHA1 checksums for property reps
>> were not written due to concerns over disk space. In hindsight, this was
>> a bad trade-off because it affected correctness of svn_fs_props_changed().
> Thinking about this change, it could be that writing the additional 40-byte
> SHA1 for the property representations is going to eliminate the benefit of
> sharing them in the first place.
Here some more background after digging through the commit history.

The original comment about space saving was ill-worded. It should have
said something like "storing sha1 etc. is pointless because nobody uses
it for props & dirs". Hence, it is a waste of space.

In 1.8, svn_fs_fs__dag_things_different got broken for props due to the
missing uniquifier but that went unnoticed. This has not been a deliberate
trade-off but a mere oversight of the potential implication of shared reps.
It needs to be fixed.

One reason for rep sharing is that rep changes tend to be rare and often
not "in sync" with text changes. That means, they must be pulled from
a different repo location than both, the node tree (dirs and noderevs)
and the file contents. Eliminating most of these accesses by reading
only a few shared proplists should give some speedup.

Space savings on disk are not the main focus. Those might happen for
commits with mass prop changes and a longer list of standard-props
at each node. Savings in the membuffer cache may be more significant,
though, because its entry directory has a limited size and storage may
remain unused if the average cached item size drops below 1k.

> If I recall correctly, rep sharing for properties is mostly there to gain
> from deduplicating small properties, such as svn:eol-style or svn:keywords.
> But with the additional SHA1 written in every representation string, this
> overhead is likely to take more space than the property itself.
>
> An alternative approach that might be worth considering here would be:
>
>   (1) Extend the on-disk format and allow representation strings without
>       SHA1, but with the uniquifier, something like this (where "-" stands
>       for "no SHA1"):
>
>       15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5
>
>       (The new format would be allowed starting from FSFS 8.)
Yes, that would be one option. If you are willing to provide a patch,
I would probably +1 it. Not bothering with the space savings would
be a valid choice as well, IMO.
>   (2) Use the new format to allow rep sharing for properties that writes
>       the uniquifier so that svn_fs_props_changed() would work correctly,
>       and doesn't introduce the overhead of writing SHA1 in the representation
>       string for every property.
>
>   (3) Disable rep sharing for properties in FSFS formats < 8 that cannot
>       read and write such representation strings without SHA1, but with an
>       uniquifier.
I'd rather have the pre fsfs-8 repos use the full sha1+uniquifier.
That is not worse than storing the props themselves and will
preserve the benefits mentioned above.
> Barring objections and alternative suggestions, I could give a shot at
> implementing this.
Go for it. Maybe notify me once you are done b/c currently, I don't
monitor the commit activity closely.

-- Stefan^2.
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Evgeny Kotkov
Stefan Fuhrmann <[hidden email]> writes:

>> An alternative approach that might be worth considering here would be:
>>
>>   (1) Extend the on-disk format and allow representation strings without
>>       SHA1, but with the uniquifier, something like this (where "-" stands
>>       for "no SHA1"):
>>
>>       15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5
>>
>>       (The new format would be allowed starting from FSFS 8.)
>
> Yes, that would be one option. If you are willing to provide a patch,
> I would probably +1 it. Not bothering with the space savings would
> be a valid choice as well, IMO.
>>
>>   (2) Use the new format to allow rep sharing for properties that writes
>>       the uniquifier so that svn_fs_props_changed() would work correctly,
>>       and doesn't introduce the overhead of writing SHA1 in the
>>       representation string for every property.

[...]

>> Barring objections and alternative suggestions, I could give a shot at
>> implementing this.
>
> Go for it. Maybe notify me once you are done b/c currently, I don't
> monitor the commit activity closely.

I committed the implementations of (1) and (2) in

    https://svn.apache.org/r1816347 and
    https://svn.apache.org/r1816348

respectively.


Thanks,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1813898 - in /subversion/trunk/subversion: libsvn_fs_fs/transaction.c libsvn_repos/reporter.c tests/cmdline/basic_tests.py tests/cmdline/svnadmin_tests.py

Stefan Fuhrmann


On 25.11.2017 21:27, Evgeny Kotkov wrote:

> Stefan Fuhrmann <[hidden email]> writes:
>
>>> An alternative approach that might be worth considering here would be:
>>>
>>>    (1) Extend the on-disk format and allow representation strings without
>>>        SHA1, but with the uniquifier, something like this (where "-" stands
>>>        for "no SHA1"):
>>>
>>>        15 0 563 7809 28ef320a82e7bd11eebdf3502d69e608 - 14-g/_5
>>>
>>>        (The new format would be allowed starting from FSFS 8.)
>>
>> Yes, that would be one option. If you are willing to provide a patch,
>> I would probably +1 it. Not bothering with the space savings would
>> be a valid choice as well, IMO.
>>>
>>>    (2) Use the new format to allow rep sharing for properties that writes
>>>        the uniquifier so that svn_fs_props_changed() would work correctly,
>>>        and doesn't introduce the overhead of writing SHA1 in the
>>>        representation string for every property.
>
> [...]
>
>>> Barring objections and alternative suggestions, I could give a shot at
>>> implementing this.
>>
>> Go for it. Maybe notify me once you are done b/c currently, I don't
>> monitor the commit activity closely.
>
> I committed the implementations of (1) and (2) in
>
>      https://svn.apache.org/r1816347 and
>      https://svn.apache.org/r1816348
>
> respectively.

Reviewed, tested and seems to work fine. Thank you!

Test expectations may need to be adapted.
With v7 repos (see r1816402), I get two
test failures while v4 and v6 pass:

[[[
W: EXPECTED STDERR (regexp):
W: | .*Found malformed header '[^']*' in revision file|.*Missing id field in
node-rev.*
W: ACTUAL STDERR:
W: | * Error verifying repository metadata.
W: | subversion/svnadmin/svnadmin.c:2191,
W: | subversion/libsvn_repos/dump.c:2511,
W: | subversion/libsvn_repos/dump.c:2425,
W: | subversion/svnadmin/svnadmin.c:965,
W: | subversion/libsvn_fs/fs-loader.c:619,
W: | subversion/libsvn_fs_fs/verify.c:914,
W: | subversion/libsvn_fs_fs/verify.c:876,
W: | subversion/libsvn_fs_fs/verify.c:233,
W: | subversion/libsvn_fs_fs/rev_file.c:263,
W: | subversion/libsvn_fs_fs/low_level.c:237,
W: | subversion/libsvn_subr/checksum.c:432: (apr_err=SVN_ERR_BAD_CHECKSUM_PARSE)
W: | svnadmin: E125012: Invalid character in hex checksum
W: CWD: /dev/shm/trunk/subversion/tests/cmdline
W: EXCEPTION: SVNUnmatchedError
Traceback (most recent call last):
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/main.py", line 1872, in run
     rc = self.pred.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 258,
in run
     return self._delegate.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 178,
in run
     result = self.func(sandbox)
   File "/run/shm/trunk/subversion/tests/cmdline/svnadmin_tests.py", line 923,
in verify_incremental_fsfs
     expected_stderr=".*Found malformed header '[^']*' in revision file"
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 455,
in verify_outputs
     compare_and_display_lines(message, label, expected, actual, raisable)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 428,
in compare_and_display_lines
     raise raisable
SVNUnmatchedError
FAIL:  svnadmin_tests.py 12: svnadmin verify detects corruption dump can't
]]]

[[[
W: Unexpected error while running 'svnadmin verify'.
W: EXPECTED STDERR (regexp):
W: | .*Path '.*' is not in UTF-8.*
W: ACTUAL STDERR:
W: | * Error verifying repository metadata.
W: | subversion/svnadmin/svnadmin.c:2191,
W: | subversion/libsvn_repos/dump.c:2511,
W: | subversion/libsvn_repos/dump.c:2425,
W: | subversion/svnadmin/svnadmin.c:965,
W: | subversion/libsvn_fs/fs-loader.c:619,
W: | subversion/libsvn_fs_fs/verify.c:914,
W: | subversion/libsvn_fs_fs/verify.c:876,
W: | subversion/libsvn_fs_fs/verify.c:717,
W: | subversion/libsvn_fs_fs/verify.c:553,
W: | subversion/libsvn_fs_fs/verify.c:527: (apr_err=SVN_ERR_FS_CORRUPT)
W: | svnadmin: E160004: Checksum mismatch in item at offset 84 of length 254
bytes in file svn-test-work/repositories/svnadmin_tests-24/db/revs/0/1
W: CWD: /dev/shm/trunk/subversion/tests/cmdline
W: EXCEPTION: SVNUnmatchedError
Traceback (most recent call last):
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/main.py", line 1872, in run
     rc = self.pred.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 258,
in run
     return self._delegate.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 258,
in run
     return self._delegate.run(sandbox)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/testcase.py", line 178,
in run
     result = self.func(sandbox)
   File "/run/shm/trunk/subversion/tests/cmdline/svnadmin_tests.py", line 1706,
in verify_non_utf8_paths
     [], errput, None, ".*Path '.*' is not in UTF-8.*")
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 455,
in verify_outputs
     compare_and_display_lines(message, label, expected, actual, raisable)
   File "/dev/shm/trunk/subversion/tests/cmdline/svntest/verify.py", line 428,
in compare_and_display_lines
     raise raisable
SVNUnmatchedError
FAIL:  svnadmin_tests.py 24: svnadmin verify with non-UTF-8 paths
]]]

-- Stefan^2.