Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
20 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Daniel Shahaf-2
[hidden email] wrote on Fri, 14 Jul 2017 11:13 +0000:

> Author: kotkov
> Date: Fri Jul 14 11:13:47 2017
> New Revision: 1801940
>
> URL: http://svn.apache.org/viewvc?rev=1801940&view=rev
> Log:
> fsfs: Add initial support for LZ4 compression.
>
> This can significantly (up to 3 times) improve the speed of commits and
> other operations with large binary or incompressible files, while still
> maintaining a decent compression ratio.

> From the client perspective, the patch starts using LZ4 compression
> only for file:// protocol, and the support/negotiation of the use of svndiff2
> with LZ4 compression for http:// and svn:// can be added later.

To be clear, ra_svn in current trunk is interoperable with 1.9, right?
I.e., it doesn't use svndiff2 over the wire.

> With this patch, LZ4 compression will be enabled for fsfs repositories which
> specify compression-level=1 in fsfs.conf.
>
> +++ subversion/trunk/subversion/libsvn_fs_fs/transaction.c Fri Jul 14 11:13:47 2017
> @@ -2159,6 +2159,38 @@ rep_write_cleanup(void *data)
>    return APR_SUCCESS;
>  }
>  
> +static void
> +txdelta_to_svndiff(svn_txdelta_window_handler_t *handler,
> +                   void **handler_baton,
> +                   svn_stream_t *output,
> +                   svn_fs_t *fs,
> +                   apr_pool_t *pool)
> +{
> +  fs_fs_data_t *ffd = fs->fsap_data;
> +  int svndiff_version;
> +  int svndiff_compression_level;
> +
> +  if (ffd->delta_compression_level == 1 &&
> +      ffd->format >= SVN_FS_FS__MIN_SVNDIFF2_FORMAT)
> +    {
> +      svndiff_version = 2;
> +      svndiff_compression_level = 0;
> +    }
> +  else if (ffd->format >= SVN_FS_FS__MIN_SVNDIFF1_FORMAT)
> +    {
> +      svndiff_version = 1;
> +      svndiff_compression_level = ffd->delta_compression_level;
> +    }
> +  else
> +    {
> +      svndiff_version = 0;
> +      svndiff_compression_level = 0;
> +    }
> +
> +  svn_txdelta_to_svndiff3(handler, handler_baton, output, svndiff_version,
> +                          svndiff_compression_level, pool);
> +}

I'm a bit uncomfortable with this logic.

1. It violates the principle of least surprise: compression-level=9
means 'gzip -9', compression-level=5 means 'gzip -5', but
compression-level=1 means LZ4 (with the default acceleration_factor)
rather than 'gzip -1'.

2. It leaves no way to use zlib level 1 in f8 filesystems.  This seems
like a decision that should be left to the admin, rather than hardcoded
into the library.

3. What if somebody wanted to add a backend with, say, xz compression.
(xz compression also takes levels like gzip.)  Would it make sense to have
two tunables:
.
    compression-backend = { lz4 | zlib }
    compression-level = {1..N for lz4, 1..9 for zlib}
.
and then other compression backends could be easily added?

This would also allow admins to set the 'acceleration_factor' of lz4.

> The interoperability is implemented by bumping the format of svndiff
> to 2 and the repository file system format to 8.

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Evgeny Kotkov
Daniel Shahaf <[hidden email]> writes:

> To be clear, ra_svn in current trunk is interoperable with 1.9, right?
> I.e., it doesn't use svndiff2 over the wire.

Yes, both ra_svn and svnserve currently behave as in 1.9, and only
negotiate and use svndiff0 and svndiff1.

> I'm a bit uncomfortable with this logic.
>
> 1. It violates the principle of least surprise: compression-level=9
> means 'gzip -9', compression-level=5 means 'gzip -5', but
> compression-level=1 means LZ4 (with the default acceleration_factor)
> rather than 'gzip -1'.
>
> 2. It leaves no way to use zlib level 1 in f8 filesystems.  This seems
> like a decision that should be left to the admin, rather than hardcoded
> into the library.
>
> 3. What if somebody wanted to add a backend with, say, xz compression.
> (xz compression also takes levels like gzip.)  Would it make sense to have
> two tunables:
> .
>     compression-backend = { lz4 | zlib }
>     compression-level = {1..N for lz4, 1..9 for zlib}
> .
> and then other compression backends could be easily added?
>
> This would also allow admins to set the 'acceleration_factor' of lz4.

First of all, I agree that this logic has drawbacks if observed from the
idealistic point of view:
  - the choice of the compression algorithm happens implicitly, and
  - it doesn't allow users to use stop using LZ4 for any reason with
    the compression level set to 1.

In the meanwhile, I think that the current approach is quite pragmatic,
as LZ4 is a suitable alternative for zlib1, and considering the big picture
with a similar configuration knob in mod_dav_svn, where the choice of the
compression algorithm is tied to the negotiation of the wire format for
older clients (see below).

When I've been thinking about allowing an explicit choice of the algorithm,
I had a slightly different line of thought, opposed to "compression-backend
+ compression-level", with a single option:

  compression = none | lz4 | zlib | zlib-1 ... zlib-9

  (The rationale is to avoid having two dependent options; as well as that,
   currently, I don't have the data showing that being able to tune the
   acceleration factor of LZ4 can noticeably improve performance.)

However, there are a couple of difficulties with porting this approach to
mod_dav_svn, i.e., if we introduce the SVNCompression directive.  There
are clients that don't use LZ4, so, presumably, this options would require
specifying all formats that a server can use, in the preferred order:

  SVNCompression "lz4, zlib"

While such approach is explicit, it also has a couple of drawbacks, as it:
  - leaves a window for mistakes (say, if the user sets "SVNCompression lz4"
    and inadvertently disables compression for older clients),
  - is not forward-compatible, as new compression algorithms require the
    server to be reconfigured, and
  - adds complexity.

As we are lucky that LZ4 is a suitable alternative for zlib1, and that our
current configuration knobs are not tightly bound to zlib, I propose that we
keep the current logic for now and postpone the generic solution up to the
moment when we add another compression algorithm that does not fit this
scheme or requires additional configuration.

In other words, we can always do this separately, when it's absolutely needed
(say, if we find ourselves adding a compression algorithm such as zstd).


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Daniel Shahaf-2
Good morning Evgeny,

First of all, thanks for the well written response.

Evgeny Kotkov wrote on Mon, 24 Jul 2017 19:19 +0300:

> Daniel Shahaf <[hidden email]> writes:
> > I'm a bit uncomfortable with this logic.
> >
> > 1. It violates the principle of least surprise: compression-level=9
> > means 'gzip -9', compression-level=5 means 'gzip -5', but
> > compression-level=1 means LZ4 (with the default acceleration_factor)
> > rather than 'gzip -1'.
> >
> > 2. It leaves no way to use zlib level 1 in f8 filesystems.  This seems
> > like a decision that should be left to the admin, rather than hardcoded
> > into the library.
> >
> > 3. What if somebody wanted to add a backend with, say, xz compression.
> > (xz compression also takes levels like gzip.)  Would it make sense to have
> > two tunables:
> > .
> >     compression-backend = { lz4 | zlib }
> >     compression-level = {1..N for lz4, 1..9 for zlib}
> > .
> > and then other compression backends could be easily added?
> >
> > This would also allow admins to set the 'acceleration_factor' of lz4.
>
> First of all, I agree that this logic has drawbacks if observed from the
> idealistic point of view:
>   - the choice of the compression algorithm happens implicitly, and
>   - it doesn't allow users to use stop using LZ4 for any reason with
>     the compression level set to 1.
>
> In the meanwhile, I think that the current approach is quite pragmatic,
> as LZ4 is a suitable alternative for zlib1, and considering the big picture
> with a similar configuration knob in mod_dav_svn, where the choice of the
> compression algorithm is tied to the negotiation of the wire format for
> older clients (see below).

I'm afraid I disagree with both of these points.

You keep stating "lz4 is better than zlib(level=1)" as an absolute fact.
I do not see it this way.  Yes, there is a benchmark you cited that
scored lz4 higher than zlib, but that does not imply that lz4 is
universally preferable to zlib (and that it will remain so for the
lifetime of the 1.10.x series).

Secondly, the compatibility requirements of mod_dav_svn and FSFS are
different.  mod_dav_svn is supposed to support multiple client minor
versions simultaneously; FSFS did a format bump explicitly because
supporting 1.9 and 1.10 simultaneously was not possible.

> When I've been thinking about allowing an explicit choice of the algorithm,
> I had a slightly different line of thought, opposed to "compression-backend
> + compression-level", with a single option:
>
>   compression = none | lz4 | zlib | zlib-1 ... zlib-9
>
>   (The rationale is to avoid having two dependent options; as well as that,
>    currently, I don't have the data showing that being able to tune the
>    acceleration factor of LZ4 can noticeably improve performance.)

+1

> However, there are a couple of difficulties with porting this approach to
> mod_dav_svn, i.e., if we introduce the SVNCompression directive.  There
> are clients that don't use LZ4, so, presumably, this options would require
> specifying all formats that a server can use, in the preferred order:
>
>   SVNCompression "lz4, zlib"
>
> While such approach is explicit, it also has a couple of drawbacks, as it:
>   - leaves a window for mistakes (say, if the user sets "SVNCompression lz4"
>     and inadvertently disables compression for older clients),
>   - is not forward-compatible, as new compression algorithms require the
>     server to be reconfigured, and
>   - adds complexity.
>

- We can detect the configuration "SVNCompression lz4" and error out on it.

- Forward compatibility: I'm not convinced that we even _need_ forward
  compatibility for this; we can just tell people who set this knob that
  they may miss out on new compression algorithms if they don't review
  the setting of that knob when they upgrade to a new minor version.
  (That's exactly how the client-side config:auth:password-stores
  works.)  There are other options to solve the compatibility issue, but
  as you say, they would add complexity.
 
- I don't see how the fact that «SVNCompression "lz4, zlib"» might be
  considered too complex to add, affects my arguments about fsfs.conf.
  As I said earlier, FSFS f8 does not need to support 1.9 and 1.10
  clients in parallel, so it has no need for a compression negotiation
  configuration.  Perhaps you could clarify the fsfs part of your
  argument?

> As we are lucky that LZ4 is a suitable alternative for zlib1, and that our
> current configuration knobs are not tightly bound to zlib, I propose that we
> keep the current logic for now and postpone the generic solution up to the
> moment when we add another compression algorithm that does not fit this
> scheme or requires additional configuration.
>
> In other words, we can always do this separately, when it's absolutely needed
> (say, if we find ourselves adding a compression algorithm such as zstd).

Yes, there are further changes we could make if we find ourselves adding
more compression algorithms, but it is premature to consider them.  At
this point, I simply suggest to add a fsfs.conf "compression" knob, with
the syntax you proposed, that overrides compression-level if both are set.
We can make further changes as and when.

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Jacek Materna
On Tue, Jul 25, 2017 at 6:39 AM, Daniel Shahaf <[hidden email]> wrote:

> Good morning Evgeny,
>
> First of all, thanks for the well written response.
>
> Evgeny Kotkov wrote on Mon, 24 Jul 2017 19:19 +0300:
>> Daniel Shahaf <[hidden email]> writes:
>> > I'm a bit uncomfortable with this logic.
>> >
>> > 1. It violates the principle of least surprise: compression-level=9
>> > means 'gzip -9', compression-level=5 means 'gzip -5', but
>> > compression-level=1 means LZ4 (with the default acceleration_factor)
>> > rather than 'gzip -1'.
>> >
>> > 2. It leaves no way to use zlib level 1 in f8 filesystems.  This seems
>> > like a decision that should be left to the admin, rather than hardcoded
>> > into the library.
>> >
>> > 3. What if somebody wanted to add a backend with, say, xz compression.
>> > (xz compression also takes levels like gzip.)  Would it make sense to have
>> > two tunables:
>> > .
>> >     compression-backend = { lz4 | zlib }
>> >     compression-level = {1..N for lz4, 1..9 for zlib}
>> > .
>> > and then other compression backends could be easily added?
>> >
>> > This would also allow admins to set the 'acceleration_factor' of lz4.
>>
>> First of all, I agree that this logic has drawbacks if observed from the
>> idealistic point of view:
>>   - the choice of the compression algorithm happens implicitly, and
>>   - it doesn't allow users to use stop using LZ4 for any reason with
>>     the compression level set to 1.
>>
>> In the meanwhile, I think that the current approach is quite pragmatic,
>> as LZ4 is a suitable alternative for zlib1, and considering the big picture
>> with a similar configuration knob in mod_dav_svn, where the choice of the
>> compression algorithm is tied to the negotiation of the wire format for
>> older clients (see below).
>
> I'm afraid I disagree with both of these points.
>
> You keep stating "lz4 is better than zlib(level=1)" as an absolute fact.
> I do not see it this way.  Yes, there is a benchmark you cited that
> scored lz4 higher than zlib, but that does not imply that lz4 is
> universally preferable to zlib (and that it will remain so for the
> lifetime of the 1.10.x series).
>
> Secondly, the compatibility requirements of mod_dav_svn and FSFS are
> different.  mod_dav_svn is supposed to support multiple client minor
> versions simultaneously; FSFS did a format bump explicitly because
> supporting 1.9 and 1.10 simultaneously was not possible.
>
>> When I've been thinking about allowing an explicit choice of the algorithm,
>> I had a slightly different line of thought, opposed to "compression-backend
>> + compression-level", with a single option:
>>
>>   compression = none | lz4 | zlib | zlib-1 ... zlib-9
>>
>>   (The rationale is to avoid having two dependent options; as well as that,
>>    currently, I don't have the data showing that being able to tune the
>>    acceleration factor of LZ4 can noticeably improve performance.)
>
> +1

+1

>
>> However, there are a couple of difficulties with porting this approach to
>> mod_dav_svn, i.e., if we introduce the SVNCompression directive.  There
>> are clients that don't use LZ4, so, presumably, this options would require
>> specifying all formats that a server can use, in the preferred order:
>>
>>   SVNCompression "lz4, zlib"
>>
>> While such approach is explicit, it also has a couple of drawbacks, as it:
>>   - leaves a window for mistakes (say, if the user sets "SVNCompression lz4"
>>     and inadvertently disables compression for older clients),
>>   - is not forward-compatible, as new compression algorithms require the
>>     server to be reconfigured, and
>>   - adds complexity.
>>
>
> - We can detect the configuration "SVNCompression lz4" and error out on it.
>
> - Forward compatibility: I'm not convinced that we even _need_ forward
>   compatibility for this; we can just tell people who set this knob that
>   they may miss out on new compression algorithms if they don't review
>   the setting of that knob when they upgrade to a new minor version.
>   (That's exactly how the client-side config:auth:password-stores
>   works.)  There are other options to solve the compatibility issue, but
>   as you say, they would add complexity.

+1

>
> - I don't see how the fact that «SVNCompression "lz4, zlib"» might be
>   considered too complex to add, affects my arguments about fsfs.conf.
>   As I said earlier, FSFS f8 does not need to support 1.9 and 1.10
>   clients in parallel, so it has no need for a compression negotiation
>   configuration.  Perhaps you could clarify the fsfs part of your
>   argument?
>
>> As we are lucky that LZ4 is a suitable alternative for zlib1, and that our
>> current configuration knobs are not tightly bound to zlib, I propose that we
>> keep the current logic for now and postpone the generic solution up to the
>> moment when we add another compression algorithm that does not fit this
>> scheme or requires additional configuration.
>>
>> In other words, we can always do this separately, when it's absolutely needed
>> (say, if we find ourselves adding a compression algorithm such as zstd).
>
> Yes, there are further changes we could make if we find ourselves adding
> more compression algorithms, but it is premature to consider them.  At
> this point, I simply suggest to add a fsfs.conf "compression" knob, with
> the syntax you proposed, that overrides compression-level if both are set.
> We can make further changes as and when.

+1

>
> Cheers,
>
> Daniel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Evgeny Kotkov
In reply to this post by Daniel Shahaf-2
Daniel Shahaf <[hidden email]> writes:

> - I don't see how the fact that «SVNCompression "lz4, zlib"» might be
>   considered too complex to add, affects my arguments about fsfs.conf.
>   As I said earlier, FSFS f8 does not need to support 1.9 and 1.10
>   clients in parallel, so it has no need for a compression negotiation
>   configuration.  Perhaps you could clarify the fsfs part of your
>   argument?

[...]

> Yes, there are further changes we could make if we find ourselves adding
> more compression algorithms, but it is premature to consider them.  At
> this point, I simply suggest to add a fsfs.conf "compression" knob, with
> the syntax you proposed, that overrides compression-level if both are set.
> We can make further changes as and when.

With a bit more thought on this, I agree that providing an explicit knob
(compression = ...) in fsfs.conf would be more appropriate than what we
have now.

I had an assumption that it would be nice to keep the configuration in
fsfs.conf and in mod_dav_svn working in a similar way.  But, as they
have different scopes (and only the latter requires negotiation), there
is no reason not to have the explicit configuration in fsfs.conf.  After all,
being explicit about what gets written on the disk is better.

Let me see what I can come up with regarding the new "compression = ..."
option.

>> While such approach is explicit, it also has a couple of drawbacks, as it:
>>   - leaves a window for mistakes (say, if the user sets "SVNCompression lz4"
>>     and inadvertently disables compression for older clients),
>>   - is not forward-compatible, as new compression algorithms require the
>>     server to be reconfigured, and
>>   - adds complexity.
>
> - We can detect the configuration "SVNCompression lz4" and error out on it.

(A minor side note for future readers)

I think that raising an error in this case might not be the right thing to do,
as this configuration could actually represent what a user wants.

For instance, for 10 or 100 Mbps LAN, where the throughput is limited by
the slow network, using fast compression can be better than disabling
compression altogether.  In this case, a user might want to avoid costly
zlib compression, but make use of LZ4 with newer clients, and that could
be done with "SVNCompression lz4".


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Stefan Sperling
In reply to this post by Evgeny Kotkov
On Mon, Jul 24, 2017 at 07:19:09PM +0300, Evgeny Kotkov wrote:

> However, there are a couple of difficulties with porting this approach to
> mod_dav_svn, i.e., if we introduce the SVNCompression directive.  There
> are clients that don't use LZ4, so, presumably, this options would require
> specifying all formats that a server can use, in the preferred order:
>
>   SVNCompression "lz4, zlib"
>
> While such approach is explicit, it also has a couple of drawbacks, as it:
>   - leaves a window for mistakes (say, if the user sets "SVNCompression lz4"
>     and inadvertently disables compression for older clients),
>   - is not forward-compatible, as new compression algorithms require the
>     server to be reconfigured, and
>   - adds complexity.

Does mod_dav_svn really need an option for this?

Can't lz4 (svndiff2) be negotiated as a mutual protocol capability of
client and server? Why won't a simple logic such as the following work:

  If the client announces lz4 compression level 1 support, use it.
  Else, use zlib.

As an admin or user my expectation would be that clients and servers pick
the most suitable protocol automatically. At best, options I set should
indicate what I think the ideal protocol configuration would be, but if that
won't work (e.g. because an old client conencts) I would expect a fallback.

I think we should also consider which decision people who are not experts
will make if we give them such options. I would expect many admins would
use a logic such as:

  lz4 is better (newer) than zlib!
  compression level 9 is better than level 1!

and then set their server to lz4 with compression-level 9...

Not even having to think about any of these options is much better
for our users and saves them time.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Evgeny Kotkov
Stefan Sperling <[hidden email]> writes:

> Does mod_dav_svn really need an option for this?
>
> Can't lz4 (svndiff2) be negotiated as a mutual protocol capability of
> client and server? Why won't a simple logic such as the following work:
>
>   If the client announces lz4 compression level 1 support, use it.
>   Else, use zlib.

The current state is that there are no additional options in mod_dav_svn,
and the "SVNCompression" option has just been discussed as an explicit
alternative to what we have now (but it also has a couple of potential
drawbacks, and I don't plan to implement it at this time).

Currently, the negotiation scheme works similarly to what you have described.
If the server has "SVNCompressionLevel 1", and the client supports LZ4, then
it will be used.  Otherwise, in case of older clients, they will be using zlib
compression with the corresponding compression level.

Apart from this, I was willing to propose that we switch to LZ4 compression
by default for both mod_dav_svn and newly created FSFS repositories.  That
is, by making "1" the default value of SVNCompressionLevel directive and
by making "lz4" the default value of the discussed "compression=" knob in
fsfs.conf.

(I will take this to a separate thread once all the necessary things are
in place.)


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Philip Martin
In reply to this post by Daniel Shahaf-2
[hidden email] writes:

> Author: kotkov
> Date: Fri Jul 14 11:13:47 2017
> New Revision: 1801940
>
> URL: http://svn.apache.org/viewvc?rev=1801940&view=rev
> Log:
> fsfs: Add initial support for LZ4 compression.

> * build.conf
>   (libsvn_subr): Build LZ4 library sources.

Why do we have an embedded copy of the source code?  That's not our
usual policy with 3rd party libraries and the LZ4 library is available
on Linux systems:

  $ pkg-config lz4 --libs
  -llz4

If the reason for embedding is simply to avoid an external dependency
then I think that is wrong -- we should not accumuate 3rd party code in
our code base.  I suppose there could be an argument to embed the LZ4
code if the compressed form is not stable from one release to another.
Is that the case?  If so then how stable is the current format?

--
Philip
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Stefan Sperling
In reply to this post by Evgeny Kotkov
On Wed, Jul 26, 2017 at 01:18:00PM +0300, Evgeny Kotkov wrote:

> Stefan Sperling <[hidden email]> writes:
>
> > Does mod_dav_svn really need an option for this?
> >
> > Can't lz4 (svndiff2) be negotiated as a mutual protocol capability of
> > client and server? Why won't a simple logic such as the following work:
> >
> >   If the client announces lz4 compression level 1 support, use it.
> >   Else, use zlib.
>
> The current state is that there are no additional options in mod_dav_svn,
> and the "SVNCompression" option has just been discussed as an explicit
> alternative to what we have now (but it also has a couple of potential
> drawbacks, and I don't plan to implement it at this time).
>
> Currently, the negotiation scheme works similarly to what you have described.
> If the server has "SVNCompressionLevel 1", and the client supports LZ4, then
> it will be used.  Otherwise, in case of older clients, they will be using zlib
> compression with the corresponding compression level.
>
> Apart from this, I was willing to propose that we switch to LZ4 compression
> by default for both mod_dav_svn and newly created FSFS repositories.  That
> is, by making "1" the default value of SVNCompressionLevel directive and
> by making "lz4" the default value of the discussed "compression=" knob in
> fsfs.conf.
>
> (I will take this to a separate thread once all the necessary things are
> in place.)
>
>
> Regards,
> Evgeny Kotkov

Sounds like we're headed in a good direction :-)

I agree that 'compression-level = 1' -> lz4 makes sense if you know
what to expect from lz4 vs. zlib. But honestly imagining myself giving
a workshop for administrators and explaining it like that, I see people
shake their heads in disbelief. It looks too much like an implementation
detail leaked into the config interface. A compression=lz4/zlib knob would
make a lot more sense to most people, with a corresponding subordinate
default for compression-level (lz4: 1, zlib: 5).

I strongly believe we should not require an admin to even look at fsfs.conf
to enable lz4. Because if admins look at this file today, they will easily get
overwhelmed with choices. I think we've been a bit too eager with adding
knobs to fsfs.conf in general. Most people never tweak them, and if anyone
asks I generally recommend to just leave them all at their defaults.
In hindsight, I would say all these options have a questionable reason
for existence and we should not have exposed them:

fail-stop
enable-rep-sharing
enable-dir-deltification
enable-props-deltification
max-deltification-walk
max-linear-deltification
compression-level
revprop-pack-size
compress-packed-revprops
block-size
l2p-page-size
p2l-page-size

I mean, apart from Stefan2, which person on this planet *really* knows what
*all* of these knobs do, why anyone would want to tweak which knob, and how
they potentially interact?

It would have been better to automatically adjust some of these values
at run-time where possible, or decide on doing something one way instead
of supporting several ways.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Philip Martin
In reply to this post by Philip Martin
Philip Martin <[hidden email]> writes:

>   $ pkg-config lz4 --libs
>   -llz4

Typo, that should be

      pkg-config liblz4 --libs

The way the lz4 code is currently embedded in libsvn_subr makes it
awkward to add support for an external liblz4.

--
Philip
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Evgeny Kotkov
In reply to this post by Stefan Sperling
Stefan Sperling <[hidden email]> writes:

> Sounds like we're headed in a good direction :-)

One particular thing that bothers me about using LZ4 as the new default
is that I think that a decision like that goes a bit against the usual policy
(add new optional feature, switch the default in the next minor release).

But, personally, in this case I would say that it would make sense to
immediately make LZ4 the new default and available to the users, if there
won't be objections to that.

> I agree that 'compression-level = 1' -> lz4 makes sense if you know
> what to expect from lz4 vs. zlib. But honestly imagining myself giving
> a workshop for administrators and explaining it like that, I see people
> shake their heads in disbelief. It looks too much like an implementation
> detail leaked into the config interface. A compression=lz4/zlib knob would
> make a lot more sense to most people, with a corresponding subordinate
> default for compression-level (lz4: 1, zlib: 5).

I think that the proposed solution with the new option that overrides (and,
essentially, deprecates) 'compression-level':

  compression = none | lz4 | zlib | zlib-1 ... zlib-9

is explicit and should be understood by the users in case they stumble
across it.

Please note that I don't think that we should change the behavior of
'compression-level' in fsfs.conf.  In other words, the 'compression-level'
would still mean "zlib compression level", exactly as it works now.  As it's
going to be superseded by the new 'compression' option, it's not necessary
and maybe even wrong to also change the existing meaning of 'compression-
level.


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Stefan Sperling
On Wed, Jul 26, 2017 at 02:41:31PM +0300, Evgeny Kotkov wrote:

> Stefan Sperling <[hidden email]> writes:
>
> > Sounds like we're headed in a good direction :-)
>
> One particular thing that bothers me about using LZ4 as the new default
> is that I think that a decision like that goes a bit against the usual policy
> (add new optional feature, switch the default in the next minor release).
>
> But, personally, in this case I would say that it would make sense to
> immediately make LZ4 the new default and available to the users, if there
> won't be objections to that.

If our test suite passes with lz4 I see no reason for concern about
making it the default right away.

> I think that the proposed solution with the new option that overrides (and,
> essentially, deprecates) 'compression-level':
>
>   compression = none | lz4 | zlib | zlib-1 ... zlib-9
>
> is explicit and should be understood by the users in case they stumble
> across it.

Makes sense.

> Please note that I don't think that we should change the behavior of
> 'compression-level' in fsfs.conf.  In other words, the 'compression-level'
> would still mean "zlib compression level", exactly as it works now.

Sure. Very explicitly deprecated, please. If an option is marked deprecated
then admins can tell they don't need to waste time thinking about it.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Stefan Sperling
In reply to this post by Philip Martin
On Wed, Jul 26, 2017 at 12:36:05PM +0100, Philip Martin wrote:

> Philip Martin <[hidden email]> writes:
>
> >   $ pkg-config lz4 --libs
> >   -llz4
>
> Typo, that should be
>
>       pkg-config liblz4 --libs
>
> The way the lz4 code is currently embedded in libsvn_subr makes it
> awkward to add support for an external liblz4.

I agree that an external library should be used during the build.
It makes life a lot easier for packagers on Unix-style systems,
and is the expected de-facto standard in that ecosystem.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Evgeny Kotkov
Stefan Sperling <[hidden email]> writes:

>> The way the lz4 code is currently embedded in libsvn_subr makes it
>> awkward to add support for an external liblz4.
>
> I agree that an external library should be used during the build.
> It makes life a lot easier for packagers on Unix-style systems,
> and is the expected de-facto standard in that ecosystem.

I would very much prefer if we didn't have the mandatory dependency on
the external LZ4 library.

This would be painful in case it's not available out of the box on the target
platform, and would add even more steps to the egregiously complicated
build process on Windows.  And building using an embedded copy of the
source isn't something new, as we already do that for utf8proc and with
sqlite-amalgamation.

Furthermore, we only require the core part of the LZ4 library — that is, two
relatively small files (lz4.c and lz4.h), which I think were specifically
designed this way to simplify the process of using LZ4 in various
applications.


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Daniel Shahaf-2
Evgeny Kotkov wrote on Wed, 26 Jul 2017 15:48 +0300:
> And building using an embedded copy of the source isn't something new,
> as we already do that for utf8proc and with sqlite-amalgamation.

utf8proc is indeed a precedent, but sqlite is not.

- We support linking against an external sqlite lib too, not just
  against sqlite-amalgamation.  (On Unix, at least, which is what stsp
  and Philip discuss)

- sqlite-amalgamation is not _embedded_ in the source; it is retrieved
  by the user, so they can use an sqlite that was released after the
  subversion tarball they're building.

> Furthermore, we only require the core part of the LZ4 library — that is, two
> relatively small files (lz4.c and lz4.h), which I think were specifically
> designed this way to simplify the process of using LZ4 in various
> applications.

This may be related to something I've been meaning to bring up: adding
lz4 has added five new warnings to the build (see attachment).  I meant
to bring them up and ask if they could please be addressed, but I wonder
if those warnings are related to your "designed to be importable" point.

Cheers,

Daniel

lz4-warnings.txt (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Stefan Sperling
In reply to this post by Evgeny Kotkov
On Wed, Jul 26, 2017 at 03:48:33PM +0300, Evgeny Kotkov wrote:

> Stefan Sperling <[hidden email]> writes:
>
> >> The way the lz4 code is currently embedded in libsvn_subr makes it
> >> awkward to add support for an external liblz4.
> >
> > I agree that an external library should be used during the build.
> > It makes life a lot easier for packagers on Unix-style systems,
> > and is the expected de-facto standard in that ecosystem.
>
> I would very much prefer if we didn't have the mandatory dependency on
> the external LZ4 library.

That's not what is being proposed. It's fine if the build can optionally
use a copy provided by the user, or even a copy embedded in our code.
But using that internal copy should not be mandatory.

Who will be blamed if, in the future, a package manager for some Linux/BSD
system fixes an exploitable bug in lz4, and accidentally leaves some systems
vulnerable because of a missing patch to SVN's internal copy?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Philip Martin
Stefan Sperling <[hidden email]> writes:

> Who will be blamed if, in the future, a package manager for some Linux/BSD
> system fixes an exploitable bug in lz4, and accidentally leaves some systems
> vulnerable because of a missing patch to SVN's internal copy?

Let us consider Subversion's embedded utf8proc.  How well are we doing
from a security point of view?

In 2012 we imported upstream 1.1.5 and since then we have made various
minor fixes to the code but as far as I can tell we are still using that
1.1.5 code.  Upstream has made several releases including one in 2016
with the ominous sounding change: "Buffer overrun fix".  Is that change
relevant to our code?  Can it be exploited in Subversion?  Has anyone
ever checked?  I've just had a quick look and our version of utf8proc is
so out-of-date it is hard to determine whether we are vulnerable or not.

Our use of utf8proc is mostly (completely?) confined to the client so if
there is an exploit it is probably not that interesting.  The same can
not be said of the proposal to have the server use an embedded LZ4 to
decode untrusted network data.

--
Philip
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Evgeny Kotkov
In reply to this post by Stefan Sperling
Stefan Sperling <[hidden email]> writes:

> That's not what is being proposed. It's fine if the build can optionally
> use a copy provided by the user, or even a copy embedded in our code.
> But using that internal copy should not be mandatory.

I could have misinterpreted one of the previous e-mails, as I was thinking
that the proposal was to *only* allow building with LZ4 being an external
dependency.

With that said, I do not mind providing the necessary options to allow
building against the non-bundled LZ4 library, although I am getting a feeling
that it would be a slight overkill.

If the consensus here is that we should provide such options, I'll put that
on my todo list.  In the meanwhile, if someone could jump in and help with
the necessary changes to the Makefiles and etc., that would be greatly
appreciated.


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Philip Martin
Evgeny Kotkov <[hidden email]> writes:

> on my todo list.  In the meanwhile, if someone could jump in and help with
> the necessary changes to the Makefiles and etc., that would be greatly
> appreciated.

Patch in preparation...

--
Philip
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: svn commit: r1801940 - in /subversion/trunk: ./ notes/ subversion/include/ subversion/include/private/ subversion/libsvn_delta/ subversion/libsvn_fs_fs/ subversion/libsvn_subr/ subversion/tests/libsvn_delta/ subversion/tests/libsvn_subr/

Evgeny Kotkov
In reply to this post by Evgeny Kotkov
Evgeny Kotkov <[hidden email]> writes:

> With a bit more thought on this, I agree that providing an explicit knob
> (compression = ...) in fsfs.conf would be more appropriate than what we
> have now.
>
> I had an assumption that it would be nice to keep the configuration in
> fsfs.conf and in mod_dav_svn working in a similar way.  But, as they
> have different scopes (and only the latter requires negotiation), there
> is no reason not to have the explicit configuration in fsfs.conf.  After all,
> being explicit about what gets written on the disk is better.
>
> Let me see what I can come up with regarding the new "compression = ..."
> option.

Committed in https://svn.apache.org/r1803639

Regards,
Evgeny Kotkov
Loading...