Proposal: new fsfs.conf properties

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
72 messages Options
1234
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Proposal: new fsfs.conf properties

Paul Hammant-3
1. compression-exempt-suffixes = mp3,mp4,jpeg

2. deltification-exempt-suffixes = mp3,mp4,jpeg

Regardless of the setting of 'compression-level', #1 above two mean certain things can skip the compression attempt.  It must give up at a certain point right?

Same for deltification re #2

I'm assuming debate happens now. Then y'all let me go off and diligently file a Jira ticket for this feature request, or I slink away suitably admonished...

- Paul
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

bahrep
Hello Paul,

On Sat, Jul 8, 2017 at 2:51 AM, Paul Hammant <[hidden email]> wrote:

>
> 1. compression-exempt-suffixes = mp3,mp4,jpeg
>
> 2. deltification-exempt-suffixes = mp3,mp4,jpeg
>
> Regardless of the setting of 'compression-level', #1 above two mean certain things can skip the compression attempt.  It must give up at a certain point right?
>
> Same for deltification re #2
>
> I'm assuming debate happens now. Then y'all let me go off and diligently file a Jira ticket for this feature request, or I slink away suitably admonished...
>
> - Paul

I'm not sure whether this is going to be useful. How do you expect
these exemptions to help Subversion users? What's the story?

--
With best regards,
Pavel Lyalyakin
VisualSVN Team
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Proposal: new fsfs.conf properties

Markus Schaber
Hi,


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: [hidden email] | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Proposal: new fsfs.conf properties

Markus Schaber
In reply to this post by bahrep
Hi,

(Sorry, it seems my previous message was sent _very_ prematurely :-(

From: Pavel Lyalyakin [mailto:[hidden email]]

> Hello Paul,
>
> On Sat, Jul 8, 2017 at 2:51 AM, Paul Hammant <[hidden email]> wrote:
> >
> > 1. compression-exempt-suffixes = mp3,mp4,jpeg
> >
> > 2. deltification-exempt-suffixes = mp3,mp4,jpeg
> >
> > Regardless of the setting of 'compression-level', #1 above two mean certain
> things can skip the compression attempt.  It must give up at a certain point
> right?
> >
> > Same for deltification re #2
> >
> > I'm assuming debate happens now. Then y'all let me go off and diligently
> file a Jira ticket for this feature request, or I slink away suitably
> admonished...
> >
> > - Paul
>
> I'm not sure whether this is going to be useful. How do you expect these
> exemptions to help Subversion users? What's the story?

I agree partly. Skipping compression for known "incompressible" formats like mpX, png or gif can come with performance benefits, saving some CPU cycles (see the recent performance disccussions on this list)

However, I'm not sure whether the same amounts for deltification. There are editing tasks which do not reencode the whole image / movie, and they can profit from deltification, for example:

- Lossless rotation / cropping of jpeg images.
- Editing / stripping the EXIF data of jpeg images.
- Embedding / dropping the preview thumbnail of jpeg images.
- Lossless MP3 editing (e. G. via mp3DirectCut).
- Editing MP3 meta data (e. G. Song Title)
(... and more...)

In all those cases, skipping deltification can drastically increase storage.


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: [hidden email] | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Paul Hammant-3
Markus - I've read your section on deltification and I can see evidence in what you wrote that you're concurrently in favor of and against it (the file-suffix exclusion idea). Can you re-read and clarify?

Thanks,

- Paul

On Tue, Jul 11, 2017 at 8:53 AM, Markus Schaber <[hidden email]> wrote:
Hi,

(Sorry, it seems my previous message was sent _very_ prematurely :-(

From: Pavel Lyalyakin [mailto:[hidden email]]
> Hello Paul,
>
> On Sat, Jul 8, 2017 at 2:51 AM, Paul Hammant <[hidden email]> wrote:
> >
> > 1. compression-exempt-suffixes = mp3,mp4,jpeg
> >
> > 2. deltification-exempt-suffixes = mp3,mp4,jpeg
> >
> > Regardless of the setting of 'compression-level', #1 above two mean certain
> things can skip the compression attempt.  It must give up at a certain point
> right?
> >
> > Same for deltification re #2
> >
> > I'm assuming debate happens now. Then y'all let me go off and diligently
> file a Jira ticket for this feature request, or I slink away suitably
> admonished...
> >
> > - Paul
>
> I'm not sure whether this is going to be useful. How do you expect these
> exemptions to help Subversion users? What's the story?

I agree partly. Skipping compression for known "incompressible" formats like mpX, png or gif can come with performance benefits, saving some CPU cycles (see the recent performance disccussions on this list)

However, I'm not sure whether the same amounts for deltification. There are editing tasks which do not reencode the whole image / movie, and they can profit from deltification, for example:

- Lossless rotation / cropping of jpeg images.
- Editing / stripping the EXIF data of jpeg images.
- Embedding / dropping the preview thumbnail of jpeg images.
- Lossless MP3 editing (e. G. via mp3DirectCut).
- Editing MP3 meta data (e. G. Song Title)
(... and more...)

In all those cases, skipping deltification can drastically increase storage.


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. <a href="tel:%2B49-831-54031-979" value="+4983154031979">+49-831-54031-979 | Fax <a href="tel:%2B49-831-54031-50" value="+498315403150">+49-831-54031-50

E-Mail: [hidden email] | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Proposal: new fsfs.conf properties

Markus Schaber
Hi, Paul,


From: Paul Hammant [mailto:[hidden email]]

> Markus - I've read your section on deltification and I can see evidence in what you wrote that you're concurrently in favor of and against it (the file-suffix exclusion idea). Can you re-read and clarify?

>> I agree partly. Skipping compression for known "incompressible" formats like mpX, png or gif can come with performance benefits, saving some CPU cycles (see the recent performance disccussions on this list)

>> However, I'm not sure whether the same amounts for deltification. There are editing tasks which do not reencode the whole image / movie, and they can profit from deltification, for example:

>> - Lossless rotation / cropping of jpeg images.
>> - Editing / stripping the EXIF data of jpeg images.
>> - Embedding / dropping the preview thumbnail of jpeg images.
>> - Lossless MP3 editing (e. G. via mp3DirectCut).
>> - Editing MP3 meta data (e. G. Song Title)
>> (... and more...)

>> In all those cases, skipping deltification can drastically increase storage.

To summarize it up:

I expect significant benefits in some use cases by skipping the compression, thus I'm +1 if benchmarks prove it's worth the effort.

I see the danger of drastically increased bandwith and storage size (transferring/storing the whole mp3 instead of just some changed meta data bytes) in some common use cases when deltification is skipped. Thus, I'm skeptical (count it as -0), and I'd kindly suggest to do some benchmarks for those cases before implementation, and clear documentation of the possible negative effects if it's implemented.


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: [hidden email] | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.





Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions
________________________________________
3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: [hidden email] | Web: codesys.com | CODESYS store: store.codesys.com
CODESYS forum: forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915
________________________________________
This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Branko Čibej
On 11.07.2017 15:39, Markus Schaber wrote:

> Hi, Paul,
>
>
> From: Paul Hammant [mailto:[hidden email]]
>
>> Markus - I've read your section on deltification and I can see evidence in what you wrote that you're concurrently in favor of and against it (the file-suffix exclusion idea). Can you re-read and clarify?
>>> I agree partly. Skipping compression for known "incompressible" formats like mpX, png or gif can come with performance benefits, saving some CPU cycles (see the recent performance disccussions on this list)
>>> However, I'm not sure whether the same amounts for deltification. There are editing tasks which do not reencode the whole image / movie, and they can profit from deltification, for example:
>>> - Lossless rotation / cropping of jpeg images.
>>> - Editing / stripping the EXIF data of jpeg images.
>>> - Embedding / dropping the preview thumbnail of jpeg images.
>>> - Lossless MP3 editing (e. G. via mp3DirectCut).
>>> - Editing MP3 meta data (e. G. Song Title)
>>> (... and more...)
>>> In all those cases, skipping deltification can drastically increase storage.
> To summarize it up:
>
> I expect significant benefits in some use cases by skipping the compression, thus I'm +1 if benchmarks prove it's worth the effort.
>
> I see the danger of drastically increased bandwith and storage size (transferring/storing the whole mp3 instead of just some changed meta data bytes) in some common use cases when deltification is skipped. Thus, I'm skeptical (count it as -0), and I'd kindly suggest to do some benchmarks for those cases before implementation, and clear documentation of the possible negative effects if it's implemented.


So, first of all, if this is server-side configuration, it has _no_
effect on the client so the client will continue to send (compressed)
deltas. This will have exactly zero effect on bandwidth or client CPU
utilization.

Another issue I have with the proposal is the idea to use file suffixes.
That's usually the wrong way to go about things (case in point: Windows
does it, with didastrous results). It's much better to determine file
format by inspection, such as, e.g., libmagic does. We already have
optional support for libmagic in the client (to set svn:mime-type).

-- Brane
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Paul Hammant-3
I'm perfectly happy for the solution to be mime-type based. 

Maybe we can take the mime-type to suffix table from Apache itself to do the translation :- https://svn.apache.org/repos/asf/httpd/httpd/trunk/docs/conf/mime.types :-P

I used it (implicitly) in a Subversion backed wysi-wiki ten years ago - yeesh!:  https://www.youtube.com/watch?v=WfjK0Pb6IIM (26 seconds of your time: Svn, DAV, Auto-increment, a JavaWeb-app to add on a site experience via Sitemesh, and Mozilla's SeaMonkey editing a page and browsing it).
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Daniel Shahaf-5
In reply to this post by Markus Schaber
On Tue, Jul 11, 2017 at 01:39:56PM +0000, Markus Schaber wrote:
> To summarize it up:
>
> I expect significant benefits in some use cases by skipping the
> compression, thus I'm +1 if benchmarks prove it's worth the effort.

It is easy to have deltification without compression, either by using
svndiff0 (instead of svndiff1) or by using svndiff1 with zlib
compression level set appropriately.

> I see the danger of drastically increased bandwith and storage size
> (transferring/storing the whole mp3 instead of just some changed meta
> data bytes) in some common use cases when deltification is skipped.
> Thus, I'm skeptical (count it as -0), and I'd kindly suggest to do
> some benchmarks for those cases before implementation, and clear
> documentation of the possible negative effects if it's implemented.

Regarding compression, would it make sense for the server to compute the
compressed delta, and it turns out to be larger than X% of the
uncompressed&undeltified file, to just store the latter?  I.e., to
compute the DELTA rep but use a PLAIN rep if the DELTA rep would be
larger than X% (in bytes) of the PLAIN rep?

IIRC this is already so with X=100, but for some filetypes it might make
sense to set X lower.

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Paul Hammant-3
So I'm after a time saving. I'm perfectly happy for the backend to waste space (in my configuration), I just don't want it to take 15 mins to transfer a single 15GB file into Subversion.

In my configuration, I'd like to pre-advise Subversion to save as much time as possible for uploads, by skipping steps that are known in advance to be meaningless for the use case.

- Paul 

On Tue, Jul 11, 2017 at 4:00 PM, Daniel Shahaf <[hidden email]> wrote:
On Tue, Jul 11, 2017 at 01:39:56PM +0000, Markus Schaber wrote:
> To summarize it up:
>
> I expect significant benefits in some use cases by skipping the
> compression, thus I'm +1 if benchmarks prove it's worth the effort.

It is easy to have deltification without compression, either by using
svndiff0 (instead of svndiff1) or by using svndiff1 with zlib
compression level set appropriately.

> I see the danger of drastically increased bandwith and storage size
> (transferring/storing the whole mp3 instead of just some changed meta
> data bytes) in some common use cases when deltification is skipped.
> Thus, I'm skeptical (count it as -0), and I'd kindly suggest to do
> some benchmarks for those cases before implementation, and clear
> documentation of the possible negative effects if it's implemented.

Regarding compression, would it make sense for the server to compute the
compressed delta, and it turns out to be larger than X% of the
uncompressed&undeltified file, to just store the latter?  I.e., to
compute the DELTA rep but use a PLAIN rep if the DELTA rep would be
larger than X% (in bytes) of the PLAIN rep?

IIRC this is already so with X=100, but for some filetypes it might make
sense to set X lower.

Cheers,

Daniel

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Stefan Sperling
In reply to this post by Branko Čibej
On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
> Another issue I have with the proposal is the idea to use file suffixes.
> That's usually the wrong way to go about things (case in point: Windows
> does it, with didastrous results). It's much better to determine file
> format by inspection, such as, e.g., libmagic does. We already have
> optional support for libmagic in the client (to set svn:mime-type).

I would not feel comfortable having the server parse arbitrary data with
libmagic. The libmagic code is not very safe to run on untrusted input.
I have seen libmagic crash my svn client on several occasions even on
text files I wrote.

At the client side it's a bit less dangerous because users have already
told svn to add the files in question to version control, and a libmagic
exploit running on the client machine can do less harm than a server-side one.

Granted, commits are usually authenticated. If we did this we should at
least make really sure that no unauthenticated access can trigger this code.
Ideally, it would be sandboxed somehow if we started using it on the server.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Branko Čibej
On 11.07.2017 22:50, Stefan Sperling wrote:

> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>> Another issue I have with the proposal is the idea to use file suffixes.
>> That's usually the wrong way to go about things (case in point: Windows
>> does it, with didastrous results). It's much better to determine file
>> format by inspection, such as, e.g., libmagic does. We already have
>> optional support for libmagic in the client (to set svn:mime-type).
> I would not feel comfortable having the server parse arbitrary data with
> libmagic. The libmagic code is not very safe to run on untrusted input.
> I have seen libmagic crash my svn client on several occasions even on
> text files I wrote.
>
> At the client side it's a bit less dangerous because users have already
> told svn to add the files in question to version control, and a libmagic
> exploit running on the client machine can do less harm than a server-side one.
>
> Granted, commits are usually authenticated. If we did this we should at
> least make really sure that no unauthenticated access can trigger this code.
> Ideally, it would be sandboxed somehow if we started using it on the server.

I wasn't really proposing to use libmagic on the server. My point is
that instead of using file name suffixes (which the compression and
deltification code don't know about), we'd do some sort of inspection
instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
is fairly easy just from looking at a few bytes of headers. Same goes
for most image and video formats.

Of course, one could always concoct a file that would trick such
inspection, but at least that's marginally harder to do than commit a
large text file full of spaces and calling it 'spaceinvaders.jpg'. :)

Random binary data is harder to detect, but we already deal with that
after the fact by using the plain text if the delta is too large.

-- Brane
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Branko Čibej
On 12.07.2017 12:09, Branko Čibej wrote:

> On 11.07.2017 22:50, Stefan Sperling wrote:
>> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>>> Another issue I have with the proposal is the idea to use file suffixes.
>>> That's usually the wrong way to go about things (case in point: Windows
>>> does it, with didastrous results). It's much better to determine file
>>> format by inspection, such as, e.g., libmagic does. We already have
>>> optional support for libmagic in the client (to set svn:mime-type).
>> I would not feel comfortable having the server parse arbitrary data with
>> libmagic. The libmagic code is not very safe to run on untrusted input.
>> I have seen libmagic crash my svn client on several occasions even on
>> text files I wrote.
>>
>> At the client side it's a bit less dangerous because users have already
>> told svn to add the files in question to version control, and a libmagic
>> exploit running on the client machine can do less harm than a server-side one.
>>
>> Granted, commits are usually authenticated. If we did this we should at
>> least make really sure that no unauthenticated access can trigger this code.
>> Ideally, it would be sandboxed somehow if we started using it on the server.
> I wasn't really proposing to use libmagic on the server. My point is
> that instead of using file name suffixes (which the compression and
> deltification code don't know about), we'd do some sort of inspection
> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
> is fairly easy just from looking at a few bytes of headers. Same goes
> for most image and video formats.
>
> Of course, one could always concoct a file that would trick such
> inspection, but at least that's marginally harder to do than commit a
> large text file full of spaces and calling it 'spaceinvaders.jpg'. :)
>
> Random binary data is harder to detect, but we already deal with that
> after the fact by using the plain text if the delta is too large.

Oh and another thing: I'd prefer to _not_ make such a feature
configurable with yet another knob. We have too many knobs ... either
make it safe to use and always-on, or forget about it. IMO.

-- Brane

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Johan Corveleyn-3
In reply to this post by Branko Čibej
On Wed, Jul 12, 2017 at 12:09 PM, Branko Čibej <[hidden email]> wrote:

> On 11.07.2017 22:50, Stefan Sperling wrote:
>> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>>> Another issue I have with the proposal is the idea to use file suffixes.
>>> That's usually the wrong way to go about things (case in point: Windows
>>> does it, with didastrous results). It's much better to determine file
>>> format by inspection, such as, e.g., libmagic does. We already have
>>> optional support for libmagic in the client (to set svn:mime-type).
>> I would not feel comfortable having the server parse arbitrary data with
>> libmagic. The libmagic code is not very safe to run on untrusted input.
>> I have seen libmagic crash my svn client on several occasions even on
>> text files I wrote.
>>
>> At the client side it's a bit less dangerous because users have already
>> told svn to add the files in question to version control, and a libmagic
>> exploit running on the client machine can do less harm than a server-side one.
>>
>> Granted, commits are usually authenticated. If we did this we should at
>> least make really sure that no unauthenticated access can trigger this code.
>> Ideally, it would be sandboxed somehow if we started using it on the server.
>
> I wasn't really proposing to use libmagic on the server. My point is
> that instead of using file name suffixes (which the compression and
> deltification code don't know about), we'd do some sort of inspection
> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
> is fairly easy just from looking at a few bytes of headers. Same goes
> for most image and video formats.
>
> Of course, one could always concoct a file that would trick such
> inspection, but at least that's marginally harder to do than commit a
> large text file full of spaces and calling it 'spaceinvaders.jpg'. :)
>
> Random binary data is harder to detect, but we already deal with that
> after the fact by using the plain text if the delta is too large.

We could also make the process driven by a "client-side suggestion".
Driving it from the client-side also gives us the possibility to
eliminate the client-side deltification overhead.

I.e. the client has some logic (libmagic, suffix, looking at the first
100 bytes, ...) to determine that it's not worth deltifying and/or
compressing. It doesn't do deltification itself, and lets the server
know that it probably shouldn't either (or the server sees that the
client hasn't deltified, so accepts the content automatically as
"non-deltifiable / non-compressable"?).

Maybe needs a server-side config setting to make it respect or ignore
the client-side suggestion.

--
Johan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Branko Čibej
On 12.07.2017 12:24, Johan Corveleyn wrote:

> On Wed, Jul 12, 2017 at 12:09 PM, Branko Čibej <[hidden email]> wrote:
>> On 11.07.2017 22:50, Stefan Sperling wrote:
>>> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>>>> Another issue I have with the proposal is the idea to use file suffixes.
>>>> That's usually the wrong way to go about things (case in point: Windows
>>>> does it, with didastrous results). It's much better to determine file
>>>> format by inspection, such as, e.g., libmagic does. We already have
>>>> optional support for libmagic in the client (to set svn:mime-type).
>>> I would not feel comfortable having the server parse arbitrary data with
>>> libmagic. The libmagic code is not very safe to run on untrusted input.
>>> I have seen libmagic crash my svn client on several occasions even on
>>> text files I wrote.
>>>
>>> At the client side it's a bit less dangerous because users have already
>>> told svn to add the files in question to version control, and a libmagic
>>> exploit running on the client machine can do less harm than a server-side one.
>>>
>>> Granted, commits are usually authenticated. If we did this we should at
>>> least make really sure that no unauthenticated access can trigger this code.
>>> Ideally, it would be sandboxed somehow if we started using it on the server.
>> I wasn't really proposing to use libmagic on the server. My point is
>> that instead of using file name suffixes (which the compression and
>> deltification code don't know about), we'd do some sort of inspection
>> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
>> is fairly easy just from looking at a few bytes of headers. Same goes
>> for most image and video formats.
>>
>> Of course, one could always concoct a file that would trick such
>> inspection, but at least that's marginally harder to do than commit a
>> large text file full of spaces and calling it 'spaceinvaders.jpg'. :)
>>
>> Random binary data is harder to detect, but we already deal with that
>> after the fact by using the plain text if the delta is too large.
> We could also make the process driven by a "client-side suggestion".
> Driving it from the client-side also gives us the possibility to
> eliminate the client-side deltification overhead.
>
> I.e. the client has some logic (libmagic, suffix, looking at the first
> 100 bytes, ...) to determine that it's not worth deltifying and/or
> compressing. It doesn't do deltification itself, and lets the server
> know that it probably shouldn't either (or the server sees that the
> client hasn't deltified, so accepts the content automatically as
> "non-deltifiable / non-compressable"?).
>
> Maybe needs a server-side config setting to make it respect or ignore
> the client-side suggestion.

That's such an easy way to make a malicious client explode the
repository size. And ... there's realy no reason to complicate. The
server's storage layer can cheaply do all the necessary checks without
having to believe the client, and without adding yet another
(dangerous!) config knob.

-- Brane
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Johan Corveleyn-3
On Wed, Jul 12, 2017 at 12:27 PM, Branko Čibej <[hidden email]> wrote:

> On 12.07.2017 12:24, Johan Corveleyn wrote:
>> On Wed, Jul 12, 2017 at 12:09 PM, Branko Čibej <[hidden email]> wrote:
>>> On 11.07.2017 22:50, Stefan Sperling wrote:
>>>> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>>>>> Another issue I have with the proposal is the idea to use file suffixes.
>>>>> That's usually the wrong way to go about things (case in point: Windows
>>>>> does it, with didastrous results). It's much better to determine file
>>>>> format by inspection, such as, e.g., libmagic does. We already have
>>>>> optional support for libmagic in the client (to set svn:mime-type).
>>>> I would not feel comfortable having the server parse arbitrary data with
>>>> libmagic. The libmagic code is not very safe to run on untrusted input.
>>>> I have seen libmagic crash my svn client on several occasions even on
>>>> text files I wrote.
>>>>
>>>> At the client side it's a bit less dangerous because users have already
>>>> told svn to add the files in question to version control, and a libmagic
>>>> exploit running on the client machine can do less harm than a server-side one.
>>>>
>>>> Granted, commits are usually authenticated. If we did this we should at
>>>> least make really sure that no unauthenticated access can trigger this code.
>>>> Ideally, it would be sandboxed somehow if we started using it on the server.
>>> I wasn't really proposing to use libmagic on the server. My point is
>>> that instead of using file name suffixes (which the compression and
>>> deltification code don't know about), we'd do some sort of inspection
>>> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
>>> is fairly easy just from looking at a few bytes of headers. Same goes
>>> for most image and video formats.
>>>
>>> Of course, one could always concoct a file that would trick such
>>> inspection, but at least that's marginally harder to do than commit a
>>> large text file full of spaces and calling it 'spaceinvaders.jpg'. :)
>>>
>>> Random binary data is harder to detect, but we already deal with that
>>> after the fact by using the plain text if the delta is too large.
>> We could also make the process driven by a "client-side suggestion".
>> Driving it from the client-side also gives us the possibility to
>> eliminate the client-side deltification overhead.
>>
>> I.e. the client has some logic (libmagic, suffix, looking at the first
>> 100 bytes, ...) to determine that it's not worth deltifying and/or
>> compressing. It doesn't do deltification itself, and lets the server
>> know that it probably shouldn't either (or the server sees that the
>> client hasn't deltified, so accepts the content automatically as
>> "non-deltifiable / non-compressable"?).
>>
>> Maybe needs a server-side config setting to make it respect or ignore
>> the client-side suggestion.
>
> That's such an easy way to make a malicious client explode the
> repository size. And ... there's realy no reason to complicate. The
> server's storage layer can cheaply do all the necessary checks without
> having to believe the client, and without adding yet another
> (dangerous!) config knob.

Yes, well in any case allowing this by server-side inspection will
also open up possibilities for blowing up the repository by a
malicious client.

In fact, making it coupled with "client also non-deltifies" forces the
client to also send those huge files over the wire, making it a little
bit more difficult to DoS the server by blowing it up. If the client
can still deltify (only sending a few bytes), but trick the server
into storing those as full-texts, the attack can be more powerful I
guess.

--
Johan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Proposal: new fsfs.conf properties

Markus Schaber
Hi,

From: Johan Corveleyn [mailto:[hidden email]]
> > That's such an easy way to make a malicious client explode the
> > repository size. And ... there's realy no reason to complicate. The
> > server's storage layer can cheaply do all the necessary checks without
> > having to believe the client, and without adding yet another
> > (dangerous!) config knob.
>
> Yes, well in any case allowing this by server-side inspection will also open
> up possibilities for blowing up the repository by a malicious client.

A malicious user can always "explode" the server by just uploading/overwriting huge random files. Using svnmucc and a unix pipe, he doesn't even need a local file or working copy for that.

Thus, I think listening to a client hint in general will not open a completely new security hole. SVN repositories are a kind of data storage, and we cannot prevent users from abusing it by storing data...

> In fact, making it coupled with "client also non-deltifies" forces the client
> to also send those huge files over the wire, making it a little bit more
> difficult to DoS the server by blowing it up. If the client can still deltify
> (only sending a few bytes), but trick the server into storing those as full-
> texts, the attack can be more powerful I guess.

Yes, I think allowing deltification for the client while storing non-deltified on the server amplifies the possible attack, so we should be careful.

Could the server use the already pre-deltified and -compressed representation coming from the client, without compressing and re-deltifying itself (but still verifying it, of course).

On the other hand, I'd also hesitate to automatically skip deltification and compression just because the client delivers uncompressed or non-deltified content. This will effectively disable deltification and compression for svnmucc, DAV-autoversioning and maybe some other use cases.


Best regards

Markus Schaber

CODESYS® a trademark of 3S-Smart Software Solutions GmbH

Inspiring Automation Solutions

3S-Smart Software Solutions GmbH
Dipl.-Inf. Markus Schaber | Product Development Core Technology
Memminger Str. 151 | 87439 Kempten | Germany
Tel. +49-831-54031-979 | Fax +49-831-54031-50

E-Mail: [hidden email] | Web: http://www.codesys.com | CODESYS store: http://store.codesys.com
CODESYS forum: http://forum.codesys.com

Managing Directors: Dipl.Inf. Dieter Hess, Dipl.Inf. Manfred Werner | Trade register: Kempten HRB 6186 | Tax ID No.: DE 167014915

This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received
this e-mail in error) please notify the sender immediately and destroy this e-mail. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Daniel Shahaf-2
In reply to this post by Branko Čibej
Branko Čibej wrote on Wed, 12 Jul 2017 12:09 +0200:
> I wasn't really proposing to use libmagic on the server. My point is
> that instead of using file name suffixes (which the compression and
> deltification code don't know about), we'd do some sort of inspection
> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
> is fairly easy just from looking at a few bytes of headers. Same goes
> for most image and video formats.

That's an option, but it would mean re-solving the problem libmagic
solves.  Is there a way for us to use libmagic securely?

E.g., we could give to libmagic only the first 10 or 20 bytes of the
file (which is enough for it to recognise mpeg/jpeg/xz files, in my
testing), or we could ask libmagic to provide an API that only runs
'safe' magic file tests (e.g., strcmp/memcmp-based tests only)…

Cheers,

daniel
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Paul Hammant-3
In reply to this post by Johan Corveleyn-3
You know, in all seriousness I think the (empty by default) list of exempted files suffixes the the best way forward.  If suffixes is good enough for Apache itself to use (link provided earlier), it is good enough in this scenario on the server side of Svn. If the function in question doesn't know the file name then I that param should be added to the functions args (and backwards through all the methods in the stack until it's reached the place where the resource name was known).

I'd be cranking up the JetBrains' Clion myself to do the refactoring and giving you cough a pull request, but I've not done any C since 1991 - https://paulhammant.com/images/swapcols_mag_page.jpg - (top right).   Anyone in NYC want to bring be up to speed with the build, and acclimate me to the source?  Or by ScreenHero (will send invites). 

- Paul

On Wed, Jul 12, 2017 at 6:33 AM, Johan Corveleyn <[hidden email]> wrote:
On Wed, Jul 12, 2017 at 12:27 PM, Branko Čibej <[hidden email]> wrote:
> On 12.07.2017 12:24, Johan Corveleyn wrote:
>> On Wed, Jul 12, 2017 at 12:09 PM, Branko Čibej <[hidden email]> wrote:
>>> On 11.07.2017 22:50, Stefan Sperling wrote:
>>>> On Tue, Jul 11, 2017 at 09:11:58PM +0200, Branko Čibej wrote:
>>>>> Another issue I have with the proposal is the idea to use file suffixes.
>>>>> That's usually the wrong way to go about things (case in point: Windows
>>>>> does it, with didastrous results). It's much better to determine file
>>>>> format by inspection, such as, e.g., libmagic does. We already have
>>>>> optional support for libmagic in the client (to set svn:mime-type).
>>>> I would not feel comfortable having the server parse arbitrary data with
>>>> libmagic. The libmagic code is not very safe to run on untrusted input.
>>>> I have seen libmagic crash my svn client on several occasions even on
>>>> text files I wrote.
>>>>
>>>> At the client side it's a bit less dangerous because users have already
>>>> told svn to add the files in question to version control, and a libmagic
>>>> exploit running on the client machine can do less harm than a server-side one.
>>>>
>>>> Granted, commits are usually authenticated. If we did this we should at
>>>> least make really sure that no unauthenticated access can trigger this code.
>>>> Ideally, it would be sandboxed somehow if we started using it on the server.
>>> I wasn't really proposing to use libmagic on the server. My point is
>>> that instead of using file name suffixes (which the compression and
>>> deltification code don't know about), we'd do some sort of inspection
>>> instead. Detecting ZIP files, or gzip/bzip2/xz-compressed files, etc.,
>>> is fairly easy just from looking at a few bytes of headers. Same goes
>>> for most image and video formats.
>>>
>>> Of course, one could always concoct a file that would trick such
>>> inspection, but at least that's marginally harder to do than commit a
>>> large text file full of spaces and calling it 'spaceinvaders.jpg'. :)
>>>
>>> Random binary data is harder to detect, but we already deal with that
>>> after the fact by using the plain text if the delta is too large.
>> We could also make the process driven by a "client-side suggestion".
>> Driving it from the client-side also gives us the possibility to
>> eliminate the client-side deltification overhead.
>>
>> I.e. the client has some logic (libmagic, suffix, looking at the first
>> 100 bytes, ...) to determine that it's not worth deltifying and/or
>> compressing. It doesn't do deltification itself, and lets the server
>> know that it probably shouldn't either (or the server sees that the
>> client hasn't deltified, so accepts the content automatically as
>> "non-deltifiable / non-compressable"?).
>>
>> Maybe needs a server-side config setting to make it respect or ignore
>> the client-side suggestion.
>
> That's such an easy way to make a malicious client explode the
> repository size. And ... there's realy no reason to complicate. The
> server's storage layer can cheaply do all the necessary checks without
> having to believe the client, and without adding yet another
> (dangerous!) config knob.

Yes, well in any case allowing this by server-side inspection will
also open up possibilities for blowing up the repository by a
malicious client.

In fact, making it coupled with "client also non-deltifies" forces the
client to also send those huge files over the wire, making it a little
bit more difficult to DoS the server by blowing it up. If the client
can still deltify (only sending a few bytes), but trick the server
into storing those as full-texts, the attack can be more powerful I
guess.

--
Johan

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: new fsfs.conf properties

Mark Phippard-3
In reply to this post by Paul Hammant-3
I cannot find it in archives so maybe this happened in IRC, but I remember one time suggesting we add a new versioned svn:XXX property to control this.  This could then be set by the client based on extension if desired.  I recall my suggestion was a compression on|off property that when turned off would cause us to omit the deltification the client does when sending to the server.  If it makes sense for the server to do something similar when it stores the file, great.

I suggested if we wanted to get really fancy we could also let the property control the size of the window used when run xdelta or whatever we do.  I recall for some binary files if you have it a larger window it could do a better job of compressing.  I am no expert here I just recall this being mentioned in the past as a tuning option that could make a difference.  So a property would be a way to expose that option.

I think the main use case though is to just disable it for files where someone knows they would be better off just skipping deltification because it is not going to reduce the size and just use a lot of time and CPU.

--
1234
Loading...