Glob syntax for svndumpfilter exclude --pattern

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Glob syntax for svndumpfilter exclude --pattern

Julian Foad-5
The '--pattern' option to svndumpfilter exclude/include was introduced
in Subversion 1.7 by http://svn.apache.org/r880381 (cc: Brane):

[[[
$ svndumpfilter --help exclude
[...]
   --pattern : Treat the path prefixes as file glob patterns.
]]]

We don't say what sort of glob patterns it accepts, even in the Book
[1]. In particular, for a customer's use case, I would like to know
whether it supports matching a variable number of path elements, for
which a '/**/' syntax is typically used.

Its implementation uses svn_cstring_match_glob_list() which doesn't say
either. The implementation of that uses apr_fnmatch(flags=0).

apr_fnmatch is documented. In particular it says '*' matches "Any
sequence of zero or more characters". (We are *not* giving the flag
APR_FNM_PATHNAME "Slash must be matched by slash".)

So the particular answer I was looking for is that '*' will cross
multiple path elements:

[[[
   $ svndumpfilter exclude --pattern '*/alpha'
   Excluding prefix patterns:
    '/*/alpha'
   [...]
   Dropped 2 nodes:
    '/foo/alpha'
    '/foo/bar/alpha'
]]]

(I noticed a suboptimal edge case: the above example doesn't match the
path '/alpha', because the implementation adds a leading slash to the
specified pattern, making '/*/alpha', and then insists on matching both
slashes literally.)

- Julian


[1]
http://svnbook.red-bean.com/nightly/en/svn.ref.svndumpfilter.html#svn.ref.svndumpfilter.sw.pattern

Reply | Threaded
Open this post in threaded view
|

Re: Glob syntax for svndumpfilter exclude --pattern

Julian Foad-6
Julian Foad wrote:
> [[[
> $ svndumpfilter --help exclude
> [...]
>   --pattern : Treat the path prefixes as file glob patterns.
> ]]]

I expanded the help a little in revision 1783741, to include what I felt
are the most useful brief details:

[[[
$ svndumpfilter --help exclude
[...]
   --pattern    : Treat the path prefixes as file glob patterns.
                  Glob special characters are '*' '?' '[]' and '\'.
                  Character '/' is not treated specially, so
                  pattern /*/foo matches paths /a/foo and /a/b/foo.
]]]

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Glob syntax for svndumpfilter exclude --pattern

Branko Čibej
In reply to this post by Julian Foad-5
On 20.02.2017 11:23, Julian Foad wrote:
> (I noticed a suboptimal edge case: the above example doesn't match the
> path '/alpha', because the implementation adds a leading slash to the
> specified pattern, making '/*/alpha', and then insists on matching
> both slashes literally.)

This is a bug. Everything else is as expected.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Glob syntax for svndumpfilter exclude --pattern

Julian Foad-5
Branko Čibej wrote:
> On 20.02.2017 11:23, Julian Foad wrote:
>> (I noticed a suboptimal edge case: the above example doesn't match the
>> path '/alpha', because the implementation adds a leading slash to the
>> specified pattern, making '/*/alpha', and then insists on matching
>> both slashes literally.)
>
> This is a bug.

Actually, adding the leading slash is not really relevant. The more
general case is that pattern "/trunk/*/README" won't match path
"/trunk/README", as a consequence of not treating slash as special in
paths. A user might well expect it to match, knowing that the pattern
"/trunk//README" *will* match due to our canonicalizing the input
patterns before matching. So there is an inconsistency there.

That may not have been intended, but I don't propose we change it now,
because I can't think of any simple change that would be an improvement.

> Everything else is as expected.

Thanks.

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Glob syntax for svndumpfilter exclude --pattern

Branko Čibej
On 20.02.2017 17:17, Julian Foad wrote:

> Branko Čibej wrote:
>> On 20.02.2017 11:23, Julian Foad wrote:
>>> (I noticed a suboptimal edge case: the above example doesn't match the
>>> path '/alpha', because the implementation adds a leading slash to the
>>> specified pattern, making '/*/alpha', and then insists on matching
>>> both slashes literally.)
>>
>> This is a bug.
>
> Actually, adding the leading slash is not really relevant. The more
> general case is that pattern "/trunk/*/README" won't match path
> "/trunk/README", as a consequence of not treating slash as special in
> paths. A user might well expect it to match, knowing that the pattern
> "/trunk//README" *will* match due to our canonicalizing the input
> patterns before matching. So there is an inconsistency there.

But the user didn't write /trunk//README, she wrote /trunk/*/README
which can't be canonicalized to /trunk/README. Remember these are not
paths but glob patterns, if we started "canonicalizing" them it'd be --
well, interesting -- to try to avoid side effects.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Glob syntax for svndumpfilter exclude --pattern

Julian Foad-6
Branko Čibej wrote:

> On 20.02.2017 17:17, Julian Foad wrote:
> > Actually, adding the leading slash is not really relevant. The more
> > general case is that pattern "/trunk/*/README" won't match path
> > "/trunk/README", as a consequence of not treating slash as special in
> > paths. A user might well expect it to match, knowing that the pattern
> > "/trunk//README" *will* match due to our canonicalizing the input
> > patterns before matching. So there is an inconsistency there.
>
> But the user didn't write /trunk//README, she wrote /trunk/*/README
> which can't be canonicalized to /trunk/README. Remember these are not
> paths but glob patterns, if we started "canonicalizing" them it'd be --
> well, interesting -- to try to avoid side effects.

Yes, I know, and I don't propose it.

- Julian

Reply | Threaded
Open this post in threaded view
|

Re: Glob syntax for svndumpfilter exclude --pattern

Branko Čibej
On 22.02.2017 15:21, Julian Foad wrote:

> Branko Čibej wrote:
>> On 20.02.2017 17:17, Julian Foad wrote:
>> > Actually, adding the leading slash is not really relevant. The more
>> > general case is that pattern "/trunk/*/README" won't match path
>> > "/trunk/README", as a consequence of not treating slash as special in
>> > paths. A user might well expect it to match, knowing that the pattern
>> > "/trunk//README" *will* match due to our canonicalizing the input
>> > patterns before matching. So there is an inconsistency there.
>>
>> But the user didn't write /trunk//README, she wrote /trunk/*/README
>> which can't be canonicalized to /trunk/README. Remember these are not
>> paths but glob patterns, if we started "canonicalizing" them it'd be --
>> well, interesting -- to try to avoid side effects.
>
> Yes, I know, and I don't propose it.

For the record, I seem to recall that I chose the glob variant that does
not treat / as special precisely because APR's glob flavour does not
provide ** (i.e., the recursive form of * when / is special).

-- Brane