RE: svn commit: r1806548 - in /subversion/trunk/subversion: svn/svn.c svnbench/svnbench.c tests/cmdline/basic_tests.py

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: svn commit: r1806548 - in /subversion/trunk/subversion: svn/svn.c svnbench/svnbench.c tests/cmdline/basic_tests.py

Bert Huijben-5


> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]]
> Sent: dinsdag 29 augustus 2017 11:57
> To: [hidden email]
> Subject: svn commit: r1806548 - in /subversion/trunk/subversion: svn/svn.c
> svnbench/svnbench.c tests/cmdline/basic_tests.py
>
> Author: stefan2
> Date: Tue Aug 29 09:56:31 2017
> New Revision: 1806548
>
> URL: http://svn.apache.org/viewvc?rev=1806548&view=rev
> Log:
> As Julian discovered, '--search' as used with 'svn log' is may not suitable
> for 'svn ls'.  File name matching should be case-sensitive and requires
> full patterns just like e.g. the ordinary unix command line 'ls' command.
>
> Therefore, introduce a separate '--pattern' option for 'svn log' that works
> similar to patterns with Unix command line 'ls'.  Since the actual matching
> already confirms to that, we only need a different option pre-processing.

Perhaps we could use --glob, to allow other syntax patterns later?

Not sure... perhaps --glob is too technical.

        Bert


Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1806548 - in /subversion/trunk/subversion: svn/svn.c svnbench/svnbench.c tests/cmdline/basic_tests.py

Evgeny Kotkov
Bert Huijben <[hidden email]> writes:

>> As Julian discovered, '--search' as used with 'svn log' is may not suitable
>> for 'svn ls'.  File name matching should be case-sensitive and requires
>> full patterns just like e.g. the ordinary unix command line 'ls' command.
>>
>> Therefore, introduce a separate '--pattern' option for 'svn log' that works
>> similar to patterns with Unix command line 'ls'.  Since the actual matching
>> already confirms to that, we only need a different option pre-processing.
>
> Perhaps we could use --glob, to allow other syntax patterns later?
>
> Not sure... perhaps --glob is too technical.

My 2 cents on this would be that having "svn ls --pattern" and "svn log
--search" which only differ in terms of case sensitivity have a chance of
being confusing for the users.

Perhaps, a slightly better way would be to keep the case-insensitive behavior
by default, but add a "--case-sensitive" switch to handle the cases where
it's required?

That is,

    svn ls --search

    svn ls --search --case-sensitive


Regards,
Evgeny Kotkov
Reply | Threaded
Open this post in threaded view
|

svn ls --search/pattern/glob/case-insensitive [was: svn commit: r1806548 ...]

Julian Foad-5
Evgeny Kotkov wrote:

> Bert Huijben <[hidden email]> writes:
>>> As Julian discovered, '--search' as used with 'svn log' is may not suitable
>>> for 'svn ls'.  File name matching should be case-sensitive and requires
>>> full patterns just like e.g. the ordinary unix command line 'ls' command.
>>>
>>> Therefore, introduce a separate '--pattern' option for 'svn log' that works
>>> similar to patterns with Unix command line 'ls'.  Since the actual matching
>>> already confirms to that, we only need a different option pre-processing.
>>
>> Perhaps we could use --glob, to allow other syntax patterns later?
>>
>> Not sure... perhaps --glob is too technical.
>
> My 2 cents on this would be that having "svn ls --pattern" and "svn log
> --search" which only differ in terms of case sensitivity have a chance of
> being confusing for the users.

I agree that's confusing, even just having two different option names is
confusing.

> Perhaps, a slightly better way would be to keep the case-insensitive behavior
> by default, but add a "--case-sensitive" switch to handle the cases where
> it's required?
>
> That is,
>
>     svn ls --search
>
>     svn ls --search --case-sensitive

I don't think a case-sensitive mode is necessary. Remember, this is a
user interface. It is to help users find things. It does not attempt to
be a programmatic search tool with all the precision and flexibility of
Unix 'find -name -maxdepth -prune ...'. I expect almost zero use cases
would require case-sensitive matching at this UI level, and that is why
we made 'svn log --search' just be case-insensitive always.

Simplicity and consistency are key.

- Julian

Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive [was: svn commit: r1806548 ...]

Stefan Fuhrmann


On 29.08.2017 12:46, Julian Foad wrote:

> Evgeny Kotkov wrote:
>> Bert Huijben <[hidden email]> writes:
>>>> As Julian discovered, '--search' as used with 'svn log' is may not suitable
>>>> for 'svn ls'.  File name matching should be case-sensitive and requires
>>>> full patterns just like e.g. the ordinary unix command line 'ls' command.
>>>>
>>>> Therefore, introduce a separate '--pattern' option for 'svn log' that works
>>>> similar to patterns with Unix command line 'ls'.  Since the actual matching
>>>> already confirms to that, we only need a different option pre-processing.
>>>
>>> Perhaps we could use --glob, to allow other syntax patterns later?
>>>
>>> Not sure... perhaps --glob is too technical.
>>
>> My 2 cents on this would be that having "svn ls --pattern" and "svn log
>> --search" which only differ in terms of case sensitivity have a chance of
>> being confusing for the users.
>
> I agree that's confusing, even just having two different option names is
> confusing.
>
>> Perhaps, a slightly better way would be to keep the case-insensitive behavior
>> by default, but add a "--case-sensitive" switch to handle the cases where
>> it's required?
>>
>> That is,
>>
>>      svn ls --search
>>
>>      svn ls --search --case-sensitive
>
> I don't think a case-sensitive mode is necessary. Remember, this is a
> user interface. It is to help users find things. It does not attempt to
> be a programmatic search tool with all the precision and flexibility of
> Unix 'find -name -maxdepth -prune ...'. I expect almost zero use cases
> would require case-sensitive matching at this UI level, and that is why
> we made 'svn log --search' just be case-insensitive always.

How would you implement the case-insensitive comparison
on the server side consistent with the client-side locals?

-- Stefan^2.
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive [was: svn commit: r1806548 ...]

Stefan Sperling-9
On Tue, Aug 29, 2017 at 01:12:07PM +0200, Stefan Fuhrmann wrote:
> How would you implement the case-insensitive comparison
> on the server side consistent with the client-side locals?

As far as I can tell the utf8proc code which the client uses
for this is local-independent.
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive [was: svn commit: r1806548 ...]

Branko Čibej
On 29.08.2017 13:28, Stefan Sperling wrote:
> On Tue, Aug 29, 2017 at 01:12:07PM +0200, Stefan Fuhrmann wrote:
>> How would you implement the case-insensitive comparison
>> on the server side consistent with the client-side locals?
> As far as I can tell the utf8proc code which the client uses
> for this is local-independent.

"Locale," but yes. Utf8proc relies on the Unicode generic case-folding
rules, which are indeed locale-independent. They won't always be 100%
grammatically correct for a particular environment, but from my reading
they'll be good enough for the purpose, which is to find stuff not write
a linguistic thesis. A few strictly-speaking false-positive edge cases
won't hurt.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Stefan Fuhrmann


On 29.08.2017 14:04, Branko Čibej wrote:

> On 29.08.2017 13:28, Stefan Sperling wrote:
>> On Tue, Aug 29, 2017 at 01:12:07PM +0200, Stefan Fuhrmann wrote:
>>> How would you implement the case-insensitive comparison
>>> on the server side consistent with the client-side locals?
>> As far as I can tell the utf8proc code which the client uses
>> for this is local-independent.
>
> "Locale," but yes. Utf8proc relies on the Unicode generic case-folding
> rules, which are indeed locale-independent. They won't always be 100%
> grammatically correct for a particular environment, but from my reading
> they'll be good enough for the purpose, which is to find stuff not write
> a linguistic thesis. A few strictly-speaking false-positive edge cases
> won't hurt.

Then it should not be hard to extend the existing code to
do "fuzzy" comparisons using generic case- and accent folding.

Because I think that strict glob-like patterns need to be supported
as well, I suggest to have two options:

    --search does a fuzzy search just like we use it in other commands.
             We implicitly add leading an trailing '*'.

    --search-glob performs a fully case- and accent-sensitive glob
                  matching.

As with '--search' alone, the user is free to specify any number
and combination of the above but without the support of '--search-and' -
at least for now.

Thoughts? Better names for '--search-glob'?

-- Stefan^2.
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Stefan Sperling
On Fri, Sep 01, 2017 at 06:36:46AM +0200, Stefan Fuhrmann wrote:

> Because I think that strict glob-like patterns need to be supported
> as well, I suggest to have two options:
>
>    --search does a fuzzy search just like we use it in other commands.
>             We implicitly add leading an trailing '*'.
>
>    --search-glob performs a fully case- and accent-sensitive glob
>                  matching.
>
> As with '--search' alone, the user is free to specify any number
> and combination of the above but without the support of '--search-and' -
> at least for now.
>
> Thoughts? Better names for '--search-glob'?

I don't see a need to invent a new option name.

I would say it's OK to have --search options with slightly different
semantics for different subcommands. Their semantics just shouldn't
be entirely different. If one matches case-sensitively and the other
does not, I would say that's too different. But I don't think we
need to be consistent about implicit *-globbing rules.

So having 'svn list --search' do what you describe above as
'svn list --search-glob' would be entirely acceptable to me.
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Stefan Fuhrmann
On 01.09.2017 13:30, Stefan Sperling wrote:

> On Fri, Sep 01, 2017 at 06:36:46AM +0200, Stefan Fuhrmann wrote:
>> Because I think that strict glob-like patterns need to be supported
>> as well, I suggest to have two options:
>>
>>     --search does a fuzzy search just like we use it in other commands.
>>              We implicitly add leading an trailing '*'.
>>
>>     --search-glob performs a fully case- and accent-sensitive glob
>>                   matching.
>>
>> As with '--search' alone, the user is free to specify any number
>> and combination of the above but without the support of '--search-and' -
>> at least for now.
>>
>> Thoughts? Better names for '--search-glob'?
>
> I don't see a need to invent a new option name.
>
> I would say it's OK to have --search options with slightly different
> semantics for different subcommands. Their semantics just shouldn't
> be entirely different. If one matches case-sensitively and the other
> does not, I would say that's too different. But I don't think we
> need to be consistent about implicit *-globbing rules.
>
> So having 'svn list --search' do what you describe above as
> 'svn list --search-glob' would be entirely acceptable to me.

Yeah, my impression is that most people don't see a need
for an exact matching option.  And given that the "fuzzy"
matching will rarely produce false negatives, users could
easily post-filter the results if they want to remove
false positives.

That would leave us with 'svn ls --search' working almost
exactly like 'svn list --search', except that we must not
add implicit '*' globs on both sides of the pattern.  If
we did that "*.c" would match "hallo.cpp", and that would
be confusing to most users.

So, if there are no strong opinions against it, I will undo
the "--pattern" option commit and change the matching code
to case- and accent-insensitive.

-- Stefan^2.
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Branko Čibej
In reply to this post by Stefan Fuhrmann
On 01.09.2017 06:36, Stefan Fuhrmann wrote:

>
>
> On 29.08.2017 14:04, Branko Čibej wrote:
>> On 29.08.2017 13:28, Stefan Sperling wrote:
>>> On Tue, Aug 29, 2017 at 01:12:07PM +0200, Stefan Fuhrmann wrote:
>>>> How would you implement the case-insensitive comparison
>>>> on the server side consistent with the client-side locals?
>>> As far as I can tell the utf8proc code which the client uses
>>> for this is local-independent.
>>
>> "Locale," but yes. Utf8proc relies on the Unicode generic case-folding
>> rules, which are indeed locale-independent. They won't always be 100%
>> grammatically correct for a particular environment, but from my reading
>> they'll be good enough for the purpose, which is to find stuff not write
>> a linguistic thesis. A few strictly-speaking false-positive edge cases
>> won't hurt.
>
> Then it should not be hard to extend the existing code to
> do "fuzzy" comparisons using generic case- and accent folding.

I'd be very surprised if we don't have that kind of matching code
somewhere already ... indeed: normalize_cstring() in
libsvn_subr/utf8proc.c does exactly this kind of transformation using
utf8proc, exposed as a (private) API in svn_utf__xfrm() which has a
nodding acquaintance to the standard strxfrm().

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Julian Foad-5
In reply to this post by Stefan Fuhrmann
Stefan Fuhrmann wrote:

> On 01.09.2017 13:30, Stefan Sperling wrote:
>> On Fri, Sep 01, 2017 at 06:36:46AM +0200, Stefan Fuhrmann wrote:
>>> Because I think that strict glob-like patterns need to be supported
>>> as well, I suggest to have two options:
>>>
>>>     --search does a fuzzy search just like we use it in other commands.
>>>              We implicitly add leading an trailing '*'.
>>>
>>>     --search-glob performs a fully case- and accent-sensitive glob
>>>                   matching.
>>>
>>> As with '--search' alone, the user is free to specify any number
>>> and combination of the above but without the support of '--search-and' -
>>> at least for now.
>>>
>>> Thoughts? Better names for '--search-glob'?
>>
>> I don't see a need to invent a new option name.
>>
>> I would say it's OK to have --search options with slightly different
>> semantics for different subcommands. Their semantics just shouldn't
>> be entirely different. If one matches case-sensitively and the other
>> does not, I would say that's too different. But I don't think we
>> need to be consistent about implicit *-globbing rules.
>>
>> So having 'svn list --search' do what you describe above as
>> 'svn list --search-glob' would be entirely acceptable to me.

(I'm unsure exactly what you meant there, stsp -- that seems to
contradict the previous paragraph, if 'svn log --search remains
case-insensitive.)

Mainly I think about the issue this way: What is the benefit to the user
if path matching in "svn list --search" works differently from path
matching in "svn log --search"? I see no benefit and some harms if it
works differently.

Altogether it feels like we are getting close to letting ourselves
produce something confusing just because we are thinking about what's
useful for one particular use case at a time, and/or because the
implementation happens to be already structured in such a way, and I
think we should step back a bit. How would we want to explain the search
option to the users?

> Yeah, my impression is that most people don't see a need
> for an exact matching option.  And given that the "fuzzy"
> matching will rarely produce false negatives, users could
> easily post-filter the results if they want to remove
> false positives.
>
> That would leave us with 'svn ls --search' working almost
> exactly like 'svn list --search', except that we must not

(I assume you meant "exactly like 'svn log --search'".)

> add implicit '*' globs on both sides of the pattern.  If
> we did that "*.c" would match "hallo.cpp", and that would
> be confusing to most users.

Another aspect that I'm not sure about is how they match a whole path:
"svn log --search" seems to match the pattern against the whole path,
not treating slashes specially: pattern "trunk*.c" would match path
"/subversion/trunk/subversion/svn/svn.c"). On the other hand, "svn list
--search" seems to match the pattern against just one path component at
a time -- possibly just the last component? I haven't tested it enough
to know, nor seen it documented.

If that behaviour is completely different between "log" and "list"...
that's another source of confusion that I think we should address one
way or another.

> So, if there are no strong opinions against it, I will undo
> the "--pattern" option commit and change the matching code
> to case- and accent-insensitive.

That part sounds good to me.

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Stefan Sperling
On Fri, Sep 01, 2017 at 03:01:33PM +0100, Julian Foad wrote:
> (I'm unsure exactly what you meant there, stsp -- that seems to
> contradict the previous paragraph, if 'svn log --search remains
> case-insensitive.)

The most important point for me is that I don't think appending and
prepending * globs is useful in the context of a path search with 'ls'.

Consider a revision such as:

$ svn log -r 3 -v
------------------------------------------------------------------------
r3 | stsp | 2017-09-01 16:12:23 +0200 (Fri, 01 Sep 2017) | 1 line
Changed paths:
   A /trunk/dog.txt
   A /trunk/lazy_dog.txt

the quick brown fox jumps over the lazy dog
------------------------------------------------------------------------

The behaviour for log --search is that either '--search lazy' or
'--search dog' will match this revision. This is fine because we are
looking for log message output rather than path names.

$ svn log --search dog
------------------------------------------------------------------------
r3 | stsp | 2017-09-01 16:12:23 +0200 (Fri, 01 Sep 2017) | 1 line

the quick brown fox jumps over the lazy dog
------------------------------------------------------------------------
$ svn log --search lazy
------------------------------------------------------------------------
r3 | stsp | 2017-09-01 16:12:23 +0200 (Fri, 01 Sep 2017) | 1 line

the quick brown fox jumps over the lazy dog
------------------------------------------------------------------------

However, when matching paths with 'svn ls' I would not expect 'dog*' to
match lazy_dog.txt. The output is a list of paths, not a log message.
I would expect matching to behave more like a unix shell does with ls(1).

The current behaviour as implemented by --pattern looks like:

$ svn ls -r3 --pattern 'dog*' ^/    
$ svn ls -r3 --pattern 'dog*' ^/trunk
dog.txt
$ svn ls -r3 --pattern '*dog*' ^/trunk
dog.txt
lazy_dog.txt

The above behaviour looks good to me.

However, as you already pointed out, matching child path components
seems to be impossible at present:

$ svn ls -r3 --pattern 'trunk/*'    
$
$ svn ls -R -r3 --pattern 'trunk/*' ^/
$  

These should print:

trunk/dog.txt
trunk/lazy_dog.txt

There is of course an interaction with --depth to consider.
This already works:

$ svn ls -R -r3 --pattern '*dog*' ^/      
trunk/dog.txt
trunk/lazy_dog.txt
$ svn ls -R -r3 --pattern '*trunk*' ^/
trunk/
$

Which suggests that matching happens after depth filtering.
Perhaps a default depth should be chosen based on the pattern to allow
patterns such as 'trunk/*' to match?

Regarding case:

If 'list --search' is designed to be case-insensitive, then this:

$ svn ls -r3 --pattern 'Dog*' ^/trunk
$

should print:

dog.txt

If it was case-sensitive, I would use:

$ svn ls -r3 --pattern '[dD]og*' ^/trunk

I could live with either possibility.
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Julian Foad-5
Stefan Sperling wrote:
> On Fri, Sep 01, 2017 at 03:01:33PM +0100, Julian Foad wrote:
>> (I'm unsure exactly what you meant there, stsp -- that seems to
>> contradict the previous paragraph, if 'svn log --search remains
>> case-insensitive.)
>
> The most important point for me is that I don't think appending and
> prepending * globs is useful in the context of a path search with 'ls'.

Agreed.

> Consider a revision such as:
>
> $ svn log -r 3 -v
> ------------------------------------------------------------------------
> r3 | stsp | 2017-09-01 16:12:23 +0200 (Fri, 01 Sep 2017) | 1 line
> Changed paths:
>    A /trunk/dog.txt
>    A /trunk/lazy_dog.txt
>
> the quick brown fox jumps over the lazy dog
> ------------------------------------------------------------------------
>
> The behaviour for log --search is that either '--search lazy' or
> '--search dog' will match this revision. This is fine because we are
> looking for log message output rather than path names.
>
> $ svn log --search dog
> ------------------------------------------------------------------------
> r3 | stsp | 2017-09-01 16:12:23 +0200 (Fri, 01 Sep 2017) | 1 line
>
> the quick brown fox jumps over the lazy dog
> ------------------------------------------------------------------------
> $ svn log --search lazy
> ------------------------------------------------------------------------
> r3 | stsp | 2017-09-01 16:12:23 +0200 (Fri, 01 Sep 2017) | 1 line
>
> the quick brown fox jumps over the lazy dog
> ------------------------------------------------------------------------

I agree this is fine when we are talking about searching in the *log
message*.

But when I talk about consistency between "list" and "log", I'm thinking
of the searching in *paths*, which should (and does) work like this:

$ svn log -vq --search quick
$ # no results

$ svn log -vq --search dog
Changed paths:
   A /trunk/dog.txt
   A /trunk/lazy_dog.txt


> However, when matching paths with 'svn ls' I would not expect 'dog*' to
> match lazy_dog.txt. The output is a list of paths, not a log message.
> I would expect matching to behave more like a unix shell does with ls(1).

Agreed.

And, for consistency, when matching paths with "svn log -vq --search..."
I would expect the same (I would not expect 'dog*' to match
lazy_dog.txt). At present the implementation does match it, because it
treats the pattern as a substring both when searching in the log message
and also when searching in the paths. I propose that one way to get
consistency is if we change "svn log --search" so it treats the pattern
as a substring in the log message but not in the paths.

> The current behaviour as implemented by --pattern looks like:
>
> $ svn ls -r3 --pattern 'dog*' ^/    
> $ svn ls -r3 --pattern 'dog*' ^/trunk
> dog.txt
> $ svn ls -r3 --pattern '*dog*' ^/trunk
> dog.txt
> lazy_dog.txt
>
> The above behaviour looks good to me.
>
> However, as you already pointed out, matching child path components
> seems to be impossible at present:
>
> $ svn ls -r3 --pattern 'trunk/*'    
> $
> $ svn ls -R -r3 --pattern 'trunk/*' ^/
> $  
>
> These should print:
>
> trunk/dog.txt
> trunk/lazy_dog.txt
>
> There is of course an interaction with --depth to consider.
> This already works:
>
> $ svn ls -R -r3 --pattern '*dog*' ^/      
> trunk/dog.txt
> trunk/lazy_dog.txt
> $ svn ls -R -r3 --pattern '*trunk*' ^/
> trunk/
> $
>
> Which suggests that matching happens after depth filtering.
> Perhaps a default depth should be chosen based on the pattern to allow
> patterns such as 'trunk/*' to match?
>
> Regarding case:
>
> If 'list --search' is designed to be case-insensitive, then this:
>
> $ svn ls -r3 --pattern 'Dog*' ^/trunk
> $
>
> should print:
>
> dog.txt
[...]

All of the above looks good to me, apart from maybe we haven't nailed
the details of how paths starting with "^/" or "/" are matched.

- Julian

Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Stefan Sperling
On Fri, Sep 01, 2017 at 04:24:28PM +0100, Julian Foad wrote:
> And, for consistency, when matching paths with "svn log -vq --search..."
> I would expect the same (I would not expect 'dog*' to match
> lazy_dog.txt). At present the implementation does match it, because it
> treats the pattern as a substring both when searching in the log message
> and also when searching in the paths. I propose that one way to get
> consistency is if we change "svn log --search" so it treats the pattern
> as a substring in the log message but not in the paths.

Yes, that makes sense. Here's an attempt at it.

My patch is not quite correct because it does not support slashes in
the search pattern and it will not match across multiple path components.
Any suggestion how to best implement this?

$ svn log -v -r4 --search 'lazy*'
------------------------------------------------------------------------
r4 | stsp | 2017-09-01 17:42:07 +0200 (Fri, 01 Sep 2017) | 1 line
Changed paths:
   M /trunk/lazy_dog.txt

this is a log message
------------------------------------------------------------------------
$ svn log -v -r4 --search 'dog*'  
------------------------------------------------------------------------

This should match but does not:
$ svn log -v -r4 --search '/trunk/*dog*'
------------------------------------------------------------------------
$


Index: subversion/svn/log-cmd.c
===================================================================
--- subversion/svn/log-cmd.c (revision 1806831)
+++ subversion/svn/log-cmd.c (working copy)
@@ -160,8 +160,10 @@ match_search_pattern(const char *search_pattern,
   if (changed_paths)
     {
       apr_hash_index_t *hi;
+      apr_pool_t *iterpool;
 
       /* Does a changed path match the search pattern? */
+      iterpool = svn_pool_create(pool);
       for (hi = apr_hash_first(pool, changed_paths);
            hi;
            hi = apr_hash_next(hi))
@@ -168,17 +170,37 @@ match_search_pattern(const char *search_pattern,
         {
           const char *path = apr_hash_this_key(hi);
           svn_log_changed_path2_t *log_item;
+          apr_array_header_t *components;
+          int i;
 
-          if (match(pattern, path, buf))
-            return TRUE;
+          svn_pool_clear(iterpool);
 
+          components = svn_path_decompose(path, iterpool);
+          for (i = 0; i < components->nelts; i++)
+            {
+              const char *p = APR_ARRAY_IDX(components, i, const char *);
+
+              if (match(search_pattern, p, buf))
+                return TRUE;
+            }
+
           /* Match copy-from paths, too. */
           log_item = apr_hash_this_val(hi);
           if (log_item->copyfrom_path
-              && SVN_IS_VALID_REVNUM(log_item->copyfrom_rev)
-              && match(pattern, log_item->copyfrom_path, buf))
-            return TRUE;
+              && SVN_IS_VALID_REVNUM(log_item->copyfrom_rev))
+            {
+                components = svn_path_decompose(log_item->copyfrom_path,
+                                                iterpool);
+                for (i = 0; i < components->nelts; i++)
+                  {
+                    const char *p = APR_ARRAY_IDX(components, i, const char *);
+
+                    if (match(search_pattern, p, buf))
+                      return TRUE;
+                  }
+            }
         }
+      svn_pool_destroy(iterpool);
     }
 
   return FALSE;
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Johan Corveleyn-3
On Fri, Sep 1, 2017 at 5:55 PM, Stefan Sperling <[hidden email]> wrote:

> On Fri, Sep 01, 2017 at 04:24:28PM +0100, Julian Foad wrote:
>> And, for consistency, when matching paths with "svn log -vq --search..."
>> I would expect the same (I would not expect 'dog*' to match
>> lazy_dog.txt). At present the implementation does match it, because it
>> treats the pattern as a substring both when searching in the log message
>> and also when searching in the paths. I propose that one way to get
>> consistency is if we change "svn log --search" so it treats the pattern
>> as a substring in the log message but not in the paths.
>
> Yes, that makes sense. Here's an attempt at it.

I don't think that's a good idea. I've used "svn log -v --search"
before to search for "stuff" in the changed paths list, but I have
never considered that as globbing similar to what a unix shell would
do. I.e. it feels absolutely natural (to me) that "svn log --search"
just looks for a substring in "the entire output of the 'svn log'
command", without discerning between the different parts of that
output.

For example, I've recently searched for "(from" to find revisions with
"A+'ed" files or directories (to locate interesting examples to test
the tree conflict resolution :-)). I just guessed that would work, and
it did.

I think it would be weird to treat the changed paths list differently
from all the other output of "svn log".

If we'd like special matching for the "changed paths" list, I suggest
that should be done with a new option ("svn log -v --search-path" or
something), and "log --search" remains for just matching a substring
in the entire output. Hmm, but then maybe the "svn list" option should
also be named --search-path.

Tangentially (if we take a step back): I think an option name like
--filter or --include could be made to apply to more subcommands (in
the future):

* svn ls -R --filter '*.txt'  ## list all *.txt files (recursively)
* svn cat -R --filter '*.txt'  ## cat all the *.txt files (recursively)
* svn export --filter 'build.xml'  ## export build.xml files into a
sparse directory structure

or

* svn ls -R --include '*.txt'
* svn cat -R --include '*.txt'
* svn export --include 'build.xml'


Of course only the first of those bullets is done (almost) ...
Maybe I'm dreaming a bit too much now :-)

--
Johan
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Stefan Fuhrmann
In reply to this post by Branko Čibej


On 01.09.2017 15:31, Branko Čibej wrote:

> On 01.09.2017 06:36, Stefan Fuhrmann wrote:
>>
>>
>> On 29.08.2017 14:04, Branko Čibej wrote:
>>> On 29.08.2017 13:28, Stefan Sperling wrote:
>>>> On Tue, Aug 29, 2017 at 01:12:07PM +0200, Stefan Fuhrmann wrote:
>>>>> How would you implement the case-insensitive comparison
>>>>> on the server side consistent with the client-side locals?
>>>> As far as I can tell the utf8proc code which the client uses
>>>> for this is local-independent.
>>>
>>> "Locale," but yes. Utf8proc relies on the Unicode generic case-folding
>>> rules, which are indeed locale-independent. They won't always be 100%
>>> grammatically correct for a particular environment, but from my reading
>>> they'll be good enough for the purpose, which is to find stuff not write
>>> a linguistic thesis. A few strictly-speaking false-positive edge cases
>>> won't hurt.
>>
>> Then it should not be hard to extend the existing code to
>> do "fuzzy" comparisons using generic case- and accent folding.
>
> I'd be very surprised if we don't have that kind of matching code
> somewhere already ... indeed: normalize_cstring() in
> libsvn_subr/utf8proc.c does exactly this kind of transformation using
> utf8proc, exposed as a (private) API in svn_utf__xfrm() which has a
> nodding acquaintance to the standard strxfrm().

I've seen that begin called in the client but was not sure
whether that is generic logic or locale-dependent.

-- Stefan^2.
Reply | Threaded
Open this post in threaded view
|

Re: svn ls --search/pattern/glob/case-insensitive

Stefan Fuhrmann
In reply to this post by Julian Foad-5


On 01.09.2017 17:24, Julian Foad wrote:
> Stefan Sperling wrote:
>> On Fri, Sep 01, 2017 at 03:01:33PM +0100, Julian Foad wrote:

>> The above behaviour looks good to me.
>>
>> However, as you already pointed out, matching child path components
>> seems to be impossible at present:
>>
>> $ svn ls -r3 --pattern 'trunk/*'
>> $
>> $ svn ls -R -r3 --pattern 'trunk/*' ^/
>> $
>>
>> These should print:
>>
>> trunk/dog.txt
>> trunk/lazy_dog.txt
>>
>> There is of course an interaction with --depth to consider.
>> This already works:
>>
>> $ svn ls -R -r3 --pattern '*dog*' ^/
>> trunk/dog.txt
>> trunk/lazy_dog.txt
>> $ svn ls -R -r3 --pattern '*trunk*' ^/
>> trunk/
>> $
>>
>> Which suggests that matching happens after depth filtering.
>> Perhaps a default depth should be chosen based on the pattern to allow
>> patterns such as 'trunk/*' to match?
>>
>> Regarding case:
>>
>> If 'list --search' is designed to be case-insensitive, then this:
>>
>> $ svn ls -r3 --pattern 'Dog*' ^/trunk
>> $
>>
>> should print:
>>
>> dog.txt
> [...]
>
> All of the above looks good to me, apart from maybe we haven't nailed
> the details of how paths starting with "^/" or "/" are matched.

I've been thinking about matching whole paths instead of
just the (latest) segment.  But it seems to either make
the pattern spec needlessly complicated, difficult to
predict or too loose:

* Finding all "readme"s:
   "*/readme*" instead of"readme*"

* Implicit "*/" prefixes might be added, but only of the
   pattern does not start with a "/":
   "foo" -> "*/foo"
   "foo/bar" -> "*/foo/bar"
   "*/foo" -> "*/foo" (unchanged)

* Which one of the following should match "foo/bar"?
   "foo/*/bar", "foo*/bar", "foo*bar"

* Which one of the following should match everything in
   the whole sub-tree "/foo", which would not include "/foo"
   itself?
   "foo/*", "foo/", "foo*"

I guess we could come up with a solution for all of these
and the implementation will most likely not be too hard.
But it all seems convoluted for typical use-cases like
"Where the heck is file xyz?", "Did somebody create a
branch folder inside trunk?" or "How many Python files
are there?"

-- Stefan^2.