Re: svn commit: r1807056 - in /subversion/trunk/subversion: include/private/svn_utf_private.h libsvn_client/list.c libsvn_repos/list.c libsvn_subr/utf8proc.c

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1807056 - in /subversion/trunk/subversion: include/private/svn_utf_private.h libsvn_client/list.c libsvn_repos/list.c libsvn_subr/utf8proc.c

Branko Čibej
On 02.09.2017 17:12, [hidden email] wrote:

> +svn_boolean_t
> +svn_utf__fuzzy_glob_match(const char *str,
> +                          const apr_array_header_t *patterns,
> +                          svn_membuf_t *buf)
> +{
> +  const char *normalized;
> +  svn_error_t *err;
> +  int i;
> +
> +  /* Try to normalize case and accents in STR.
> +   *
> +   * If that should fail for some reason, continue with the original STR.
> +   * There is still a fair chance that it matches "*.ext" pattern despite
> +   * being "broken" UTF8. */


What evidence do you have for this statement? It is, quite frankly,
completely bonkers.

"Broken," as you put in quotes, means it's not UTF-8. What kind of UTF-8
do you think there's a fair chance it'll match then?

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1807056 - in /subversion/trunk/subversion: include/private/svn_utf_private.h libsvn_client/list.c libsvn_repos/list.c libsvn_subr/utf8proc.c

Stefan Fuhrmann


On 02.09.2017 17:17, Branko Čibej wrote:

> On 02.09.2017 17:12, [hidden email] wrote:
>> +svn_boolean_t
>> +svn_utf__fuzzy_glob_match(const char *str,
>> +                          const apr_array_header_t *patterns,
>> +                          svn_membuf_t *buf)
>> +{
>> +  const char *normalized;
>> +  svn_error_t *err;
>> +  int i;
>> +
>> +  /* Try to normalize case and accents in STR.
>> +   *
>> +   * If that should fail for some reason, continue with the original STR.
>> +   * There is still a fair chance that it matches "*.ext" pattern despite
>> +   * being "broken" UTF8. */
>
>
> What evidence do you have for this statement? It is, quite frankly,
> completely bonkers.
>
> "Broken," as you put in quotes, means it's not UTF-8. What kind of UTF-8
> do you think there's a fair chance it'll match then?

I've encountered old repositories where some path names
were apparently not converted properly into UTF8 but
contained whatever locale-based strings the client sent.

Those would still match "*.someExtension" and similar
patterns despite having non-UTF8.  I would like to cover
those as well.

I see 3 options here:

1. Make these cases a non-match, hiding them in the output.
2. Handle these cases in the callers, duplicating that part.
3. Keep it as it is.

-- Stefan^2.
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1807056 - in /subversion/trunk/subversion: include/private/svn_utf_private.h libsvn_client/list.c libsvn_repos/list.c libsvn_subr/utf8proc.c

Branko Čibej
On 02.09.2017 18:06, Stefan Fuhrmann wrote:

>
>
> On 02.09.2017 17:17, Branko Čibej wrote:
>> On 02.09.2017 17:12, [hidden email] wrote:
>>> +svn_boolean_t
>>> +svn_utf__fuzzy_glob_match(const char *str,
>>> +                          const apr_array_header_t *patterns,
>>> +                          svn_membuf_t *buf)
>>> +{
>>> +  const char *normalized;
>>> +  svn_error_t *err;
>>> +  int i;
>>> +
>>> +  /* Try to normalize case and accents in STR.
>>> +   *
>>> +   * If that should fail for some reason, continue with the
>>> original STR.
>>> +   * There is still a fair chance that it matches "*.ext" pattern
>>> despite
>>> +   * being "broken" UTF8. */
>>
>>
>> What evidence do you have for this statement? It is, quite frankly,
>> completely bonkers.
>>
>> "Broken," as you put in quotes, means it's not UTF-8. What kind of UTF-8
>> do you think there's a fair chance it'll match then?
>
> I've encountered old repositories where some path names
> were apparently not converted properly into UTF8 but
> contained whatever locale-based strings the client sent.

And such repositories were used without problems with current clients? I
can hardly believe that.

> Those would still match "*.someExtension" and similar
> patterns despite having non-UTF8.  I would like to cover
> those as well.

But not *.ك

I think you're trying to solve an edge case in an API function by means
of an undocumented implementation detail. My advice is to throw it out.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1807056 - in /subversion/trunk/subversion: include/private/svn_utf_private.h libsvn_client/list.c libsvn_repos/list.c libsvn_subr/utf8proc.c

Stefan Fuhrmann


On 02.09.2017 18:26, Branko Čibej wrote:

> On 02.09.2017 18:06, Stefan Fuhrmann wrote:
>>
>>
>> On 02.09.2017 17:17, Branko Čibej wrote:
>>> On 02.09.2017 17:12, [hidden email] wrote:
>>>> +svn_boolean_t
>>>> +svn_utf__fuzzy_glob_match(const char *str,
>>>> +                          const apr_array_header_t *patterns,
>>>> +                          svn_membuf_t *buf)
>>>> +{
>>>> +  const char *normalized;
>>>> +  svn_error_t *err;
>>>> +  int i;
>>>> +
>>>> +  /* Try to normalize case and accents in STR.
>>>> +   *
>>>> +   * If that should fail for some reason, continue with the
>>>> original STR.
>>>> +   * There is still a fair chance that it matches "*.ext" pattern
>>>> despite
>>>> +   * being "broken" UTF8. */
>>>
>>>
>>> What evidence do you have for this statement? It is, quite frankly,
>>> completely bonkers.
>>>
>>> "Broken," as you put in quotes, means it's not UTF-8. What kind of UTF-8
>>> do you think there's a fair chance it'll match then?
>>
>> I've encountered old repositories where some path names
>> were apparently not converted properly into UTF8 but
>> contained whatever locale-based strings the client sent.
>
> And such repositories were used without problems with current clients? I
> can hardly believe that.
>
>> Those would still match "*.someExtension" and similar
>> patterns despite having non-UTF8.  I would like to cover
>> those as well.
>
> But not *.ك
>
> I think you're trying to solve an edge case in an API function by means
> of an undocumented implementation detail. My advice is to throw it out.

On second thought, I agree.  Removed in r1807155.

-- Stefan^2.