Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py

Daniel Shahaf-2
[hidden email] wrote on Thu, 24 Sep 2020 17:06 -0000:

> Author: futatuki
> Date: Thu Sep 24 17:06:39 2020
> New Revision: 1881985
>
> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev
> Log:
> Follow up to r1880192: Fix an EOL issue in test on Windows.
>
> * subversion/tests/cmdline/merge-tests.py
>   (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n'
>   in expected values of svn:mergeinfo.
>
> Modified:
>     subversion/trunk/subversion/tests/cmdline/merge_tests.py
>
> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py
> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff
> ==============================================================================
> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original)
> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 17:06:39 2020
> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_
>                                         )
>  
>    # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2.
> +
> +  # NOTE: When writing out multi-line prop values in svn:* props, the
> +  # client converts to local encoding and local eol style.
> +  # Therefore, the expected output must contain the right kind of eoln
> +  # strings. That's why we use os.linesep in the tests below, not just
> +  # plain '\n'. The _last_ \n is also from the client, but it's not
> +  # part of the prop value and it doesn't get converted in the pipe.

I'm confused.

First, if the client applies any transformation at all to property
values, that'd be a bug.  Property values are opaque octet sequences,
just like file contents, so they must be emitted verbatim.  It so
happens that values of svn:* properties are UTF-8 with LF line endings,
so that's what Python should see, regardless of the local encoding and
EOL style.

For the same reason, I'd have expected the expected output to be given
as bytes objects rather than as str objects.

Second, where is the 'last \n' that the comment refers to?  I don't see
it in the code.  I wouldn't expect a trailing newline to be emitted,
either, since svnadmin.tests.check_prop() passes --strict, which is
documented as a deprecated alias to --no-newline.

Thanks!

Daniel


>    expected_mergeinfo = [
>      ('A',       ['/branch_A:3-7']),
> -    ('A/D',     ['/branch_A/D:5-7\n', '/branch_B/C:1*']),
> -    ('A/D/G',   ['/branch_A/D/G:5-7\n', '/branch_B/C/G:1*']),
> -    ('A/D/G2',  ['/branch_A/D/G2:5-7\n', '/branch_B/C/G2:1*']),
> +    ('A/D',     ['/branch_A/D:5-7'+os.linesep, '/branch_B/C:1*']),
> +    ('A/D/G',   ['/branch_A/D/G:5-7'+os.linesep, '/branch_B/C/G:1*']),
> +    ('A/D/G2',  ['/branch_A/D/G2:5-7'+os.linesep, '/branch_B/C/G2:1*']),
>      ]
>    for path, mergeinfo in expected_mergeinfo:
>      svntest.actions.check_prop('svn:mergeinfo', sbox.ospath(path),
>
>

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py

Yasuhito FUTATSUKI-2
On 2020/09/25 16:06, Daniel Shahaf wrote:

> [hidden email] wrote on Thu, 24 Sep 2020 17:06 -0000:
>> Author: futatuki
>> Date: Thu Sep 24 17:06:39 2020
>> New Revision: 1881985
>>
>> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev
>> Log:
>> Follow up to r1880192: Fix an EOL issue in test on Windows.
>>
>> * subversion/tests/cmdline/merge-tests.py
>>   (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n'
>>   in expected values of svn:mergeinfo.
>>
>> Modified:
>>     subversion/trunk/subversion/tests/cmdline/merge_tests.py
>>
>> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py
>> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff
>> ==============================================================================
>> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original)
>> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 17:06:39 2020
>> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_
>>                                         )
>>  
>>    # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2.
>> +
>> +  # NOTE: When writing out multi-line prop values in svn:* props, the
>> +  # client converts to local encoding and local eol style.
>> +  # Therefore, the expected output must contain the right kind of eoln
>> +  # strings. That's why we use os.linesep in the tests below, not just
>> +  # plain '\n'. The _last_ \n is also from the client, but it's not
>> +  # part of the prop value and it doesn't get converted in the pipe.

Before answer the questions, the comment above was brougt from
prop_value_conversion() in prop_tests.py.
 
> I'm confused.
>
> First, if the client applies any transformation at all to property
> values, that'd be a bug.  Property values are opaque octet sequences,
> just like file contents, so they must be emitted verbatim.  It so
> happens that values of svn:* properties are UTF-8 with LF line endings,
> so that's what Python should see, regardless of the local encoding and
> EOL style.

I judged that EOL conversion of property values on 'svn pg' isn't bug
because this comment is found in test for it. If it is a bug,
prop_tests.py also does not test correctly.

> For the same reason, I'd have expected the expected output to be given
> as bytes objects rather than as str objects.

Those expected values are converted to bytes, just below of the diff
lines.

> Second, where is the 'last \n' that the comment refers to?  I don't see
> it in the code.  I wouldn't expect a trailing newline to be emitted,
> either, since svnadmin.tests.check_prop() passes --strict, which is
> documented as a deprecated alias to --no-newline.

The last line of the added comment above was extra, by mistake.
 

> Thanks!
>
> Daniel
>
>
>>    expected_mergeinfo = [
>>      ('A',       ['/branch_A:3-7']),
>> -    ('A/D',     ['/branch_A/D:5-7\n', '/branch_B/C:1*']),
>> -    ('A/D/G',   ['/branch_A/D/G:5-7\n', '/branch_B/C/G:1*']),
>> -    ('A/D/G2',  ['/branch_A/D/G2:5-7\n', '/branch_B/C/G2:1*']),
>> +    ('A/D',     ['/branch_A/D:5-7'+os.linesep, '/branch_B/C:1*']),
>> +    ('A/D/G',   ['/branch_A/D/G:5-7'+os.linesep, '/branch_B/C/G:1*']),
>> +    ('A/D/G2',  ['/branch_A/D/G2:5-7'+os.linesep, '/branch_B/C/G2:1*']),
>>      ]
>>    for path, mergeinfo in expected_mergeinfo:
>>      svntest.actions.check_prop('svn:mergeinfo', sbox.ospath(path),
<<                                 [m.encode() for m in mergeinfo])

Cheers,
--
Yasuhito FUTATSUKI <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py

Yasuhito FUTATSUKI-2
On 2020/09/25 17:05, Yasuhito FUTATSUKI wrote:
> On 2020/09/25 16:06, Daniel Shahaf wrote:

>> First, if the client applies any transformation at all to property
>> values, that'd be a bug.  Property values are opaque octet sequences,
>> just like file contents, so they must be emitted verbatim.  It so
>> happens that values of svn:* properties are UTF-8 with LF line endings,
>> so that's what Python should see, regardless of the local encoding and
>> EOL style.
>
> I judged that EOL conversion of property values on 'svn pg' isn't bug
> because this comment is found in test for it. If it is a bug,
> prop_tests.py also does not test correctly.

This behavior was introduced in r1003238, as a "fix" of the issue #3721.

Cheers,
--
Yasuhito FUTATSUKI <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py

Daniel Shahaf-2
In reply to this post by Yasuhito FUTATSUKI-2
Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 17:05 +0900:

> On 2020/09/25 16:06, Daniel Shahaf wrote:
> > [hidden email] wrote on Thu, 24 Sep 2020 17:06 -0000:  
> >> Author: futatuki
> >> Date: Thu Sep 24 17:06:39 2020
> >> New Revision: 1881985
> >>
> >> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev
> >> Log:
> >> Follow up to r1880192: Fix an EOL issue in test on Windows.
> >>
> >> * subversion/tests/cmdline/merge-tests.py
> >>   (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n'
> >>   in expected values of svn:mergeinfo.
> >>
> >> Modified:
> >>     subversion/trunk/subversion/tests/cmdline/merge_tests.py
> >>
> >> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py
> >> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff
> >> ==============================================================================
> >> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original)
> >> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 17:06:39 2020
> >> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_
> >>                                         )
> >>  
> >>    # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2.
> >> +
> >> +  # NOTE: When writing out multi-line prop values in svn:* props, the
> >> +  # client converts to local encoding and local eol style.
> >> +  # Therefore, the expected output must contain the right kind of eoln
> >> +  # strings. That's why we use os.linesep in the tests below, not just
> >> +  # plain '\n'. The _last_ \n is also from the client, but it's not
> >> +  # part of the prop value and it doesn't get converted in the pipe.  
>
> Before answer the questions, the comment above was brougt from
> prop_value_conversion() in prop_tests.py.
>  

*nod*

> > I'm confused.
> >
> > First, if the client applies any transformation at all to property
> > values, that'd be a bug.  Property values are opaque octet sequences,
> > just like file contents, so they must be emitted verbatim.  It so
> > happens that values of svn:* properties are UTF-8 with LF line endings,
> > so that's what Python should see, regardless of the local encoding and
> > EOL style.  
>
> I judged that EOL conversion of property values on 'svn pg' isn't bug
> because this comment is found in test for it.

If you mean prop_value_conversion(), that test isn't about EOL style
and encodings at all.  It's about normalization of the values of some
svn:* properties, such as this:
.
    % svn ps -q svn:executable yes iota
    % svn pg svn:executable iota
    *
    %

The test for binary property values is prop_tests.py:binary_props().  It
doesn't cover newlines.

> If it is a bug, prop_tests.py also does not test correctly.

I've looked further and I think it's working as designed, but the
design is rather unintuitive.

In the data model, property values are binary data.  That's why, in
the API, property hashes' value type is svn_string_t.  However, the
cmdline client doesn't print all properties as binary data; it prints
svn:* properties as text, transcoded and EOL-converted, even as it
prints other properties in binary:
.
    https://svn.apache.org/viewvc/subversion/tags/1.0.0/subversion/clients/cmdline/props.c?view=markup#l50

The code still exists in trunk@HEAD, in propget-cmd.c.

These semantics mean prop_tests.py and merge_tests.py are both correct
to expect os.linesep in the output.

However, I find these semantics rather unintuitive.  Imagine someone
using a foo:ignore property as staging for svn:ignore:

     1 % svn propset svn:ignore "予定表.txt" ./
     2 property 'svn:ignore' set on '.'
     3 % svn propset foo:ignore "予定表.txt" ./
     4 property 'foo:ignore' set on '.'
     5 % LC_ALL=ja_JP.eucjp svn pl -v
     6 Properties on '.':
     7  foo:ignore
     8    予定表.txt
     9  svn:ignore
    10    ͽɽ.txt
.
The same sequence of bytes is emitted differently depending on what
property it belongs to — once in binary (foo:ignore), once in the local
encoding (svn:ignore), within the same list (!).  (My terminal was in
UTF-8 mode the whole time.)

    11 % LC_ALL=C svn pg --strict svn:ignore
    12 {U+4E88}{U+5B9A}{U+8868}.txt
.
There is no way in the cmdline client to get the value of an svn:*
property that's not representable in the local encoding; in part
because…

    13 % svn propset svn:ignore "{U+4E88}.txt" ./
    14 property 'svn:ignore' set on '.'
    15 % sqlite3 .svn/wc.db .dump | me
    16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
    17 % svn pg --strict svn:ignore
    18 {U+4E88}{U+5B9A}{U+8868}.txt
.
…the cmdline client's output is not unambiguously parseable.  (These
commands were run in a UTF-8 locale.)

[On line 15 I grepped for the property skel (line 16)'s hexdump
representation — that's how sqlite3 dumps render BLOB columns — and
manually extracted and converted it.  E.g., to find "4E88", I'd grep
for its ASCII hex representation, "34453838".]

On the other hand, for «propset» there is a --encoding=UTF-8 flag that
allows property values not representable in the current encoding to be
set.  (The value still undergoes EOL conversion.  I haven't checked
whether it undergoes composition or decomposition.)  However, one still
has to know that «svn propset foo:ignore -F» takes binary input while
«svn propset svn:ignore -F» takes input in the local encoding.

So, I think there are a number of different issues/gotchas here:

- prop_tests.py:binary_props() should cover property values that contain
  bytes values such as 0x0A, 0x0D, and 0x80-0x8F (newlines and non-ASCII).

- Getting the value of an svn:* property is different to getting the
  value of any other property, in both «propget» and «proplist».

- It's not possible to get the raw value of an svn:* property in
  a working copy if the value is not representable in the local encoding.

- The {U+hhhh} fallback encoding is ambiguous.  (And in any case,
  I guess scripts are going to have to roll their own parsing if they
  want to undo this escaping?)

- Setting the value of an svn:* property is different to setting the
  value of any other property, and there will not necessarily be an
  error message.  (For instance, imagine extracting an svn:ignore value
  from a dumpfile into a temporary file and then using «svn propset -F»
  on the temporary file, while the local encoding is ISO-8859-1.  That
  will silently succeed.)

- The remark in prop_value_conversion() about a 'last \n' has bitrotted.
  [It should have been removed in r845498 (= r5424) — which,
  coincidentally, is also the revision in which binary_props() was
  added.]

> > For the same reason, I'd have expected the expected output to be given
> > as bytes objects rather than as str objects.  
>
> Those expected values are converted to bytes, just below of the diff
> lines.

Thanks, I missed that.

Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 19:09 +0900:

> On 2020/09/25 17:05, Yasuhito FUTATSUKI wrote:
> > On 2020/09/25 16:06, Daniel Shahaf wrote:  
>
> >> First, if the client applies any transformation at all to property
> >> values, that'd be a bug.  Property values are opaque octet sequences,
> >> just like file contents, so they must be emitted verbatim.  It so
> >> happens that values of svn:* properties are UTF-8 with LF line endings,
> >> so that's what Python should see, regardless of the local encoding and
> >> EOL style.  
> >
> > I judged that EOL conversion of property values on 'svn pg' isn't bug
> > because this comment is found in test for it. If it is a bug,
> > prop_tests.py also does not test correctly.  
>
> This behavior was introduced in r1003238, as a "fix" of the issue #3721.

Good find — but if that's the case, why was the comment describing this
behaviour was added to the test much earlier, in r845369 (= r5295)?

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py

Yasuhito FUTATSUKI-2
I'm sorry for replying late.

On 2020/09/26 19:12, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 19:09 +0900:

First of all, I'd like to correct my misunderstanding.

>> On 2020/09/25 17:05, Yasuhito FUTATSUKI wrote:
>>> On 2020/09/25 16:06, Daniel Shahaf wrote:  
>>
>>>> First, if the client applies any transformation at all to property
>>>> values, that'd be a bug.  Property values are opaque octet sequences,
>>>> just like file contents, so they must be emitted verbatim.  It so
>>>> happens that values of svn:* properties are UTF-8 with LF line endings,
>>>> so that's what Python should see, regardless of the local encoding and
>>>> EOL style.  
>>>
>>> I judged that EOL conversion of property values on 'svn pg' isn't bug
>>> because this comment is found in test for it. If it is a bug,
>>> prop_tests.py also does not test correctly.  
>>
>> This behavior was introduced in r1003238, as a "fix" of the issue #3721.
>
> Good find — but if that's the case, why was the comment describing this
> behaviour was added to the test much earlier, in r845369 (= r5295)?

I'm very sorry, this was my misunderstanding. r1003238 was fix of
EOL style on a property name printing, not for a value.

> Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 17:05 +0900:
>> On 2020/09/25 16:06, Daniel Shahaf wrote:
>>> [hidden email] wrote on Thu, 24 Sep 2020 17:06 -0000:  
>>>> Author: futatuki
>>>> Date: Thu Sep 24 17:06:39 2020
>>>> New Revision: 1881985
>>>>
>>>> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev
>>>> Log:
>>>> Follow up to r1880192: Fix an EOL issue in test on Windows.
>>>>
>>>> * subversion/tests/cmdline/merge-tests.py
>>>>   (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n'
>>>>   in expected values of svn:mergeinfo.
>>>>
>>>> Modified:
>>>>     subversion/trunk/subversion/tests/cmdline/merge_tests.py
>>>>
>>>> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py
>>>> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff
>>>> ==============================================================================
>>>> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original)
>>>> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 17:06:39 2020
>>>> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_
>>>>                                         )
>>>>  
>>>>    # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2.
>>>> +
>>>> +  # NOTE: When writing out multi-line prop values in svn:* props, the
>>>> +  # client converts to local encoding and local eol style.
>>>> +  # Therefore, the expected output must contain the right kind of eoln
>>>> +  # strings. That's why we use os.linesep in the tests below, not just
>>>> +  # plain '\n'. The _last_ \n is also from the client, but it's not
>>>> +  # part of the prop value and it doesn't get converted in the pipe.  
>>
>> Before answer the questions, the comment above was brougt from
                                                      ^^^^^^ brought

>> prop_value_conversion() in prop_tests.py.
>>  
>
> *nod*
>
>>> I'm confused.
>>>
>>> First, if the client applies any transformation at all to property
>>> values, that'd be a bug.  Property values are opaque octet sequences,
>>> just like file contents, so they must be emitted verbatim.  It so
>>> happens that values of svn:* properties are UTF-8 with LF line endings,
>>> so that's what Python should see, regardless of the local encoding and
>>> EOL style.  
>>
>> I judged that EOL conversion of property values on 'svn pg' isn't bug
>> because this comment is found in test for it.
>
> If you mean prop_value_conversion(), that test isn't about EOL style
> and encodings at all.  It's about normalization of the values of some
> svn:* properties, such as this:
> .
>     % svn ps -q svn:executable yes iota
>     % svn pg svn:executable iota
>     *
>     %
>
> The test for binary property values is prop_tests.py:binary_props().  It
> doesn't cover newlines.
>
>> If it is a bug, prop_tests.py also does not test correctly.
>
> I've looked further and I think it's working as designed, but the
> design is rather unintuitive.
>
> In the data model, property values are binary data.  That's why, in
> the API, property hashes' value type is svn_string_t.  However, the
> cmdline client doesn't print all properties as binary data; it prints
> svn:* properties as text, transcoded and EOL-converted, even as it
> prints other properties in binary:
> .
>     https://svn.apache.org/viewvc/subversion/tags/1.0.0/subversion/clients/cmdline/props.c?view=markup#l50
>
> The code still exists in trunk@HEAD, in propget-cmd.c.
>
> These semantics mean prop_tests.py and merge_tests.py are both correct
> to expect os.linesep in the output.
>

I confirmed it the code on trunk. Thank you. In propget-cmd.c,
svn_cl__propget() calls print_single_prop() through print_properties()
for non-xml output, and print_single_prop() calls
svn_cmdline__print_prop_hash() for single value for "svn pg -v" or
calls svn_subst_detranslate_string() for "svn:*" props before output
it to the out stream.

I'll fix the comment in merge_tests.py. As the issue on the comment
in prop_tests.py is not related to the change on r1881985 directly,
I won't fix it simultanously.

I think the rest of your mail needs farther discussion, so I'll
try to reply later, with changing Subject: header.

Cheers,
--
Yasuhito FUTATSUKI <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Some issues on svn propget (Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py)

Yasuhito FUTATSUKI-2
In reply to this post by Daniel Shahaf-2
I changed a subject to let it be noticeable.

(Sorry for reply too late.)

On 2020/09/26 19:12, Daniel Shahaf wrote:

> Yasuhito FUTATSUKI wrote on Fri, 25 Sep 2020 17:05 +0900:
>> On 2020/09/25 16:06, Daniel Shahaf wrote:
>>> [hidden email] wrote on Thu, 24 Sep 2020 17:06 -0000:  
>>>> Author: futatuki
>>>> Date: Thu Sep 24 17:06:39 2020
>>>> New Revision: 1881985
>>>>
>>>> URL: http://svn.apache.org/viewvc?rev=1881985&view=rev
>>>> Log:
>>>> Follow up to r1880192: Fix an EOL issue in test on Windows.
>>>>
>>>> * subversion/tests/cmdline/merge-tests.py
>>>>   (merge_deleted_folder_with_mergeinfo_2): Use os.linesep instead of '\n'
>>>>   in expected values of svn:mergeinfo.
>>>>
>>>> Modified:
>>>>     subversion/trunk/subversion/tests/cmdline/merge_tests.py
>>>>
>>>> Modified: subversion/trunk/subversion/tests/cmdline/merge_tests.py
>>>> URL: http://svn.apache.org/viewvc/subversion/trunk/subversion/tests/cmdline/merge_tests.py?rev=1881985&r1=1881984&r2=1881985&view=diff
>>>> ==============================================================================
>>>> --- subversion/trunk/subversion/tests/cmdline/merge_tests.py (original)
>>>> +++ subversion/trunk/subversion/tests/cmdline/merge_tests.py Thu Sep 24 17:06:39 2020
>>>> @@ -18639,11 +18639,19 @@ def merge_deleted_folder_with_mergeinfo_
>>>>                                         )
>>>>  
>>>>    # verify that mergeinfo is set/changed on A/D, A/D/G, A/D/G2.
>>>> +
>>>> +  # NOTE: When writing out multi-line prop values in svn:* props, the
>>>> +  # client converts to local encoding and local eol style.
>>>> +  # Therefore, the expected output must contain the right kind of eoln
>>>> +  # strings. That's why we use os.linesep in the tests below, not just
>>>> +  # plain '\n'. The _last_ \n is also from the client, but it's not
>>>> +  # part of the prop value and it doesn't get converted in the pipe.  
>>
>> Before answer the questions, the comment above was brougt from
>> prop_value_conversion() in prop_tests.py.
>>  
>
> *nod*
>
>>> I'm confused.
>>>
>>> First, if the client applies any transformation at all to property
>>> values, that'd be a bug.  Property values are opaque octet sequences,
>>> just like file contents, so they must be emitted verbatim.  It so
>>> happens that values of svn:* properties are UTF-8 with LF line endings,
>>> so that's what Python should see, regardless of the local encoding and
>>> EOL style.  
>>
>> I judged that EOL conversion of property values on 'svn pg' isn't bug
>> because this comment is found in test for it.
>
> If you mean prop_value_conversion(), that test isn't about EOL style
> and encodings at all.  It's about normalization of the values of some
> svn:* properties, such as this:
> .
>     % svn ps -q svn:executable yes iota
>     % svn pg svn:executable iota
>     *
>     %
>
> The test for binary property values is prop_tests.py:binary_props().  It
> doesn't cover newlines.
>
>> If it is a bug, prop_tests.py also does not test correctly.
>
> I've looked further and I think it's working as designed, but the
> design is rather unintuitive.
>
> In the data model, property values are binary data.  That's why, in
> the API, property hashes' value type is svn_string_t.  However, the
> cmdline client doesn't print all properties as binary data; it prints
> svn:* properties as text, transcoded and EOL-converted, even as it
> prints other properties in binary:
> .
>     https://svn.apache.org/viewvc/subversion/tags/1.0.0/subversion/clients/cmdline/props.c?view=markup#l50
>
> The code still exists in trunk@HEAD, in propget-cmd.c.
>
> These semantics mean prop_tests.py and merge_tests.py are both correct
> to expect os.linesep in the output.

I've commited fix of the comment only, in 1882105 on last Tuesday.
 

> However, I find these semantics rather unintuitive.  Imagine someone
> using a foo:ignore property as staging for svn:ignore:
>
>      1 % svn propset svn:ignore "予定表.txt" ./
>      2 property 'svn:ignore' set on '.'
>      3 % svn propset foo:ignore "予定表.txt" ./
>      4 property 'foo:ignore' set on '.'
>      5 % LC_ALL=ja_JP.eucjp svn pl -v
>      6 Properties on '.':
>      7  foo:ignore
>      8    予定表.txt
>      9  svn:ignore
>     10    ͽɽ.txt
> .
> The same sequence of bytes is emitted differently depending on what
> property it belongs to — once in binary (foo:ignore), once in the local
> encoding (svn:ignore), within the same list (!).  (My terminal was in
> UTF-8 mode the whole time.)
>
>     11 % LC_ALL=C svn pg --strict svn:ignore
>     12 {U+4E88}{U+5B9A}{U+8868}.txt
> .
> There is no way in the cmdline client to get the value of an svn:*
> property that's not representable in the local encoding; in part
> because…
>
>     13 % svn propset svn:ignore "{U+4E88}.txt" ./
>     14 property 'svn:ignore' set on '.'
>     15 % sqlite3 .svn/wc.db .dump | me
>     16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
>     17 % svn pg --strict svn:ignore
>     18 {U+4E88}{U+5B9A}{U+8868}.txt
> .
> …the cmdline client's output is not unambiguously parseable.  (These
> commands were run in a UTF-8 locale.)
>
> [On line 15 I grepped for the property skel (line 16)'s hexdump
> representation — that's how sqlite3 dumps render BLOB columns — and
> manually extracted and converted it.  E.g., to find "4E88", I'd grep
> for its ASCII hex representation, "34453838".]
>
> On the other hand, for «propset» there is a --encoding=UTF-8 flag that
> allows property values not representable in the current encoding to be
> set.  (The value still undergoes EOL conversion.  I haven't checked
> whether it undergoes composition or decomposition.)  However, one still
> has to know that «svn propset foo:ignore -F» takes binary input while
> «svn propset svn:ignore -F» takes input in the local encoding.
>
> So, I think there are a number of different issues/gotchas here:
>
> - prop_tests.py:binary_props() should cover property values that contain
>   bytes values such as 0x0A, 0x0D, and 0x80-0x8F (newlines and non-ASCII).

Sure.
 
> - Getting the value of an svn:* property is different to getting the
>   value of any other property, in both «propget» and «proplist».

Also it is not explicitly tested in prop_tests.py especially about
transcoding, although it is troublesome to check the transcode about
them because the test suite don't pass environment variables explicitly
to subprocess and there is no warranty that the system running the test
have locale other than "C".

> - It's not possible to get the raw value of an svn:* property in
>   a working copy if the value is not representable in the local encoding.

I belive that if we want to get property values precisely, we should
use xml output, although --no-newline is enough in most case except
this case.
 
> - The {U+hhhh} fallback encoding is ambiguous.  (And in any case,
>   I guess scripts are going to have to roll their own parsing if they
>   want to undo this escaping?)

I also think non-xml output is not for scripts, but mainly visualize.

> - Setting the value of an svn:* property is different to setting the
>   value of any other property, and there will not necessarily be an
>   error message.  (For instance, imagine extracting an svn:ignore value
>   from a dumpfile into a temporary file and then using «svn propset -F»
>   on the temporary file, while the local encoding is ISO-8859-1.  That
>   will silently succeed.)

Indeed, it is not intuitive.
 
> - The remark in prop_value_conversion() about a 'last \n' has bitrotted.
>   [It should have been removed in r845498 (= r5424) — which,
>   coincidentally, is also the revision in which binary_props() was
>   added.]

I agreed, but did not touched prop_tests.py in 1882105.

Cheers,
--
Yasuhito FUTATSUKI <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Some issues on svn propget (Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py)

Daniel Shahaf-2
Yasuhito FUTATSUKI wrote on Sun, 04 Oct 2020 21:56 +0900:
> (Sorry for reply too late.)

No worries.

> On 2020/09/26 19:12, Daniel Shahaf wrote:
> >      1 % svn propset svn:ignore "予定表.txt" ./
> >      2 property 'svn:ignore' set on '.'
> >      3 % svn propset foo:ignore "予定表.txt" ./
> >      4 property 'foo:ignore' set on '.'
> >      5 % LC_ALL=ja_JP.eucjp svn pl -v
> >      6 Properties on '.':
> >      7  foo:ignore
> >      8    予定表.txt
> >      9  svn:ignore
> >     10    ͽɽ.txt
> >
> >     11 % LC_ALL=C svn pg --strict svn:ignore
> >     12 {U+4E88}{U+5B9A}{U+8868}.txt
> >
> >     13 % svn propset svn:ignore "{U+4E88}.txt" ./
> >     14 property 'svn:ignore' set on '.'
> >     15 % sqlite3 .svn/wc.db .dump | me
> >     16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
> >     17 % svn pg --strict svn:ignore
> >     18 {U+4E88}{U+5B9A}{U+8868}.txt
> > .
> > So, I think there are a number of different issues/gotchas here:
> >
> > - It's not possible to get the raw value of an svn:* property in
> >   a working copy if the value is not representable in the local encoding.  
>
> I belive that if we want to get property values precisely, we should
> use xml output, although --no-newline is enough in most case except
> this case.

Hmm, that's an interesting one.  On the one hand, «propget --xml»
does resolve the ambiguity issue of the ad-hoc escaping; on the other
hand:

- We shouldn't require CLI users to use an XML parser in order to
  retrieve values of binary blobs.

- The XML document declares itself to be in UTF-8.  Does that mean XML
  parsers are allowed to treat the dumped property values as UTF-8 and,
  for example, convert the byte sequence (that comprises the value) to
  another byte sequence, that's equivalent when treated as UTF-8 but
  not equivalent when treated as binary blobs?  (For example, convert
  the UTF-8 to composed or decomposed normal form.)

Thanks for the (snipped) remainder; I agree.

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: Some issues on svn propget (Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py)

Yasuhito FUTATSUKI-2
On 2020/10/05 1:57, Daniel Shahaf wrote:
> Yasuhito FUTATSUKI wrote on Sun, 04 Oct 2020 21:56 +0900:

>> On 2020/09/26 19:12, Daniel Shahaf wrote:
>>>      1 % svn propset svn:ignore "予定表.txt" ./
>>>      2 property 'svn:ignore' set on '.'
>>>      3 % svn propset foo:ignore "予定表.txt" ./
>>>      4 property 'foo:ignore' set on '.'
>>>      5 % LC_ALL=ja_JP.eucjp svn pl -v
>>>      6 Properties on '.':
>>>      7  foo:ignore
>>>      8    予定表.txt
>>>      9  svn:ignore
>>>     10    ͽɽ.txt
>>>
>>>     11 % LC_ALL=C svn pg --strict svn:ignore
>>>     12 {U+4E88}{U+5B9A}{U+8868}.txt
>>>
>>>     13 % svn propset svn:ignore "{U+4E88}.txt" ./
>>>     14 property 'svn:ignore' set on '.'
>>>     15 % sqlite3 .svn/wc.db .dump | me
>>>     16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
>>>     17 % svn pg --strict svn:ignore
>>>     18 {U+4E88}{U+5B9A}{U+8868}.txt
>>> .
>>> So, I think there are a number of different issues/gotchas here:
>>>
>>> - It's not possible to get the raw value of an svn:* property in
>>>   a working copy if the value is not representable in the local encoding.  
>>
>> I belive that if we want to get property values precisely, we should
>> use xml output, although --no-newline is enough in most case except
>> this case.
>
> Hmm, that's an interesting one.  On the one hand, «propget --xml»
> does resolve the ambiguity issue of the ad-hoc escaping; on the other
> hand:
>
> - We shouldn't require CLI users to use an XML parser in order to
>   retrieve values of binary blobs.

Then do we need a new output format for "strict" values?

> - The XML document declares itself to be in UTF-8.  Does that mean XML
>   parsers are allowed to treat the dumped property values as UTF-8 and,
>   for example, convert the byte sequence (that comprises the value) to
>   another byte sequence, that's equivalent when treated as UTF-8 but
>   not equivalent when treated as binary blobs?  (For example, convert
>   the UTF-8 to composed or decomposed normal form.)

At least we expect there is no conversion of byte sequence on parsing,
if the value is considered to be safe by svn_xml_is_xml_safe(). If it
is not so, I think outputs of --xml is broken.

Moreover, as properties have no meta data about its contents, we can't
determine a property is a text or not even if it contains only printable
characters, like 'eicar.com'[1]. So it is not so curious even if we might
use base64 encoding for all properties (but I don't think it is good
idea).

[1] https://svn.haxx.se/dev/archive-2016-03/0043.shtml
(Yes, I was also trapped by it yesterday.)

Cheers,
--
Yasuhito FUTATSUKI <[hidden email]>
Reply | Threaded
Open this post in threaded view
|

Re: Some issues on svn propget (Re: svn commit: r1881985 - /subversion/trunk/subversion/tests/cmdline/merge_tests.py)

Daniel Shahaf-2
Yasuhito FUTATSUKI wrote on Mon, 05 Oct 2020 13:30 +00:00:

> On 2020/10/05 1:57, Daniel Shahaf wrote:
> > Yasuhito FUTATSUKI wrote on Sun, 04 Oct 2020 21:56 +0900:
>
> >> On 2020/09/26 19:12, Daniel Shahaf wrote:
> >>>      1 % svn propset svn:ignore "予定表.txt" ./
> >>>      2 property 'svn:ignore' set on '.'
> >>>      3 % svn propset foo:ignore "予定表.txt" ./
> >>>      4 property 'foo:ignore' set on '.'
> >>>      5 % LC_ALL=ja_JP.eucjp svn pl -v
> >>>      6 Properties on '.':
> >>>      7  foo:ignore
> >>>      8    予定表.txt
> >>>      9  svn:ignore
> >>>     10    ͽɽ.txt
> >>>
> >>>     11 % LC_ALL=C svn pg --strict svn:ignore
> >>>     12 {U+4E88}{U+5B9A}{U+8868}.txt
> >>>
> >>>     13 % svn propset svn:ignore "{U+4E88}.txt" ./
> >>>     14 property 'svn:ignore' set on '.'
> >>>     15 % sqlite3 .svn/wc.db .dump | me
> >>>     16 (svn:ignore 29 {U+4E88}{U+5B9A}{U+8868}.txt )
> >>>     17 % svn pg --strict svn:ignore
> >>>     18 {U+4E88}{U+5B9A}{U+8868}.txt
> >>> .
> >>> So, I think there are a number of different issues/gotchas here:
> >>>
> >>> - It's not possible to get the raw value of an svn:* property in
> >>>   a working copy if the value is not representable in the local encoding.  
> >>
> >> I belive that if we want to get property values precisely, we should
> >> use xml output, although --no-newline is enough in most case except
> >> this case.
> >
> > Hmm, that's an interesting one.  On the one hand, «propget --xml»
> > does resolve the ambiguity issue of the ad-hoc escaping; on the other
> > hand:
> >
> > - We shouldn't require CLI users to use an XML parser in order to
> >   retrieve values of binary blobs.
>
> Then do we need a new output format for "strict" values?

+1

> > - The XML document declares itself to be in UTF-8.  Does that mean XML
> >   parsers are allowed to treat the dumped property values as UTF-8 and,
> >   for example, convert the byte sequence (that comprises the value) to
> >   another byte sequence, that's equivalent when treated as UTF-8 but
> >   not equivalent when treated as binary blobs?  (For example, convert
> >   the UTF-8 to composed or decomposed normal form.)
>
> At least we expect there is no conversion of byte sequence on parsing,
> if the value is considered to be safe by svn_xml_is_xml_safe(). If it
> is not so, I think outputs of --xml is broken.
>

I don't think svn_xml_is_xml_safe() addresses the above concern.  By
code inspection, that function will return TRUE on the strings «é»
(U+00E9, bytes: C3 A9) and «é» (U+0065 U+0301, bytes: 65 CC 81), so the
XML/Unicode/renormalization question remains.  (The two example strings
here are the composed/decomposed equivalents of each other.)

I agree that if XML parsers are allowed to do such renormalizations,
then «propget --xml» is broken.

> Moreover, as properties have no meta data about its contents, we can't
> determine a property is a text or not even if it contains only printable
> characters, like 'eicar.com'[1].

Yes, property values are binary, so serialization/deserialization ought
to preserve them byte for byte.

However, when dumping a particular value, there's nothing stopping us
from inspecting that value to see whether it, say, consists entirely of
printable ASCII or not, and taking different codepaths depending on the
result of the check.

> So it is not so curious even if we might
> use base64 encoding for all properties (but I don't think it is good
> idea).

Well, using base64 unconditionally would definitely be robust, but I think
it'd be too much: we could, for one, forgo base64 for property values
that are entirely ASCII.

Cheers,

Daniel