Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Branko Čibej
On 09.11.2018 12:54, [hidden email] wrote:
> Author: brane
> Date: Fri Nov  9 11:54:21 2018
> New Revision: 1846237
>
> URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
> Log:
> More tests for peg revision parsing.
[...]
> +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@")
> +def remove_subdir_7a_no_escape_peg(sbox):
> +  "remove missing 'E/@' without pegrev escape"
> +  do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'")

Yes indeed ... our target parsing is so incredibly reliable that
attempting to delete a non-existent 'E/@' will happily delete 'E' with
all its contents. I think this may be too much of a good thing.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Daniel Shahaf-2
Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:

> On 09.11.2018 12:54, [hidden email] wrote:
> > Author: brane
> > Date: Fri Nov  9 11:54:21 2018
> > New Revision: 1846237
> >
> > URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
> > Log:
> > More tests for peg revision parsing.
> [...]
> > +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@")
> > +def remove_subdir_7a_no_escape_peg(sbox):
> > +  "remove missing 'E/@' without pegrev escape"
> > +  do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'")
>
> Yes indeed ... our target parsing is so incredibly reliable that
> attempting to delete a non-existent 'E/@' will happily delete 'E' with
> all its contents. I think this may be too much of a good thing.

The documented escaping rule for CLI consumers is to append an "@" after
every filename.  By this rule, a CLI consumer that _wants_ to remove E/
would run "svn rm E/@".  That command should parse as { target = 'E/',
peg = '' } regardless of whether the directory "E" contains a child
named "@".

I don't understand what change you're proposing.

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Branko Čibej
On 10.11.2018 01:31, Daniel Shahaf wrote:

> Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:
>> On 09.11.2018 12:54, [hidden email] wrote:
>>> Author: brane
>>> Date: Fri Nov  9 11:54:21 2018
>>> New Revision: 1846237
>>>
>>> URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
>>> Log:
>>> More tests for peg revision parsing.
>> [...]
>>> +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@")
>>> +def remove_subdir_7a_no_escape_peg(sbox):
>>> +  "remove missing 'E/@' without pegrev escape"
>>> +  do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'")
>> Yes indeed ... our target parsing is so incredibly reliable that
>> attempting to delete a non-existent 'E/@' will happily delete 'E' with
>> all its contents. I think this may be too much of a good thing.
> The documented escaping rule for CLI consumers is to append an "@" after
> every filename.  By this rule, a CLI consumer that _wants_ to remove E/
> would run "svn rm E/@".  That command should parse as { target = 'E/',
> peg = '' } regardless of whether the directory "E" contains a child
> named "@".
>
> I don't understand what change you're proposing.

Consider this:

$ svn rm E@
D         E
$ svn rm @
svn: E125001: '@' is just a peg revision. Maybe try '@@' instead?
$ svn rm E/@
D         E


would it not be more consistent that trying to remove 'E/@' would raise
the same error as trying to remove '@'? Because otherwise the behaviour
of the client depends on whether you're removing things from the current
directory or not. This becomes especially error-prone in the case when
E/@ exists:

$ svn rm E/@
D         E
D         E/@


I'm pretty sure we should not remove a parent directory if the child
might have been the intended target.

These usages were not ambiguous and possibly error-prone until we
invented the peg-revision syntax. At the time we failed to strictly
define the semantics , so maybe it's time we did that now. For example,
given the path

    foo@bar/baz


the parser will treat 'bar/baz' as the peg-revision specifier, even
though it's obvious that's not what the user had in mind. If we changed
the parser to only look for peg revisions on the basename of the path,
then at least the only restriction we'll have is that directory
separators can't be part of the peg revision specifier — which should be
an acceptable restriction, since currently we only support known
keywords, revision numbers or dates, none of which contain forward- or
backslashes.

Obviously that would only be a parser change, peg revisions would still
apply to whole paths for command semantics.

I'm aware that doing this would change the meaning of some forms of
commands, but for now I'm fairly sure we'd only produce new errors, not
silently behave differently. From the current set of tests, it seems to
follow that this inconsistency only arises with targets whose basename
starts with '@' or '.@' (and probably '..@'), the latter because '.'
(and '..') are removed by canonicalization.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Daniel Shahaf-2
Branko Čibej wrote on Sat, Nov 10, 2018 at 15:12:48 +0100:

> On 10.11.2018 01:31, Daniel Shahaf wrote:
> > Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:
> >> On 09.11.2018 12:54, [hidden email] wrote:
> >>> Author: brane
> >>> Date: Fri Nov  9 11:54:21 2018
> >>> New Revision: 1846237
> >>>
> >>> URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
> >>> Log:
> >>> More tests for peg revision parsing.
> >> [...]
> >>> +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@")
> >>> +def remove_subdir_7a_no_escape_peg(sbox):
> >>> +  "remove missing 'E/@' without pegrev escape"
> >>> +  do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'")
> >> Yes indeed ... our target parsing is so incredibly reliable that
> >> attempting to delete a non-existent 'E/@' will happily delete 'E' with
> >> all its contents. I think this may be too much of a good thing.
> > The documented escaping rule for CLI consumers is to append an "@" after
> > every filename.  By this rule, a CLI consumer that _wants_ to remove E/
> > would run "svn rm E/@".  That command should parse as { target = 'E/',
> > peg = '' } regardless of whether the directory "E" contains a child
> > named "@".
> >
> > I don't understand what change you're proposing.
>
> Consider this:
>
> $ svn rm E@
> D         E
> $ svn rm @
> svn: E125001: '@' is just a peg revision. Maybe try '@@' instead?
> $ svn rm E/@
> D         E
>
>
> would it not be more consistent that trying to remove 'E/@' would raise
> the same error as trying to remove '@'?

The argument parsing is documented to work as follows: remove the last
"@" and everything after it — that's the peg rev.  What remains is the
target dirent or URL.  Thus, ultimately the reason for the different
errors is that "E/" is a valid OS-level path specification but "" is not
(in the sense that «stat("E/")» works and «stat("")» errors).

In other words, your examples are simply usage errors, but it so happens
that one of them is a syntax error and one of them is not.  (It would
have been good if both usage errors had been syntax errors, but we're
12 years too late to get that right to begin with.  C'est la vie.)

The correct syntaxes for removing "./@" and "E/@", of course, are
«svn rm E/@@» and «svn rm @@» respectively.  

I think that «svn rm E/@» should _not_ remove "E/@", for two reasons:

1. Because that would be backwards incompatible.

2. In order to preserve the property that «svn ${subcommand} -- ${target}@»
is parsed in exactly the same way regardless of the values of
${subcommand} and ${target} and regardless of the tree contents.

IIRC «svn up @» used to be the same as «svn up ./», so I think we
shouldn't make it mean "update the file './@'" because that would be
a silent incompatibility for some users.  I don't remember what the last
minor line with the non-error meaning was, though.

> Because otherwise the behaviour of the client depends on whether you're
> removing things from the current directory or not.

To be more precise, the question is whether one spells the command
«svn rm @» or «svn rm ./@».

> This becomes especially error-prone in the case when
> E/@ exists:
>
> $ svn rm E/@
> D         E
> D         E/@
>
>
> I'm pretty sure we should not remove a parent directory if the child
> might have been the intended target.
>

I see that there's an argument that we should require some option here,
à la --force-log, but frankly, I don't think files literally called "@"
or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry
about.

Whatever we do, we must have a documented syntax that means "recursively
delete E/ and everything thereunder" and works regardless of what
filenames E/ contains.

> These usages were not ambiguous and possibly error-prone until we
> invented the peg-revision syntax. At the time we failed to strictly
> define the semantics , so maybe it's time we did that now. For example,
> given the path
>
>     foo@bar/baz
>
> the parser will treat 'bar/baz' as the peg-revision specifier, even
> though it's obvious that's not what the user had in mind.
> If we changed the parser to only look for peg revisions on the basename of
> the path, then at least the only restriction we'll have is that directory
> separators can't be part of the peg revision specifier — which should be an
> acceptable restriction, since currently we only support known keywords,
> revision numbers or dates, none of which contain forward- or backslashes.
>
> Obviously that would only be a parser change, peg revisions would still
> apply to whole paths for command semantics.
>

So you're proposing changing the parsing rule from "the last @" to
"the last @ that isn't followed by any os.path.sep"?  That's forward
compatible if we constrain ourselves never to use slash or backslash in
a pegrev specifier, and backwards compatible in that every non-erroneous
usage will continue to work, so it's something we can consider.

How would that affect scheme://user@hostname/ URLs?  What about typos
such as «foo@HEA/D»?

> I'm aware that doing this would change the meaning of some forms of
> commands, but for now I'm fairly sure we'd only produce new errors, not
> silently behave differently. From the current set of tests, it seems to
> follow that this inconsistency only arises with targets whose basename
> starts with '@' or '.@' (and probably '..@'), the latter because '.'
> (and '..') are removed by canonicalization.

I don't follow what you mean by "this"; do you refer to forbidding
slashes in pegrevs or to something else?  (The former wouldn't seem to
address all the concerns you described.)

Cheers,

Daniel

P.S.  Other languages also have examples of usage errors that aren't
syntax errors.  For example, in C the tokens «+», «-», and «*» can be
either unary or binary operators, so «double hypotenuse(double a, double b)
{ return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
is not.
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Branko Čibej
On 11.11.2018 07:29, Daniel Shahaf wrote:

> Branko Čibej wrote on Sat, Nov 10, 2018 at 15:12:48 +0100:
>> On 10.11.2018 01:31, Daniel Shahaf wrote:
>>> Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:
>>>> On 09.11.2018 12:54, [hidden email] wrote:
>>>>> Author: brane
>>>>> Date: Fri Nov  9 11:54:21 2018
>>>>> New Revision: 1846237
>>>>>
>>>>> URL: http://svn.apache.org/viewvc?rev=1846237&view=rev
>>>>> Log:
>>>>> More tests for peg revision parsing.
>>>> [...]
>>>>> +@Wimp("Removes E instead of reporting ENOENT or E125001 for E/@")
>>>>> +def remove_subdir_7a_no_escape_peg(sbox):
>>>>> +  "remove missing 'E/@' without pegrev escape"
>>>>> +  do_remove(sbox, 'E/@', 'E/@') #, "svn: E125001: 'E/@'")
>>>> Yes indeed ... our target parsing is so incredibly reliable that
>>>> attempting to delete a non-existent 'E/@' will happily delete 'E' with
>>>> all its contents. I think this may be too much of a good thing.
>>> The documented escaping rule for CLI consumers is to append an "@" after
>>> every filename.  By this rule, a CLI consumer that _wants_ to remove E/
>>> would run "svn rm E/@".  That command should parse as { target = 'E/',
>>> peg = '' } regardless of whether the directory "E" contains a child
>>> named "@".
>>>
>>> I don't understand what change you're proposing.
>> Consider this:
>>
>> $ svn rm E@
>> D         E
>> $ svn rm @
>> svn: E125001: '@' is just a peg revision. Maybe try '@@' instead?
>> $ svn rm E/@
>> D         E
>>
>>
>> would it not be more consistent that trying to remove 'E/@' would raise
>> the same error as trying to remove '@'?
> The argument parsing is documented to work as follows: remove the last
> "@" and everything after it — that's the peg rev.  What remains is the
> target dirent or URL.  Thus, ultimately the reason for the different
> errors is that "E/" is a valid OS-level path specification but "" is not
> (in the sense that «stat("E/")» works and «stat("")» errors).
>
> In other words, your examples are simply usage errors, but it so happens
> that one of them is a syntax error and one of them is not.  (It would
> have been good if both usage errors had been syntax errors, but we're
> 12 years too late to get that right to begin with.  C'est la vie.)

Life can also be fixed to suck less.

> The correct syntaxes for removing "./@" and "E/@", of course, are
> «svn rm E/@@» and «svn rm @@» respectively.  
>
> I think that «svn rm E/@» should _not_ remove "E/@", for two reasons:

I didn't say it should. I said it should error out just like 'svn rm @'
does.


> 1. Because that would be backwards incompatible.

Obviouisly.

> 2. In order to preserve the property that «svn ${subcommand} -- ${target}@»
> is parsed in exactly the same way regardless of the values of
> ${subcommand} and ${target} and regardless of the tree contents.

I do think that this rule was not sufficiently thought out.

> IIRC «svn up @» used to be the same as «svn up ./», so I think we
> shouldn't make it mean "update the file './@'" because that would be
> a silent incompatibility for some users.

Again, I didn't say it should. I said it should be an error because
'./@' is ambiguous.

>   I don't remember what the last
> minor line with the non-error meaning was, though.
>
>> Because otherwise the behaviour of the client depends on whether you're
>> removing things from the current directory or not.
> To be more precise, the question is whether one spells the command
> «svn rm @» or «svn rm ./@».

or 'svn rm .@' or 'svn rm './.@'. I'd argue that premature
canonicalization is a bad thing.


>> This becomes especially error-prone in the case when
>> E/@ exists:
>>
>> $ svn rm E/@
>> D         E
>> D         E/@
>>
>>
>> I'm pretty sure we should not remove a parent directory if the child
>> might have been the intended target.
>>
> I see that there's an argument that we should require some option here,
> à la --force-log, but frankly, I don't think files literally called "@"
> or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry
> about.

Perhaps not, but then there's SVN-4530, so I'd say they're common enough.


> Whatever we do, we must have a documented syntax that means "recursively
> delete E/ and everything thereunder" and works regardless of what
> filenames E/ contains.

Yes, it's called 'svn rm E'.


>> These usages were not ambiguous and possibly error-prone until we
>> invented the peg-revision syntax. At the time we failed to strictly
>> define the semantics , so maybe it's time we did that now. For example,
>> given the path
>>
>>     foo@bar/baz
>>
>> the parser will treat 'bar/baz' as the peg-revision specifier, even
>> though it's obvious that's not what the user had in mind.
>> If we changed the parser to only look for peg revisions on the basename of
>> the path, then at least the only restriction we'll have is that directory
>> separators can't be part of the peg revision specifier — which should be an
>> acceptable restriction, since currently we only support known keywords,
>> revision numbers or dates, none of which contain forward- or backslashes.
>>
>> Obviously that would only be a parser change, peg revisions would still
>> apply to whole paths for command semantics.
>>
> So you're proposing changing the parsing rule from "the last @" to
> "the last @ that isn't followed by any os.path.sep"?  That's forward
> compatible if we constrain ourselves never to use slash or backslash in
> a pegrev specifier, and backwards compatible in that every non-erroneous
> usage will continue to work, so it's something we can consider.
>
> How would that affect scheme://user@hostname/ URLs?

I haven't tested that yet.

> What about typos such as «foo@HEA/D»?

This won't work currently, but you're right, it _could_ work with the
proposed change. So my assumption that we'd only add more errors is wrong.

>> I'm aware that doing this would change the meaning of some forms of
>> commands, but for now I'm fairly sure we'd only produce new errors, not
>> silently behave differently. From the current set of tests, it seems to
>> follow that this inconsistency only arises with targets whose basename
>> starts with '@' or '.@' (and probably '..@'), the latter because '.'
>> (and '..') are removed by canonicalization.
> I don't follow what you mean by "this"; do you refer to forbidding
> slashes in pegrevs or to something else?  (The former wouldn't seem to
> address all the concerns you described.)
>
> Cheers,
>
> Daniel
>
> P.S.  Other languages also have examples of usage errors that aren't
> syntax errors.  For example, in C the tokens «+», «-», and «*» can be
> either unary or binary operators, so «double hypotenuse(double a, double b)
> { return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
> is not.

I really don't see how this comparison is valid. C operator syntax and
precedence is precisely defined. Our usage of @ in paths may be
well-defined too, but its interaction with aggressive canonicalization
creates unintentional side effects in the implementation.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Branko Čibej
On 11.11.2018 07:47, Branko Čibej wrote:

> Our usage of @ in paths may be
> well-defined too, but its interaction with aggressive canonicalization
> creates unintentional side effects in the implementation.


A really educational example of this is svn_cl__copy. It goes out of its
way to ignore a specific kind of peg-revision parsing error. I'm having
a really hard time trying to convince myself that can be the right thing
to do. If one particular command has to try to fix parsing errors in
order to work correctly, then I'd say that the parser is just wrong.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Daniel Shahaf-2
In reply to this post by Branko Čibej
Branko Čibej wrote on Sun, Nov 11, 2018 at 07:47:18 +0100:

> On 11.11.2018 07:29, Daniel Shahaf wrote:
> > Branko Čibej wrote on Sat, Nov 10, 2018 at 15:12:48 +0100:
> >> On 10.11.2018 01:31, Daniel Shahaf wrote:
> >>> Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:
> >>>> On 09.11.2018 12:54, [hidden email] wrote:
> > The correct syntaxes for removing "./@" and "E/@", of course, are
> > «svn rm E/@@» and «svn rm @@» respectively.  
> >
> > I think that «svn rm E/@» should _not_ remove "E/@", for two reasons:
>
> I didn't say it should. I said it should error out just like 'svn rm @'
> does.
>

It wasn't clear to me what you were proposing.

It still isn't, actually.  Have I overlooked a commit or email where you
spelled out what the new algorithm would be?  None of the emails in this
thread spells out an algorithm.

> > Whatever we do, we must have a documented syntax that means "recursively
> > delete E/ and everything thereunder" and works regardless of what
> > filenames E/ contains.
>
> Yes, it's called 'svn rm E'.

That won't work in the general case:

% /bin/mkdir foo@bar foo@bar@ foo@bar/@
% svn add .
% svn ci
% <how do I 'svn rm' foo@bar now?>

Secondly, the incumbent escaping syntax is independent of the subcommand,
filename, and tree contents: to run «svn verb» on the file called
${foo} on disk, one runs «svn verb -- "${foo}@"», no exceptions.
I think this property should be preserved by the new escaping rules.
(Not necessarily this specific syntax, but the
independence/no-exceptions aspect.)

> > 2. In order to preserve the property that «svn ${subcommand} -- ${target}@»
> > is parsed in exactly the same way regardless of the values of
> > ${subcommand} and ${target} and regardless of the tree contents.
>
> I do think that this rule was not sufficiently thought out.
>
> > IIRC «svn up @» used to be the same as «svn up ./», so I think we
> > shouldn't make it mean "update the file './@'" because that would be
> > a silent incompatibility for some users.
>
> Again, I didn't say it should. I said it should be an error because
> './@' is ambiguous.

As a point of fact, it is not ambiguous.  "./@" means {path_or_url="./",
peg=""}.

Terminology aside, the problem you have identified here is that if
a user has a file called "@" and forgets to escape it, he doesn't get
a syntax error.  That we can address.

> >   I don't remember what the last
> > minor line with the non-error meaning was, though.
> >
> >> Because otherwise the behaviour of the client depends on whether you're
> >> removing things from the current directory or not.
> > To be more precise, the question is whether one spells the command
> > «svn rm @» or «svn rm ./@».
>
> or 'svn rm .@' or 'svn rm './.@'. I'd argue that premature
> canonicalization is a bad thing.
>
>

What change, if any, do you propose?

> >> This becomes especially error-prone in the case when
> >> E/@ exists:
> >>
> >> $ svn rm E/@
> >> D         E
> >> D         E/@
> >>
> >>
> >> I'm pretty sure we should not remove a parent directory if the child
> >> might have been the intended target.
> >>
> > I see that there's an argument that we should require some option here,
> > à la --force-log, but frankly, I don't think files literally called "@"
> > or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry
> > about.
>
> Perhaps not, but then there's SVN-4530, so I'd say they're common enough.

Fair enough.

> >> I'm aware that doing this would change the meaning of some forms of
> >> commands, but for now I'm fairly sure we'd only produce new errors, not
> >> silently behave differently. From the current set of tests, it seems to
> >> follow that this inconsistency only arises with targets whose basename
> >> starts with '@' or '.@' (and probably '..@'), the latter because '.'
> >> (and '..') are removed by canonicalization.
> > I don't follow what you mean by "this"; do you refer to forbidding
> > slashes in pegrevs or to something else?  (The former wouldn't seem to
> > address all the concerns you described.)
> >
> > Cheers,
> >
> > Daniel
> >
> > P.S.  Other languages also have examples of usage errors that aren't
> > syntax errors.  For example, in C the tokens «+», «-», and «*» can be
> > either unary or binary operators, so «double hypotenuse(double a, double b)
> > { return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
> > is not.
>
> I really don't see how this comparison is valid. C operator syntax and
> precedence is precisely defined. Our usage of @ in paths may be
> well-defined too, but its interaction with aggressive canonicalization
> creates unintentional side effects in the implementation.

The relevance of the example is that it's another case in which a usage
error (programmer forgot to paste «a*a» after the opening parenthesis)
is not a syntax error, in relation to your observation that using «svn
rm E/@» to remove "E/@" isn't a syntax error.
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Branko Čibej
On 11.11.2018 10:54, Daniel Shahaf wrote:

> Branko Čibej wrote on Sun, Nov 11, 2018 at 07:47:18 +0100:
>> On 11.11.2018 07:29, Daniel Shahaf wrote:
>>> Branko Čibej wrote on Sat, Nov 10, 2018 at 15:12:48 +0100:
>>>> On 10.11.2018 01:31, Daniel Shahaf wrote:
>>>>> Branko Čibej wrote on Fri, 09 Nov 2018 12:56 +0100:
>>>>>> On 09.11.2018 12:54, [hidden email] wrote:
>>> The correct syntaxes for removing "./@" and "E/@", of course, are
>>> «svn rm E/@@» and «svn rm @@» respectively.  
>>>
>>> I think that «svn rm E/@» should _not_ remove "E/@", for two reasons:
>> I didn't say it should. I said it should error out just like 'svn rm @'
>> does.
>>
> It wasn't clear to me what you were proposing.
>
> It still isn't, actually.  Have I overlooked a commit or email where you
> spelled out what the new algorithm would be?  None of the emails in this
> thread spells out an algorithm.

That's because I don't know it yet ... since I can't describe precisely
what is wrong with the way we currently do things, but I do have the
feeling that we're doing something wrong. I was hoping this discussion
would clarify things.

>>> Whatever we do, we must have a documented syntax that means "recursively
>>> delete E/ and everything thereunder" and works regardless of what
>>> filenames E/ contains.
>> Yes, it's called 'svn rm E'.
> That won't work in the general case:
>
> % /bin/mkdir foo@bar foo@bar@ foo@bar/@
> % svn add .
> % svn ci
> % <how do I 'svn rm' foo@bar now?>
>
> Secondly, the incumbent escaping syntax is independent of the subcommand,
> filename, and tree contents: to run «svn verb» on the file called
> ${foo} on disk, one runs «svn verb -- "${foo}@"», no exceptions.
> I think this property should be preserved by the new escaping rules.
> (Not necessarily this specific syntax, but the
> independence/no-exceptions aspect.)

Agreed.

Given your example above, this is what works now:

  * to remove foo@bar: 'svn rm foo@bar@' or 'svn rm foo@bar/@'
  * to remove foo@bar@: 'svn rm foo@bar@@' or 'svn rm foo@bar@/@'
  * to remove foo@bar/@: 'svn rm foo@bar/@@' or 'svn rm foo@bar/@/@'

I /think/ I'm objecting to the fact that those extra directory
separators in the second variants have no effect ... but I'm not yet
sure what to do about it.


>>> 2. In order to preserve the property that «svn ${subcommand} -- ${target}@»
>>> is parsed in exactly the same way regardless of the values of
>>> ${subcommand} and ${target} and regardless of the tree contents.
>> I do think that this rule was not sufficiently thought out.
>>
>>> IIRC «svn up @» used to be the same as «svn up ./», so I think we
>>> shouldn't make it mean "update the file './@'" because that would be
>>> a silent incompatibility for some users.
>> Again, I didn't say it should. I said it should be an error because
>> './@' is ambiguous.
> As a point of fact, it is not ambiguous.  "./@" means {path_or_url="./",
> peg=""}.
>
> Terminology aside, the problem you have identified here is that if
> a user has a file called "@" and forgets to escape it, he doesn't get
> a syntax error.  That we can address.
>
>>>   I don't remember what the last
>>> minor line with the non-error meaning was, though.
>>>
>>>> Because otherwise the behaviour of the client depends on whether you're
>>>> removing things from the current directory or not.
>>> To be more precise, the question is whether one spells the command
>>> «svn rm @» or «svn rm ./@».
>> or 'svn rm .@' or 'svn rm './.@'. I'd argue that premature
>> canonicalization is a bad thing.
>>
>>
> What change, if any, do you propose?

As I said, I'm still trying to work this out. For example, one of the
things that's been driving me up the wall is that when the user writes
'foo/.@bar', the error message says a peg revision isn't allowed at
'foo@bar', regardless of whether 'foo/.@bar' exists. Yes, the syntax is
wrong, the user should have typed 'foo/.@bar@' instead, but surely we
can be smart enough to notice that instead of emitting an error about
something the user almost certainly didn't have in mind?


>>>> This becomes especially error-prone in the case when
>>>> E/@ exists:
>>>>
>>>> $ svn rm E/@
>>>> D         E
>>>> D         E/@
>>>>
>>>>
>>>> I'm pretty sure we should not remove a parent directory if the child
>>>> might have been the intended target.
>>>>
>>> I see that there's an argument that we should require some option here,
>>> à la --force-log, but frankly, I don't think files literally called "@"
>>> or "@HEAD" or "@{2018-01-01}" and so on are common enough to worry
>>> about.
>> Perhaps not, but then there's SVN-4530, so I'd say they're common enough.
> Fair enough.
>
>>>> I'm aware that doing this would change the meaning of some forms of
>>>> commands, but for now I'm fairly sure we'd only produce new errors, not
>>>> silently behave differently. From the current set of tests, it seems to
>>>> follow that this inconsistency only arises with targets whose basename
>>>> starts with '@' or '.@' (and probably '..@'), the latter because '.'
>>>> (and '..') are removed by canonicalization.
>>> I don't follow what you mean by "this"; do you refer to forbidding
>>> slashes in pegrevs or to something else?  (The former wouldn't seem to
>>> address all the concerns you described.)
>>>
>>> Cheers,
>>>
>>> Daniel
>>>
>>> P.S.  Other languages also have examples of usage errors that aren't
>>> syntax errors.  For example, in C the tokens «+», «-», and «*» can be
>>> either unary or binary operators, so «double hypotenuse(double a, double b)
>>> { return sqrt(*a + b*b); }» is a syntax error but «return sqrt( + b*b)»
>>> is not.
>> I really don't see how this comparison is valid. C operator syntax and
>> precedence is precisely defined. Our usage of @ in paths may be
>> well-defined too, but its interaction with aggressive canonicalization
>> creates unintentional side effects in the implementation.
> The relevance of the example is that it's another case in which a usage
> error (programmer forgot to paste «a*a» after the opening parenthesis)
> is not a syntax error, in relation to your observation that using «svn
> rm E/@» to remove "E/@" isn't a syntax error.

Ack.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: svn commit: r1846237 - /subversion/trunk/subversion/tests/cmdline/pegrev_parse_tests.py

Daniel Shahaf-2
Branko Čibej wrote on Sun, Nov 11, 2018 at 12:02:14 +0100:

> On 11.11.2018 10:54, Daniel Shahaf wrote:
> > It wasn't clear to me what you were proposing.
> >
> > It still isn't, actually.  Have I overlooked a commit or email where you
> > spelled out what the new algorithm would be?  None of the emails in this
> > thread spells out an algorithm.
>
> That's because I don't know it yet ... since I can't describe precisely
> what is wrong with the way we currently do things, but I do have the
> feeling that we're doing something wrong. I was hoping this discussion
> would clarify things.
>

Makes sense.

> > What change, if any, do you propose?
>
> As I said, I'm still trying to work this out. For example, one of the
> things that's been driving me up the wall is that when the user writes
> 'foo/.@bar', the error message says a peg revision isn't allowed at
> 'foo@bar', regardless of whether 'foo/.@bar' exists. Yes, the syntax is
> wrong, the user should have typed 'foo/.@bar@' instead, but surely we
> can be smart enough to notice that instead of emitting an error about
> something the user almost certainly didn't have in mind?

This sounds like two separate issues.

1. When printing an error about a path not existing, and the path
contains an "@" that was parsed as a peg revision marker, stat() the
argument as typed, and if it exists add a hint to the error message.

2. Error messages contain «svn_dirent_local_style(svn_dirent_canonicalize(argv[N]))»
and do not contain argv[N] as typed.  Including the former is indeed
unusual, but it has its upsides: it shows the user how svn parsed the
argument.  For example, when a user intends to run «svn cat ./foo» but
accidentally runs «svn cat ../foo» instead, the error message might
actually be easier to understand with ../foo printed as an abspath
(which aspect is actually orthogonal to canonicalization…).

Cheers,

Daniel