Support character classes in glob authz rules

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Support character classes in glob authz rules

Branko Čibej
TL;DR:

https://issues.apache.org/jira/browse/SVN-4204
https://issues.apache.org/jira/browse/SVN-4795


The longer story:

Even though apr_fnmatch(), which we use for matching glob patterns in
authz rules, supports character classes ([A-Z] etc.), we can't use them
in the glob rules because of the way the config parser works. For
example, the following rule:

[:glob:/**/*.[Dd]oc]
* = rw


will be silently parsed to ":glob:/**/*.[Dd" and will match neither
'x.doc' nor 'x.Doc' but will match 'x.[Dd', which the user almost
certainly does not want. The reason for this is that our config parser,
which we still use for the syntactical part of parsing the authz files,
strictly follows the semantics of Python's ConfigParser and treats the
first ']' it finds as the section terminator, ignoring any remaining
characters to the end of the line.

The proposed patch changes this behaviour as follows:

  * the /last/ ']' in the line is the section terminator;
  * only whitespace is allowed after the terminator to the end of the line.

The proposed change in the parser is only enabled for parsing authz and
global group files, other Subversion configuration files will use the
current semantics.

Comments? Suggestions?

-- Brane

[[[
Change the authz file parser to allow character classes in glob rules.

* subversion/include/private/svn_config_private.h
  (svn_config__parse_stream): Add new parameter 'strict_sections'
   and update the docstring to describe what it does.

* subversion/include/private/svn_string_private.h
  (svn_stringbuf__strip_trailing_whitespace): New.

* subversion/libsvn_repos/authz_parse.c
  (svn_authz__parse): Invoke the new behaviour of svn_config__parse_stream.

* subversion/libsvn_subr/config.c
  (svn_config_parse): Use the old behaviour of svn_config__parse_stream.

* subversion/libsvn_subr/config_file.c: Inclulde private/svn_string_private.h.
  (parse_context_t): New member 'strict_sections'.
  (parse_section_name): Select how sections are parsed depending on the
   value of the 'strict_sections' flag. Update docstring.
  (svn_config__parse_file): Use the old behaviour of svn_config__parse_stream.
  (svn_config__parse_stream): Update implementation; store the new
   'strict_sections' flag in the parser context.

* subversion/libsvn_subr/string.c
  (svn_stringbuf__strip_trailing_whitespace): Implement; extracted from ...
  (svn_stringbuf_strip_whitespace): ... here.

Fixes: SVN-4204, SVN-4795
]]]


authz-glob-character-class.patch.txt (12K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Julian Foad-5
Branko Čibej wrote:

> https://issues.apache.org/jira/browse/SVN-4204
> https://issues.apache.org/jira/browse/SVN-4795
>
> Even though apr_fnmatch(), which we use for matching glob patterns in
> authz rules, supports character classes ([A-Z] etc.), we can't use them
> in the glob rules because of the way the config parser works. For
> example, the following rule:
>
> [:glob:/**/*.[Dd]oc]
>
> will be silently parsed to ":glob:/**/*.[Dd" [because] our config parser
> strictly follows the semantics of Python's ConfigParser and treats the
> first ']' it finds as the section terminator, ignoring any remaining
> characters to the end of the line.
>
> The proposed patch changes this behaviour as follows:
>
>   * the /last/ ']' in the line is the section terminator;
>   * only whitespace is allowed after the terminator to the end of the line.

Glad to see a proposal.

It makes me uncomfortable to depart from standard parsing. What if users are relying on Python ConfigParser or other compatible parsing as part of their Subversion authz infrastructure?

First I wondered if anything bad could happen if there's a silent change in meaning where a user has written, let's say,

> [:glob:/**/secret1]    # was [:glob:/**/secret2]

It's hard to find any plausible example that would successfully parse and actually match something, but may be possible.

> The proposed change in the parser is only enabled for parsing authz and
> global group files, other Subversion configuration files will use the
> current semantics.

These sorts of quirks tend to make a system hard to maintain in the long run. What if we find another case where we'd like to depart from that parsing? How far would we go?

What alternative solution could we consider?

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Stefan Sperling
On Mon, Dec 03, 2018 at 08:15:24AM +0000, Julian Foad wrote:
> Branko Čibej wrote:
> > The proposed change in the parser is only enabled for parsing authz and
> > global group files, other Subversion configuration files will use the
> > current semantics.
>
> These sorts of quirks tend to make a system hard to maintain in the long run. What if we find another case where we'd like to depart from that parsing? How far would we go?
>
> What alternative solution could we consider?

Some sort of escaping mechanism might work.
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Stefan Sperling
On Mon, Dec 03, 2018 at 09:30:53AM +0100, Stefan Sperling wrote:

> On Mon, Dec 03, 2018 at 08:15:24AM +0000, Julian Foad wrote:
> > Branko Čibej wrote:
> > > The proposed change in the parser is only enabled for parsing authz and
> > > global group files, other Subversion configuration files will use the
> > > current semantics.
> >
> > These sorts of quirks tend to make a system hard to maintain in the long run. What if we find another case where we'd like to depart from that parsing? How far would we go?
> >
> > What alternative solution could we consider?
>
> Some sort of escaping mechanism might work.

See also https://issues.apache.org/jira/browse/SVN-4784

It would be nice if a solution to the authz issues would address
that issue as well.
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Branko Čibej
In reply to this post by Julian Foad-5
On 03.12.2018 09:15, Julian Foad wrote:

> Branko Čibej wrote:
>> https://issues.apache.org/jira/browse/SVN-4204
>> https://issues.apache.org/jira/browse/SVN-4795
>>
>> Even though apr_fnmatch(), which we use for matching glob patterns in
>> authz rules, supports character classes ([A-Z] etc.), we can't use them
>> in the glob rules because of the way the config parser works. For
>> example, the following rule:
>>
>> [:glob:/**/*.[Dd]oc]
>>
>> will be silently parsed to ":glob:/**/*.[Dd" [because] our config parser
>> strictly follows the semantics of Python's ConfigParser and treats the
>> first ']' it finds as the section terminator, ignoring any remaining
>> characters to the end of the line.
>>
>> The proposed patch changes this behaviour as follows:
>>
>>   * the /last/ ']' in the line is the section terminator;
>>   * only whitespace is allowed after the terminator to the end of the line.
> Glad to see a proposal.
>
> It makes me uncomfortable to depart from standard parsing. What if users are relying on Python ConfigParser or other compatible parsing as part of their Subversion authz infrastructure?

Then they're not using character classes in glob patterns.

The problem is that ConfigParser does not define an escaping method for
the ], so we can't even adopt that in our parser — and if we did, then
existing rules could silently change their meaning in really nasty ways.

> First I wondered if anything bad could happen if there's a silent change in meaning where a user has written, let's say,
>
>> [:glob:/**/secret1]    # was [:glob:/**/secret2]

Yes, this is the sort of case where the meaning would change.

> It's hard to find any plausible example that would successfully parse and actually match something, but may be possible.

If one stretches "plausible" far enough ...

>> The proposed change in the parser is only enabled for parsing authz and
>> global group files, other Subversion configuration files will use the
>> current semantics.
> These sorts of quirks tend to make a system hard to maintain in the long run. What if we find another case where we'd like to depart from that parsing? How far would we go?
>
> What alternative solution could we consider?

I can't think of a solution that would not depart from ConfigParser
semantics in one way or another. As for "the system" ... well, the fact
that config and authz parsing shares common code is really just an
accident, and certainly an implementation detail.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Branko Čibej
In reply to this post by Stefan Sperling
On 03.12.2018 09:39, Stefan Sperling wrote:

> On Mon, Dec 03, 2018 at 09:30:53AM +0100, Stefan Sperling wrote:
>> On Mon, Dec 03, 2018 at 08:15:24AM +0000, Julian Foad wrote:
>>> Branko Čibej wrote:
>>>> The proposed change in the parser is only enabled for parsing authz and
>>>> global group files, other Subversion configuration files will use the
>>>> current semantics.
>>> These sorts of quirks tend to make a system hard to maintain in the long run. What if we find another case where we'd like to depart from that parsing? How far would we go?
>>>
>>> What alternative solution could we consider?
>> Some sort of escaping mechanism might work.
> See also https://issues.apache.org/jira/browse/SVN-4784
>
> It would be nice if a solution to the authz issues would address
> that issue as well.

"Some sort of excaping" also silently changes existing rules and, worse,
does it in far more subtle ways than my proposed change.

SVN-4784 is a different issue and should be solved in the auto-props
code; I added a proposal in Jira but haven't had time to bring it to
this list.

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Julian Foad-5
In reply to this post by Branko Čibej
Branko Čibej wrote:
> I can't think of a solution that would not depart from ConfigParser
> semantics in one way or another. As for "the system" ... well, the fact
> that config and authz parsing shares common code is really just an
> accident, and certainly an implementation detail.

OK, that's fair. Your solution does seem to be a backward-compatible way that practically should work well to cope with this case.

I was going to say "this new case", but the square-bracket problem is not new with glob rules: any filename can have square brackets in it, on Unix.

If we treat the authz format as its own format, that does make me wonder if there are any other limitations the current ConfigParser format is applying, that should also be lifted.

For example, if "@harry" is parsed as a reference to group name "harry", then is there also a way to specify a username that is literally "@harry"?

In general, is there a programmatic transformation from arbitrary valid user names and paths to corresponding authz entries?

--
- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Branko Čibej
On 03.12.2018 10:39, Julian Foad wrote:
> Branko Čibej wrote:
>> I can't think of a solution that would not depart from ConfigParser
>> semantics in one way or another. As for "the system" ... well, the fact
>> that config and authz parsing shares common code is really just an
>> accident, and certainly an implementation detail.
> OK, that's fair. Your solution does seem to be a backward-compatible way that practically should work well to cope with this case.
>
> I was going to say "this new case", but the square-bracket problem is not new with glob rules: any filename can have square brackets in it, on Unix.


Yes, the original issue #4204 predates glob rules.


> If we treat the authz format as its own format, that does make me wonder if there are any other limitations the current ConfigParser format is applying, that should also be lifted.
>
> For example, if "@harry" is parsed as a reference to group name "harry", then is there also a way to specify a username that is literally "@harry"?


Well, first of all, this has nothing to do with the ConfigParser, but
with how our authz infrastructure interprets the access entry selectors.
And no, there's no way to do this.


> In general, is there a programmatic transformation from arbitrary valid user names and paths to corresponding authz entries?

Define "arbitrary valid" and I'll answer that. :)


We've always had the following restictions: usernames can't begin with
'@' or '&' or '~', and similar limitations apply to group and alias
names. Such identifiers have always been rejected at one stage or
another, and this has not been a problem.

Paths in rules are interpreted literally (either as literal paths or as
literal glob patterns), without any excaping mechanism. This hasn't
changed since the inception of authz rules, either. The only really
problematic case is having ']' in the rule, and with glob patterns, this
becomes more likely (or desired).

-- Brane
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Julian Foad-5
Branko Čibej wrote:
>> If we treat the authz format as its own format, that does make me wonder if there are any other limitations the current ConfigParser format is applying, that should also be lifted.
>>
>> For example, if "@harry" is parsed as a reference to group name "harry", then is there also a way to specify a username that is literally "@harry"?
>
> Well, first of all, this has nothing to do with the ConfigParser, but
> with how our authz infrastructure interprets the access entry selectors.

Accepted.

> And no, there's no way to do this.
> > In general, is there a programmatic transformation from arbitrary valid user names and paths to corresponding authz entries?
>
> Define "arbitrary valid" and I'll answer that. :)

Ones that Subversion otherwise accepts, apart from in this context.

> We've always had the following restictions: usernames can't begin with
> '@' or '&' or '~',

There must be also limitations with '#', '*', '=' and whitespace characters, and I don't see a way to find a complete list.

> and similar limitations apply to group and alias
> names. Such identifiers have always been rejected at one stage or
> another, and this has not been a problem.

We haven't heard vocal complaints. It could well have caused headaches for those implementing authz in their systems. (Not because they need to use such usernames, but because they need to defensively program against an incompletely known set of problems.)

Would I be totally off the mark in suggesting an enhancement request for an authz file format that addresses these issues? It seems to me this is the sort of thing "enterprise" users would value.

--
- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Michael Pilato
In reply to this post by Julian Foad-5
On 12/3/18 3:15 AM, Julian Foad wrote:
> It makes me uncomfortable to depart from standard parsing. What if
> users are relying on Python ConfigParser or other compatible parsing
> as part of their Subversion authz infrastructure?

We needn't keep this hypothetical.  ViewVC is using (a slightly
modified[*]) Python ConfigParser in this way.

-- Mike

[*] By default, ConfigParser (well, really the RawConfigParser it
subclasses) lowercases option names, which can cause username/group
matching to fail.  So ViewVC's code replaces the optionxform method of
ConfigParser with a noop lambda function.  (See
https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.optionxform
for official docs.)
Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Branko Čibej
In reply to this post by Julian Foad-5
On 03.12.2018 11:42, Julian Foad wrote:

> Branko Čibej wrote:
>>> If we treat the authz format as its own format, that does make me wonder if there are any other limitations the current ConfigParser format is applying, that should also be lifted.
>>>
>>> For example, if "@harry" is parsed as a reference to group name "harry", then is there also a way to specify a username that is literally "@harry"?
>> Well, first of all, this has nothing to do with the ConfigParser, but
>> with how our authz infrastructure interprets the access entry selectors.
> Accepted.
>
>> And no, there's no way to do this.
>>> In general, is there a programmatic transformation from arbitrary valid user names and paths to corresponding authz entries?
>> Define "arbitrary valid" and I'll answer that. :)
> Ones that Subversion otherwise accepts, apart from in this context.
>
>> We've always had the following restictions: usernames can't begin with
>> '@' or '&' or '~',
> There must be also limitations with '#', '*', '=' and whitespace characters, and I don't see a way to find a complete list.

True. Also '$' since we have the magic '$authenticated' and '$anonymous'
tokens.

The rules are not explicitly documented anywhere, but they're implied by
the documentation in The Book. With a bit of luck, I'll have these
spelled out on the wiki some day.

>> and similar limitations apply to group and alias
>> names. Such identifiers have always been rejected at one stage or
>> another, and this has not been a problem.
> We haven't heard vocal complaints. It could well have caused headaches for those implementing authz in their systems. (Not because they need to use such usernames, but because they need to defensively program against an incompletely known set of problems.)

Well, realistically, user identifiers are likely to take one of the
following forms, with reasonable variations:

  * usernme
  * name.surname
  * [hidden email]
  * CN=Name Surname,OU=marketing,DC=example,DC=com


While some authentication systems (pam_unix FTW) theoretically allow
funny usernames with leading '@', '&', '$' etc., such names are very,
very unlikely to be used in practice. Our authz parser will accept any
of the forms I listed above.

(The Distinguished Name has to be mapped through an alias because it
containe '=', this is documented in The Book.)

> Would I be totally off the mark in suggesting an enhancement request for an authz file format that addresses these issues? It seems to me this is the sort of thing "enterprise" users would value.

I don't think it's necessary. And until and unless we get an actual bug
report, I'd leave well enough alone. Inventing some escaping mechanism
post-hoc always leads to strange side effects, making some currently
valid identifiers suddenly invalid or interpreted differently.

-- Brane

P.S.: Users can't have ':' in the name of a repository for
repository-specific rules, either; this, too have never been a problem.

Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Branko Čibej
In reply to this post by Michael Pilato
On 03.12.2018 16:07, Michael Pilato wrote:

> On 12/3/18 3:15 AM, Julian Foad wrote:
>> It makes me uncomfortable to depart from standard parsing. What if
>> users are relying on Python ConfigParser or other compatible parsing
>> as part of their Subversion authz infrastructure?
> We needn't keep this hypothetical.  ViewVC is using (a slightly
> modified[*]) Python ConfigParser in this way.
>
> -- Mike
>
> [*] By default, ConfigParser (well, really the RawConfigParser it
> subclasses) lowercases option names, which can cause username/group
> matching to fail.  So ViewVC's code replaces the optionxform method of
> ConfigParser with a noop lambda function.  (See
> https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.optionxform
> for official docs.)


Interesting. Of course, our ConfigParser-like implementation already has
the option to treat section names as case-sensitive, and this option is
used ... that's righjt, by the authz file parser.

-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Support character classes in glob authz rules

Doug Robinson
Brane:

I just read through this thread.  Your proposal makes a lot of sense.
To me, it's one of those things that should go into a new version (not a patch).
And there should be a nit-picky script to point out "strange stuff" (like the example
that Julian posed).

Cheers.

Doug

On Mon, Dec 3, 2018 at 10:31 AM Branko Čibej <[hidden email]> wrote:
On 03.12.2018 16:07, Michael Pilato wrote:
> On 12/3/18 3:15 AM, Julian Foad wrote:
>> It makes me uncomfortable to depart from standard parsing. What if
>> users are relying on Python ConfigParser or other compatible parsing
>> as part of their Subversion authz infrastructure?
> We needn't keep this hypothetical.  ViewVC is using (a slightly
> modified[*]) Python ConfigParser in this way.
>
> -- Mike
>
> [*] By default, ConfigParser (well, really the RawConfigParser it
> subclasses) lowercases option names, which can cause username/group
> matching to fail.  So ViewVC's code replaces the optionxform method of
> ConfigParser with a noop lambda function.  (See
> https://docs.python.org/2/library/configparser.html#ConfigParser.RawConfigParser.optionxform
> for official docs.)


Interesting. Of course, our ConfigParser-like implementation already has
the option to treat section names as case-sensitive, and this option is
used ... that's righjt, by the authz file parser.

-- Brane



--
DOUGLAS B ROBINSON SENIOR PRODUCT MANAGER

The LIVE DATA Company
Find out more wandisco.com

THIS MESSAGE AND ANY ATTACHMENTS ARE CONFIDENTIAL, PROPRIETARY AND MAY BE PRIVILEGED

If this message was misdirected, WANdisco, Inc. and its subsidiaries, ("WANdisco") does not waive any confidentiality or privilege. If you are not the intended recipient, please notify us immediately and destroy the message without disclosing its contents to anyone. Any distribution, use or copying of this email or the information it contains by other than an intended recipient is unauthorized. The views and opinions expressed in this email message are the author's own and may not reflect the views and opinions of WANdisco, unless the author is authorized by WANdisco to express such views or opinions on its behalf. All email sent to or from this address is subject to electronic storage and review by WANdisco. Although WANdisco operates anti-virus programs, it does not accept responsibility for any damage whatsoever caused by viruses being passed.