Python 3 Bindings Query

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Python 3 Bindings Query

Troy Curtis Jr
Howdy guys,

It has been quite some time since I was last on here, but as I was looking through Fedora's Python 3 porting status list [1] I saw the familiar Subversion project showed as not yet supporting Python 3.  So I was wondering if you are interested in having me take a shot at getting it working.  For a really small effort, I would have just implemented it and then see if you would take the patch.  However, after spending a bit of time on it this afternoon, I realized it will take a bit more time to get right.  So I wanted to make sure the dev team was interesting in taking in such a patch before I bothered to finish it out.

I've looked around a bit, and other than a note about getting the top-level tests running under Python 3, I haven't seen much mention about implementing this.  I did notice there is a separate pysvn project, which lists Python 3 support, so I specifically wanted to make sure that the built-in python bindings are anticipated to be a supported feature going forward.  I also wanted to know of any partial efforts that might have already been started, or if there were discussions related to the implementation that my searches did not turn up.

Thanks,
Troy Curtis

1: http://fedora.portingdb.xyz/
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Daniel Shahaf-2
Troy Curtis Jr wrote on Mon, 16 Oct 2017 01:00 +0000:
> Howdy guys,
>

Welcome back!

> It has been quite some time since I was last on here, but as I was looking

Almost a decade, according to contribulyzer. :-)

> through Fedora's Python 3 porting status list [1] I saw the familiar
> Subversion project showed as not yet supporting Python 3.  So I was
> wondering if you are interested in having me take a shot at getting it
> working.  For a really small effort, I would have just implemented it and
> then see if you would take the patch.  However, after spending a bit of
> time on it this afternoon, I realized it will take a bit more time to get
> right.  So I wanted to make sure the dev team was interesting in taking in
> such a patch before I bothered to finish it out.
>

Thanks for asking.  In general, we'd like to support Python 3.  In 1.9
we dropped Python 2.6 support in order to make it easier to add Python
3.x support.

> I've looked around a bit, and other than a note about getting the top-level
> tests running under Python 3, I haven't seen much mention about
> implementing this.  I did notice there is a separate pysvn project, which
> lists Python 3 support, so I specifically wanted to make sure that the
> built-in python bindings are anticipated to be a supported feature going
> forward.

There hasn't been any suggestion to deprecate swig-py.  Moreover, they
are our favourite bindings for tools/ scripts, so I don't anticipate
them to be deprecated, either.

That said, the bindings see few changes nowadays, and we have always had
few swig-savvy devs around; so any help would be most welcome.

> I also wanted to know of any partial efforts that might have
> already been started, or if there were discussions related to the
> implementation that my searches did not turn up.

There are several separate uses of Python in the source tree.  I recall
patches to build/, tools/, and subversion/tests/cmdline/ that improve
3.x compatibility, but I don't recall any such changes to the bindings.
Note that we have both SWIG bindings at subversion/bindings/swig/python/
and ctypes bindings at subversion/bindings/ctypes-python/.

I take it that of all these, you're interested in the swig-py bindings?
Or are the build system and test suite also within your scope?

You said the patch was going to be a largeish one.  How large/invasive
are we talking about?  (This affects how easy it'd be for us to
review/apply it)

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

James McCoy-3
On Mon, Oct 16, 2017 at 08:36:51PM +0000, Daniel Shahaf wrote:

> Troy Curtis Jr wrote on Mon, 16 Oct 2017 01:00 +0000:
> > through Fedora's Python 3 porting status list [1] I saw the familiar
> > Subversion project showed as not yet supporting Python 3.  So I was
> > wondering if you are interested in having me take a shot at getting it
> > working.  For a really small effort, I would have just implemented it and
> > then see if you would take the patch.  However, after spending a bit of
> > time on it this afternoon, I realized it will take a bit more time to get
> > right.  So I wanted to make sure the dev team was interesting in taking in
> > such a patch before I bothered to finish it out.
> >
>
> Thanks for asking.  In general, we'd like to support Python 3.  In 1.9
> we dropped Python 2.6 support in order to make it easier to add Python
> 3.x support.

Although, that was just in the build/test infrastructure, not the actual
bindings.

> > I also wanted to know of any partial efforts that might have
> > already been started, or if there were discussions related to the
> > implementation that my searches did not turn up.
>
> There are several separate uses of Python in the source tree.  I recall
> patches to build/, tools/, and subversion/tests/cmdline/ that improve
> 3.x compatibility, but I don't recall any such changes to the bindings.
> Note that we have both SWIG bindings at subversion/bindings/swig/python/
> and ctypes bindings at subversion/bindings/ctypes-python/.

It looks like the last attempt (from 8 years ago) to make the bindings
3.x compatible still lives at ^/subversion/branches/python-3-compatibility.

Cheers,
--
James
GPG Key: 4096R/91BF BF4D 6956 BD5D F7B7  2D23 DFE6 91AE 331B A3DB
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Troy Curtis Jr
In reply to this post by Daniel Shahaf-2


On Mon, Oct 16, 2017 at 3:36 PM Daniel Shahaf <[hidden email]> wrote:

> It has been quite some time since I was last on here, but as I was looking

Almost a decade, according to contribulyzer. :-)

Yeah, you know it has been a while when your feature shows up in RHEL..2 releases ago (RHEL6)!
 

There hasn't been any suggestion to deprecate swig-py.  Moreover, they
are our favourite bindings for tools/ scripts, so I don't anticipate
them to be deprecated, either.

That said, the bindings see few changes nowadays, and we have always had
few swig-savvy devs around; so any help would be most welcome.

Perfect, that's great to hear.  This is what I assumed, but I wanted to be sure.
 
> I also wanted to know of any partial efforts that might have
> already been started, or if there were discussions related to the
> implementation that my searches did not turn up.

There are several separate uses of Python in the source tree.  I recall
patches to build/, tools/, and subversion/tests/cmdline/ that improve
3.x compatibility, but I don't recall any such changes to the bindings.
Note that we have both SWIG bindings at subversion/bindings/swig/python/
and ctypes bindings at subversion/bindings/ctypes-python/.

I take it that of all these, you're interested in the swig-py bindings?
Or are the build system and test suite also within your scope?

 
Well I certainly don't want to half way do it, so while my initial target was the swig bindings, I'll take a look at the full set.  

You said the patch was going to be a largeish one.  How large/invasive
are we talking about?  (This affects how easy it'd be for us to
review/apply it)

Yes, I suspect the effort will touch a fair number of code, but much of it will be direct substitution, and thus should be reasonable to review.  Plus, it'll certainly be broken up into smaller, more manageable commits.  I'll also be planning to simultaneously build python3 and python2 bindings, since I'll be eyeing future Fedora (and presumably other distro) packaging, which will touch a few front end build pieces.

In fact, to get the conversation going I've attached a patch which gives a sense of the road ahead.  This is where I got to yesterday before deciding that I should probably start talking to the dev team about desires and direction.  I believe that it should consist mostly of replacing various functions deprecated/removed in Py3 with wrappers to consolidate all the conditional compiling into a common location.  Then substituting the use of those functions.

My initial assessment is there there are really only a handful of the deprecated functions in use by the current subversion python bindings.  However, it may make sense to use the py3c project [1], which already provides most (all?) of the necessary shims and compatibility functions.  However, I'm not sure whether this (header-only) dependency is something you really want to pick up or not.  As I don't think there is a wide variety of functions that need wrappers, it may not be worth the new dep.

There is one fairly big decision to be made. 

As you may or may not know, one of the primary user facing changes of Python 3 is the migration of the "string" type to "unicode" types.  Previously in python 2 the "string" type basically represented a sequence of bytes, and when you printed it or did a few other manipulations you assumed/hoped/prayed it was actually in a valid encoding (typically assumed to be ASCII/latin-1/uft8).  Now in Python 3 basically all I/O operation result in a more accurate 'bytes' object, which you then 'decode' into a "unicode" object, explicitly indicating the encoding format. 

In practice, you can almost always use 'utf8', but of course that is 'almost'.  I'm sure there are scenarios when you know that you have some other encoding coming in or going out that you would need to use some other specified encoding, but I suspect that is quite rare.  I believe for this code-set, which will target being compatible with both Py2 and Py3 at the same time, it is perfectly reasonable to assume 'utf8'.  Indeed, this is the same approach the aforementioned py3c project took [2].

So I believe (and it seems the py3c developer agrees) that making the assumption of utf8 encoding is reasonable, especially given that the current py2 is likely making that same same assumption implicitly (though ASCII/latin-1 might be closer to the truth in many py2 scenarios).  There is actually a Py2 unicode object that could be an option to convert to, but the level of changes trying to move over to that for Py2 coupled with the pretty large differences in the necessary unicode conversions, would mean a lot more code churn, for likely little gain.

The questions to decide on are:
1. Are you generally comfortable with the build changes necessary to (optionally) build Py2 and Py3 bindings at the same time?
2. Do you want to pick up 'py3c' as a new dep, or implement the handful of necessary wrappers?
3. Is the assumption of utf8 encoding sufficiently reasonable?
 
My general plan of attack will be:
1. Replace deprecated Py2 swig functions and syntax with Py2/Py3 cross-compatible versions or wrappers.
2. Ensure existing full Py2 support works correctly.
3. Update build to build both, Py2 and Py3, and get Py3 working.
4. Look at the remainder of the python usage to ensure Py3 compliance.

Cheers,

Daniel

That's all I have for now,
Troy Curtis, Jr

example-subversion-py3.patch (17K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Daniel Shahaf-2
Troy Curtis Jr wrote on Tue, Oct 17, 2017 at 03:33:29 +0000:
> On Mon, Oct 16, 2017 at 3:36 PM Daniel Shahaf <[hidden email]>
> wrote:
> > I take it that of all these, you're interested in the swig-py bindings?
> > Or are the build system and test suite also within your scope?
> >
> >
> Well I certainly don't want to half way do it, so while my initial target
> was the swig bindings, I'll take a look at the full set.

Wonderful.

> > You said the patch was going to be a largeish one.  How large/invasive
> > are we talking about?  (This affects how easy it'd be for us to
> > review/apply it)
> >
>
> Yes, I suspect the effort will touch a fair number of code, but much of it
> will be direct substitution, and thus should be reasonable to review.
> Plus, it'll certainly be broken up into smaller, more manageable commits.
> I'll also be planning to simultaneously build python3 and python2 bindings,
> since I'll be eyeing future Fedora (and presumably other distro) packaging,
> which will touch a few front end build pieces.
>
> In fact, to get the conversation going I've attached a patch which gives a
> sense of the road ahead.  This is where I got to yesterday before deciding
Thanks for posting that.  Is that patch ready for commit, or is it
intended just as a proof of concept / basis for discussions?

> that I should probably start talking to the dev team about desires and
> direction.  I believe that it should consist mostly of replacing various
> functions deprecated/removed in Py3 with wrappers to consolidate all the
> conditional compiling into a common location.  Then substituting the use of
> those functions.
>

Makes sense.

> My initial assessment is there there are really only a handful of the

[snip long & detailed analysis that was helpfully summarised to:]

> The questions to decide on are:
> 1. Are you generally comfortable with the build changes necessary to
> (optionally) build Py2 and Py3 bindings at the same time?

Do you mean that a single 'make install' would install two versions of
libsvn_swig_py.so, one compiled against CPython 2 and one compiled
against CPython 3?

Or do you mean, having a (say) configure-time knob that controls whether
bindings would be built against py2 or against py3?

I think the latter would be uncontroversial.  As to the former, it would
involve more moving pieces, so I would be concerned about maintenance
overhead of the new build system code.  (We have few active developers
nowadays.)

> 2. Do you want to pick up 'py3c' as a new dep, or implement the handful of
> necessary wrappers?

I don't know the swig-py bindings well enough to answer that.  I guess
it depends on how much a "handful" would be.

Datapoints: We use 73 distinct CPython API symbols (see attached grep
results).  Of these, only three(?) entry points (all of them PyInt_*)
appear in py3c/compat.h.

> 3. Is the assumption of utf8 encoding sufficiently reasonable?

In Subversion, we assume that argv and stdout are encoded in APR's
"system encoding", but basically everything else — including nearly
every single string parameter to every public API — is required to be in
UTF-8.  svn_cmdline.h contains some of the exceptions.

I'm not sure whether that information answers your question, though.  In
what concrete cases do the bindings have to convert between str and
bytes?  What are the compatibility considerations for user code
(consumers of the swig-py2 bindings that want to upgrade to py3)?

> My general plan of attack will be:
> 1. Replace deprecated Py2 swig functions and syntax with Py2/Py3
> cross-compatible versions or wrappers.
> 2. Ensure existing full Py2 support works correctly.
> 3. Update build to build both, Py2 and Py3, and get Py3 working.
> 4. Look at the remainder of the python usage to ensure Py3 compliance.

At some point we'll want to switch one of the buildbot builders to
use python3.  (See http://subversion.apache.org/buildbot/all)

Could you explain how you — and we — test patches written in step #1,
given that "update the build to support py3" only comes later, in #3?
Is it currently possible to build the bindings against py3 (well, to
_run_ the build against py3 and get compiler errors about missing
symbols)?

Thanks for the very detailed email!

Daniel

P.S. Feel free to join #svn-dev on freenode.

swig-py-uses-of-Py-or-PY.grep (1K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Troy Curtis Jr



Thanks for posting that.  Is that patch ready for commit, or is it
intended just as a proof of concept / basis for discussions?

Definitely not ready for commit, it is the WIP of what I was hacking on Sunday.  I was just getting a feel for the kinds of things involved, and attached it to give you a sense of what I was talking about.  I even have a FIXME note in there where I *think* there may be a potential use after free issue.  We may be getting away with it because of the delay of the python garbage collector, but I plan on taking a harder look at that one before reaching a conclusion.

> The questions to decide on are:
> 1. Are you generally comfortable with the build changes necessary to
> (optionally) build Py2 and Py3 bindings at the same time?

Do you mean that a single 'make install' would install two versions of
libsvn_swig_py.so, one compiled against CPython 2 and one compiled
against CPython 3?

Or do you mean, having a (say) configure-time knob that controls whether
bindings would be built against py2 or against py3?

I think the latter would be uncontroversial.  As to the former, it would
involve more moving pieces, so I would be concerned about maintenance
overhead of the new build system code.  (We have few active developers
nowadays.)


Right, I mean something like having "--enable-swig-py2" and "--enable-swig-py3" so that Py2 and Py3 could be enabled/disabled independently, with the option of both being built in the same build.  I'll admit it is specifically with a mind towards packaging, since many (all?) distros include bindings for both versions if possible.  I knew that might be a stretch, which is why I asked ;)  I'll plan on implementing the straight forward either-or method first, and putting the dual-compile option in a separate commit that can be dropped if desired.  Oh and I won't forget about the ctypes implementation either.

One point in favor of doing both.  While Py3 is expected to be used for all new projects, and upstream support of Py2 is dropping in 2020 [1], still many existing project are still on Py2 (like for instance Trac [2]).  So in practice, I'd venture to say that both would typically need to be built in any case.
 
> 2. Do you want to pick up 'py3c' as a new dep, or implement the handful of
> necessary wrappers?

I don't know the swig-py bindings well enough to answer that.  I guess
it depends on how much a "handful" would be.

Datapoints: We use 73 distinct CPython API symbols (see attached grep
results).  Of these, only three(?) entry points (all of them PyInt_*)
appear in py3c/compat.h.


I'll default to not pulling it in, I don't think it is really needed.  But I try to default to resisting the "Not Invented Here" tendency.  However, adding a new dep, especially for so little gain, probably doesn't make sense IMHO.  The converse is that while some of the calls might not be currently used, later additions might have to add more and more compat functions, approaching what is provided in py3c.

Note also the PyString_* variants would go to PyStr_* in py3c (or svn_swig_py_*_string*() like in my WIP patch), as well as the PyFile_* calls. 
 
> 3. Is the assumption of utf8 encoding sufficiently reasonable?

In Subversion, we assume that argv and stdout are encoded in APR's
"system encoding", but basically everything else — including nearly
every single string parameter to every public API — is required to be in
UTF-8.  svn_cmdline.h contains some of the exceptions.

I'm not sure whether that information answers your question, though.  In
what concrete cases do the bindings have to convert between str and
bytes?  What are the compatibility considerations for user code
(consumers of the swig-py2 bindings that want to upgrade to py3)?


It answers it well enough.  The implicit assumption of string handling in py2 is one of the things py3 set out to address. Since currently py2 are really making assumptions already, going from 2->3 using utf8 should be reasonable IMHO.
 
> My general plan of attack will be:
> 1. Replace deprecated Py2 swig functions and syntax with Py2/Py3
> cross-compatible versions or wrappers.
> 2. Ensure existing full Py2 support works correctly.
> 3. Update build to build both, Py2 and Py3, and get Py3 working.
> 4. Look at the remainder of the python usage to ensure Py3 compliance.

At some point we'll want to switch one of the buildbot builders to
use python3.  (See http://subversion.apache.org/buildbot/all)

Could you explain how you — and we — test patches written in step #1,
given that "update the build to support py3" only comes later, in #3?
Is it currently possible to build the bindings against py3 (well, to
_run_ the build against py3 and get compiler errors about missing
symbols)?


My main concern on step #1 is the "not breaking existing py2 bindings", laying the ground work for the py3 build changes.  I'll probably send all the patches at the same time, but if they can stand alone, hopefully it'll make review easier.
 
Thanks for the very detailed email!


Err, yeah.  My '--verbose' can't really be disabled... :P


Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Julian Foad-5
Troy Curtis Jr wrote:

>>> 2. Do you want to pick up 'py3c' as a new dep, or implement the
>>> handful of necessary wrappers?
>>
>> I don't know the swig-py bindings well enough to answer that.  I guess
>> it depends on how much a "handful" would be. [...]
>
> I'll default to not pulling it in, I don't think it is really needed.  
> But I try to default to resisting the "Not Invented Here" tendency.  
> However, adding a new dep, especially for so little gain, probably
> doesn't make sense IMHO.  The converse is that while some of the calls
> might not be currently used, later additions might [...]

My two cents, from the sidelines: use py3c's solutions; I don't mind how.

I second your "resist NIH" attitude. I have little experience of Swig
and none of py3c, but assuming py3c is good I would like us to default
to using py3c solutions, so that our results are more likely to be easy
to integrate with other code and more likely to be familiar to another
Python programmer, use techniques known to have been tested in other
projects, and so on.

How you get the py3c definitions into Subversion -- copying just the
definitions we currently will use, or the whole set of headers, or even
picking up a system-installed version -- I don't mind. Do what seems
best to you.

Hopefully that's more or less what you were thinking anyway; I just
wasn't quite clear what you meant by "adding a new dep... probably
doesn't make sense".

- Julian
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Troy Curtis Jr


On Wed, Oct 18, 2017, 4:11 AM Julian Foad <[hidden email]> wrote:
Troy Curtis Jr wrote:
>>> 2. Do you want to pick up 'py3c' as a new dep, or implement the
>>> handful of necessary wrappers?
>>
>> I don't know the swig-py bindings well enough to answer that.  I guess
>> it depends on how much a "handful" would be. [...]
>
> I'll default to not pulling it in, I don't think it is really needed.
> But I try to default to resisting the "Not Invented Here" tendency.
> However, adding a new dep, especially for so little gain, probably
> doesn't make sense IMHO.  The converse is that while some of the calls
> might not be currently used, later additions might [...]

My two cents, from the sidelines: use py3c's solutions; I don't mind how.

I second your "resist NIH" attitude. I have little experience of Swig
and none of py3c, but assuming py3c is good I would like us to default
to using py3c solutions, so that our results are more likely to be easy
to integrate with other code and more likely to be familiar to another
Python programmer, use techniques known to have been tested in other
projects, and so on.

How you get the py3c definitions into Subversion -- copying just the
definitions we currently will use, or the whole set of headers, or even
picking up a system-installed version -- I don't mind. Do what seems
best to you.

Hopefully that's more or less what you were thinking anyway; I just
wasn't quite clear what you meant by "adding a new dep... probably
doesn't make sense".

- Julian

You weren't clear because I kept waffling back and forth about which direction I think it should go. Sorry for the indecision :) 

It is a header only library so it is actually only a build time dep, and appears to be readily available in various Linux distro repositories. I'll take a look at how you are handling deps and start my efforts there. 

Troy
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Daniel Shahaf-2
In reply to this post by Troy Curtis Jr
Troy Curtis Jr wrote on Wed, Oct 18, 2017 at 03:49:57 +0000:

> > Thanks for posting that.  Is that patch ready for commit, or is it
> > intended just as a proof of concept / basis for discussions?
> >
>
> Definitely not ready for commit, it is the WIP of what I was hacking on
> Sunday.  I was just getting a feel for the kinds of things involved, and
> attached it to give you a sense of what I was talking about.  I even have a
> FIXME note in there where I *think* there may be a potential use after free
> issue.  We may be getting away with it because of the delay of the python
> garbage collector, but I plan on taking a harder look at that one before
> reaching a conclusion.

Thanks, that'd be great.  To be clear, as a preexisting issue, I would
not consider it a blocker to accepting the py3 compatibility patch; I
would consider the two issues independent.

> > The questions to decide on are:
> > > 1. Are you generally comfortable with the build changes necessary to
> > > (optionally) build Py2 and Py3 bindings at the same time?
> >
> > Do you mean that a single 'make install' would install two versions of
> > libsvn_swig_py.so, one compiled against CPython 2 and one compiled
> > against CPython 3?
> >
> > Or do you mean, having a (say) configure-time knob that controls whether
> > bindings would be built against py2 or against py3?
> >
> > I think the latter would be uncontroversial.  As to the former, it would
> > involve more moving pieces, so I would be concerned about maintenance
> > overhead of the new build system code.  (We have few active developers
> > nowadays.)
> >
> >
> Right, I mean something like having "--enable-swig-py2" and
> "--enable-swig-py3" so that Py2 and Py3 could be enabled/disabled
> independently, with the option of both being built in the same build.  I'll
> admit it is specifically with a mind towards packaging, since many (all?)
> distros include bindings for both versions if possible.  I knew that might
> be a stretch, which is why I asked ;)  I'll plan on implementing the
> straight forward either-or method first, and putting the dual-compile
> option in a separate commit that can be dropped if desired.

Thanks.  I agree it would be nice to be able to build both py2 and py3
in one build.

Re "droppable patch", as you did for this thread, if you find that that
patch would take much time to write then do seek consensus on that
patch's acceptability on dev@ before investing that time: no one here
would want you to spend time writing a patch that wouldn't be
incorporated.

> Oh and I won't forget about the ctypes implementation either.

We generally require feature parity between the RA layers and the
non-{deprecated,experimental} FS layers, but there is no such lockstep
policy for swig-py and ctypes: there is no requirement that patches to
one be accompanied by patches to the other.  Now that I got that
clarification out of the way, I can answer:  Thanks!  We'd welcome
patches to ctypes.

> > > 2. Do you want to pick up 'py3c' as a new dep, or implement the handful
> > of
> > > necessary wrappers?
> >
> > I don't know the swig-py bindings well enough to answer that.  I guess
> > it depends on how much a "handful" would be.
> >
> > Datapoints: We use 73 distinct CPython API symbols (see attached grep
> > results).  Of these, only three(?) entry points (all of them PyInt_*)
> > appear in py3c/compat.h.
> >
> >
> I'll default to not pulling it in, I don't think it is really needed.  But
> I try to default to resisting the "Not Invented Here" tendency.  However,
> adding a new dep, especially for so little gain, probably doesn't make
> sense IMHO.  The converse is that while some of the calls might not be
> currently used, later additions might have to add more and more compat
> functions, approaching what is provided in py3c.
>
> Note also the PyString_* variants would go to PyStr_* in py3c (or
> svn_swig_py_*_string*() like in my WIP patch), as well as the PyFile_*
> calls.

I think we are conducting a balanced and objective decision process
regarding whether to include py3c or to crib or reimplement it.  If that
process results in a decision to crib or reimplement, that'll be okay:
NIH is not an uncontravenable axiom.  Our due diligence to NIH is to
conduct this decision process, not to ensure that this decision process
makes a NIH-compatible decision.

> > > 3. Is the assumption of utf8 encoding sufficiently reasonable?
> >
> > In Subversion, we assume that argv and stdout are encoded in APR's
> > "system encoding", but basically everything else — including nearly
> > every single string parameter to every public API — is required to be in
> > UTF-8.  svn_cmdline.h contains some of the exceptions.
> >
> > I'm not sure whether that information answers your question, though.  In
> > what concrete cases do the bindings have to convert between str and
> > bytes?  What are the compatibility considerations for user code
> > (consumers of the swig-py2 bindings that want to upgrade to py3)?
> >
> >
> It answers it well enough.  The implicit assumption of string handling in
> py2 is one of the things py3 set out to address. Since currently py2 are
> really making assumptions already, going from 2->3 using utf8 should be
> reasonable IMHO.

I don't follow how you reach the conclusion that using utf8 should be
reasonable.  Could you trace your steps?

Our goal, presumably, is for py3's "import svn" to be as much of a
drop-in replacement to py2's "import svn" as possible.  In what cases
does py2 code pass py2 str objects to libsvn_swig_py?  What happens to
those objects in py3?  The whole question of default encoding only
matters if what in py2 was a str object is under py3 a bytes object.
Maybe we should require py3 user code to pass a str object, mooting the
question entirely?

> > > My general plan of attack will be:
> > > 1. Replace deprecated Py2 swig functions and syntax with Py2/Py3
> > > cross-compatible versions or wrappers.
> > > 2. Ensure existing full Py2 support works correctly.
> > > 3. Update build to build both, Py2 and Py3, and get Py3 working.
> > > 4. Look at the remainder of the python usage to ensure Py3 compliance.
> >
> > At some point we'll want to switch one of the buildbot builders to
> > use python3.  (See http://subversion.apache.org/buildbot/all)
> >
> > Could you explain how you — and we — test patches written in step #1,
> > given that "update the build to support py3" only comes later, in #3?
> > Is it currently possible to build the bindings against py3 (well, to
> > _run_ the build against py3 and get compiler errors about missing
> > symbols)?
> >
> >
> My main concern on step #1 is the "not breaking existing py2 bindings",
> laying the ground work for the py3 build changes.  I'll probably send all
> the patches at the same time, but if they can stand alone, hopefully it'll
> make review easier.

I'd expect "Not breaking swig-py2" to be an invariant regardless of
which order swig-py3's development proceeded in.

> > Thanks for the very detailed email!
> >
> >
> Err, yeah.  My '--verbose' can't really be disabled... :P

Heheh :-)

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Troy Curtis Jr

On Wed, Oct 18, 2017 at 10:11 PM Daniel Shahaf <[hidden email]> wrote:
Troy Curtis Jr wrote on Wed, Oct 18, 2017 at 03:49:57 +0000:

> > > 3. Is the assumption of utf8 encoding sufficiently reasonable?
> >
> > In Subversion, we assume that argv and stdout are encoded in APR's
> > "system encoding", but basically everything else — including nearly
> > every single string parameter to every public API — is required to be in
> > UTF-8.  svn_cmdline.h contains some of the exceptions.
> >
> > I'm not sure whether that information answers your question, though.  In
> > what concrete cases do the bindings have to convert between str and
> > bytes?  What are the compatibility considerations for user code
> > (consumers of the swig-py2 bindings that want to upgrade to py3)?
> >
> >
> It answers it well enough.  The implicit assumption of string handling in
> py2 is one of the things py3 set out to address. Since currently py2 are
> really making assumptions already, going from 2->3 using utf8 should be
> reasonable IMHO.

I don't follow how you reach the conclusion that using utf8 should be
reasonable.  Could you trace your steps?

Our goal, presumably, is for py3's "import svn" to be as much of a
drop-in replacement to py2's "import svn" as possible.  In what cases
does py2 code pass py2 str objects to libsvn_swig_py?  What happens to
those objects in py3?  The whole question of default encoding only
matters if what in py2 was a str object is under py3 a bytes object.
Maybe we should require py3 user code to pass a str object, mooting the
question entirely?


str objects are what would be expected to come into and out of the swig API on both py2 and py3, which is exactly what I am striving to maintain.  It is just that they mean slightly different things depending on the context between the 2 versions.  In py2, it was perfectly reasonable to expect a str object to come out of a raw socket read or a file from disk.  You could then treat it as "bytes" by only doing thing with it that you would do with bytes, such as using other modules such as 'struct' or maybe one of the compression libraries.  Or if the developer expected those sources would provide printable character data, they would treat it as a "string" and do things like splits(), strips(), and formatting operations.  If it turned out that there were bytes inside that were not actually printable, you'd get a decode error during one of the string operations.

Instead with py3 a str object is a unicode object and thus printable.  The bytes->unicode happens explicitly in a decode() operation.  But since that is now an explicit action, a specific encoding must be chosen instead of relying on some default, which in py2 would likely just be ASCII.  If we use UTF8, it will work everywhere ASCII would have worked in py2, and would work in additional places where a decode error would have popped up.

The places where we'd expect this to actually need to happen are less on the Python API side (since it should be str in and out), but more on the binding's usage of the underlying Subversion API.  All the 'char *' will be considered 'bytes' by py3, and need to be decoded to str (unicode), before it is given to the Python API so that the user gets the expected str objects.

So in general I think going with utf8 is the way to go.  However, there a few places I plan on looking carefully at:
1. Anywhere raw data is "streamed in": I'm not sure if this exists in the API or not, I am assuming it does somewhere to manually feed data into the library to be used as content of a commit.

2. Filesystem paths: The general case is there are separate functions for dealing with "the filesystem encoding" whichever it happens to be [1].  So perhaps one concrete question is are paths provided to the API and elsewhere within Subversion assumed to be UTF8?  If so, then the plan to use utf8 works perfectly.  If there are exceptions, than I'll need to be extra careful in the conversion.

Troy

1: https://docs.python.org/3/c-api/unicode.html#file-system-encoding
Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Query

Daniel Shahaf-2
Troy Curtis Jr wrote on Thu, Oct 19, 2017 at 04:05:57 +0000:
> The places where we'd expect this to actually need to happen are less on
> the Python API side (since it should be str in and out), but more on the
> binding's usage of the underlying Subversion API.  All the 'char *' will be
> considered 'bytes' by py3, and need to be decoded to str (unicode), before
> it is given to the Python API so that the user gets the expected str
> objects.

Ah!  So you're thinking of data going from libsvn_* to user Python code,
not the other way around.  Now it makes sense.  (I was thinking of user
data passed into to the bindings.)

In this case, I think the rule is: if it's NUL terminated, then it's in
UTF-8 (except for some isolated exceptions such as svn_cmdline_*); if
it's a counted-length string, then it's bytes.  For example, a property
hash is an apr_hash_t* mapping const char* to const svn_string_t*,
corresponding to the data model where property names are UTF-8 strings
and property values are opaque binary blobs.

This is supposed to be explicitly documented, by the way.  For example,
svn_path.h states
.
     * All incoming and outgoing paths are non-NULL and in UTF-8, unless
     * otherwise documented.
.
but, apparently, the newer svn_dirent_uri.h doesn't have such a statement.

> So in general I think going with utf8 is the way to go.  However, there a
> few places I plan on looking carefully at:
> 1. Anywhere raw data is "streamed in": I'm not sure if this exists in the
> API or not, I am assuming it does somewhere to manually feed data into the
> library to be used as content of a commit.

Functions that add data, at various layers, are:

- svn_client_add5()
- svn_delta_editor_t
- svn_repos_load_fs6()
- svn_fs_make_file()

Paths in the repository are always in UTF-8.  File contents are treated
as opaque binary blobs and are generally presented as streams (either
svn_stream_t or an svndiff/txdelta stream).

> 2. Filesystem paths: The general case is there are separate functions for
> dealing with "the filesystem encoding" whichever it happens to be [1].  So
> perhaps one concrete question is are paths provided to the API and
> elsewhere within Subversion assumed to be UTF8?

Subversion's functions generally take UTF-8, but there are exceptiosn,
such as *_canonicalize() and *_internal_style().  They're generally
implemented in terms of apr_* functions which expect a different
encoding (see e.g. svn_io_check_file()).

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|

Python3 Swig Bindings Branch (was Re: Python 3 Bindings Query)

Troy Curtis Jr
Howdy guys,

I thought I'd point out my first (useful) commit to the ^/subversion/branches/swig-py3 branch, r1813665.  This adds the py3c library to the build and replaces (hopefully all) the various functions that are incompatible between 2 and 3 with the appropriate py3c provided interfaces.  There may be a few missed, which will certainly turn up once I enable the build for Python 3.

It touched a lot of functions a little, so the log was quite lengthy.  While the Python3 portions aren't all in place yet, it is completely usable and I'd appreciate anyone taking a look at what has been done and giving a bit of feedback.

Thanks,
Troy
Reply | Threaded
Open this post in threaded view
|

Python 3 Bindings Build [was: Re: Python 3 Bindings Query]

Branko Čibej
In reply to this post by Troy Curtis Jr
For anyone who wants to build the swig-py3 branch on the Mac:

I created a custom Homebrew tap for the py3c library (unfortunately it's
not "notorious" enough to be accepted into homebrew-core):

$ brew tap digiverse/p3c
$ brew install py3c
$ ./configure ...
...
configure: py3c library configuration via pkg-config
checking for py3c library... yes
...


-- Brane

Reply | Threaded
Open this post in threaded view
|

Re: Python 3 Bindings Build [was: Re: Python 3 Bindings Query]

Branko Čibej
On 30.10.2017 10:31, Branko Čibej wrote:
> For anyone who wants to build the swig-py3 branch on the Mac:
>
> I created a custom Homebrew tap for the py3c library (unfortunately it's
> not "notorious" enough to be accepted into homebrew-core):
>
> $ brew tap digiverse/p3c

^^^ digiverse/py3c ... typosuction.

> $ brew install py3c
> $ ./configure ...
> ...
> configure: py3c library configuration via pkg-config
> checking for py3c library... yes
> ...
>
>
> -- Brane
>