[PATCH]On branch swig-py3: accept both of bytes/str for input char * args

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

[PATCH]On branch swig-py3: accept both of bytes/str for input char * args

Yasuhito FUTATSUKI-2
Hi,

I've made a patch swig-py3@1850520, to accept both of bytes and str
objects for input args correspondings char * args of Subversion API
on Python 3 wrapper functions. It is just a interim report, because
I have some points I want to make clear, and/or need to fix.
(Moreover, I've not touched code to convert return value of Python
callback functions yet)


The patch attached modifies 4 kind of input argment translations.

(1) typemap(in) char * (with/without const modifiers); not allow NULL,
     typemap(in) const char * MAY_BE_NULL; allows NULL
These had done by using 'parse=' typemap modifier, however there is no
PyArg_Parse() format unit to convert both of str and bytes in py3.
So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
and use it in for new typemap definition.

consideration:
* For py2, my patch code uses svn_swig_py_string_to_cstring()
   - It isn't allow Unicode for input, however 's' and 'z' format units
    PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
    it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
    add code to conversion for py2)
   - Difference of TypeError exception message. Pyrg_Parse() reports
    argment by argnum in Python wrapper function. However it can't
    determin in typemap definition code, so my patch code uses argment
    symbol instead.
* For py3, it seems to need more kindly Exception message for
  UnicodeEncodeError, which can be caused if input str contains surrogate
  data (U+DC00 - U+DCFF).
* test case for UnicodeEncodeError is needed

(2) in core.i, typemap for svn_stream_write

consideration:
* As this typemap doesn't seems to accept Unicode on py2, there seems to
  be no regression like described in (1)
* As this typemap only used for svn_stream_write wrapper which have only
  one char * arg, default UnicodeEncodeError message seems to be sufficient.

(3) typemap(in) apr_hash_t * (for various types)
These are using make_string_from_ob() for hash key string conversion,
and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
make_svn_strinf_from_ob() for hash value conversion

consideration:
* It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
  but current implementation of conversion function doesn't allows. Isn't
  it needed? (I added test for this case, but disabled until it make clear)
* test case for UnicodeEncodeError is needed (for both of hash key and
  hash value)

(4) typemap(in) apr_array_header_t *STRINGLIST
This typemap is using svn_swig_py_unwrap_string() through the function
svn_swig_py_seq_to_array().

consideration:
* test case for UnicodeEncodeError is needed (for both of hash key and
  hash value)

Thanks,
--
Yasuhito FUTATSUKI

swig-py-char-ptr-input-patch.txt (33K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for input char * args

Troy Curtis Jr
On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
<[hidden email]> wrote:

>
> Hi,
>
> I've made a patch swig-py3@1850520, to accept both of bytes and str
> objects for input args correspondings char * args of Subversion API
> on Python 3 wrapper functions. It is just a interim report, because
> I have some points I want to make clear, and/or need to fix.
> (Moreover, I've not touched code to convert return value of Python
> callback functions yet)
>
>
> The patch attached modifies 4 kind of input argment translations.
>
> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>      typemap(in) const char * MAY_BE_NULL; allows NULL
> These had done by using 'parse=' typemap modifier, however there is no
> PyArg_Parse() format unit to convert both of str and bytes in py3.
> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
> and use it in for new typemap definition.
>
> consideration:
> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>    - It isn't allow Unicode for input, however 's' and 'z' format units
>     PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>     it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>     add code to conversion for py2)


Yes I think you should support Unicode in this case, but it turns out
you are most of the way there. If you just remove the IS_PY3
conditional, it will support unicode!  The "PyBytes_*" and "PyStr_*"
functions are wrappers provided by the py3c library.  The names point
to the concept that they target, and then map it appropriately in Py2
and Py3. So

PyBytes: Sequence of byte values, e.g. "raw data"
   In Py2: str
   In Py3: bytes

PyStr: Character data
   In Py2: Unicode
   In Py3: str

So in Py2, using both PyBytes_* and PyStr_* is not redundant, they
actually do map to different things. At first it can seem a bit
confusing that PyStr means Unicode in Py2 and str in Py3, since the
type in Py2 is called "str". But note in particular that the real
Python API for "String" objects is actually PyString. PyStr is
specifically made up by py3c to represent the concept of "string" and
not necessarily the object "String". PyBytes can just be used as is,
since bytes was added as an alias for str in Py2.7.

>    - Difference of TypeError exception message. Pyrg_Parse() reports
>     argment by argnum in Python wrapper function. However it can't
>     determin in typemap definition code, so my patch code uses argment
>     symbol instead.
> * For py3, it seems to need more kindly Exception message for
>   UnicodeEncodeError, which can be caused if input str contains surrogate
>   data (U+DC00 - U+DCFF).
> * test case for UnicodeEncodeError is needed
>
> (2) in core.i, typemap for svn_stream_write
>
> consideration:
> * As this typemap doesn't seems to accept Unicode on py2, there seems to
>   be no regression like described in (1)
> * As this typemap only used for svn_stream_write wrapper which have only
>   one char * arg, default UnicodeEncodeError message seems to be sufficient.
>
> (3) typemap(in) apr_hash_t * (for various types)
> These are using make_string_from_ob() for hash key string conversion,
> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
> make_svn_strinf_from_ob() for hash value conversion
>
> consideration:
> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
>   but current implementation of conversion function doesn't allows. Isn't
>   it needed? (I added test for this case, but disabled until it make clear)
> * test case for UnicodeEncodeError is needed (for both of hash key and
>   hash value)


Yes this looks like a good catch, the generic conversion function
should handle the NULL/None case.

> (4) typemap(in) apr_array_header_t *STRINGLIST
> This typemap is using svn_swig_py_unwrap_string() through the function
> svn_swig_py_seq_to_array().
>
> consideration:
> * test case for UnicodeEncodeError is needed (for both of hash key and
>   hash value)
>

And here is the start of my patch review:

Don't forget to put a brief description of the over overall change in
the log message here. And as this is such large change, a detail
section is probably also in order as well.

> [In subversion/bindings/swig]
>
> * core.i (%typemap(in) (const char *data, apr_size_t *len): On Python 3,
>  allow Str as well for data argment of svn_stream_write()
>
> * include/svn_global.swg
>  (remove)(%typemap(in) char *, char const *, char * const,
>   char const * const): Move this typemap into include/svn_strings as
>   typemap (in) IN_STRING
>
[snip]

>
> diff --git a/subversion/bindings/swig/core.i b/subversion/bindings/swig/core.i
> index c309d48070..6993ef4c9c 100644
> --- a/subversion/bindings/swig/core.i
> +++ b/subversion/bindings/swig/core.i
> @@ -442,18 +442,31 @@
>  */
>  #ifdef SWIGPYTHON
>  %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
> -    char *tmpdata;
>      Py_ssize_t length;
> -    if (!PyBytes_Check($input)) {
> -        PyErr_SetString(PyExc_TypeError,
> -                        "expecting a bytes object for the buffer");
> -        SWIG_fail;
> +    if (PyBytes_Check($input)) {
> +        if (PyBytes_AsStringAndSize($input, (char **)&$1, &length) == -1) {
> +            SWIG_fail;
> +        }
> +    }
> +%#if IS_PY3
> +    else if (PyStr_Check($input)) {
> +        $1 = PyStr_AsUTF8AndSize($input, &length);
> +        if (PyErr_Occurred()) {
> +            SWIG_fail;
> +        }
>      }
> -    if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) {
> +%#endif

Don't use the IS_PY3 conditional here, and you get Unicode (on Py2)
conversion automatically.

> +    else {
> +        PyErr_SetString(PyExc_TypeError,
> +%#if IS_PY3
> +                        "expecting a bytes or str object for the buffer"
> +%#else
> +                        "expecting a string for the buffer"
> +%#endif

I think it is valid to say "bytes or string" object for both Py2 and
Py3, thus getting rid of another conditional.

> +                       );
>          SWIG_fail;
>      }
>      temp = ($*2_type)length;
> -    $1 = tmpdata;
>      $2 = ($2_ltype)&temp;
>  }
>  #endif

[snip]

> diff --git a/subversion/bindings/swig/include/svn_string.swg b/subversion/bindings/swig/include/svn_string.swg
> index 0fc64ebdcc..8be4c3d746 100644
> --- a/subversion/bindings/swig/include/svn_string.swg
> +++ b/subversion/bindings/swig/include/svn_string.swg
> @@ -251,6 +251,26 @@ typedef struct svn_string_t svn_string_t;
>  }
>  #endif
>
> + /* -----------------------------------------------------------------------
> +   Type: char * (input)
> +*/
> +#ifdef SWIGPYTHON
> +%typemap (in) IN_STRING
> +{
> +    $1 = svn_swig_py_string_to_cstring($input, FALSE, "$symname", "$1_name");
> +    if (PyErr_Occurred()) SWIG_fail;
> +}
> +
> +%typemap (freearg) IN_STRING "";

What does freearg here do? Looking at the docs, it seems this supposed
to be used to free allocated resources once the wrapper function
exits.

> +
> +%apply IN_STRING {
> +    const char *,
> +    char *,
> +    char const *,
> +    char * const,
> +    char const * const
> +};
> +#endif
>  /* -----------------------------------------------------------------------
>     define a way to return a 'const char *'
>  */
> diff --git a/subversion/bindings/swig/include/svn_types.swg b/subversion/bindings/swig/include/svn_types.swg
> index 319f7daa6b..7c933b1ac7 100644
> --- a/subversion/bindings/swig/include/svn_types.swg
> +++ b/subversion/bindings/swig/include/svn_types.swg
> @@ -348,12 +348,8 @@ svn_ ## TYPE ## _swig_rb_closed(VALUE self)
>  #ifdef SWIGPYTHON
>  %typemap(in) const char *MAY_BE_NULL
>  {
> -    if ($input == Py_None) {
> -        $1 = NULL;
> -    } else {
> -        $1 = PyBytes_AsString($input);
> -        if ($1 == NULL) SWIG_fail;
> -    }
> +    $1 = svn_swig_py_string_to_cstring($input, TRUE, "$symname", "$1_name");
> +    if (PyErr_Occurred()) SWIG_fail;
>  }
>  #endif
>
> diff --git a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> index 8a4ec631be..58cfec30a3 100644
> --- a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> +++ b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> @@ -506,6 +506,32 @@ void svn_swig_py_svn_exception(svn_error_t *error_chain)
>
>  /*** Helper/Conversion Routines ***/
>
> +/* Function to get char * representation of bytes/str object */
> +char *svn_swig_py_string_to_cstring(PyObject *input, int maybe_null,
> +                                    const char * funcsym, const char * argsym)

I think we need to be very careful with the return value here, both
the type and the lifetime. The trouble with using the value returned
directly from the C api below is that they typically return a pointer
to an internal buffer which must not be modified. Returning it as
"char *" is probably not the right idea. Additionally, these are
internal to the "input" object, so if something like:

  PyObject *foo = <create Py String>;
  char *my val = svn_swig_py_string_to_cstring(foo, ...);
  Py_DECREF(foo);

  svn_awesome_api(my_val, ...); /* This could explode */

Of course, we do need to handle the case of an input "char *"
argument, so I think the best option is probably to use
"apr_pstrdup()" to allocate a new string from the application pool.

> +{
> +    char *retval = NULL;
> +    if (PyBytes_Check(input)) {
> +        retval = PyBytes_AsString(input);
> +    }
> +#if IS_PY3
> +    else if (PyStr_Check(input)) {
> +        retval = (char *)PyStr_AsUTF8(input);
> +    }
> +#endif

Remove this IF_PY3 conditional as well.

> +    else if (input != Py_None || ! maybe_null) {
> +        PyErr_Format(PyExc_TypeError,
> +#if IS_PY3
> +                     "%s() argument %s must be bytes or str%s, not %s",
> +#else
> +                     "%s() argument %s must be string%s, not %s",
> +#endif
> +                     funcsym, argsym, maybe_null?" or None":"",
> +                     Py_TYPE(input)->tp_name);
> +    }
> +    return retval;
> +}
> +

[ As far as my review got for now]

I'll continue reviewing over the next few days, that is all the time I
have for tonight!

Troy
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Yasuhito FUTATSUKI-2
In article <CAEZNtZ+SKqiATLwDvzyVf=[hidden email]>
[hidden email] writes:
> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
> <[hidden email]> wrote:

>> The patch attached modifies 4 kind of input argment translations.
>>
>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>>      typemap(in) const char * MAY_BE_NULL; allows NULL
>> These had done by using 'parse=' typemap modifier, however there is no
>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>> and use it in for new typemap definition.
>>
>> consideration:
>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>>    - It isn't allow Unicode for input, however 's' and 'z' format units
>>     PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>>     it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>>     add code to conversion for py2)
>
>
> Yes I think you should support Unicode in this case, but it turns out
> you are most of the way there. If you just remove the IS_PY3
> conditional, it will support unicode!  The "PyBytes_*" and "PyStr_*"
> functions are wrappers provided by the py3c library.  The names point
> to the concept that they target, and then map it appropriately in Py2
> and Py3. So
>
> PyBytes: Sequence of byte values, e.g. "raw data"
>    In Py2: str
>    In Py3: bytes
>
> PyStr: Character data
>    In Py2: Unicode
>    In Py3: str

Unfortunately, PyStr in py3c compatibility layer API is the intersection
of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly
use PyUnicode_* for handling Unicode in py2.

 

>> (3) typemap(in) apr_hash_t * (for various types)
>> These are using make_string_from_ob() for hash key string conversion,
>> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
>> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
>> make_svn_strinf_from_ob() for hash value conversion
>>
>> consideration:
>> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
>>   but current implementation of conversion function doesn't allows. Isn't
>>   it needed? (I added test for this case, but disabled until it make clear)
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)
>
>
> Yes this looks like a good catch, the generic conversion function
> should handle the NULL/None case.

Then I'll fix it. Does it need to separate typemap that allow NULL
as hash value and dones not allow?
 
>> (4) typemap(in) apr_array_header_t *STRINGLIST
>> This typemap is using svn_swig_py_unwrap_string() through the function
>> svn_swig_py_seq_to_array().
>>
>> consideration:
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)
apr_array_header_t is't hash, so this was '(for values)' :)

 
> And here is the start of my patch review:
>
> Don't forget to put a brief description of the over overall change in
> the log message here. And as this is such large change, a detail
> section is probably also in order as well.
 
Yes, I see.

> > [In subversion/bindings/swig]
> >
> > * core.i (%typemap(in) (const char *data, apr_size_t *len): On Python 3,
> >  allow Str as well for data argment of svn_stream_write()
> >
> > * include/svn_global.swg
> >  (remove)(%typemap(in) char *, char const *, char * const,
> >   char const * const): Move this typemap into include/svn_strings as
> >   typemap (in) IN_STRING
> >
> [snip]
> >
> > diff --git a/subversion/bindings/swig/core.i b/subversion/bindings/swig/core.i
> > index c309d48070..6993ef4c9c 100644
> > --- a/subversion/bindings/swig/core.i
> > +++ b/subversion/bindings/swig/core.i
> > @@ -442,18 +442,31 @@
> >  */
> >  #ifdef SWIGPYTHON
> >  %typemap(in) (const char *data, apr_size_t *len) ($*2_type temp) {
> > -    char *tmpdata;
> >      Py_ssize_t length;
> > -    if (!PyBytes_Check($input)) {
> > -        PyErr_SetString(PyExc_TypeError,
> > -                        "expecting a bytes object for the buffer");
> > -        SWIG_fail;
> > +    if (PyBytes_Check($input)) {
> > +        if (PyBytes_AsStringAndSize($input, (char **)&$1, &length) == -1) {
> > +            SWIG_fail;
> > +        }
> > +    }
> > +%#if IS_PY3
> > +    else if (PyStr_Check($input)) {
> > +        $1 = PyStr_AsUTF8AndSize($input, &length);
> > +        if (PyErr_Occurred()) {
> > +            SWIG_fail;
> > +        }
> >      }
> > -    if (PyBytes_AsStringAndSize($input, &tmpdata, &length) == -1) {
> > +%#endif
>
> Don't use the IS_PY3 conditional here, and you get Unicode (on Py2)
> conversion automatically.

I'll try to use Unicode_* API here for both of py2 and py3.
 

> > +    else {
> > +        PyErr_SetString(PyExc_TypeError,
> > +%#if IS_PY3
> > +                        "expecting a bytes or str object for the buffer"
> > +%#else
> > +                        "expecting a string for the buffer"
> > +%#endif
>
> I think it is valid to say "bytes or string" object for both Py2 and
> Py3, thus getting rid of another conditional.

I see.
 

> > +                       );
> >          SWIG_fail;
> >      }
> >      temp = ($*2_type)length;
> > -    $1 = tmpdata;
> >      $2 = ($2_ltype)&temp;
> >  }
> >  #endif
>
> [snip]
>
> > diff --git a/subversion/bindings/swig/include/svn_string.swg b/subversion/bindings/swig/include/svn_string.swg
> > index 0fc64ebdcc..8be4c3d746 100644
> > --- a/subversion/bindings/swig/include/svn_string.swg
> > +++ b/subversion/bindings/swig/include/svn_string.swg
> > @@ -251,6 +251,26 @@ typedef struct svn_string_t svn_string_t;
> >  }
> >  #endif
> >
> > + /* -----------------------------------------------------------------------
> > +   Type: char * (input)
> > +*/
> > +#ifdef SWIGPYTHON
> > +%typemap (in) IN_STRING
> > +{
> > +    $1 = svn_swig_py_string_to_cstring($input, FALSE, "$symname", "$1_name");
> > +    if (PyErr_Occurred()) SWIG_fail;
> > +}
> > +
> > +%typemap (freearg) IN_STRING "";
>
> What does freearg here do? Looking at the docs, it seems this supposed
> to be used to free allocated resources once the wrapper function
> exits.

This is the replacement of %typemap(in, parse='s')
and %typemap(in, parse='z') which uses internal char * buffer of
Python object. This typemap also uses py object internal buffer,
so it doesn't need typemap for freearg.  As far as I read the code
SWIG produed for %typemap(in, parse=...), it seems undefine
typemap(freearg) corresponding it. If `%typemap (freearg) IN_STRING "";'
doesn't exist here, pre-defined typemap for char * in swig library
becomes effective, then extra codes are produced.  

 

> > +
> > +%apply IN_STRING {
> > +    const char *,
> > +    char *,
> > +    char const *,
> > +    char * const,
> > +    char const * const
> > +};
> > +#endif
> >  /* -----------------------------------------------------------------------
> >     define a way to return a 'const char *'
> >  */
> > diff --git a/subversion/bindings/swig/include/svn_types.swg b/subversion/bindings/swig/include/svn_types.swg
> > index 319f7daa6b..7c933b1ac7 100644
> > --- a/subversion/bindings/swig/include/svn_types.swg
> > +++ b/subversion/bindings/swig/include/svn_types.swg
> > @@ -348,12 +348,8 @@ svn_ ## TYPE ## _swig_rb_closed(VALUE self)
> >  #ifdef SWIGPYTHON
> >  %typemap(in) const char *MAY_BE_NULL
> >  {
> > -    if ($input == Py_None) {
> > -        $1 = NULL;
> > -    } else {
> > -        $1 = PyBytes_AsString($input);
> > -        if ($1 == NULL) SWIG_fail;
> > -    }
> > +    $1 = svn_swig_py_string_to_cstring($input, TRUE, "$symname", "$1_name");
> > +    if (PyErr_Occurred()) SWIG_fail;
> >  }
> >  #endif
> >
> > diff --git a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > index 8a4ec631be..58cfec30a3 100644
> > --- a/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > +++ b/subversion/bindings/swig/python/libsvn_swig_py/swigutil_py.c
> > @@ -506,6 +506,32 @@ void svn_swig_py_svn_exception(svn_error_t *error_chain)
> >
> >  /*** Helper/Conversion Routines ***/
> >
> > +/* Function to get char * representation of bytes/str object */
> > +char *svn_swig_py_string_to_cstring(PyObject *input, int maybe_null,
> > +                                    const char * funcsym, const char * argsym)
>
> I think we need to be very careful with the return value here, both
> the type and the lifetime. The trouble with using the value returned
> directly from the C api below is that they typically return a pointer
> to an internal buffer which must not be modified. Returning it as
> "char *" is probably not the right idea.

I see.

>                                          Additionally, these are
> internal to the "input" object, so if something like:
>
>   PyObject *foo = <create Py String>;
>   char *my val = svn_swig_py_string_to_cstring(foo, ...);
>   Py_DECREF(foo);
>
>   svn_awesome_api(my_val, ...); /* This could explode */
>
> Of course, we do need to handle the case of an input "char *"
> argument, so I think the best option is probably to use
> "apr_pstrdup()" to allocate a new string from the application pool.

I don't suppose that this function is called other than
"%typemap(in) IN_STRING" and "%typemap(in) char *MAY_BE_NULL",
so I think the comment for this function is insufficient about it.
(APIs which need to use apr_pstrdup() are already separated
like "%typemap(in) const void *value" in core.i, I think.)
 

> > +{
> > +    char *retval = NULL;
> > +    if (PyBytes_Check(input)) {
> > +        retval = PyBytes_AsString(input);
> > +    }
> > +#if IS_PY3
> > +    else if (PyStr_Check(input)) {
> > +        retval = (char *)PyStr_AsUTF8(input);
> > +    }
> > +#endif
>
> Remove this IF_PY3 conditional as well.
>
> > +    else if (input != Py_None || ! maybe_null) {
> > +        PyErr_Format(PyExc_TypeError,
> > +#if IS_PY3
> > +                     "%s() argument %s must be bytes or str%s, not %s",
> > +#else
> > +                     "%s() argument %s must be string%s, not %s",
> > +#endif
> > +                     funcsym, argsym, maybe_null?" or None":"",
> > +                     Py_TYPE(input)->tp_name);
> > +    }
> > +    return retval;
> > +}
> > +
>
> [ As far as my review got for now]
>
> I'll continue reviewing over the next few days, that is all the time I
> have for tonight!
>
> Troy

Thank you for review. I'll start to fix pointed out.

--
Yasuhito FUTATSUKI
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Yasuhito FUTATSUKI-2
On 1/14/19 3:10 PM, Yasuhito FUTATSUKI wrote:

> In article <CAEZNtZ+SKqiATLwDvzyVf=[hidden email]>
> [hidden email] writes:
>> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
>> <[hidden email]> wrote:
>
>>> The patch attached modifies 4 kind of input argment translations.
>>>
>>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>>>       typemap(in) const char * MAY_BE_NULL; allows NULL
>>> These had done by using 'parse=' typemap modifier, however there is no
>>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>>> and use it in for new typemap definition.
>>>
>>> consideration:
>>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>>>     - It isn't allow Unicode for input, however 's' and 'z' format units
>>>      PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>>>      it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>>>      add code to conversion for py2)
>>
>>
>> Yes I think you should support Unicode in this case, but it turns out
>> you are most of the way there. If you just remove the IS_PY3
>> conditional, it will support unicode!  The "PyBytes_*" and "PyStr_*"
>> functions are wrappers provided by the py3c library.  The names point
>> to the concept that they target, and then map it appropriately in Py2
>> and Py3. So
>>
>> PyBytes: Sequence of byte values, e.g. "raw data"
>>     In Py2: str
>>     In Py3: bytes
>>
>> PyStr: Character data
>>     In Py2: Unicode
>>     In Py3: str
>
> Unfortunately, PyStr in py3c compatibility layer API is the intersection
> of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly
> use PyUnicode_* for handling Unicode in py2.

In py2, it seems there is no C API to return (const) char * buffer
corresponding to Unicode object directly In py3, PyUnicode_AsUTF8() returns
(const) char *, without extra object reference (and py3c uses it as the entity
of PyStr_AsUTF8()).

So, for py2, we should explicitly convert Unicode object into new Bytes object
and hold its reference while API call, then release it after API call
in %typemap(freearg).

Alternatively, revert to use %typemap(in, parse="s"), %typemap(in, parse="z")
in py2 (only).

--
Yasuhito FUTATSUKI
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Yasuhito FUTATSUKI-2
On 1/14/19 6:30 PM, Yasuhito FUTATSUKI wrote:
> On 1/14/19 3:10 PM, Yasuhito FUTATSUKI wrote:
>> In article <CAEZNtZ+SKqiATLwDvzyVf=[hidden email]>
>> [hidden email] writes:
>>> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI


>>> PyBytes: Sequence of byte values, e.g. "raw data"
>>>     In Py2: str
>>>     In Py3: bytes
>>>
>>> PyStr: Character data
>>>     In Py2: Unicode
>>>     In Py3: str
>>
>> Unfortunately, PyStr in py3c compatibility layer API is the intersection
>> of PyString in Python 2, and PyUnicode in Python 3, so we must explicitly
>> use PyUnicode_* for handling Unicode in py2.

Aha, both of those are partially correct and partially wrong.

PyStr_Check:
     In Py2: PyString_Check, accept string and its subtype object.
             (not accept unicode object)
     In Py3: PyUnicode_Check, accept string and its subtype object.

PyStr_UTF8:
     In Py2: PyString_AsString, accept both of bytes(str) and unicode.
     In Py3: PyUnicode_AsUTF8, accept str(Unicode object).

and, PyUnicode_Check() is also provided for py2.

So, we can use code fragment like:

         ...
     if (PyBytes_Check(input))
       {
         retval = PyBytes_AsString(input);
       }
     else if (PyUnicode_Check(input))
       {
         retval = (char *)PyStr_AsUTF8(input);
       }
      else
          ...

--
Yasuhito FUTATSUKI
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Yasuhito FUTATSUKI-2
Patch updated.

In article <CAEZNtZ+SKqiATLwDvzyVf=[hidden email]>
[hidden email] writes:
> On Thu, Jan 10, 2019 at 9:50 AM Yasuhito FUTATSUKI
> <[hidden email]> wrote:

>> The patch attached modifies 4 kind of input argment translations.
>>
>> (1) typemap(in) char * (with/without const modifiers); not allow NULL,
>>      typemap(in) const char * MAY_BE_NULL; allows NULL
>> These had done by using 'parse=' typemap modifier, however there is no
>> PyArg_Parse() format unit to convert both of str and bytes in py3.
>> So I make a function svn_swig_py_string_to_cstring() in swigutil_py.c,
>> and use it in for new typemap definition.
>>
>> consideration:
>> * For py2, my patch code uses svn_swig_py_string_to_cstring()
>>    - It isn't allow Unicode for input, however 's' and 'z' format units
>>     PyArg_Parse() Unicode in py2. If it is need to accept Unicode in py2,
>>     it is need to fix. (use svn_swig_py_string_to_cstring() py3 only, or
>>     add code to conversion for py2)
>
>
> Yes I think you should support Unicode in this case, but it turns out
> you are most of the way there. If you just remove the IS_PY3
> conditional, it will support unicode!  The "PyBytes_*" and "PyStr_*"
> functions are wrappers provided by the py3c library.  The names point
> to the concept that they target, and then map it appropriately in Py2
> and Py3. So
>
> PyBytes: Sequence of byte values, e.g. "raw data"
>    In Py2: str
>    In Py3: bytes
>
> PyStr: Character data
>    In Py2: Unicode
>    In Py3: str
Now it uses PyUnicode_Check() and PyStr_UTF8(), then it works in py2 and py3
without IS_PY3 conditional here.

>>    - Difference of TypeError exception message. Pyrg_Parse() reports
>>     argment by argnum in Python wrapper function. However it can't
>>     determin in typemap definition code, so my patch code uses argment
>>     symbol instead.

This is still left, if we treat it as a regression.

>> * For py3, it seems to need more kindly Exception message for
>>   UnicodeEncodeError, which can be caused if input str contains surrogate
>>   data (U+DC00 - U+DCFF).

I reconsidered it and abandoned, because of UnicodeEncodeError structure.
It is enought to analyse why it causes.

> * test case for UnicodeEncodeError is needed

Added.

>> (2) in core.i, typemap for svn_stream_write
>>
>> consideration:
>> * As this typemap doesn't seems to accept Unicode on py2, there seems to
>>   be no regression like described in (1)
>> * As this typemap only used for svn_stream_write wrapper which have only
>>   one char * arg, default UnicodeEncodeError message seems to be sufficient.

Not changed in this patch.

>> (3) typemap(in) apr_hash_t * (for various types)
>> These are using make_string_from_ob() for hash key string conversion,
>> and typemap(in) apr_hash_t *HASH_CSTRING uses it for hash value
>> conversion, too. Similarly typemap(in) apr_hash_t *PROPHASH uses
>> make_svn_strinf_from_ob() for hash value conversion
>>
>> consideration:
>> * It seems some of API (e.g. svn_prop_diffs()) allows NULL for hash value,
>>   but current implementation of conversion function doesn't allows. Isn't
>>   it needed? (I added test for this case, but disabled until it make clear)
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)
>
>
> Yes this looks like a good catch, the generic conversion function
> should handle the NULL/None case.
I've fixed it. To fix this, check key and value separately in
svn_swig_py_*_from_dict().

Test case for UnicodeEncodeError is added, and it detect make_svn_strinf_from_ob()
couldn't handle this case :) Yes, I've fixed it, of course.

>> (4) typemap(in) apr_array_header_t *STRINGLIST
>> This typemap is using svn_swig_py_unwrap_string() through the function
>> svn_swig_py_seq_to_array().
>>
>> consideration:
>> * test case for UnicodeEncodeError is needed (for both of hash key and
>>   hash value)

Test case for UnicodeEncodeError is added in tests/client.py.


> And here is the start of my patch review:

I've intended to fix problems pointed out, except the comment for
svn_swig_py_string_to_cstring(). For svn_swig_py_string_to_cstring(),
I've added a comment for the reason of return type and why it seems to
sufficient not to duplicate string entity.

Thanks,
--
Yasuhito FUTATSUKI

swig-py-char-ptr-input-revised-patch.txt (43K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Yasuhito FUTATSUKI-2
On 1/20/19 5:50 PM, Yasuhito FUTATSUKI wrote:
> Patch updated.

I've commited it with modifications below, as r1853592.

* Add support to convert py_file from unicode file name in
  svn_swig_py_make_file()
* Revise test case for svn_stream_write() in python/tests/core.py

Thanks,
--
Yasuhito FUTATSUKI
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Daniel Shahaf-2
Yasuhito FUTATSUKI wrote on Sat, 16 Feb 2019 07:37 +00:00:
> On 1/20/19 5:50 PM, Yasuhito FUTATSUKI wrote:
> > Patch updated.
>
> I've commited it with modifications below, as r1853592.

Thanks!

I was wondering what are the blockers for merging swig-py3 into trunk?
The only item in BRANCH-README is "Fix a compile error on windows with
py3", but I don't even know if that's a current issue or an old one that has
since been fixed.

Cheers,

Daniel
Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Yasuhito FUTATSUKI-2
On 2/17/19 9:04 AM, Daniel Shahaf wrote:

> I was wondering what are the blockers for merging swig-py3 into trunk?
> The only item in BRANCH-README is "Fix a compile error on windows with
> py3", but I don't even know if that's a current issue or an old one that has
> since been fixed.

I'm sorry I'm not sure the compile error still exists or not. (I don't
have Windows build environment yet)

The only thing I know about it, my patch committed on r1849784 might break
Windows build because of "PY3" macro for swig which added only Unix/Linux
build. However, it might not be a compile error, but might produce improper
codes under py3 environment, and this might be dissolved on r1853592,
for "PY3" macro in swig typemaps no longer used.


I think there are still some other points need to look over.

* Do scrpts under tools/ work on py3?
* If we want to install py2 bindings and py3 bindings at the same time,
   we must rename libsvn_swig_py library for py3.

Cheers,
--
Yasuhito FUTATSUKI


Reply | Threaded
Open this post in threaded view
|

Re: [PATCH]On branch swig-py3: accept both of bytes/str for inputchar * args

Daniel Shahaf-2
Yasuhito FUTATSUKI wrote on Sun, 17 Feb 2019 06:13 +00:00:

> On 2/17/19 9:04 AM, Daniel Shahaf wrote:
>
> > I was wondering what are the blockers for merging swig-py3 into trunk?
> > The only item in BRANCH-README is "Fix a compile error on windows with
> > py3", but I don't even know if that's a current issue or an old one that has
> > since been fixed.
>
> I'm sorry I'm not sure the compile error still exists or not. (I don't
> have Windows build environment yet)
>

There's no need to apologize; we don't expect every developer to be able
to build on every platform.  If you are able to set up a Windows build
environment that would be very helpful, of course, but if not, a Windows
developer can do that.

If you just need a quick tests run, you can also trigger buildbot builds
of the swig-py3 branch on Windows by joining #svn-dev on IRC and
issueing an appropriate command to buildbot's IRC bot.

> The only thing I know about it, my patch committed on r1849784 might break
> Windows build because of "PY3" macro for swig which added only Unix/Linux
> build. However, it might not be a compile error, but might produce improper
> codes under py3 environment, and this might be dissolved on r1853592,
> for "PY3" macro in swig typemaps no longer used.
>

OK.

>
> I think there are still some other points need to look over.
>
> * Do scrpts under tools/ work on py3?

Good call: the branch should not cause regressions in tools/ scripts
that use the SWIG bindings.  That said, I would assume that Python
scripts that don't use the swig bindings are orthogonal to the branch;
they _should_ be ported to Python 3, but that work can happen on trunk,
before or after the merge.

> * If we want to install py2 bindings and py3 bindings at the same time,
>    we must rename libsvn_swig_py library for py3.

Good question, but I don't know the answer.

Thanks,

Daniel