Re: Fwd: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Fwd: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message

Charles Bailey-2
--On Jun 6, 2005 12:34 PM, Michael W Thelen <[hidden email]> wrote:
>
> It looks like something bad happened to your message, as if something
> stripped out all the newlines.  Would you mind resending the patch?

Sorry; I think I've been Gmailed again; let me try from Mulberry.  Here's
the leading text:

On Wed, 23 Feb 2005, Charles Bailey wrote:
>
> Attached is a patchlet that, when expat fails to parse an hunk of XML,
> appends at least part of the offending hunk to the error message.  It

which led to an exchange regarding the need to make strings
UTF-8-safe.  After too long a haitus, I posted for comment:

On 4/21/05, Charles Bailey <[hidden email]> wrote:

>
> Well, after umpteen interrupts from the rest of life,I finally got a
> few hours to look at this again.    In checking was was already
> available, I found a handful of "string escaping" function in various
> places which perform similar tasks (at least one with the comment
> "this should share code with other_string_escaping_routine()").  Since
> I'd have to add ya such function, I thought I'd try to abstract it a
> bit, with the hope that similar routines could use a common base.
> I've appended a short proposal at the bottom of this messages,
> containing a common "engine" and an example implementation for
> creating a UTF-8-safe version of an arbitrary string.
Julian Foad was kind enough to point out a dumb thinko, but no other
comments were forthcoming, possibly because the core developers were
busy with pre-1.2 cleanup.

So, after another too-long hiatus, here's a patch which implements a
"common" string escaping function , uses it for UTF-8 escaping, and
uses that to sanitize the offending XML, which is then output in the
error message that Jack built^W^Wstarted this thread.

I've interspersed my comments in the code, since there's imho zero
chance that this version of the patch will be
substantially/stylistically suitable for committing.  They're far from
exhaustive, but this message is long enough already.

Conceptual "Log message":
[[[
Add function that escapes illegal UTF-8 characters, along the way
refactoring core of
string-escaping routines, and insure that illegal XML error message
outputs legal UTF-8.
### Probably best applied as several patches, but collected here for review.

* subversion/libsvn_subr/escape.c:
   New file
   (svn_subr__escape_string): Final-common-path function for escaping
strings.

* subversion/libsvn_subr/escape_impl.h:
   New file, declaring svn_subr__escape_string and convenience macros.
   ### Logical candidate for consolidation with utf_impl.h, perhaps as
subr_impl.h

* subversion/libsvn_subr/utf.c:
   (fuzzy_escape): Renamed to ascii_fuzzy_escape, and rewritten to use
    svn_subr__escape_string.
   (svn_utf__stringbuf_escape_utf8_fuzzy): New function which escapes
illegal
    UTF-8 in a string, returning the escaped string in a stringbuf.
   (utf8_escape_mapper): Helper function for
svn_utf__stringbuf_escape_utf8_fuzzy.

* subversion/libsvn_subr/utf_impl.h:
   Add prototype for svn_utf__stringbuf_escape_utf8_fuzzy.
   (svn_utf__cstring_escape_utf8_fuzzy):  Macro implementing variant
of above that
    returns NUL-terminated string.

* subversion/libsvn_subr/xml.c:
   (svn_xml_parse): If parse fails, print (sanitized) (part of) offending
XML
    with error message.

* subversion/tests/libsvn_subr/utf-test.c:
   (utf_escape): New function testing UTF-8 string-escaping functions.

* subversion/po/de.po, subversion/po/es.po, subversion/po/ja.po,
  subversion/po/ko.po, subversion/po/nb.po, subversion/po/pl.po,
  subversion/po/pt_BR.po, subversion/po/sv.po,
  subversion/po/zh_CN.po, subversion/po/zh_TW.po:
  Courtesy to translators, since I've changed a localized string.
]]]

The patch, with interspersed comments, is appended as an attachment.

--
Regards,
Charles Bailey  < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

xml_error_20050606.patch (29K) Download Attachment