Re: Fwd: [PATCH] Re: [PATCH] Include offending XML in "Malformed XML" error message
--On Jun 6, 2005 12:34 PM, Michael W Thelen <[hidden email]> wrote:
> It looks like something bad happened to your message, as if something
> stripped out all the newlines. Would you mind resending the patch?
Sorry; I think I've been Gmailed again; let me try from Mulberry. Here's
the leading text:
On Wed, 23 Feb 2005, Charles Bailey wrote:
> Attached is a patchlet that, when expat fails to parse an hunk of XML,
> appends at least part of the offending hunk to the error message. It
which led to an exchange regarding the need to make strings
UTF-8-safe. After too long a haitus, I posted for comment:
> Well, after umpteen interrupts from the rest of life,I finally got a
> few hours to look at this again. In checking was was already
> available, I found a handful of "string escaping" function in various
> places which perform similar tasks (at least one with the comment
> "this should share code with other_string_escaping_routine()"). Since
> I'd have to add ya such function, I thought I'd try to abstract it a
> bit, with the hope that similar routines could use a common base.
> I've appended a short proposal at the bottom of this messages,
> containing a common "engine" and an example implementation for
> creating a UTF-8-safe version of an arbitrary string.
Julian Foad was kind enough to point out a dumb thinko, but no other
comments were forthcoming, possibly because the core developers were
busy with pre-1.2 cleanup.
So, after another too-long hiatus, here's a patch which implements a
"common" string escaping function , uses it for UTF-8 escaping, and
uses that to sanitize the offending XML, which is then output in the
error message that Jack built^W^Wstarted this thread.
I've interspersed my comments in the code, since there's imho zero
chance that this version of the patch will be
substantially/stylistically suitable for committing. They're far from
exhaustive, but this message is long enough already.
Conceptual "Log message":
Add function that escapes illegal UTF-8 characters, along the way
refactoring core of
string-escaping routines, and insure that illegal XML error message
outputs legal UTF-8.
### Probably best applied as several patches, but collected here for review.
(svn_subr__escape_string): Final-common-path function for escaping
New file, declaring svn_subr__escape_string and convenience macros.
### Logical candidate for consolidation with utf_impl.h, perhaps as
(fuzzy_escape): Renamed to ascii_fuzzy_escape, and rewritten to use
(svn_utf__stringbuf_escape_utf8_fuzzy): New function which escapes
UTF-8 in a string, returning the escaped string in a stringbuf.
(utf8_escape_mapper): Helper function for
Add prototype for svn_utf__stringbuf_escape_utf8_fuzzy.
(svn_utf__cstring_escape_utf8_fuzzy): Macro implementing variant
of above that
returns NUL-terminated string.
(svn_xml_parse): If parse fails, print (sanitized) (part of) offending
with error message.
(utf_escape): New function testing UTF-8 string-escaping functions.
* subversion/po/de.po, subversion/po/es.po, subversion/po/ja.po,
subversion/po/ko.po, subversion/po/nb.po, subversion/po/pl.po,
Courtesy to translators, since I've changed a localized string.
The patch, with interspersed comments, is appended as an attachment.
Charles Bailey < bailey _at_ newman _dot_ upenn _dot_ edu >
Newman Center at the University of Pennsylvania ---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email]