[Xml-compile] bytes vs. characters?

Brian Phillips bpphillips+ml at gmail.com
Wed Jul 20 18:15:40 GMT 2011


We ran into an issue recently when upgrading from an ancient version of
XML::Compile::SOAP (v0.78) to a newer version (v2.23) where data that was
previously being encoded correctly to match the UTF-8 character set in the
header of the XML document was being passed through "as-is" without any
encoding.  It looks like this was intentionally removed in the big rewrite
version (v2.00_01) as shown in this gitpan link:
https://github.com/gitpan/XML-Compile-SOAP/commit/763931037231fb8c9bb42d715=
fe126d2707c4915#L61L123

Disclaimer: the platform I'm working on is only starting to become UTF-8
friendly.  The legacy code all currently assumes ISO-8859-1 for all data
going in or coming out of the system.

We have identified a few different ways that we can address this on our end:
1) Make sure everything is decoded to Perl characters before passing to
XML::Compile::SOAP.  I realize this should be done anyway but the fact that
this works on it's own almost seems coincidental because I can't see any
point at which these strings are encoded into the character set specified in
the SOAP request (which defaults to utf-8).  If Perl changes the internal
storage of it's strings to something that's not UTF-8, I'm guessing this
would no longer work.
2) Tell XML::Compile::SOAP to send a different character set header that
would match the byte encoding of the data.  In our case, simply passing an
ISO-8859-1 character set seems to allow things to match up (the body of the
XML document matches the XML declaration)
3) Encode all data sent to XML::Compile::SOAP such that it matches the
character set of the request (UTF-8 unless otherwise specified)

It seems that we really should be doing both #1 and #3 with the current
version of XML::Compile::SOAP.  Although in my opinion, #3 would be done by
XML::Compile automatically (i.e. you send in characters and out comes
encoded bytes)

My questions:
1) Does XML::Compile::SOAP presume that all input is coming in the form of
bytes since it is passing things through to XML::LibXML without any explicit
encoding?
2) Do you agree that XML::Compile::SOAP, as the portion of the process that
is doing I/O, should be doing some encoding before sending the request over
the wire?
3) Am I missing something here? :-)  (I don't claim to be a Unicode expert!)

Thanks,
Brian Phillips
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.scsys.co.uk/pipermail/xml-compile/attachments/20110720/37=
79dd1c/attachment.htm


More information about the Xml-compile mailing list