[Xml-compile] Re: Null characters in strings

Patrick Powell papowell at astart.com
Thu Dec 12 00:06:31 GMT 2013


On 12/10/13 14:43, Mark Overmeer wrote:
> * Patrick Powell (papowell at astart.com) [131210 17:17]:
>> I ran into a very odd problem today.  Due to a cut and paste update
>> of a database field,  a Null (0x00) value was put into a string.
> oops
>
Yeah.  I spent quite a while trying to figure this one out.
>> I ran into some funny problems.   First,  one toolset generated a
>> � for the null.  However,  this resulted
>> in an error when sent to an XML::SOAP server.  - illegal XMLchar
>> value 0 (I do not have the exact error message).
> Who did complain?  The server?  XML::Compile has a light task: it
> does not look at the content of the strings at all: if anyone
> did encode \x0 into &#x0, then that is libxml2.
The server (Implemented using XML::Compile) complained,
and generated a text message response.  It appears to have
originated from the libxml2 library,  as you suggested.
>> I then used an XML::Compile based client and looked at the SOAP
>> message that was being sent.  It did not appear to have the null
>> value in the string.
> Well, never trust editors in this case.  Did you use 'vim' with the
> 'l' command?
I actually used WireShark and looked at the message going out at the 
TCP/IP level.
There were no '%#x00;'  or similar text in the message but there were 
NULLs in the
MySQL text that I used to generate the message.
>> Question:  Does XML::Compile or its support libraries/facilties
>> delete NULL characters before
>> doing string encoding (i.e. - translates & to &)?   Does it do
>> this for other characters?
> libxml2 wrapped by XML::LibXML

Well, apparently libxml2 has nicely removed the 'illegal' control 
characters from outgoing XML
but it refuses to handle them on incoming XML.   I suspect that NULL 
(0x00) is treated as a
'truly evil' character as it is used as an end of string indicator in 
C,  and I can envision all sorts
of havoc if it was not deleted during the decoding phase of the XML 
translation.  Since libXML
is C/C++ based this would make a lot of sense.

>
>> Question:  Is there a magic option/flag for XML::Compile so that it
>> will ACCEPT strings with encoded null characters in them?
> No, X::C is agnostic about charsets
> You have to look at the libxml2 lists for your answers... sorry.
Yes.   You have to go into the guts of libxml2 to fix this.   I think 
the correct way to do
this fix is to have the senders of the data fix up their databases and 
remove the junk
characters.



More information about the Xml-compile mailing list