Avatar image for callahan09
Posted by callahan09 (24 posts) - - Show Bio

This post deals with an XML encoding issue that I found, and I can only attest to the existence of this issue when XML is the delivery format, because I haven't worked with JSON or JSONP.

I noticed that I wasn't getting all of the characters_credits when I parsed the XML for Batman: Harley and Ivy.  I was supposed to be getting 13 characters, but wasn't getting one of them.  So after a bit of digging I discovered that Fawny Cougérre was the character that wasn't coming through.  It jumped out at me pretty quickly: it's that character with the diacritical mark, that é, that's the culprit.  The XML returned by the API is UTF-8 encoded, but, as I just learned, can contain characters that require a UTF-16 encoding to be read by an XML parser.

So in order to correctly read the XML without any errors when situations like this pop up in the XML results, I made a utility for my ComicVine API library that converts the encoding of the incoming XML to UTF-16.  So rather than take the URL for the API query and use that in an XmlTextReader to get the XML, now I'm downloading the source from the API query URL directly into a string, replacing the XML attribute encoding='UTF-8' with encoding='UTF-16', then loading the string into a MemoryStream, then loading the MemoryStream into a StringStream with character encoding specified to UTF-16.  THEN I read the StringStream into the XmlTextReader and I've got a properly encoded XML document that I can parse and have no problem reading results with non-standard characters.