LibXML - ParseDoc fails
BlitzMax Forums/Brucey's Modules/LibXML - ParseDoc fails
| ||
I'm loading my XML files from memory, so I put them into a string and then called TXMLDoc.ParseDoc() to parse them. The debug log shows something like this : Entity: line 1: parser error : Start tag expected, '<' not found <?xml version="1.0" encoding="utf-8"?> I don't know where the garbage is coming from. It's not there in my string, when I send the string to the debuglog. Is LibXML expecting a different string format or something? Or is there a trick to this? |
| ||
Yap. I had this problem a while back.. You have to comment out the calls that free the string passed to ParseDoc() Brucey, you haven't fixed this yet? |
| ||
If you reported this a while back, it's probably my fault. I haven't updated for quite a while. I'll check for an updated version on his site. Thanks for the tip. EDIT: Nope, There's a change to TXMLReader to do with string cleanup, but I don't think that gets called from ParseDoc. I think I see your suggested fix though. He frees the string before calling txmldoc._Create(). I think I need to store that, free the string, and then return the doc. |
| ||
Ok, well I've made the fix, but I'm still getting the same error, so I guess the fix wasn't necessary. I went and updated to 1.14 from the SVN just in case, but that didn't fix anything either. I guess it must be a problem with _xmlConvertMaxToUTF8(text).toCString() because that's what prepares the string for parsing. |
| ||
Where did your xml come from? Is it possible you have a BOM in the first few bytes? Those bytes (were it a BOM) would be : EF BB BF |
| ||
Could be an issue related to my conversion of the string from Max 16-bit format to UTF-8, which mangles the BOM. |
| ||
Snap :-pSuperStrict Framework bah.libxml Import brl.standardio Const BOM_UTF8:String = Chr(239) + Chr(187) + Chr(191) Local s:String = "<?xml version=~q1.0~q?><root/>" Local doc:TxmlDoc = TxmlDoc.ParseDoc(BOM_UTF8 + s) ... Entity: line 1: parser error : Start tag expected, '<' not found <?xml version="1.0"?><root/> |
| ||
Fix committed (rev 436) Apologies for the delay in providing a fix... btw, TxmlDoc.parseFile() supports "Incbin::....." filenames, if you want to skip the loading into a string first. This should also handle the BOM bytes, as libxml tests for those already. The issue above was because of the byte-conversion I'm performing on the string to get it into UTF-8. |
| ||
Thanks for that, Brucey. I'm actually reading the XML from an encrypted archive, so I stream it into a string, and parse it from there. |