4
Vote

File Corruption?

description

When I open a file and save it (with or without editing it), XML Notepad adds three characters to the beginning of the file: hex EF,BB,BF.
 
This doesn't make XML parsers happy.
 
Version is 2.5.2798.17141. I did a repair just to make sure. Does the same thing on two machines.
 
What's going on?

comments

fraka113 wrote Mar 21, 2008 at 3:41 PM

There is a reason why they are there.....
Snipped from http://en.wikipedia.org/wiki/Utf8#UTF-8_derivations
"Windows
Although not part of the standard, many Windows programs (including Windows Notepad) use the byte sequence EF BB BF at the beginning of a file to indicate that the file is encoded using UTF-8. This is the Byte Order Mark U+FEFF encoded in UTF-8, which appears as the ISO-8859-1 characters "" in most text editors and web browsers not prepared to handle UTF-8."

clovett wrote Sep 26, 2008 at 9:39 AM

Actually, the UTF-8 byte order mark is part of the XML standard. See http://www.w3.org/TR/2006/REC-xml-20060816/ and in particular the following paragraph: "Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with the Byte Order Mark described by Annex H of [ISO/IEC 10646:2000], section 2.4 of [Unicode], and section 2.7 of [Unicode3] (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is an encoding signature, not part of either the markup or the character data of the XML document. XML processors MUST be able to use this character to differentiate between UTF-8 and UTF-16 encoded documents.". So any other XML parser that doesn't handle this is non-standard.

AFranklin wrote Oct 13, 2008 at 8:29 PM

While byte order mark may be legal for UTF-8, it doesn't seem to be compatible with web services developed with .NET 2.0. Maybe the problem is with the built-in parser in .NET web services, but it just seems odd you absolutely cannot submit UTF-8 data edited with Microsoft XML Notepad to a Microsoft .NET web service. Inserting the byte order mark invariably causes the web service to return, "Server was unable to process request. ---> Data at the root level is invalid. Line 1, position 1."

RandyJean wrote May 19, 2009 at 6:53 PM

I have had the error from .Net 2.0 web services, too. I found I can scrub the characters in the old DOS editor. Hasn't been a big problem as it only comes up in testing, not production, but still an annoyance.

clovett wrote Wed at 10:41 PM

Do not delete the UTF-8 byte order mark. You can cause data corruption doing that.