Message2652

Author lch
Recipients junkblocker, shamilbi
Date 2007-01-06.02:51:05
Content
I am having this exact problem with Mercurial 0.9.3, so it's not fixed.  But!  I
believe I can shed a whole lot of light on the subject.


The specific problem:
If you check in files with the DOS-style 0D 0A ("\r\n") EOL convention, and you
modify those files, and you run "hg diff" *on Windows*, the diff will OD OD OA
("\r\r\n") at the ends of all lines copied out of the aforementioned files. 
Lines of the output created entirely by Mercurial (e.g. "diff -r abcdabcdabcd -r
cdefcdefcdef Foo/bar.txt"), and lines copied from files that have the UNIX-style
0A ("\n") EOL convention, have proper OD OA line endings.


To reproduce:
1. Use a Windows machine.
2. Install stock Mercurial 0.9.3 from http://mercurial.berkwood.com/ .  In
particular, I did *not* tell it to do any EOL conversion, and I assume it isn't
doing any.
3. Create a repository (hg init) and check in a file with DOS-style (OD OA) EOL
convention (hg add, hg ci).
4. Modify that file.
5. hg diff > x

If you examine the file "x" in the hex mode of an editor (or whatnot) you'll see
OD OD OA at the end of every line copied from your file.

Note that you also get the same behavior with other commands that produce a
diff, like "hg incoming -p" and "hg outgoing -p".


The cause:
This is almost certainly because Mercurial is preserving the original EOL
characters from the file, and "stdio" is open in "text" mode.  When you
sys.stdout.write(line) and "line" contains an 0A, it will automatically prepend
it with an OD.


The solution:
If Mercurial were written in C, the most straightforward solution would be to
use _setmode() to manage the text/binary mode of stdout.  Like so:
_setmode(_fileno(stdin), _O_BINARY);
// print lines of diff
_setmode(_fileno(stdin), _O_TEXT);

Since Mercurial is written in lovely Python, I'm not sure what the best course
of action is.  Perhaps call string = string.replace("\r\n", "\n") when running
on Windows?  You could do it at any stage of the process, though if done when
reading the file in to do the diff you'd save a little memory and speed up diffs
everso-slightly.
History
Date User Action Args
2007-01-06 02:51:06lchsetmessageid: <1168051866.73.0.0414409957465.issue250@selenic.com>
2007-01-06 02:51:06lchsetrecipients: + junkblocker, shamilbi
2007-01-06 02:51:06lchlinkissue250 messages
2007-01-06 02:51:05lchcreate