Issue250

Title hg diff produces extra newlines on win32
Priority bug Status resolved
Superseder Nosy List dim, jeanluc, junkblocker, lch, mpm, pmezard, shamilbi, tangguo77, tksoh, tonfa, wsorenson
Assigned To pmezard Topics windows

Created on 2006-05-17.01:39:10 by shamilbi, last changed 2008-01-14.20:33:53 by pmezard.

Files
File name Uploaded Type Edit Remove
diff-crlf.diff pmezard, 2007-02-19.10:46:44 text/plain
diffoutput.txt tangguo77, 2008-01-12.22:58:06 text/plain
fixdiff.c tangguo77, 2008-01-08.21:17:30 text/plain
test-diff-newlines.diff pmezard, 2007-02-19.10:53:22 text/plain
test-pmezard.diff pmezard, 2008-01-12.23:13:55 application/octet-stream
Messages
msg4892 (view) Author: pmezard Date: 2008-01-14.20:33:53
Binary installers work too, closing the issue again.
msg4886 (view) Author: tangguo77 Date: 2008-01-13.04:05:16
My fault. The fix is working.
When I try to inspect the diff output. I typed
python c:\hg\hg diff | gvim -
and try to find whether there are ^M character displayed. It turns out when
vim detected mixed "\n" and "\r\n". It automatically convert "\n" to "\r\n" and
then display ^M. The diff output I attached in previous report is generated by
typing :w command in vim instead of python c:\hg\hg diff > out.txt. So it
appears bug is not fixed. Sorry for the confusion.

With this fix. The header part of diff file will have "\n" line ending. The
context part whatever text line ending of original file. Mostly "\r\n" in
windows environment. It will get some editors confused, like vim. But I think it
will be fine, since the GNU patch.exe can handle this file with "--binary"
option. And hg import can also handle the diff output.
msg4884 (view) Author: pmezard Date: 2008-01-12.23:13:55
Here is my test scenario, WinXP SP2, python25+pywin32, python build of crew-tip
with VS 2003 (but not py2exe):

C:\dev\mercurial>hg init test
C:\dev\mercurial>cd test
C:\dev\mercurial\test>echo a > a
C:\dev\mercurial\test>hg ci -Am t
adding a
C:\dev\mercurial\test>echo b >> a
C:\dev\mercurial\test>echo c >> a
C:\dev\mercurial\test>hg diff > test-pmezard.diff

test-pmezard.diff is attached. All code lines ends in CRLF as expected. Maybe
this is related to the py2exe build. The binary stream magic is embedded in the
root hg script.
msg4883 (view) Author: tangguo77 Date: 2008-01-12.22:58:06
Here is the diff output attached.
msg4882 (view) Author: tangguo77 Date: 2008-01-12.22:56:24
a3fe91b4f6eb seems cannot fix it.

I did a pull from http://hg.intevation.org/mercurial/crew.
It has a3fe91b4f6eb.

Then did 
1. python setup.py build -c mingw32
2. python setup.py build -c mingw32 py2exe -b 1
3. In test folder, create a file test1.txt which has only one line text1.txt
4. checkin test1.txt
5. add test2.txt test3.txt two lines in test1.txt file
6. python c:\hg\hg diff 
7. The output is below. The first 4 lines have line ending "\r\n". The context
has line ending "\r\r\n".
8. "python c:\hg\hg version" shows Mercurial Distributed SCM (version a76395713691)

diff -r 3c5415e417d3 test1.txt
--- a/test1.txt	Sat Jan 12 13:51:54 2008 -0800
+++ b/test1.txt	Sat Jan 12 14:48:45 2008 -0800
@@ -1,1 +1,3 @@ test1.txt
 test1.txt

+test2.txt

+test3.txt
msg4878 (view) Author: pmezard Date: 2008-01-12.15:05:27
tangguo77: maybe you can try a binary installer built after a3fe91b4f6eb and
tell us if it solves your issue ?
msg4845 (view) Author: tangguo77 Date: 2008-01-08.21:17:30
This is a little C program I am using in windows environment to fix "hg diff"
and "hg qdiff" output before they can be used by unixutils patch.exe command.
Hope we don't need it anymore in 0.9.6 hg release.
msg4313 (view) Author: pmezard Date: 2007-11-10.21:22:34
Should be fixed in crew by a3fe91b4f6eb
msg4310 (view) Author: jeanluc Date: 2007-11-10.17:14:28
This bug still appears in 0.95. It's problematic for Windows users because the 
Windows binary distribution can't be patched (and somewhat ironic since this is 
the only platform it's a problem for).

One workaround is to take the result of 'hg diff' and run it through the 
following Perl one-liner.

perl -pe "s/\015//" import.patch > import.corrected.patch

Because we're looking for 0D 0D 0A sequences, it would be preferable to use a 
sed sequence like s/\015\015\012/\015\012/, but stdio handling appears to be 
getting in the way; that sed pattern never matches for me. 

If anyone comes up with a better solution, please post it here.
msg4173 (view) Author: wsorenson Date: 2007-10-24.14:58:18
this is also an issue with hg cat
msg3518 (view) Author: mpm Date: 2007-07-16.22:48:55
Demoting to bug.
msg2785 (view) Author: lch Date: 2007-02-19.22:30:45
It looks like other work is being done on this issue in parallel; see this
recent posting from the Mercurial developer mailing list:

http://www.selenic.com/pipermail/mercurial-devel/2007-February/000900.html

Maybe you all could work together?  :)
msg2784 (view) Author: nriley Date: 2007-02-19.11:54:07
This patch fixes 'hg diff' output for me but not 'hg export', where I still get
the CR-CR-LF line endings.

However, it seems this bug should be fixed at the same time as a related one in
'hg import'.  Without hacking mercurial.patch.patch to include --binary I can
import a "correct" patch (i.e., one with CR-LF line endings) but the output uses
LF endings.  With --binary, I can't import a "correct" patch any more, but an
CR-CR-LF one will import as just CR-LF and work fine.
msg2783 (view) Author: pmezard Date: 2007-02-19.10:53:22
Add test for this issue.
msg2781 (view) Author: pmezard Date: 2007-02-19.10:46:44
Here is a patch :
- util.set_binary() takes an argument to enable/disable binary mode
- a binaryfile() method returns an ui-based stream in binary mode to hide
set_binary calls and force people to cleanup.
- patch.diff() uses binaryfile() as fp instead of ui.
msg2685 (view) Author: lch Date: 2007-01-09.23:04:43
You're welcome, but my name's actually Larry.  :)  And thanks for letting me
know you could use setmode() on stdio in Python; another tool for my toolbox.
msg2682 (view) Author: pmezard Date: 2007-01-09.07:25:45
> If Mercurial were written in C, the most straightforward solution would be 
> to use _setmode() to manage the text/binary mode of stdout.  Like so:
> _setmode(_fileno(stdin), _O_BINARY);
> // print lines of diff
> _setmode(_fileno(stdin), _O_TEXT);

Thank you Lee, I did not know that, you just saved my day.

You can do that in lovely python also, setmode() is available through the msvcrt
module and it works fine on stdio streams.
msg2652 (view) Author: lch Date: 2007-01-06.02:51:05
I am having this exact problem with Mercurial 0.9.3, so it's not fixed.  But!  I
believe I can shed a whole lot of light on the subject.


The specific problem:
If you check in files with the DOS-style 0D 0A ("\r\n") EOL convention, and you
modify those files, and you run "hg diff" *on Windows*, the diff will OD OD OA
("\r\r\n") at the ends of all lines copied out of the aforementioned files. 
Lines of the output created entirely by Mercurial (e.g. "diff -r abcdabcdabcd -r
cdefcdefcdef Foo/bar.txt"), and lines copied from files that have the UNIX-style
0A ("\n") EOL convention, have proper OD OA line endings.


To reproduce:
1. Use a Windows machine.
2. Install stock Mercurial 0.9.3 from http://mercurial.berkwood.com/ .  In
particular, I did *not* tell it to do any EOL conversion, and I assume it isn't
doing any.
3. Create a repository (hg init) and check in a file with DOS-style (OD OA) EOL
convention (hg add, hg ci).
4. Modify that file.
5. hg diff > x

If you examine the file "x" in the hex mode of an editor (or whatnot) you'll see
OD OD OA at the end of every line copied from your file.

Note that you also get the same behavior with other commands that produce a
diff, like "hg incoming -p" and "hg outgoing -p".


The cause:
This is almost certainly because Mercurial is preserving the original EOL
characters from the file, and "stdio" is open in "text" mode.  When you
sys.stdout.write(line) and "line" contains an 0A, it will automatically prepend
it with an OD.


The solution:
If Mercurial were written in C, the most straightforward solution would be to
use _setmode() to manage the text/binary mode of stdout.  Like so:
_setmode(_fileno(stdin), _O_BINARY);
// print lines of diff
_setmode(_fileno(stdin), _O_TEXT);

Since Mercurial is written in lovely Python, I'm not sure what the best course
of action is.  Perhaps call string = string.replace("\r\n", "\n") when running
on Windows?  You could do it at any stage of the process, though if done when
reading the file in to do the diff you'd save a little memory and speed up diffs
everso-slightly.
msg1397 (view) Author: shamilbi Date: 2006-05-22.16:16:30
the problem was resolved when i copied Mercurial.ini from
"E:\bin\mercurial\0.9\" to "C:\Documents and Settings\$USER\", thanks to vok for
his question about "hg debugconfig".
msg1396 (view) Author: shamilbi Date: 2006-05-22.16:05:25
>Which windows version are you running on
winXP SP2
>Are you using the installer version?
yes, 0.8.1-4334be196f8d and 0.9
>hg debugconfig
msg1363 (view) Author: vok Date: 2006-05-19.15:36:44
I don' have this problem. 
1. Which windows version are you running on
2. Are you using the installer version?
3. What is the output, if you run 'hg debugconfig'?
msg1293 (view) Author: shamilbi Date: 2006-05-17.04:46:51
it behaves good on linux
msg1290 (view) Author: shamilbi Date: 2006-05-17.01:39:10
example:
+ aaaa

- bbb

+ cccc

instead of:
+ aaa
- bbb
+ ccc

such behavior is only in versions 0.8.1 and 0.9 but not in 0.8
History
Date User Action Args
2008-01-14 20:33:53pmezardsetstatus: chatting -> resolved
nosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc, tangguo77
messages: + msg4892
2008-01-13 04:05:17tangguo77setnosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc, tangguo77
messages: + msg4886
2008-01-12 23:13:56pmezardsetfiles: + test-pmezard.diff
nosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc, tangguo77
messages: + msg4884
2008-01-12 22:58:07tangguo77setfiles: + diffoutput.txt
nosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc, tangguo77
messages: + msg4883
2008-01-12 22:56:28tangguo77setnosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc, tangguo77
messages: + msg4882
2008-01-12 15:05:28pmezardsetnosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc, tangguo77
messages: + msg4878
2008-01-08 21:17:31tangguo77setfiles: + fixdiff.c
nosy: + tangguo77
status: resolved -> chatting
messages: + msg4845
2008-01-08 21:14:25tangguo77setnosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc
2007-12-02 20:54:22mpmsetstatus: testing -> resolved
nosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc
2007-11-10 21:22:34pmezardsetstatus: chatting -> testing
nosy: mpm, tonfa, tksoh, junkblocker, shamilbi, pmezard, lch, dim, wsorenson, jeanluc
messages: + msg4313
2007-11-10 17:14:28jeanlucsetnosy: + jeanluc
messages: + msg4310
2007-11-09 23:00:52dimsetnosy: + dim
2007-10-24 14:58:18wsorensonsetnosy: + wsorenson
messages: + msg4173
2007-10-24 08:45:43tksohsetnosy: + tksoh
2007-07-16 22:48:55mpmsetpriority: urgent -> bug
nosy: + mpm
messages: + msg3518
2007-06-22 22:05:05mpmsetnosy: tonfa, junkblocker, shamilbi, pmezard, lch
assignedto: pmezard
2007-02-19 22:30:48lchsetnosy: tonfa, junkblocker, shamilbi, pmezard, lch
messages: + msg2785
2007-02-19 11:54:10nrileysetnosy: tonfa, junkblocker, shamilbi, pmezard, lch
messages: + msg2784
2007-02-19 10:53:23pmezardsetfiles: + test-diff-newlines.diff
nosy: tonfa, junkblocker, shamilbi, pmezard, lch
messages: + msg2783
2007-02-19 10:52:33pmezardsetnosy: tonfa, junkblocker, shamilbi, pmezard, lch
messages: - msg2782
2007-02-19 10:52:18pmezardsetnosy: tonfa, junkblocker, shamilbi, pmezard, lch
messages: + msg2782
2007-02-19 10:46:44pmezardsetfiles: + diff-crlf.diff
nosy: tonfa, junkblocker, shamilbi, pmezard, lch
messages: + msg2781
2007-01-09 23:04:44lchsetnosy: tonfa, junkblocker, shamilbi, pmezard, lch
messages: + msg2685
2007-01-09 13:44:02tonfasetnosy: + tonfa
2007-01-09 07:25:47pmezardsetnosy: + pmezard
messages: + msg2682
2007-01-06 02:51:58lchsetnosy: + lch
2007-01-06 02:51:06lchsetstatus: resolved -> chatting
nosy: junkblocker, shamilbi
messages: + msg2652
2006-05-22 16:16:30shamilbisetstatus: chatting -> resolved
nosy: junkblocker, shamilbi
messages: + msg1397
2006-05-22 16:05:25shamilbisetnosy: junkblocker, shamilbi
messages: + msg1396
2006-05-22 16:05:04shamilbisetnosy: junkblocker, shamilbi
messages: - msg1395
2006-05-22 15:46:49shamilbisetnosy: junkblocker, shamilbi
messages: + msg1395
2006-05-19 15:36:44voksetnosy: junkblocker, shamilbi
messages: + msg1363
2006-05-17 04:46:51shamilbisetstatus: unread -> chatting
nosy: junkblocker, shamilbi
messages: + msg1293
2006-05-17 02:47:02shamilbisettopic: + windows
nosy: junkblocker, shamilbi
2006-05-17 02:43:30shamilbisetnosy: junkblocker, shamilbi
title: hd diff produces extra newlines on win32 -> hg diff produces extra newlines on win32
2006-05-17 02:35:25junkblockersetnosy: + junkblocker
2006-05-17 01:39:10shamilbicreate