diff_file_diff3 spins

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

diff_file_diff3 spins

Chia-liang Kao
Hi,

Doing merge on with the following files:

$ ab -clkao- [~] wc /tmp/bad-*
       0       0       0 /tmp/bad-empty.csv
   62946   62946  880992 /tmp/bad-local.csv
   68633   68633 1029132 /tmp/bad-new.csv
  131579  131579 1910124 total

causes diff_file_diff3 to spin.

I'm testing with Perl:

use SVN::Core;
    my $diff = SVN::Core::diff_file_diff3
('/tmp/bad-empty.csv', '/tmp/bad-local.csv', '/tmp/bad-new.csv');
warn $diff;

the files are in http://wagner.elixus.org/~clkao/diff_file_diff3.tgz

Cheers,
CLK



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: diff_file_diff3 spins

Chia-liang Kao
Chia-liang Kao <clkao <at> clkao.org> writes:
> Doing merge on with the following files:
>
> $ ab -clkao- [~] wc /tmp/bad-*
>        0       0       0 /tmp/bad-empty.csv
>    62946   62946  880992 /tmp/bad-local.csv
>    68633   68633 1029132 /tmp/bad-new.csv
>   131579  131579 1910124 total
>
> causes diff_file_diff3 to spin.

Actually it finishes after 3 minutes:

$ ab -clkao- [/tmp/diff_file_diff3] time perl diff.t
_p_svn_diff_t=SCALAR(0x804d23c) at diff.t line 4.
158.366u 1.264s 3:00.50 88.4%   10+16118k 0+0io 0pf+0w

But with /usr/bin/diff3:

$ ab -clkao- [/tmp/diff_file_diff3] time diff3 bad-empty.csv bad-local.csv
bad-new.csv > /dev/null
0.455u 0.055s 0:00.64 78.1%     48+9124k 0+0io 0pf+0w

Cheers,
CLK



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: diff_file_diff3 spins

Philip Martin
Chia-liang Kao <[hidden email]> writes:

> Chia-liang Kao <clkao <at> clkao.org> writes:
>> Doing merge on with the following files:
>>
>> $ ab -clkao- [~] wc /tmp/bad-*
>>        0       0       0 /tmp/bad-empty.csv
>>    62946   62946  880992 /tmp/bad-local.csv
>>    68633   68633 1029132 /tmp/bad-new.csv
>>   131579  131579 1910124 total
>>
>> causes diff_file_diff3 to spin.
>
> Actually it finishes after 3 minutes:
>
> $ ab -clkao- [/tmp/diff_file_diff3] time perl diff.t
> _p_svn_diff_t=SCALAR(0x804d23c) at diff.t line 4.
> 158.366u 1.264s 3:00.50 88.4%   10+16118k 0+0io 0pf+0w
>
> But with /usr/bin/diff3:
>
> $ ab -clkao- [/tmp/diff_file_diff3] time diff3 bad-empty.csv bad-local.csv
> bad-new.csv > /dev/null
> 0.455u 0.055s 0:00.64 78.1%     48+9124k 0+0io 0pf+0w

This is likely to be because GNU diff uses a number of heuristics to
shortcut the full diff algorithm and Subversion's diff implementation
doesn't do this.  Without the heuristics GNU diff is likely to take
much longer.  diff3 doesn't appear to allow one to disable the
heuristics but diff has the --minimal switch, try

$ diff --minimal bad-local.csv bad-new.csv
$ diff bad-local.csv bad-new.csv

to confirm that the heuristics make the difference.

--
Philip Martin

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: diff_file_diff3 spins

Chia-liang Kao
On Mon, Apr 25, 2005 at 03:28:27PM +0100, Philip Martin wrote:

> > $ ab -clkao- [/tmp/diff_file_diff3] time perl diff.t
> > _p_svn_diff_t=SCALAR(0x804d23c) at diff.t line 4.
> > 158.366u 1.264s 3:00.50 88.4%   10+16118k 0+0io 0pf+0w
> >
> > But with /usr/bin/diff3:
> >
> > $ ab -clkao- [/tmp/diff_file_diff3] time diff3 bad-empty.csv bad-local.csv
> > bad-new.csv > /dev/null
> > 0.455u 0.055s 0:00.64 78.1%     48+9124k 0+0io 0pf+0w
>
> This is likely to be because GNU diff uses a number of heuristics to
> shortcut the full diff algorithm and Subversion's diff implementation
> doesn't do this.  Without the heuristics GNU diff is likely to take
> much longer.  diff3 doesn't appear to allow one to disable the
> heuristics but diff has the --minimal switch, try
>
> $ diff --minimal bad-local.csv bad-new.csv

81.712u 0.288s 1:41.44 80.8%    56+9008k 0+0io 0pf+0w

> $ diff bad-local.csv bad-new.csv

$ ab -clkao- [/tmp/diff_file_diff3] time diff bad-local.csv bad-new.csv > /dev/null
0.102u 0.007s 0:00.11 90.9%     61+8630k 0+0io 0pf+0w

You are right.  So is there any reason that we shouldn't have such
heuristics?

Cheers,
CLK

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: diff_file_diff3 spins

C. Michael Pilato
Chia-liang Kao <[hidden email]> writes:

> You are right.  So is there any reason that we shouldn't have such
> heuristics?

We actually have an open issue for this.  But Sander Striker (who
wrote our diff library) claims that there's no way to really
gracefully wedge in this heuristics -- you basically have to rewrite
the entirety of the diff internal guts.  Which is fine, of course, but
at the time I asked him, he esimated it would take him a couple of
weeks of coding effort to bang it out.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: diff_file_diff3 spins

Ben Collins-Sussman
In reply to this post by Chia-liang Kao

On Apr 25, 2005, at 10:37 AM, Chia-liang Kao wrote:
>
> You are right.  So is there any reason that we shouldn't have such
> heuristics?
>

Issue 1966, filed 9 months ago.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Loading...