diff
Performance Tradeoffs
GNU diff
runs quite efficiently; however, in some circumstances
you can cause it to run faster or produce a more compact set of changes.
There are two ways that you can affect the performance of GNU
diff
by changing the way it compares files.
Performance has more than one dimension. These options improve one aspect of performance at the cost of another, or they improve performance in some cases while hurting it in others.
The way that GNU diff
determines which lines have changed always
comes up with a near-minimal set of differences. Usually it is good
enough for practical purposes. If the diff
output is large, you
might want diff
to use a modified algorithm that sometimes
produces a smaller set of differences. The `-d' or
`--minimal' option does this; however, it can also cause
diff
to run more slowly than usual, so it is not the default
behavior.
When the files you are comparing are large and have small groups of
changes scattered throughout them, you can use the `-H' or
`--speed-large-files' option to make a different modification to
the algorithm that diff
uses. If the input files have a constant
small density of changes, this option speeds up the comparisons without
changing the output. If not, diff
might produce a larger set of
differences; however, the output will still be correct.
Normally diff
discards the prefix and suffix that is common to
both files before it attempts to find a minimal set of differences.
This makes diff
run faster, but occasionally it may produce
non-minimal output. The `--horizon-lines=lines' option
prevents diff
from discarding the last lines lines of the
prefix and the first lines lines of the suffix. This gives
diff
further opportunities to find a minimal output.