Here are some ideas for improving GNU diff
and patch
. The
GNU project has identified some improvements as potential programming
projects for volunteers. You can also help by reporting any bugs that
you find.
If you are a programmer and would like to contribute something to the GNU project, please consider volunteering for one of these projects. If you are seriously contemplating work, please write to `gnu@prep.ai.mit.edu' to coordinate with other volunteers.
diff
and patch
One should be able to use GNU diff
to generate a patch from any
pair of directory trees, and given the patch and a copy of one such
tree, use patch
to generate a faithful copy of the other.
Unfortunately, some changes to directory trees cannot be expressed using
current patch formats; also, patch
does not handle some of the
existing formats. These shortcomings motivate the following suggested
projects.
diff
and patch
do not handle some changes to directory
structure. For example, suppose one directory tree contains a directory
named `D' with some subsidiary files, and another contains a file
with the same name `D'. `diff -r' does not output enough
information for patch
to transform the the directory subtree into
the file.
There should be a way to specify that a file has been deleted without
having to include its entire contents in the patch file. There should
also be a way to tell patch
that a file was renamed, even if
there is no way for diff
to generate such information.
These problems can be fixed by extending the diff
output format
to represent changes in directory structure, and extending patch
to understand these extensions.
Some files are neither directories nor regular files: they are unusual
files like symbolic links, device special files, named pipes, and
sockets. Currently, diff
treats symbolic links like regular files;
it treats other special files like regular files if they are specified
at the top level, but simply reports their presence when comparing
directories. This means that patch
cannot represent changes
to such files. For example, if you change which file a symbolic link
points to, diff
outputs the difference between the two files,
instead of the change to the symbolic link.
diff
should optionally report changes to special files specially,
and patch
should be extended to understand these extensions.
When a file name contains an unusual character like a newline or
white space, `diff -r' generates a patch that patch
cannot
parse. The problem is with format of diff
output, not just with
patch
, because with odd enough file names one can cause
diff
to generate a patch that is syntactically correct but
patches the wrong files. The format of diff
output should be
extended to handle all possible file names.
GNU diff
can analyze files with arbitrarily long lines and files
that end in incomplete lines. However, patch
cannot patch such
files. The patch
internal limits on line lengths should be
removed, and patch
should be extended to parse diff
reports of incomplete lines.
diff
operates by reading both files into memory. This method
fails if the files are too large, and diff
should have a fallback.
One way to do this is to scan the files sequentially to compute hash codes of the lines and put the lines in equivalence classes based only on hash code. Then compare the files normally. This does produce some false matches.
Then scan the two files sequentially again, checking each match to see whether it is real. When a match is not real, mark both the "matching" lines as changed. Then build an edit script as usual.
The output routines would have to be changed to scan the files sequentially looking for the text to print.
It would be nice to have a feature for specifying two strings, one in from-file and one in to-file, which should be considered to match. Thus, if the two strings are `foo' and `bar', then if two lines differ only in that `foo' in file 1 corresponds to `bar' in file 2, the lines are treated as identical.
It is not clear how general this feature can or should be, or what syntax should be used for it.
If you think you have found a bug in GNU cmp
, diff
,
diff3
, sdiff
, or patch
, please report it by
electronic mail to `bug-gnu-utils@prep.ai.mit.edu'. Send as
precise a description of the problem as you can, including sample input
files that produce the bug, if applicable.
Because Larry Wall has not released a new version of patch
since
mid 1988 and the GNU version of patch
has been changed since
then, please send bug reports for patch
by electronic mail to
both `bug-gnu-utils@prep.ai.mit.edu' and
`lwall@netlabs.com'.