Most existing BibTeX bibliography files have been found to have rather haphazardly-chosen, and unsystematic, citation labels that are very likely to conflict with labels in other bibliography files; biblabel and citesub provide an automatic way to rectify this.
To avoid confusion between labels with common prefixes, such as Smith80 and Smith80a, citesub checks for leading context of a left brace, quote, comma, whitespace, or beginning of line and trailing context of a right brace, comma, quote, percent, whitespace, or end of line so as to match these styles:
@Book{Smith:1980:ABC, crossref = "Smith:1980:ABC", crossref = {Smith:1980:ABC}, \cite{Smith:1980:ABC} \cite{Smith:1980:ABC,Jones:1994:DEF} \cite{% Smith:1980:ABC,% Jones:1994:DEF% }
Although one might expect that simple application of standard software tools like the UNIX awk(1) and sed(1) utilities could do the string substitution job, this is not the case. For one thing, the required context sensitivity complicates the regular-expression patterns that are needed. For another, most UNIX sed(1) implementations have a built-in limit of about 100 substitutions, which is far too few for typical bibliographies. Finally, simple application of awk(1) and awk(1) involves matching every input line with every substitution pattern, which results in quadratic run-time behavior that proves impossibly slow for large bibliographies.
citesub provides an efficient solution whose run time is essentially proportional to the size of the input files, and independent of the number of substitutions to be carried out. This is achieved by tokenizing the input lines, and then looking up each token in a constant-access time (hash) table of substitutions. An initial prototype programmed in the awk language led to a final version in C that ran about 50 times faster, processing about 4000 input lines/sec on an entry-level Sun SPARCstation LX workstation.
If this option is omitted, then the substitution filename will be derived from that of the first input file by replacing its extension by .sub. Thus, the commands
citesub -f foo.sub foo.bib >foo.bib-new and citesub foo.bib >foo.bib-neware equivalent.
If the substitution file is named "-", then citesub follows the common UNIX convention and interprets it to mean standard input, allowing the substitutions to be provided from a pipeline, such as
biblabel foo.bib | citesub -f - >foo.new
Nelson H. F. Beebe, Bibliography prettyprinting and syntax checking, TUGboat 14(3), 222, October (1993) and TUGboat 14(4), 395--419, December (1993).
Citation labels must contain only these characters:
ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz 0123456789 :-+/.'_citesub will continue processing, but since only input tokens containing the above set of characters are candidates for substitution, such erroneous labels will not be substituted.
Nelson H. F. Beebe, Ph.D. Center for Scientific Computing Department of Mathematics University of Utah Salt Lake City, UT 84112 Tel: +1 801 581 5254 FAX: +1 801 581 4148 Email: <beebe@math.utah.edu>