BIBSORT(1) USER COMMANDS BIBSORT(1) NAME bibsort - sort a BibTeX bibliography file SYNOPSIS bibsort [optional sort(1) switches] < infile >outfile DESCRIPTION bibsort filters a BibTeX bibliography, or bibliography frag- ment, on its standard input, printing on standard output a sorted bibliography. Sorting is by BibTeX tag name, or by @String macro name, and letter case is ignored in the sorting. If no command-line switches are provided for sort(1), then -f is supplied to cause letter case to be ignored. If you also want to remove duplicate entries, you could specify the switches -f -u. The input stream is conceptually divided into four parts, any of which may be absent. 1. Introductory material such as comments, file headers, and edit logs that are ignored by BibTeX. No line in this part begins with an at-sign, ``@''. 2. Preamble material delineated by ``@Preamble{'' and a matching closing ``}'', intended to be processed by TeX. Normally, there is only one such entry in a bibliography file, although BibTeX, and bibsort, permit more than one. 3. Macro definitions of the form ``@String{...}''. A single macro definition may span multiple lines, and there are usually several such definitions. 4. Bibliography entries such as ``@Article{...}'', ``@Book{...}'', ``@Proceedings{...}'', and so on. For bibsort, any line that begins with an ``@'' immediately followed by letters and digits and an open brace is considered to be such an entry. The order of these parts is preserved in the output stream. Part 1 will be unchanged, but parts 2--4 will be sorted within themselves. The sort key of ``@Preamble'' entries is their initial line, of ``@String'' entries, the macro name, and of all BibTeX entries, the citation tag between the open curly brace and the trailing comma. Version 0.00 Last change: 13 October 1992 1 BIBSORT(1) USER COMMANDS BIBSORT(1) bibsort will correctly handle UNIX files with LF line termi- nators, as well as IBM PC DOS files with CR LF line termina- tors; the essential requirement is that input lines be del- ineated by LF characters. CAVEATS BibTeX has loose syntactical requirements that the current simple implementation of bibsort does not support. In par- ticular, outer parentheses may not be used in place of braces following ``@keyword'' patterns, nor may there be leading or embedded whitespace. If you have such a file, you can use bibclean(1) to pretty- print it into a form that bibsort can handle successfully. The user must be aware that sorting a bibliography is not without peril, for at least these reasons: 1. BibTeX has a requirement that entry tags given in crossref = tag pairs in a bibliography entry must refer to entries defined later, rather than ear- lier, in the bibliography file. This regrettable implementation limitation of the current (pre-1.0) BibTeX prevents arbitrary ordering of entries when crossref values are present. 2. If the BibTeX file contains interspersed commentary between ``@keyword{...}'' entries, this material will be considered part of the preceding entry, and will be sorted with it. Leading commentary is more common, and will be moved elsewhere in the file. This is normally not a problem for the part 1 material before the ``@Preamble'', since it is kept together at the beginning of the output stream. 3. Some kinds of bibliography files should be kept in a different order than alphabetically by tags. A good example is a bibliography file with the con- tents of a journal, for which publication order is likely more suitable. While a much more sophisticated implementation of bibsort could deal with the first point, solving the second one requires human intelligence and natural language understand- ing that computers lack. bibsort uses ASCII control characters 001 through 007 for temporary modifications of the input stream. If any of these are already present in the input, they will be altered on output. This is unlikely to be a problem, because those characters have neither a printable representation, nor are Version 0.00 Last change: 13 October 1992 2 BIBSORT(1) USER COMMANDS BIBSORT(1) they conventionally used to mark line or page boundaries in text files. PROGRAMMING NOTES Some text editors permit application of an arbitrary filter command to a region of text. For example, in GNU emacs(1), the command C-u M-x shell-command-on-region, or equivalently, C-u M-|, can be used to run bibsort on a region of the buffer that is devoid of cross references and other material that cannot be safely sorted. Some implementations of BibTeX editing support in GNU emacs(1) have a sort-bibtex-entries command that is func- tionally similar to bibsort. However, the file size that can be processed by emacs(1) is limited, while bibsort can be used on arbitrarily large files, since it acts as a filter, processing a small amount of data at a time. The sort stage needs the entire data stream, but fortunately, the UNIX sort(1) command is clever enough to deal with very large inputs. The current implementation of bibsort follows the UNIX trad- ition of combining simple already-available tools. A six- stage pipeline of egrep(1), nawk(1), sort(1), and tr(1) accomplishes the job in one pass with about 70 lines of shell script, 60 lines of which is a nawk(1) program for insertion of sort keys. bibsort was written and tested on several large bibliographies in a couple of hours. By con- trast, bibtex(1) is more than 11 000 lines of code and docu- mentation, and bibclean(1) is about 1500 lines long. BUGS bibsort may fail on some UNIX systems if their sort(1) implementations cannot handle very long lines, because for sorting purposes, each complete bibliography entry is tem- porarily folded into a single line. You may be able to overcome this problem by adding a -znnnnn switch to the sort(1) command (passed via the command line to bibsort) to increase the maximum line size to some larger value of nnnnn bytes. SEE ALSO bibclean(1), bibtex(1), egrep(1), emacs(1), nawk(1), sort(1), tr(1). AUTHOR Nelson H. F. Beebe, Ph.D. Center for Scientific Computing Department of Mathematics University of Utah Salt Lake City, UT 84112 Tel: (801) 581-5254 Version 0.00 Last change: 13 October 1992 3 BIBSORT(1) USER COMMANDS BIBSORT(1) FAX: (801) 581-4148 Email: Version 0.00 Last change: 13 October 1992 4