If no input file extension is given, .ps is supplied automatically.
The output file will be given the same base name as the input file, with its file extension set to one of .arff, .html, or .txt, according to the first command-line argument.
prescript uses a PostScript interpreter, normally gs(1), to execute the PostScript program, so that even text that is generated programmatically, rather than being explicitly present in PostScript strings, can be extracted. Particular attention is paid to heuristic recognition of word breaks (which PostScript sadly lacks any convention for marking), to reconstruction of words hyphenated at line breaks, to preservation of paragraph breaks, and to recognition of TeX ligatures.
prescript is believed to be superior to all previous utilities for this purpose (see the SEE ALSO section below).
prescript is a product of the New Zealand Digital Library Project. It has been used to extract text from a 32GB archive of 32,000+ computer science technical reports for use in a full-text indexing system.
Craig G. Nevill-Manning, Todd Reed, and Ian H. Witten Extracting Text from PostScript Software---Practice and Experience 28(5), 481--491 (1998).
David J. Miller Prescript: Programme Structure and Functional Description New Zealand Digital Library Project Technical Report March 4, 1998 WWW URL:http://www.nzdl.org/cgi-bin/gw?c=cstr&a=page&p=Prescript&z=x-Dw2aww