Original version:
Fri Jul 18 16:32:04 1997
Last updates:
Sat Jun 13 14:03:05 1998
Thu Oct 02 00:22:12 2003
Fri Nov 12 15:55:21 2004
Writing HTML files without a validating parser is like trying to write computer programs without a compiler: don't do it! Fortunately, help is readily available on the Internet.
James Clark <jjc@jclark.com> is developing a new implementation of a suite of SGML parser tools, called SP. These include:
nsgmls
sgmls
-compatible validating SGML
parser.
spam
sgmlnorm
spent
Besides being a complete redesign of the earlier successful
smgls
implementation, the new programs are
designed for the future: they support extended character
sets, such as Unicode, and various multi-byte encodings used
in oriental languages.
The new code is written almost entirely in C++ (just over 50K lines at version 1.0.1, or 2.5 times the size of Don Knuth's TeX or Metafont), and requires template support, a relatively new feature of C++ which is not yet widely available.
WARNING: To build these programs, you will need about 50MB of disk space, unless you remove the default -g compiler option. Doing so reduces the executable sizes from almost 10MB each to about 1.5MB (on a Sun SPARC Solaris 2.3 system). Alternatively, you can build them, then run the UNIX strip command on the executables to remove debug symbols.
The SP
programs can be compiled and built using
recent releases of
GNU g++
and libg++
(2.7.1 or later: patches to gcc 2.7.0 are included in the
SP distribution). g++
itself is built as part
of the GNU gcc
compiler installation;
although that installation takes a few hours, and requires
about 120MB of disk space to be able to run the validation
tests before installation, it is straightforward, and
should be problem free on most current UNIX systems. The
GNU compiler suite has also been built on IBM PC MS DOS
and DEC OpenVMS systems, although those versions usually
lag behind.
WARNING: With at least libg++
2.7.1, there is an
installation problem that has been reported to the
developers: make install does not
install libio.a
, libiostream.a
,
and librx.a
. libiostream.a
is
required for building SP
, and most other C++
programs. To remedy this, I did the following steps
manually in the libg++
directory:
(cd librx; make install) cp libio/libio*.a /usr/local/lib
Unfortunately, I only discovered this problem after having
built libg++
on 8 systems, and then having
deleted the build trees after the make
install, so I had to do it all over again, sigh...
The SP
distribution site has binaries for IBM
PC DOS, Intel 386 Linux, Intel 386 Windows NT 3.5, Sun SunOS
4.1.3 and Sun Solaris 2.3, so if you have such a system, you
may not need to build any of the SP
code from
scratch, or to install g++
. Binaries are also
available for the previous version (0.4) for DEC Alpha OSF/1
3.x and IBM PC OS/2 systems.
Just as with
sgmls
,
lengthy command lines are needed to run these programs
successfully. To facilitate their use, I've prepared simple
UNIX shell scripts
html-ncheck
and
html-spam
to hide the complexity, so that only the HTML files need to
be provided on the script command lines.
If you have installed the html-check
distribution, and you want to use html-spam
,
you need to add to end of the HTML catalog file,
/usr/local/lib/html-check/lib/catalog.
these lines:
-- Added at the suggestion of James Clark <jjc@jclark.com> -- -- so that spam -p doesn't output the contents of html.decl -- SGMLDECL html.decl
Without this change, the contents of html.decl
are copied to the output if the -p is included
in the spam
invocation in html-spam
; omitting -p
and including
html.decl doesn't help, because the <!DOCTYPE ...
> line is then lost.
I have successfully built sp-1.0.1
with
g++
(gcc
2.7.1 [13-Nov-1995] and
libg++
[15-Nov-1995]) on these systems:
using the command
make && make check && make install
On a few of these, minor problems cropped up and were solved; they are discussed further below.
I also made unsuccessful attempts to build SP
with native C++ compilers on Hewlett-Packard HP-UX 10.0.1
and Silicon Graphics IRIX 5.3, with a command line like
make CXX=CC CXXFLAGS=-O DEFINES='-DANSI_CLASS_INST $(XDEFINES)'
Numerous compiler errors quickly led to my abandoning the effort.
Compilation with native Sun Solaris 2.3 CC looked initially promising, but linking failed with errors about differing sizes of particular symbols, and with many missing functions arising from template instantiation. This linking problem is just what I found with SP 0.4 on the IBM RS/6000 AIX 3.2.5 systems too.
make
step completed successfully, but the
make check
failed with a shell script error
./dotest sh: bad substitutionI simply switched shells from
sh
to GNU
bash
, instead of fiddling with the
dotest
script:
bash < dotestThe test completed successfully, and
make
install
worked as expected.
Mail from Michael Riedmann <Michael_Riedmann@hp.com> at Hewlett-Packard GmbH in Böblingen, Germany on 12 May 1998 reported a successful build of SP version 1.3 on HP-UX 10.20 with g++ version 2.7.2.3, after installing HP patch PHKL_8693 to fix a problem with a non-ANSI extern struct declaration in /usr/include/sys/time.h.
Once the missing libiostream.a
problem (see
above) was solved, I was able to complete the first
successful installation of SP on the IBM RS/6000. I was
previously completely unable to get version 0.4 to build
successfully with either g++
or native
xlC
.
I also tried a build with the native C++ compiler, using
make CXX=xlC WARN= DEFINES='-DANSI_CLASS_INST $(XDEFINES)' -i
This may be close to working: here are the compilation errors produced:
sp-1.0.1/entmg: xlC -ansi -I. -I./../lib -I./../entmgr -DANSI_CLASS_INST -c \ ExtendEntityManager.C "ExtendEntityManager.C", line 34.1: 1540-251: (S) The previous declaration of "memmove" did not have a linkage specification. sp-1.0.1/app: xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr \ -DANSI_CLASS_INST -c LineOutputCodingSystem.C "LineOutputCodingSystem.C", line 17.1: 1540-293: (W) "LineEncoder::output(const Char*,size_t,streambuf*)" hides the virtual function "Encoder::output(Char*,size_t,streambuf*)". sp-1.0.1/nsgmls: xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr \ -I./../app -DANSI_CLASS_INST -c nsgmls.C "nsgmls.C", line 77.1: 1540-055: (S) "char**" cannot be converted to "const char**". "nsgmls.C", line 77.1: 1540-306: (I) The previous message applies to argument 2 of function "getopt(int,const char**,const char*)". sp-1.0.1/spam: xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr -I./../app -DANSI_CLASS_INST -c spam.C "spam.C", line 101.1: 1540-055: (S) "char**" cannot be converted to "const char**". "spam.C", line 101.1: 1540-306: (I) The previous message applies to argument 2 of function "getopt(int,const char**,const char*)". sp-1.0.1/sgmlnorm: xlC -ansi -I. -I./../lib -I./../entmgr -I./../xentmgr -I./../app \ -I./../api -DANSI_CLASS_INST -c sgmlnorm.C "sgmlnorm.C", line 43.1: 1540-055: (S) "char**" cannot be converted to "const char**". "sgmlnorm.C", line 43.1: 1540-306: (I) The previous message applies to argument 2 of function "getopt(int,const char**,const char*)". sp-1.0.1/spam: xlC -ansi -I. -I./../lib -I./../entmgr -I./../parser -I./../xentmgr \ -I./../app -DANSI_CLASS_INST -c spam.C "spam.C", line 101.1: 1540-055: (S) "char**" cannot be converted to "const char**". "spam.C", line 101.1: 1540-306: (I) The previous message applies to argument 2 of function "getopt(int,const char**,const char*)". sp-1.0.1/sgmlnorm: xlC -ansi -I. -I./../lib -I./../entmgr -I./../xentmgr -I./../app \ -I./../api -DANSI_CLASS_INST -c sgmlnorm.C "sgmlnorm.C", line 43.1: 1540-055: (S) "char**" cannot be converted to "const char**". "sgmlnorm.C", line 43.1: 1540-306: (I) The previous message applies to argument 2 of function "getopt(int,const char**,const char*)". sp-1.0.1/spent: xlC -ansi -I. -I./../lib -I./../entmgr -I./../xentmgr -I./../app -DANSI_CLASS_INST -c spent.C "spent.C", line 54.1: 1540-055: (S) "char* const*" cannot be converted to "const char**". "spent.C", line 54.1: 1540-306: (I) The previous message applies to argument 2 of function "getopt(int,const char**,const char*)".
All of the errors about getopt()
arise from
confusion between const char** and
char* const*. The DEC Alpha OSF/1 3.x,
Hewlett-Packard HP-UX 10.x, Silicon Graphics IRIX 5.x, and
Sun Solaris 2.x header files stdlib.h
have
the latter, while the IBM RS/6000 stdlib.h
file has the former.
As an experiment, I therefore temporarily modified the
file spent/spent.C
to add a type cast
(const char**) to the second argument of
getopt()
: compilation was then successful,
but after adding a needed -L/usr/local/lib
search path to the LIBS variable in the
Makefile
, linking failed with massive numbers
of unresolved external names generated from templates.
This is the same problem that existed with both g++
2.6.3
and xlC
with sp
0.4
, and I therefore abandoned further attempts
with the xlC
compiler.
I modified the top-level SP
Makefile
to set RANLIB=ranlib. The build of
SP
then completed successfully, and make
check passed all of the validation tests.
On Sun SunOS 4.1.3, the Makefile needs to have comment markers removed to generate the lines
LIBOBJS = strerror.o memmove.o LIBS = -liostream -lg++ -L/usr/lang/SC1.0/ansi_lib -lansi
Without the -lansi
, function
strtoul
was not resolved from the C or C++
libraries. The Makefile
comments
# On SunOS 4, using libg++ 2.6, uncomment this. # libg++ is needed for strtoul which is used by libiostream. # LIBS=-liostream -lg++
incorrectly imply that strtoul
can be found
in libg++.a
, but that is not the case.
However, the function can be found in the library
for the SunOS 4.x half-ANSI acc
compiler.