HTML-PRETTY 1 "04 December 1997" "Version 1.00" [section 5 of 14]
.-3[SYNOPSIS]
.-2[DESCRIPTION]
.-1[OPTIONS]
Top
.+1[STYLE FILES]
.+2[CATALOG DIRECTORY]
.+3[COMMENTS IN HTML AND SGML]
html-pretty
groups HTML tags into collections of
style classes.
Tags within a single style class receive similar formatting. The
built-in style-class tag-membership lists recognize a large collection
of tags from multiple grammar levels, and multiple browser vendor
extensions, but
all
of the built-in lists can be modified by the user, as
described in the
-extend-style
and
-stylefile
options above, and the
STYLE FILES
section below.
The subsections which follow catalog the origins of the recognized
tags, and then describe the available style classes in alphabetical
order, listing their default tag members, and briefly sketching how
the tags are formatted.
Tags that are not explicitly named in these subsections, or in style
files that are read at run time, are treated as normal text, and have
no effect on the indentation or line breaking, other than their
contribution toward the line length limit.
Tags in the style classes
doctype,
line-break,
math,
plaintext,
short,
standalone,
and
standalone-nocheck
make up the set of HTML tags with SGML content
EMPTY,
which means that end tags for them are
forbidden.
html-pretty
will issue warnings about such end tags, but will leave their deletion
to a human.
HTML 2.0 contains the following 49 tags:
A,
ADDRESS,
B,
BASE,
BLOCKQUOTE,
BODY,
BR,
CITE,
CODE,
DD,
DIR,
DL,
DT,
EM,
FORM,
H1,
H2,
H3,
H4,
H5,
H6,
HEAD,
HR,
HTML,
I,
IMG,
INPUT,
ISINDEX,
KBD,
LI,
LINK,
LISTING,
MENU,
META,
NEXTID,
OL,
OPTION,
P,
PLAINTEXT,
PRE,
SAMP,
SELECT,
STRONG,
TEXTAREA,
TITLE,
TT,
UL,
VAR,
and
XMP.
HTML 3.0 augments the 2.0 grammar with 53 additional tags:
ABBREV,
ABOVE,
ACRONYM,
ARRAY,
ATOP,
AU,
BAR,
BELOW,
BIG,
BOX,
BQ,
BT,
CAPTION,
CHOOSE,
CREDIT,
DDOT,
DEL,
DFN,
DIV,
DOT,
FIG,
HAT,
INS,
ITEM,
LANG,
LEFT,
LH,
MATH,
NOTE,
OF,
OVER,
OVERLAY,
PERSON,
PRE,
Q,
RIGHT,
ROOT,
ROW,
S,
SMALL,
SQRT,
STYLE,
SUB,
SUP,
T,
TAB,
TABLE,
TD,
TH,
TILDE,
TR,
U,
and
VEC.
These tags are identified by their occurrence in the
html.dtd
and
html-3.dtd
document type definition files in lines like these:
<!ENTITY % font " TT | B | I ">
<!ENTITY % phrase "EM | STRONG | CODE | SAMP | KBD | VAR | CITE ">
<!ELEMENT (%font;|%phrase) - - (%text)+>
<!ELEMENT XMP - - %literal>
ENTITY declarations define text string substitutions, and ELEMENT
declarations define the tags recognized by the grammar.
HTML 3.2 introduced these 21 new tags:
APPLET,
AREA,
BASEFONT,
BIG,
CAPTION,
CENTER,
DFN,
DIV,
FILE,
FONT,
MAP,
NUMBER,
PARAM,
SCRIPT,
SMALL,
STRIKE,
STYLE,
SUB,
SUP,
TABLE,
and
U.
Proposed HTML 4.0 introduces these 15 new tags:
BDO,
BUTTON,
COL,
COLGROUP,
FIELDSET,
FRAMESET,
IFRAME,
LABEL,
NOFRAMES,
NOSCRIPT,
OBJECT,
SPAN,
TBODY,
TFOOT,
and
THEAD.
The HTML grammar permits certain end tags to be omitted, when their
implied position can be determined from the grammatical context. In
HTML 3.0, this includes the following tags:
DD,
DT,
ITEM,
LH,
LI,
OF,
OPTION,
P,
ROW,
STYLE,
and
TR.
The HTML 3.2 grammar permits these end tags to be omitted:
DD,
DT,
INPUT,
LI,
OPTION,
and
P.
The HTML proposed 4.0 grammars permit these end tags to be omitted:
COLGROUP,
DD,
DT,
LI,
OPTION,
P,
PARAM,
TFOOT,
THEAD,
and
TR.
In all grammar versions, these tags
never
have end tags:
AREA,
BASE,
BR,
FRAME,
HR,
IMG,
INPUT,
ISINDEX,
LINK,
META,
NEXTID,
PARAM,
and
PLAINTEXT.
In version 3.0,
BASEFONT
takes an optional end tag,
but in succeeding versions, it must
not
have an end tag.
In version 3.0,
STYLE
takes an optional end tag,
but in succeeding versions, it
must
have an end tag.
Supporting the tag-omission feature requires the ability to parse a
complete SGML grammar, which requires a great deal more code than
html-pretty
provides. Consequently,
html-pretty
does not support optional end tags; based on typical usage, they are
expected to be always present, or always absent, according to the
rules given below. Omitted end tags can be automatically supplied by
an SGML tag normalizer, such as
sgmlnorm(1),
spam(1),
or
html-spam(1).
html-pretty
will warn about end tags that should not be present, based on the tags'
membership in those style classes that are known not to have end tags.
However, it will not delete them from the output stream, because human
judgment may be called for. See the
HTML GRAMMAR CONSTRAINTS
section below for further details.
The following HTML tag name must occur only once, with a begin/end
pair, often with substantial amounts of intervening text:
BODY.
The begin/end tags are prettyprinted on separate lines, with their
enclosed text indented one level. However, the BODY environment must
occur after the HEAD environment, and one level inside the HTML
environment.
html-pretty
will supply this environment if needed, unless the
-brief
option has suppressed it.
Short HTML comments are output inline, like normal text. Long ones,
and ones with embedded angle brackets, are prettyprinted on separate
lines. Their internal form is preserved exactly, without any line
wrapping, since they will often contain specially-formatted material.
Any whitespace between the final "--" and the closing angle bracket
will be eliminated, when possible.
The following HTML tag name occurs only once, and should normally be
the first non-comment tag in a file:
!DOCTYPE.
html-pretty
will supply this tag if needed, unless the
-brief
option has suppressed it. Strictly, this is not a tag, but rather a
markup declaration, but
html-pretty
treats it as a special tag, and outputs it verbatim while checking for
proper embedded comment balance. For more details, see the
COMMENTS IN HTML AND SGML
section below.
These HTML tag names occur in begin/end pairs, usually with smaller
amounts of enclosed material. They appear inline in the running text,
and do not alter indentation:
ACRONYM,
B,
BIG,
BLINK,
BT,
CODE,
DFN,
EM,
I,
KBD,
Q,
REV,
S,
SAMP,
SMALL,
STRIKE,
STRONG,
T,
TT,
U,
and
VAR.
This HTML tag occurs in begin/end pairs, which are prettyprinted on
separate lines with enclosing text indented one level:
HEAD.
However, this tag pair must occur only once in a file, and then only
inside an HTML environment, and before the BODY environment.
html-pretty
will supply this environment if needed, unless the
-brief
option has suppressed it.
This HTML tag occurs in begin/end pairs, which are prettyprinted on
separate lines with enclosing text indented one level:
HTML.
However, this tag pair must occur only once in a file, and then only
at the outermost level.
html-pretty
will supply this environment if needed, unless the
-brief
option has suppressed it.
Tags in this class are treated as ordinary text, with no additional
spacing requirements, or checks for enclosing environments.
Neither the default built-in style, nor any of the standard
grammar-level-specific style files, use this class. It is provided to
permit transparent handling of tags that may be added in future
versions of the HTML grammars.
There is a difference in the handling of a member of this class,
compared to that for a tag which is not defined in any class. The
latter may result in warnings if the
-unknown-tag-warning
option has been selected, is allowed only in the BODY environment,
and may cause a paragraph to end. A tag in the
inline
style class may occur in either the HEAD or BODY environments, never
raises unknown-tag warnings, and does not end a paragraph.
This HTML tag marks an explicit line break, and has no matching end
tag; preceding space is deleted, and a newline follows:
BR.
This HTML tag has no matching end tag; it appears alone on
a separate line:
LINK.
There is normally at least one LINK tag, in the HEAD environment, and
html-pretty
will supply one automatically if none is present in the input stream.
These HTML tags names occur in begin/end pairs, and delimit lists.
They appear on separate lines, with their enclosed text indented two
levels:
DIR,
DL,
MENU,
OL,
and
UL.
This HTML tag marks the title of a list:
LH.
The begin/end tags are output on separate lines, indented one level from
the enclosing list.
These HTML tags mark the beginning of list items, and have matching
end tags which are supplied if they are absent. They are output on
separate lines, indented one level from the enclosing list:
DD,
DT,
and
LI.
The following SGML markup declarations are also treated like special
tags, and output verbatim while checking for proper embedded comment
balance:
!ATTLIST,
!ELEMENT,
!ENTITY,
!NOTATION,
!SGML,
!SHORTREF,
and
!USEMAP.
However,
html-pretty
does no further checking about where these `tags' are legal.
Generally, they do not occur in HTML files, but are found mainly in
DTD files.
These HTML tag names occur only inside a
MATH
environment, and appear inline, without end tags, and without
affecting indentation:
ATOP,
CHOOSE,
LEFT,
OF,
OVER,
RIGHT,
and
TAB.
These HTML tag names occur only inside a
MATH
environment, with begin/end pairs, and appear inline, without
affecting indentation:
ABOVE,
BAR,
BELOW,
BOX,
DDOT,
DOT,
HAT,
ROOT,
ROW,
SQRT,
SUB,
SUP,
TILDE,
and
VEC.
The following HTML tag names occur in begin/end pairs
(<TAG>and</TAG>),
often with substantial amounts of intervening text:
A,
ABBREV,
ABSTRACT,
ADDED,
ADDRESS,
APPLET,
ARG,
AROW,
ARRAY,
AU,
BDO,
BLOCKQUOTE,
BQ,
BUTTON,
CAPTION,
CENTER,
CITE,
CMD,
COLGROUP,
CREDIT,
DEL,
DIV,
DIV1,
DIV2,
DIV3,
DIV4,
DIV5,
DIV6,
FIELDSET,
FIG,
FN,
FONT,
FOOTNOTE,
FORM,
FRAMESET,
HIDE,
IFRAME,
INS,
LABEL,
LANG,
MAP,
MARGIN,
MATH,
MESSAGE,
NOFRAMES,
NOSCRIPT,
NOTE,
OBJECT,
OPTION,
PERSON,
QUOTE,
REMOVED,
SELECT,
SPAN,
STYLE,
TABLE,
TBODY,
TD,
TEXTAREA,
TFOOT,
TH,
THEAD,
and
TR.
They are prettyprinted on separate lines, with their enclosed text
indented one level.
This HTML tag occurs in begin/end pairs, which are prettyprinted on
separate lines with enclosing text indented one level:
P.
However, paragraphing is tracked, empty paragraphs are discarded, and
when new tags are encountered which are known to be illegal inside a
paragraph, any open paragraph is automatically closed. Thus,
old-style HTML files with omitted </P> tags will usually get them
added. Unlike most word processors and many typesetting systems,
blank lines in the SGML and HTML input stream do
not
imply a paragraph break; only the
<P>
tag does.
The HTML tag
PLAINTEXT
marks the beginning of verbatim text that continues to end-of-file; it
appears on a separate line. Although some HTML viewers will terminate
the plaintext environment on reaching a matching end tag,
</PLAINTEXT>, that practice is now considered erroneous.
html-pretty
will warn about this abberant environment, and recommend using
<PRE> ... </PRE>
instead.
These HTML tags occur in begin/end pairs, which are prettyprinted on
separate lines with enclosing text indented one level:
H1,
H2,
H3,
H4,
H5,
and
H6.
However, they must be logically ordered: H1 before H2 ... before H6,
with no intermediate header levels omitted, and they must appear
at the first level inside the BODY environment.
This HTML tag has no matching end tag; it appears alone on a separate
line:
ITEM.
However, tags in this class can be used only inside a BODY
environment, and consequently,
html-pretty
will automatically end any open HEAD environment, and start a BODY
environment, if needed.
These HTML tags have no matching end tag; they appear alone on
separate lines:
CHANGED,
HR,
IMG,
INPUT,
RENDER,
STYLES,
and
WBR.
However, they may appear only inside the BODY environment, and outside
a paragraph, and consequently,
html-pretty
will automatically end any open HEAD and P environments, and start a
BODY environment, if needed.
These HTML tags have no matching end tag; they may appear in either
the HEAD or the BODY environment, and they appear alone on separate
lines:
BASE,
ISINDEX,
META,
and
NEXTID.
As the class name implies, they are not checked against rules that
might restrict their placement with respect to other environments.
This HTML tag occurs in begin/end pairs, which are prettyprinted on
separate lines with enclosing text indented one level:
TITLE.
However, this tag pair is restricted to occurring only in the HEAD
environment, and should normally only be given once.
html-pretty
will supply this environment if needed, unless the
-brief
option has suppressed it, and will warn about multiple occurrences.
These HTML tags appear in begin/end pairs, delimit preformatted, or
verbatim, text, and may occur only in the BODY environment:
LISTING,
NOBR,
PRE,
and
XMP.
The beginning and ending tags are output on separate lines, with no
indentation, and with the enclosed material copied exactly as it
appeared in the input stream.
These HTML tags appear in begin/end pairs, delimit preformatted, or
verbatim, text, and may occur in either HEAD or BODY environments:
SCRIPT
and
STYLE.
The beginning and ending tags are output on separate lines, with no
indentation, and with the enclosed material copied exactly as it
appeared in the input stream.
The
SCRIPT
and
STYLE
environments are not strictly verbatim environments, but since they
contain material in one of several different scripting (Java,
JavaScript, Tcl, VBScript, ...) or style-sheet (CSS, ...)
languages, there is no reasonable way for
html-pretty
to reformat their contents, so they are included in this style class
to prevent such reformatting.
.-3[SYNOPSIS]
.-2[DESCRIPTION]
.-1[OPTIONS]
Top
.+1[STYLE FILES]
.+2[CATALOG DIRECTORY]
.+3[COMMENTS IN HTML AND SGML]