At the heart of Supercite is a regular expression interpreting engine
called Regi. Regi operates by interpreting a data structure
called a Regi-frame (or just frame), which is a list of
Regi-entries (or just entry). Each entry contains a predicate,
typically a regular expression, which is matched against a line of text
in the current buffer. If the predicate matches true, an associated
expression is eval
uated. In this way, an entire region of text
can be transformed in an awk-like manner. Regi is used
throughout Supercite, from mail header information extraction, to header
nuking, to citing text.
While the details of Regi are discussed below (see section Using Regi), only those who wish to customize certain aspects of Supercite need concern themselves with it. It is important to understand though, that any conceivable citation style that can be described by a regular expression can be recognized by Supercite. This leads to some interesting applications. For example, if you regularly recieve email from a co-worker that uses an uncommon citation style (say one that employs a `|' or `}' character at the front of the line), it is possible for Supercite to recognize this and coerce the citation to your preferred style, for consistency. In theory, it is possible for Supercite to recognize such things as uuencoded messages or C code and cite or fill those differently than normal text. None of this is currently part of Supercite, but contributions are welcome!
Regi works by interpreting frames with the function
regi-interpret
. A frame is a list of arbitrary size where each
element is a entry of the following form:
(pred func [negate-p [case-fold-search]])
Regi starts with the first entry in a frame, evaluating the pred
of that entry against the beginning of the line that `point' is on.
If the pred evaluates to true (or false if the optional
negate-p is non-nil
), then the func for that entry is
eval
uated. How processing continues is determined by the return
value for func, and is described below. If pred was false
the next entry in the frame is checked until all entries have been
matched against the current line. If no entry matches, `point' is
moved forward one line and the frame is reset to the first entry.
pred can be a string, a variable, a list or one of the following
symbols: t
, begin
, end
, or every
. If
pred is a string, or a variable or list that eval
uates to a
string, it is interpreted as a regular expression. This regexp is
matched against the current line, from the beginning, using
looking-at
. This match folds case if the optional
case-fold-search is non-nil
. If pred is not a
string, or does not eval
uate to a string, it is interpreted as a
binary value (nil
or non-nil
).
The four special symbol values for pred are recognized:
t
begin
end
every
Note that negate-p and case-fold-search are ignored if pred is one of these special symbols. Only the first occurance of each symbol in a frame is used; any duplicates are ignored. Also note that for performance reasons, the entries associated with these symbols are removed from the frame during the main interpreting loop.
Your func can return certain values which control continued Regi
processing. By default, if your func returns nil
(as it
should be careful to do explicitly), Regi will reset the frame to the
first entry, and advance `point' to the beginning of the next line.
If a list is returned from your function, it can contain any combination
of the following elements:
continue
abort
end
entry is still processed.
(frame . newframe)
(step . step)
During execution of your func, the following variables will be temporarily bound to some useful information:
curline
looking-at
, as a string.
curframe
curentry
As mentioned earlier, Supercite uses various frames to perform
certain jobs such as mail header information extraction and mail header
nuking. However, these frames are not available for you to customize,
except through abstract interfaces such as sc-nuke-mail-header
,
et al.
However, the citation frames Supercite uses provide a lot of customizing
power and are thus available to you to change to suit your needs. The
workhorse of citation is the frame contained in the variable
sc-default-cite-frame
. This frame recognizes many situations,
such as blank lines, which it interprets as paragraph separators. It
also recognizes previously cited nested and non-nested citations in the
original message. By default it will coerce non-nested citations into
your preferred citation style, and it will add a level of citation to
nested citations. It will also simply cite uncited lines in your
preferred style.
In a similar vein, there are default frames for unciting and
reciting, contained in the variables
sc-default-uncite-frame
and sc-default-recite-frame
respectively.
As mentioned earlier (see section Recognizing Citations), citations are
recognized through the values of the regular expressions
sc-citation-root-regexp
, et al. To recognize odd styles, you
could modify these variables, or you could modify the default citing
frame. Alternatively, you could set up association lists of frames for
recognizing specific alternative forms.
For each of the actions -- citing, unciting, and reciting -- an alist is
consulted to find the frame to use (sc-cite-frame-alist
,
sc-uncite-frame-alist
, and sc-recite-frame-alist
respectively). These frames can contain alists of the form:
((infokey (regexp . frame) (regexp . frame) ...) (infokey (regexp . frame) (regexp . frame) ...) (...))
Where infokey is a key suitable for sc-mail-field
,
regexp is a regular expression which is string-match
'd
against the value of the sc-mail-field
key, and frame is
the frame to use if a match occurred. frame can be a variable
containing a frame or a frame in-lined.
When Supercite is about to cite, uncite, or recite a region, it consults the appropriate alist and attempts to find a frame to use. If one is not found from the alist, then the appropriate default frame is used.