The Obstack
class is a simple rewrite of the C obstack macros and
functions provided in the GNU CC compiler source distribution.
Obstacks provide a simple method of creating and maintaining a string table, optimized for the very frequent task of building strings character-by-character, and sometimes keeping them, and sometimes not. They seem especially useful in any parsing application. One of the test files demonstrates usage.
A brief summary:
grow
finish
copy
copy
is always equivalent to first grow, then finish.
free
The other functions are less commonly needed:
blank
alloc
blank
, but it wraps up the object and returns its starting
address.
chunk_size, base, next_free, alignment_mask, size, room
grow_fast
blank_fast
blank
, but without checking if there is enough room.
shrink(int n)
contains(void* addr)
Here is a lightly edited version of the original C documentation:
These functions operate a stack of objects. Each object starts life small, and may grow to maturity. (Consider building a word syllable by syllable.) An object can move while it is growing. Once it has been "finished" it never changes address again. So the "top of the stack" is typically an immature growing object, while the rest of the stack is of mature, fixed size and fixed address objects.
These routines grab large chunks of memory, using the GNU C++ new
operator. On occasion, they free chunks, via delete
.
Each independent stack is represented by a Obstack.
One motivation for this package is the problem of growing char strings in symbol tables. Unless you are a "fascist pig with a read-only mind" [Gosper's immortal quote from HAKMEM item 154, out of context] you would not like to put any arbitrary upper limit on the length of your symbols.
In practice this often means you will build many short symbols and a
few long symbols. At the time you are reading a symbol you don't know
how long it is. One traditional method is to read a symbol into a
buffer, realloc()
ating the buffer every time you try to read a
symbol that is longer than the buffer. This is beaut, but you still will
want to copy the symbol from the buffer to a more permanent
symbol-table entry say about half the time.
With obstacks, you can work differently. Use one obstack for all symbol names. As you read a symbol, grow the name in the obstack gradually. When the name is complete, finalize it. Then, if the symbol exists already, free the newly read name.
The way we do this is to take a large chunk, allocating memory from low addresses. When you want to build a symbol in the chunk you just add chars above the current "high water mark" in the chunk. When you have finished adding chars, because you got to the end of the symbol, you know how long the chars are, and you can create a new object. Mostly the chars will not burst over the highest address of the chunk, because you would typically expect a chunk to be (say) 100 times as long as an average object.
In case that isn't clear, when we have enough chars to make up the object, they are already contiguous in the chunk (guaranteed) so we just point to it where it lies. No moving of chars is needed and this is the second win: potentially long strings need never be explicitly shuffled. Once an object is formed, it does not change its address during its lifetime.
When the chars burst over a chunk boundary, we allocate a larger chunk, and then copy the partly formed object from the end of the old chunk to the beginning of the new larger chunk. We then carry on accreting characters to the end of the object as we normally would.
A special version of grow is provided to add a single char at a time to a growing object.
Summary:
The obstack data structure is used in many places in the GNU C++ compiler.
Differences from the the GNU C version
init
and begin
macros are replaced by constructors.
grow
, grow0
, etc.
new
operator.
This restricts flexibility by a little, but maintains compatibility
with usual C++ conventions.
terminator
, and then calls
finish()
. This enables the normal invocation of finish(0)
to
wrap up a string being grown character-by-character.
s
after computing its length.