Tom Lord's Hackery

Towards Data Structures for Tla 2.0

draft A

tla 1.x uses ./src/tla/libawk to provide a bunch of "awk-like" data structures. libawk is good, as far is it goes, but it isn't suitable for a thoroughly librified libarch.

problems with the current libawk:

bogus error handling -- libawk makes little attempt to propogate errors to callers in an orderly way: it assumes it is running in a one-shot (short-lived) process and is free to exit on error.

leaky abstraction barrier -- programs using the current libawk too often wind up refering to libawk strings as t_uchar * (which is incompatible with the Unicode plans) or, even worse, explicitly freeing, allocating, more modifying supposed-to-be-opaque fields of libawk data structures. The API isn't quite a clean abstraction.

missing functionality -- 2.0 needs Unicode support which would be hard to retrofit onto libawk. While working on 1.x, I sorely missed some minor generalizations of libawk such as number valued list and table entries.

awkward memory managment (no pun intended) -- Programs must explicitly free all libawk data structures allocated as "stack locals". It is easy to flub this, at least along some execution paths -- the result would be a memory-leaking 2.0 library.

Here is what I plan to take the place of libawk in tla 2.0.

Namespaces

The central data structure used by libarch in tla-2.0 is called a namespace.

Roughly speaking, a namespace is a kind of dictionary: a dynamically modifiable collection of named variables. Programs create and delete variables within a namespace. Programs read and write the values of variables in a namespace.

However, namespaces have considerably more structure than your average dictionary:

Namespaces at a Glance

A namespace is a data structure which maps variable names to locations. What is a "variable name"? What is a "location"?

A location is mutable storage for a single scalar value. The set of scalar values includes strings, numbers, symbols, booleans, and the value nil. A location works similarly to a C variable or structure field of scalar type: programs can read the value stored there; programs can store a new value there.

        Need a picture here.
        A "location" pictured as a box, 
        containing a scalar.

    

A variable name is comprised of at least a scope number and identifier. The scope number is a small integer, the identifier is a string (constrained to conform to "identifier name syntax").

A namespace contains 1 or more dynamically allocated scopes. Each scope is a disjoint namespace: the same identifier may be bound to two distinct locations, in two different scopes.

Simple Variables

A simple variable name is comprised of only a scope number and identifier. Given just a the identifier naming a simple variable, and its scope, programs can find its location and therefore both read and modify the value of the variable.

	Need a picture here.
        A scope, containing only simple variables, 
        pictured as a 2-col table with variable names
        in the left column, scalar values in the right.

        A namespace pictured (for now) as an array
        of such scopes.

      

Non-Simple Variables: Lists and Tables

Some identifiers, however, are bound to more complex variables.

A list variable is an identifier bound within a namespace scope to a dynamically resizable 1-d array of locations.

To find a location within a list variable, a program must supply a scope, an identifier name, and an integer list offset.

Similarly, a table variable is an identifier bound (within a given scope) to a dynamically resizable array of rows, each row being a dynamically resizable array of columns, each column within a row being a single location.

To find a location within a table variable, a program must supply a scope, an identifier name, an integer row offset, and an integer column offset.

	Picture

        Picture scopes as a table:

	     name:		binding:

	     simple_var		[ 42 ]

	     list_var		size=2
				[ "hello" ]
                                [ "world" ] 

	     table_var		n_rows=2
				row[0]= [ "hello" ] [ "world" ] 
				row[1]= [ "hello" ] [ "sailor" ] [14]


        A composite of several of those forming a namespace.

	Some variable names with arrows pointing to the addressed
        location.

	E.g.   "simple_var" points to the box around 42
		"table_var[1][1]" points to the box around "sailor"


   ]]


      

Fancy Tricks With Scopes

The namespace data structure provides somewhat efficient operations to:

Clear a scope -- Remove all bindings from an indicated scope. (All names in the scope are, in effect, simple variables whose current value is nil.)

Push a scope -- Clear the scope but, first, remember the old values on a stack.

Pop a scope -- The inverse of pushing a scope.

There are some "standard scopes", intended to be used with these operations, used to implement a simple (albeit heavyweight) "calling convention" based on namespaces. The standard scopes are:

     environment_scope
     global_scope
     params_scope
     locals_scope
     returns_scope

      

Overview of Using Namespaces

The namespace data structure will be used in tla-2.0 to provide a uniform and completely "reflective" API to libarch.

Using tla-1.x, a "client program" of libarch has little choice but to run tla as a subprocess. To invoke a libarch entry point today, a program has to build an argv array of the parameters, fork and exec the command, wait for the command and collect any return parameters.

Using namespaces, a 2.0 client will do something very similar, but considerably easier in the details:

To run a command, a client can: (1) allocate a namespace; (2) initialize the namespace by setting variables to reflect parameters to the command (and the name of the command to run); (3) call arch_run; (4) read back exit status and results from the namespace; (5) free the namespace.

(There is also the possibility of namespaces persisting across multiple libarch invocations, of course.)

Function: {`alloc_namespace'}

Prototype:

      ssize_t alloc_namespace (void);

    

Description:

Normally, return a small positive integer: the "namespace descriptor" for a newly allocated namespace.

Upon a recoverable allocation failure (a retry might succeed if the allocator permits it), return 0.

Upon catastrophic failure, return a value less than 0. Here and elsewhere, a catastrophic failure (usually indicated by a return value less than 0) indicates that most calls into libarch are no longer safe. Callers receiving a "catastrophic error" return value should, persumably, arrange to make an emergency exit from their process as quickly as possible.

Function: {`free_namespace'}

Prototype:

      int free_namespace (ssize_t nspace);

    

Description:

nspace should be a descriptor previously returned by alloc_namespace.

Free the indicated namespace and release all associated resources.

Normally, return 0.

Return a negative value upon a catastrophic error.

Should not, currently, return a positive value.

Scopes

Each namespace data structure can contain multiple, disjoint identifier name mappings at once. Each disjoint mapping is called a namespace scope.

In other words, a namespace contains N scopes. Every identifier name can be bound to a variable in each of those N scopes. Two scopes can contain separate variables for a single name.

Function: {`namespace_create_scope'}

Prototype:

      ssize_t namespace_create_scope (ssize_t nspace);

    

Description:

Return a positive integer which serves as the name for a new disjoint identifier mapping within nspace, a namespace previously returned by alloc_namespace.

Return -1 for catastrophic failures and 0 for potentially transient failures (such as some kind of allocation failure).

There is no correpsonding function to release a previously allocated scope. Programs are not expected to create large numbers of scopes.

Standard Scopes

This section defines {`namespace_statics'}, {`namespace_globals'}, {`namespace_locals'}, {`namespace_params'}, {`namespace_returns'}.

The namespace library provides some standard, built-in scopes. The integer identifiers for these scopes are the same in all namespace instances:

Prototypes:

      ssize_t namespace_environment (void);
      ssize_t namespace_globals (void);
      ssize_t namespace_locals (void);
      ssize_t namespace_params (void);
      ssize_t namespace_returns (void);

    

Description:

Return the scope number for each of the 5 standard scopes.

Scope Lists

Every scope represents a mapping from identifiers to variables. In fact, scopes have additional structure beyond that.

Let's call a simple mapping from identifiers to mappings a symbol table.

A scope then has two parts: a list of symbol tables and a current offset into that list.

In pseudo-code, we might declare a scope data structure this way:

      struct scope
      {
        int current_list_pos;
        list_of<struct symbol_table> symbtabs;
      }

    

If a user asks for the variable named X in scope S, then X is looked up in in the symbol table S.symbtabs[S.current_list_pos].

Using Scope Lists as Stacks

Function: {`namespace_push_scope'}

      int namespace_push_scope (ssize_t nspace, ssize_t scope);

      

Allocate a new symbol table and append it to the symbtabs list of the indicated scope. Set the current_list_pos of that scope to point to this newly appended symbol table. The new symbol table is initially empty (no identifiers bound to variables).

Function: {`namespace_pop_scope'}

      int namespace_pop_scope (ssize_t nspace, ssize_t scope);

      

Discard the last element of the symbtabs list of the indicated scope. Set the current_list_pos pointer of the scope to point to the new last element of symbtabs.

If this operation would otherwise leave the symbtabs list empty, instead, the list is reinitialized to contain a single symbol table, initially containing no bindings.

Randomly Accessing Scope Lists

Function: {`namespace_n_scope_elements'}

      ssize_t namespace_n_scope_elements (ssize_t nspace, 
                                          ssize_t scope);

      

Return the number of elements in the symbtabs list of the indicated scope.

Function: {`scope_set_symbtab'}

      ssize_t namespace_set_symbtab (ssize_t nspace, 
                                     ssize_t scope,
                                     ssize_t symbtab_list_pos);

      

Change the current_list_pos field of the indicated scope. I.e., change which scope in the symbtabs list is used, by default, to look up variable names.

Variables, Indexes, and Locations

So. Namespaces contain scopes. Each scope is a dynamic list of symbol tables plus a "current symbol table" index. Each symbol table maps identifiers to variables. Please make sure you have absorbed enough from the preceeding sections to understand the description in this paragraph before continuing.

We're left with at least two questions: What are identifiers? and What are variables?.

Identifiers

Identifiers are represented as ASCII strings, beginning with an alphabetic character, containing only alphabetic, numeric, and underscore characters.

In namespace APIs, identifiers are usually passed as t_uchar * pointers to 0-terminated strings.

Variables and Locations: Singletons, Lists, and Tables

Namespace variables are containers for one or more mutable locations.

Each location holds a scalar value. A scalar value can be a number, (immutable) string, symbol, boolean, or the nil value.

Singleton variables consist of just a single location. They hold a single scalar value. To access the scalar value stored in a singleton variable, you need only the variable's name.

List variables consist of a dynamically sized ordered collection of locations. New locations can be prepended to, appended upon, or inserted into the list. Locations can be deleted, too, from arbitrary positions within the list. To access a scalar value stored in a list variable, you need both the variable's name and an integer list element index.

Table variables consist of a "dynamically sized ordered collection of list of locations" (whew!). In plainer english, a table variable is a resizable list of rows, and each row is a resizable list of columns. Each element of a column is a separate location, containing some scalar value. To access a scalar value stored in an table variable, you need the variable's name, an integer row index, and an integer column index.

Lists and Tables Not Values

Don't make the mistake of thinking that a list variable is a variable whose value is a list.

There is no such thing as a value which is a list: all values in namespaces are immutable scalars. Lists can be modified and are composite values, containing N locations, each containing a separate scalars.

Think instead that some variables happen list structured (or array structured or whatever) -- instead of consisting of a single location, they happen to consist of a modifiable list of locations. The list in this equation is part of the variable -- not part of the value stored in the variable.

Got it?

Note: Please pay special attention to the function `namespace_copy', documented below. Understanding it's semantics is vital to understanding how to use namespaces effectively.

Function: {`namespace_rename'}

Prototype:

      int namespace_rename (ssize_t nspace,
                            t_uchar * old_name,
                            ssize_t old_scope,
                            t_uchar * new_name,
                            ssize_t new_scope);

      

Description:

Change the name and scope of a variable. If the old and new names or scopes differ, the old name becomes (in effect) a singleton variable bound to nil and the new name is bound to the variable formerly bound to the old name.

Function: {`namespace_copy'}

Prototype:

      int namespace_copy (t_uchar * to_name,
                          ssize_t to_scope,
                          ssize_t nspace,
                          t_uchar * from_name,
                          ssize_t from_scope)

      

Description:

If the from variable is a singleton variable, then make the to variable a singleton variable containing an equal scalar value.

If the from variable is a list or table variable, then the to variable is made to be a reference to that same list or table. By reference, I mean that modifications made to either variable are visible as modifications to both -- they refer to the same underlying list or table.

Although two variables can refer to the same list or table, nevertheless, each list or table specifically "belongs" to one variable in particular. If that variable is destroyed or converted to some other kind of variable, then the list or table is destroyed. When that happens, all other variables that refer to the same list or table are implicitly converted into singleton variables, containing the value nil.

In other words, if you namespace_copy variable A to variable B, and A was a list variable at the time, then:

1. modifications to the A list effect the B list and vice versa.

2. if A is destroyed or is converted to some other kind of variable, then B becomes a singleton variable, initialized to the value nil.

3. if B is destroyed or is converted to some other kind of variable, on the other hand, A is uneffected.

In effect, A has been copied to B by reference with the caveat that, using our namespace interfaces, the representation of references are "safe" (e.g., can't result in de-referencing invalid pointers).

Namespace "Addresses" (aka Indexes)

Locations within a namespace are analogous to byte locations within the memory of a general purpose computer: they can contain a simple "scalar" value and, they have an address.

Namespace location addresses are the topic of this section.

To avoid confusion over the word "address", the actual name we use for namespace location addresses is namespace indexes.

* Type {`t_namespace_index'}

Prototype:

      typedef <unspecified> t_namespace_index;

        

Description:

The type of address-like namespace indexes.

A namespace index functions similarly to an address: given a namespace and a namespace index, a unique (although possibly non-existent) location is refered to.

Given an index (and its namespace), a program can read and write the contents of the named location --- in that way, an index functions similarly to a pointer.

Unlike pointers, namespace indexes are reliably bounds checked. If your program has bugs, dereferencing or changing the location named by an index might return unexpected data or store data in an unintended part of the namespace --- but at least the namespace data structure will remain internally consistent. You won't wind up dereferencing an invalid C pointer, for example.

* Function {`namespace_index'}

Prototype:

      int namespace_index (t_namespace_index * index_ret,
                           ssize_t nspace,
                           t_uchar * var_name,
                           ssize_t scope);

        

Description:

Fill in *index_ret with an index that refers to the singleton location bound to var_name in the indicated scope.

Normally, return 0.

Upon catastrophic error, return a value less than 0.

* Function {`namespace_list_index'}

Prototype:

      int namespace_list_index (t_namespace_index * index_ret,
                                ssize_t nspace,
                                t_uchar * var_name,
                                ssize_t scope,
                                ssize_t list_pos);

        

Description:

Fill in *index_ret with an index that refers to the list element location bound to var_name in the indicated scope, at list offset list_pos.

Normally, return 0.

Upon catastrophic error, return a value less than 0.

* Function {`namespace_array_index'}

Prototype:

      int namespace_array_index (t_namespace_index * index_ret,
                                 ssize_t nspace,
                                 t_uchar * var_name,
                                 ssize_t scope,
                                 ssize_t row,
                                 ssize_t col);

        

Description:

Fill in *index_ret with an index that refers to the array element location bound to var_name in the indicated scope, at array position row, col.

Normally, return 0.

Upon catastrophic error, return a value less than 0.

Setting and Getting Scalars Stored in Locations

Namespace indexes give us a way to translate location names within a namespace into a form of "address" for the indicated location. The functions in this section let you read or write the scalar stored in a given location.

Scalar values may be numbers, strings, symbols, booleans, or the value nil.

The Value nil

* Function: {`namespace_is_nil'}

Prototype:

      int namespace_is_nil (ssize_t nspace, 
                            t_namespace_index index);

        

Description:

Return 1 if the indicated location exists and contains nil, 0 otherwise.

Return a value less than 0 upon catastrophic error.

* Function: {`namespace_store_nil'}

Prototype:

      int namespace_set_to_nil (ssize_t nspace, 
                                t_namespace_index index);

        

Description:

Store nil in the location indicated by index.

If the indicated location does not currently exist, return 1, otherwise return 0.

(Except) return a value less than 0 for catastrophic errors.

Number Values

* Function: {`namespace_is_number}

Prototype:

      int namespace_is_number (ssize_t nspace, 
                               t_namespace_index index);

        

Description:

Return 1 if the indicated location exists and contains a number, 0 otherwise.

Return a value less than 0 upon catastrophic error.

* Function: {`namespace_set_to_int32'}

Prototype:

      int namespace_set_to_int32 (ssize_t nspace, 
                                  t_namespace_index index,
                                  t_int32 new_value);

        

Description:

Store new_value in the location indicated by index.

If the indicated location does not currently exist, return 1, otherwise return 0.

(Except) return a value less than 0 for catastrophic errors.

* Function: {`namespace_get_int32'}

Prototype:

      int namespace_get_int32 (t_int32 * n_ret,
                               ssize_t nspace, 
                               t_namespace_index index);

        

Description:

Retrieve the value stored in the location addressed by index, presuming that that location exists and contains a number representable as a 32-bit integer. Return 0 in this case.

If the location does not exist or contains a non-number, return a value greater than 0.

Upon catastrophic error, return a value less than 0.

Boolean Values

* Function: {`namespace_is_boolean'}

Prototype:

      int namespace_is_boolean (ssize_t nspace, 
                                t_namespace_index index);

        

Description:

Return 1 if the indicated location exists and contains a boolean, 0 otherwise.

Return a value less than 0 upon catastrophic error.

* Function: {`namespace_set_to_boolean'}

Prototype:

      int namespace_set_to_int32 (ssize_t nspace, 
                                  t_namespace_index index,
                                  int new_value);

        

Description:

Store !!new_value in the location indicated by index.

If the indicated location does not currently exist, return 1, otherwise return 0.

(Except) return a value less than 0 for catastrophic errors.

* Function: {`namespace_get_boolean'}

Prototype:

      int namespace_get_boolean (int * bool_ret,
                                 ssize_t nspace, 
                                 t_namespace_index index);

        

Description:

Retrieve the 0-or-1 value stored in the location addressed by index, presuming that that location exists and contains a boolean. Return 0 in this case.

If the location does not exist or contains a non-boolean, return a value greater than 0.

Upon catastrophic error, return a value less than 0.

Symbol Values

* Function: {`namespace_is_symbol'}

Prototype:

      int namespace_is_symbol (ssize_t nspace, 
                               t_namespace_index index);

        

Description:

Return 1 if the indicated location exists and contains a symbol, 0 otherwise.

Return a value less than 0 upon catastrophic error.

* Function: {`namespace_set_to_symbol'}

Prototype:

      int namespace_set_to_symbol (ssize_t nspace, 
                                   t_namespace_index index,
                                   t_uchar * symbol);

        

Description:

Store symbol in the location indicated by index.

symbol should be a string returned by identifier_intern (in libhackerlab). It is an undetected error if it is not. Therefore, most programs should stick to namespace_set_to_symbol_str.

If the indicated location does not currently exist, return 1, otherwise return 0.

(Except) return a value less than 0 for catastrophic errors.

* Function: {`namespace_set_to_symbol_str'}

Prototype:

      int namespace_set_to_symbol (ssize_t nspace, 
                                   t_namespace_index index,
                                   t_uchar * symbol_name);

        

Description:

Intern the symbol named by 0-terminated symbol_name and store the resulting symbol in the location indicated by index.

If the indicated location does not currently exist, return 1, otherwise return 0.

(Except) return a value less than 0 for catastrophic errors.

* Function: {`namespace_get_symbol'}

Prototype:

      int namespace_get_boolean (t_uchar * identifier_ret,
                                 ssize_t nspace, 
                                 t_namespace_index index);

        

Description:

Retrieve the symbol value stored in the location addressed by index, presuming that that location exists and contains a symbol. Return 0 in this case.

If the location does not exist or contains a non-symbol, return a value greater than 0.

Upon catastrophic error, return a value less than 0.

String Values

* Function: {`namespace_is_string'}

Prototype:

      int namespace_is_string (ssize_t nspace, 
                               t_namespace_index index);

        

Description:

Return 1 if the indicated location exists and contains a string, 0 otherwise.

Return a value less than 0 upon catastrophic error.

* Function: {`namespace_set_to_string_str'}

Prototype:

      int namespace_set_to_string (ssize_t nspace, 
                                   t_namespace_index index,
                                   t_uchar * str);

        

Description:

Store a copy of the 0-terminated string str in the location indicated by index.

If the indicated location does not currently exist, return 1, otherwise return 0.

(Except) return a value less than 0 for catastrophic errors.

* Function: {`namespace_get_string_str_n'}

Prototype:

      int namespace_get_boolean (t_uchar * str_ret,
                                 ssize_t * len_ret,
                                 ssize_t nspace, 
                                 t_namespace_index index);

        

Description:

Retrieve the string value stored in the location addressed by index, presuming that that location exists and contains a string. Return 0 in this case.

If the location does not exist or contains a non-string, return a value greater than 0.

Upon catastrophic error, return a value less than 0.

Namespace Buffers

libhackerlab provides the module hackerlab/buffers -- a data structure for edittable strings supporting "markers".

In particular, hackarlab/buffers/buffers.h provides for "buffer sessions" -- flat namespaces of explicitly allocated and freed buffers.

Every namespace has an associated buffer session:

Function: {`namespace_buffer_session'}

Prototype:

        ssize_t namespace_buffer_session (ssize_t nspace);

    

Description:

Return the buffer session id associated with the indicated namespace or a value less than 0 upon error.

Return values less than 0 do not signal catastrophic errors. This function can not result in a catastrophic error.

Namespace Graphs

libhackerlab provides the module hackerlab/graphs/ -- a data strcuture for edittable directed graphs.

The namespace data structure permits programs to allocate graphs which, if not otherwise freed, are guaranteed to be freed when the namespace itself is freed:

Function: {`namespace_alloc_graph'}, {`namespace_free_graph'}

Prototypes:

      ssize_t namespace_alloc_digraph (ssize_t nspace);
      int namespace_free_digraph (ssize_t nspace, ssize_t digraph);

    

Description:

Allocate (or free) a digraph associated with namespace nspace.

Such graphs are automatically freed, if they have not already been explicitly freed, when the namespace is freed.

Namespace Descriptors and Subprocesses

function prototypes not provided

Similarly, namespaces provide for certain file descriptors to be automatically closed and for certain subprocesses to be killed and reaped when a namespace is freed.

Virtual Threads

Recall that, within a namespace, a scope consists of symbtabs, a list of symbol tables and current_list_pos, an index into the symbol table list.

Operations such as namespace_push_scope allow us to use scopes as a kind of "call frame stack". A function can save part of its caller's bindings, install their own, then later restore the caller's bindings (for example).

The gist is that within each scope, there can be multiple symbol tables, and which symbol table is current can change over time.

We can usefully repeat that abstraction at the next higher level. Instead of just saving and restoring individual symbol tables (aka, independent collections of bindings), we can instead save and restore entire sets of scopes.

A namespace thread is a data structure for holding a saved set of scopes. Programs can move the current values of any selected subset of a namespaces scopes to a thread object. In the namespace, the moved scope is replaced by an empty scope, containing no bindings. Programs can also restore the values of scopes from a thread: that discards the scopes replaced by those being restored and it leaves the thread object "empty".

Function: {`namespace_alloc_thread'}, {`namespace_free_thread'}

Prototypes:

      ssize_t namespace_alloc_thread (ssize_t nspace);
      int namespace_free_thread (ssize_t nspace, ssize_t thread);

    

Description:

Allocate (or free) a namespace thread within nspace.

Function: {`namespace_freeze'}, {`namespace_thaw'}

Prototypes:

      int namespace_freeze (ssize_t nspace,
                            ssize_t thread,
                            int n_scopes,
                            ssize_t * scope_v);

      int namespace_thaw (ssize_t nspace, ssize_t thread)

    

Description:

Save (or restore) the indicated scopes in a namespace thread.

Error Codes

not written yet

notes:

rbcollins and I talked about

     struct error_code
     {
       ssize_t error_class;
       ssize_t error_index_in_class;
     };

  

The APIs above assume single integer error codes, divided into negative and positive codes.

The APIs are returning error_index_in_class. If the caller knows what class of errors the callee can produce (and we callees, by convention, to produce only one class of error each) then the caller can form the complete struct error_code.

If the caller doesn't know the error class, then it is significant if the error code is non-0 and sometimes significant if a non-0 code is positive or negative.

Basic Namespace Utils

The earlier sections have built up quite a bit of structure in namespaces.

This section describes a set of "namespace utility functions" that can be built on the above. It would be tedious to make a complete list of all desirable utility functions .. just a few samples to illustrate:

List Operations

Namespace variables can be list variables. Given that it's convenient to have functions like:


    int namespace_list_append (t_namespace_index append_to_list,
                               ssize_t nspace,
                               t_namespace_index append_from_list);

    

which, if both indexed variables are lists, appends a copy of the from_list to the to_list.

Relational Operations

Given two variables which are tables:


    int namespace_join (t_namespace_index output_table,
                        ssize_t nspace,
                        ssize_t join_column,
                        t_scalar_comparison_fn (*cmp)(),
                        void * cmp_rock,
                        t_namespace_index table_a,
                        t_namespace_index table_b,
                        t_join_field_spec output_field,
                        ...)


    

and so forth.

E.g., basic string/list/table ops.

form of int fn (output_var_specs, nspace, input_var_specs + params);

Entry Points and Calling Conventions

A thoroughly "librified" libarch should include provisions which makes it entry points easily accessible to the run-time environments of scripting languages (and the like).

Such access to entry points generally requires:

1. A facility for finding and invoking entry points by symbolic name. Many of the most convenient ways to make libarch entry points available as functions in a scripting language involve having the ability to look up the list of (symbolic names for) available entry points at run time, and to be able to invoke an entry point given only its name.

2. A facility for mashalling parameters and collecting return values from entry points, using a generic mechanism. If every entry point in libarch has its own C function type, then calling those entry points from a scripting language involves a lot of (programming) work. Each such entry point must be "wrapped", either by hand or using a tool such as Swig. It is simpler if there is a generic way to collect the arguments for or return values from a libarch entry point; in other words, if scripting languages binding to libarch can get by with a single, generic wrapper that works for all entry points rather than N+1 wrappers, one for each separate entry point.

3. Useful invarients and error handling. libarch entry points need to make reasonably strong and universal guarantees. For example, absent a catastropic error, they should neither leak resources nor ever leave the internal namespace data structures in an inconsistent state.

How can we do that?

Just What is an Entry Point

For simplicity, I take the view that libarch in tla-2.0 should function as a kind of extra-fancy turing machine. Recall that a turing machine has two parts: a finite state machine defining the computational steps the machine can take; an "infinite tape" which serves as the "memory" for the computation run by the turing machine.

In libarch's case, I regard a namespace data structure as taking the place of our "infinite tape". Namespaces are similar to turing tapes in many ways: they have a simple topological structure and are divided up into locations, each of which contains a scalar value.

Namespaces are different from Turing tapes in some of their arbitrary details. For example, namespaces divide their storage into "scopes" and each scope has a list of symbol tables. Thats' much more complicated than Turing's 1-D tape but the added complexity also adds realism: symbol tables are easier to program than the 1-d tape, even if they are logically equivalent; scopes are cheap to implement and handy, even if on a 1-d tape they would be absurdly expensive to simulate. A namespace is a 1-d tape modified in response to a bunch of pragmatic considerations.

If a namespace takes the place of the infinite tape, then the entry points in libarch take the place of the finite state machine.

Indeed, although libarch may include some static data as a performance optimization, from the perspective of its API, libarch in 2.0 will be completely "stateless" --- all persistent state between libarch calls will be stored in a namespace, not in libarch itself.

A collection of stateless C entry points is, indeed, a form of finite state machine.

A Single Entry Point

librach can get by with a single entry point (although doing so is not literally proposed).

* Function: {`arch_run'}

Prototype:

      int arch_run (ssize_t nspace,
                    int (*poll)(void *),
                    void * poll_rock);

        

Description:

Perform a single libarch state transition. Usually this means invoking the command selected by the current state of the namespace nspace.

As a side effect, the indicated namespace is modified to reflect the results of the state transition.

Normally return 0.

Returns a value less than 0 upon catastropic error.

Returns a value greater than 0 upon recoverable error (such there being no currently defined transition).

The poll parameter may be 0 or a user-supplied "poll function". arch_run is free to (not required to) periodically call poll. If poll returns non-0, arch_run will attempt to return to its caller as quickly as possible, even if that entails returning a (recoverable) error.

Provided that a client program suitably modifies the namespace nspace before each call to arch_run, that can be a complete interface. (I'm assuming, of course, that the client as the separate namespace interface available to set up parameters before calling arch_run and read back results after arch_run returns.)

The arch_run Calling Convention

Upon a call to arch_run:

The Parameter Variable argv

The namespace variable named "argv" in the standard scope namespace_params() must be initialized much like an argv you would pass to exec(2):

The namespace argv variable must be a list variable.

argv[0] must be the symbolic name of a libarch "state transition" entry point. Roughly, this should correspond to a tla 1.X subcommand name.

argv[1..n] may contain a list of options and arguments to the entry point named in argv[0].

The Standard Error Buffer

The namespace variable named "stderrbuf" in the standard scope namespace_globals() may contain a non-negative integer. If so, that integer is taken to be the buffer id of the "standard error buffer", allocated from the namespace's buffer set.

Within libarch, code that wants to generate an error message should prepend that message to the "stderrbuf". If the buffer is not empty before prepending the new message, libarch code should first prepend "\n\n---\n\n" to the buffer.

The Standard Error Buffer

Similarly, the namespace variable named "stdoutbuf" in the standard scope namespace_globals() may contain a non-negative integer. If so, that integer is taken to be the buffer id of the "standard output buffer", allocated from the namespace's buffer set.

libarch code can generate "normal output" by appending to this buffer.

The Standard Input Buffer

Similarly, the namespace variable named "stdinbuf" in the standard scope namespace_globals() may contain a non-negative integer. If so, that integer is taken to be the buffer id of the "standard input buffer", allocated from the namespace's buffer set.

libarch code can read "default input" by consuming the contents of this buffer.

The Return Variables retv and status

The namespace variable "retv" in the standard scope namespace_returns() is used symmetrically to "argv".

Upon return from arch_run, "retv" will be a list variable, containing the 0 or more "returned values" from the entry point (regarded as a function call).

Upon return, the variable "status", also in the namespace_returns() scope, will be set to an integer value: the same integer returned from arch_run.

Callee-Preserves Locals

Upon return, libarch will not have changed any variable values in the standard scope namespace_locals().

If libarch wants to use the locals scope internally, it will generally do so by "pushing" (namespace_push_scope) that scope on entry to arch_run and popping the scope, to return the caller's bindings, before return.

Callee-Preserves Parameters

The namespace_params() is preserved similarly to the locals scope.

Registering Commands

The 0 element of the namespace variable "argv" in the scope namespace_params() contains the name of the command to be invoked by arch_run.

How is that name translated into an actual choice of which code to run?

* Function: {`arch_register_command'}

Prototype:

        int arch_register_command (t_uchar * name, 
                                   int (*fn) (ssize_t nspace, void * rock),
                                   void * fn_rock);

        

Description:

Remember that fn (provided the fn_rock argument) implements the libarch entry point of the indicated name.

* Listing Commands

Not Illustrated: functions for listing the available commands and perhaps conventions for linking them to help messages and into help categories.

arch_run Illustrated

If your module defined a new libarch entry point (say, "my-id") then during initialization, you'll need something like:

arch_run Initialization Illustrated


    if (0 != arch_register_command ("my-id",
                                    my_id_fn,
                                    (void *)0))
      ... uh-oh, catastropic initialization error ...;


      

arch_run Client Interface Illustrated

A libarch client can call your new entry point in a style remeniscent of using fork and exec:


    ssize_t namespace;
    t_namespace_index argv0;
    t_namespace_index retv0;
    t_uchar * my_id;
    ssize_t my_id_len;

    namespace = alloc_namespace ();
    if (namespace < 0)
      ... catastropic error ...;

    if (0 != namespace_list_index (&argv0,
                                   namespace,
                                   "argv", namespace_params()
                                   0))
      ... catastropic error ...;

    if (0 != namespace_set_to_string (argv0,
                                      namespace,
                                      "my-id"))
      ... catastropic error ...;

    if (0 != arch_run (namespace))
      ... some kind of error during the run of `my-id' ...;

    
    if (0 != namespace_list_index (&ret0,
                                   namespace,
                                   "retv", namespace_returns()
                                   0))
      ... catastropic error ...;

    if (0 != namespace_get_string_str_n (&my_id, &my_id_len,
                                         namespace, retv0))
      ... catastropic error ...;

    /* We just called `my-id' with no parameters and got back
     * the string value `my_id', of length `my_id_len'.
     * 
     * That string pointer remains valid until the namepace
     * binding of `"retv"' changes, for any reason.
     */

      

arch_run Internal Interfaces Illustrated

Finally, here is what your implementation of my-id might look like:


    int
    my_id_fn (ssize_t nspace, void * rock)
    {
      t_uchar * id_string;
      int answer;

      if (1 != namespace_list_length (nspace, "argv", namespace_params()))
        {
          ... my id was called with bogus parameters ...;
          ... spew an error message to `stderrbuf' ...;
          ... then return with a non-0 exit code: ...;

          return arch_return_from_run (2);
        }

       id_string = low_level_call_to_compute_my_id ();
       if (!id_string)
         {
            ... spew an error message to `stderrbuf' ...;
            return arch_return_from_run (1);
         }

       answer = namespace_list_set_to_string (nspace, 
                                              "retv",
                                              namespace_returns(),
                                              0,
                                              id_string);
       free (id_string);

       return answer;
    }


      

my_id turns out to be a particularly simple example. A more complicated example might, for example, need to use namespace local variables. That would involve calling namespace_push_scope to save the scope namespace_locals() on entry, and calling namespace_pop_scope to restore that scope before returning.

Copyright

Copyright (C) 2004 Tom Lord

This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

See the file COPYING for further information about the copyright and warranty status of this work.