[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
9.2.3 Cheaper Pairs
However, there is yet another issue to confront. Most Scheme heaps
contain more pairs than any other type of object; Jonathan Rees said at
one point that pairs occupy 45% of the heap in his Scheme
implementation, Scheme 48. However, our representation above spends
three SCM
-sized words per pair — one for the type, and two for
the CAR and CDR. Is there any way to represent pairs using
only two words?
Let us refine the convention we established earlier. Let us assert that:
-
If the bottom three bits of an
SCM
value are#b000
, then it is a pointer, as before. -
If the bottom three bits are
#b001
, then the upper bits are an integer. This is a bit more restrictive than before. -
If the bottom two bits are
#b010
, then the value, with the bottom three bits masked out, is the address of a pair.
Here is the new C code:
enum type { string, vector, ... }; typedef struct value *SCM; struct value { enum type type; union { struct { int length; char *elts; } string; struct { int length; SCM *elts; } vector; ... } value; }; struct pair { SCM car, cdr; }; #define POINTER_P(x) (((int) (x) & 7) == 0) #define INTEGER_P(x) (((int) (x) & 7) == 1) #define GET_INTEGER(x) ((int) (x) >> 3) #define MAKE_INTEGER(x) ((SCM) (((x) << 3) | 1)) #define PAIR_P(x) (((int) (x) & 7) == 2) #define GET_PAIR(x) ((struct pair *) ((int) (x) & ~7))
Notice that enum type
and struct value
now only contain
provisions for vectors and strings; both integers and pairs have become
special cases. The code above also assumes that an int
is large
enough to hold a pointer, which isn’t generally true.
Our list of examples is now as follows:
-
To test if x is an integer, we can write
INTEGER_P (x)
; this is as before. -
To find its value, we can write
GET_INTEGER (x)
, as before. -
To test if x is a vector, we can write:
POINTER_P (x) && x->type == vector
We must still make sure that x is a pointer to a
struct value
before dereferencing it to find its type. -
If we know x is a vector, we can write
x->value.vector.elts[0]
to refer to its first element, as before. -
We can write
PAIR_P (x)
to determine if x is a pair, and then writeGET_PAIR (x)->car
to refer to its car.
This change in representation reduces our heap size by 15%. It also
makes it cheaper to decide if a value is a pair, because no memory
references are necessary; it suffices to check the bottom two bits of
the SCM
value. This may be significant when traversing lists, a
common activity in a Scheme system.
Again, most real Scheme systems use a slightly different implementation;
for example, if GET_PAIR subtracts off the low bits of x
, instead
of masking them off, the optimizer will often be able to combine that
subtraction with the addition of the offset of the structure member we
are referencing, making a modified pointer as fast to use as an
unmodified pointer.
[ << ] | [ < ] | [ Up ] | [ > ] | [ >> ] | [Top] | [Contents] | [Index] | [ ? ] |
This document was generated on April 20, 2013 using texi2html 5.0.