Home Contact Projects

SilvIR: Evaluation Semantics, Draft 12021-02-21

This post assumes you've read the previous one in the series.

This post might have a higher "difficulty of read to usefulness of reading" ratio than the others in the series... I'll try to precisely specify the evaluation semantics of the IR defined in the previous draft here. If you're averse to hundreds of lines of Haskell, feel free to skim/skip it, but some questions about how features interact with each other are answered here, either directly or by being able to play around with these definitions in the REPL.

Alternatively, if you learn better by seeing an example, you can see one that uses concrete syntax here.

This post is a Literate Haskell file; if you clone the repo backing this site and run this post (content/silvir/EvaluationSemantics.lhs) under GHCi, you can get a live version of these semantics you can play with interactively. This is definitely not anything approaching a proper efficient interpreter for SilvIR; rather, it's meant to act as a precise specification of the semantics of the language, which as a side benefit allows executing SilvIR programs.

Imports

Haskell makes us put the language pragmas and imports at the top of the file, before we get to any code, so here they are.

We use Seqs instead of lists for efficiency, so the OverloadedLists language extension is convenient to allow us to use familiar square-brackets-and-commas notations for Seq literals. This also allows us to write Maps as if they were alists (i.e. lists of key-value pairs).

{-# LANGUAGE OverloadedLists #-}

Similarly, we use Text instead of String, and OverloadedStrings lets us do that.

{-# LANGUAGE OverloadedStrings #-}

Even though Haskell is usually a lazy language, we're trying to be precise in a semantics, and relying on Haskell's laziness is a little too weird and implicit (and hard to do, besides). To prove that we're not playing any tricks with laziness, we use strict evaluation.

{-# LANGUAGE Strict #-}

We make this a proper module, because we want it to be easier to import these semantics.

module EvaluationSemantics where

None of the first group should be too spooky, just some additional data structures.

import Data.Map.Strict (Map)
import qualified Data.Map.Strict as Map
import Data.Sequence (Seq(..))
import qualified Data.Sequence as Seq
import qualified Data.Set as Set
import Data.Text (Text)
import qualified Data.Text as Text

We use the toList method from Data.Foldable to convert Seqs into lists when necessary.

import Data.Foldable (toList)

We also use the State monad, to describe evaluation statefully.

import Control.Monad.State.Strict

Translating the Grammar

Before we can look at the evaluation rules, we need to write down our grammar in Haskell syntax.

I tend to like to create newtypes for numbers and strings, to avoid type confusion and increase the odds of "if it compiles it works." When reading the type definitions, just note that any type that ends with Name is isomorphic to strings.

Note that there's no namespacing mechanism; all of these except for LocalNames should be a fully-qualified name, e.g. (MkGlobalName "core:append") rather than (MkGlobalName "append").

newtype AttrName = MkAttrName Text
  deriving (Eq, Ord, Read, Show)

newtype ForeignFunctionName = MkForeignFunctionName Text
  deriving (Eq, Ord, Read, Show)

newtype GlobalName = MkGlobalName Text
  deriving (Eq, Ord, Read, Show)

newtype LocalName = MkLocalName Text
  deriving (Eq, Ord, Read, Show)

newtype NTName = MkNTName Text
  deriving (Eq, Ord, Read, Show)

newtype ProdName = MkProdName Text
  deriving (Eq, Ord, Read, Show)

newtype RecordFieldName = MkRecordFieldName Text
  deriving (Eq, Ord, Read, Show)

Most of the nonterminals follow directly from the grammar.

data Literal
  = L'Int Integer
  | L'String Text
  deriving (Eq, Read, Show)

data PrimPattern
  = P'Var LocalName
  | P'Lit Literal
  | P'Any
  deriving (Read, Show)

data Pattern
  = P'Record (Map RecordFieldName PrimPattern)
  | P'TreeOrTerm ProdName (Seq PrimPattern)
  | P'Prim PrimPattern
  deriving (Read, Show)

data IsChildDecorable
  = ChildIsDecorable
  | ChildIsntDecorable
  deriving (Read, Show)

type Priority = Int

data TopLevelItem
  = T'GlobalDecl GlobalName Expr
  | T'ProdDecl ProdName NTName
  | T'DefaultProdBodyDecl NTName Priority LocalName Expr
  | T'ProdBodyDecl ProdName Priority LocalName Expr
  deriving (Read, Show)

We make two deviations from the grammar. The first is tiny; programs are represented as linear sequences of TopLevelItems instead of as sets. This is for bad reasons; Set requires that the element type is orderable, which we may not want to impose on our AST nodes. However, the order TopLevelItems are in shouldn't matter to anything.

We do retain combining (now with append instead of union) two programs as the means of composing grammars.

type Program = Seq TopLevelItem

Our larger deviation from the grammar is in using A-normal form. Since we're now paying attention to order of evaluation, having an AST that makes it explicit is handy.

A-normal form essentially works by splitting Expr into two halves:

The restriction is then enforced that function calls (and similar constructs) must have AExprs for arguments. This makes it much easier to ensure that side effects aren't accidentally being duplicated, reordered, or removed when performing program transformations, since it's always known that AExprs can be moved in this way. This is exactly why in the previous post we noted that foreign function calls are split into pure and impure versions.

data AExpr
  = A'Local LocalName
  | A'Global GlobalName
  | A'Lit Literal
  | A'Lam (Seq LocalName) Expr
  | A'PureForeign ForeignFunctionName (Seq AExpr)
  | A'MakeRecord (Map RecordFieldName AExpr)
  | A'GetRecordMember RecordFieldName AExpr
  | A'Cons ProdName (Seq (IsChildDecorable, AExpr))
  | A'GetChild Int AExpr
  | A'GetAttr AttrName AExpr
  | A'Undecorate AExpr
  deriving (Read, Show)

data Expr
  = E'A AExpr
  | E'Call AExpr (Seq AExpr)
  | E'Let LocalName Expr Expr
  | E'LetRec (Map LocalName Expr) Expr
  | E'Error AExpr
  | E'Thunk Expr
  | E'Force AExpr
  | E'Case AExpr (Seq (Pattern, Expr))
  | E'ImpureForeign ForeignFunctionName (Seq AExpr)
  | E'SetAttr AttrName AExpr AExpr Expr
  | E'CombineAttr AttrName AExpr AExpr AExpr Expr
  | E'Decorate AExpr AExpr
  deriving (Read, Show)

We now have a working copy of the grammar as Haskell data structures.

Evaluation

Since we're writing these semantics down in a pure language, we need some way to talk about effects, and more generally the sorts of things we're assuming exist in the environment. The actual definitions of all these primitives will come later.

We evaluate within a monad M, which provides a bunch of useful operations in the environment.

newtype M a = ...

instance Monad M where
  ...

These environment operations include providing lookups for both global variables and production bodies in a fixed global environment, as well as a mutable local environment:

lookupGlobal :: GlobalName -> M Value

lookupProdBodies :: ProdName -> M (Seq (Priority, LocalName, Expr))

newtype LocalEnv = ...
emptyLocalEnv :: LocalEnv
lookupLocal :: LocalName -> LocalEnv -> Value
updateLocals :: Seq (LocalName, Value) -> LocalEnv -> LocalEnv

getLocalEnv :: M LocalEnv
putLocalEnv :: LocalEnv -> M ()

Thunks and trees are both reference objects; that is to say, if one stores them (e.g. closes over them with a lambda), it's possible to observe mutations made elsewhere in the program. To model this, we create (very limited) pointers. The environment then contains heaps we can allocate from, and lets us read from and write to pointers.

Note that there's no notion of pointer arithmetic, nor of freeing; if an implementation doesn't have an explicit notion of pointers, but has mutable data (e.g. JavaScript), plain values are probably fine. Also note that when memory is allocated, it starts in an uninitialized state; reading from it in that state is undefined behavior, but it should be impossible to do so.

The actual definitions of both Thunk and Tree are below.

newtype Pointer a = ...
  deriving Show

allocThunk :: M (Pointer Thunk)
readThunk :: Pointer Thunk -> M Thunk
writeThunk :: Pointer Thunk -> Thunk -> M ()

allocTree :: M (Pointer Tree)
readTree :: Pointer Tree -> M Tree
writeTree :: Pointer Tree -> Tree -> M ()

We also have foreign function calls:

callForeignFunction :: ForeignFunctionName -> Seq Value -> M Value

Our abstract machine has a notion of the current context, aka the continuation stack, aka the current continuation. This will be explained in detail before it's relevant, though.

Finally, we add a helper alias for error, for notating undefined behavior. The idea of these reference semantics are, any exception is undefined behavior, and ones that are the result of calls to ub are especially so.

ub :: String -> a
ub msg = error ("Undefined behavior: " <> msg)

Values

In this section, we'll look at the Value, Closure, Thunk, and Tree types.

data Value

We store closures as a separate type for boring reasons; basically, we want to be able to write a custom Show instance for them, since LocalEnvs aren't Show.

  = V'Closure Closure

Values representable in source code can be stored as themselves.

  | V'Lit Literal

As mentioned above, thunks are reference objects, so they're represented in Value simply as their address.

  | V'Thunk (Pointer Thunk)

Records can simply be stored as a map.

  | V'Record (Map RecordFieldName Value)

Terms are plain data too. This is a change from earlier drafts (drafts before this one weren't numbered), where terms were a special case of trees.

The overhead of storing IsChildDecorable flags (even though they should be statically known) is something I'm considering changing. It'd be fairly easy, and arguably a bit cleaner, to instead declare these as part of T'ProdDecl, which would ensure both the number of values that appear as arguments and their decorability is fixed in the global environment.

  | V'Term ProdName (Seq (IsChildDecorable, Value))

Trees are again reference objects.

  | V'Tree (Pointer Tree)

We derive Show on all these types for convenience.

  deriving Show

Closures store the environment they're closing over, as well as an argument list and a body expression. The complexity of the Show instance can be ignored; this is essentially the instance that deriving Show would generate, except with an _ instead of the LocalEnv value.

data Closure = MkClosure LocalEnv (Seq LocalName) Expr

instance Show Closure where
  showsPrec d (MkClosure _ vars body) = showParen (d > 10) $
    showString "MkClosure _ " . showsPrec 11 vars
             . showString " " . showsPrec 11 body

Thunks are defined almost exactly as one would expect, with a case for forced thunks and one for unforced thunks, the latter closing over the environment. Again we use a custom instance to ignore the LocalEnv.

data Thunk
  = Thunk'Forced Value
  | Thunk'Unforced LocalEnv Expr

instance Show Thunk where
  showsPrec d (Thunk'Forced value) = showParen (d > 10) $
    showString "Thunk'Forced " . showsPrec 11 value
  showsPrec d (Thunk'Unforced _ expr) = showParen (d > 10) $
    showString "Thunk'Unforced _ " . showsPrec 11 expr

Trees are also fairly straightforward; this is another case, though, where carrying the IsChildDecorable flags around makes everything uglier...

data Tree = MkTree
  { tree'prod :: ProdName
  , tree'children :: Seq (IsChildDecorable, Value)
  , tree'attrs :: Map AttrName Value
  } deriving Show

We also define some helper functions that expect a value to be a certain type, and trigger undefined behavior if it is not. These are used in places where type errors are undefined behavior (essentially everywhere a certain type is required).

asClosure :: Value -> Closure
asClosure (V'Closure fields) = fields
asClosure value = ub ("Type error: expected a closure, found " <> show value)

asRecord :: Value -> Map RecordFieldName Value
asRecord (V'Record fields) = fields
asRecord value = ub ("Type error: expected a record, found " <> show value)

asString :: Value -> Text
asString (V'Lit (L'String str)) = str
asString value = ub ("Type error: expected a string, found " <> show value)

asTerm :: Value -> (ProdName, Seq (IsChildDecorable, Value))
asTerm (V'Term prodName children) = (prodName, children)
asTerm value = ub ("Type error: expected a term, found " <> show value)

asThunk :: Value -> Pointer Thunk
asThunk (V'Thunk ptr) = ptr
asThunk value = ub ("Type error: expected a thunk, found " <> show value)

asTree :: Value -> Pointer Tree
asTree (V'Tree ptr) = ptr
asTree value = ub ("Type error: expected a tree, found " <> show value)

AExprs

How all the above pieces fit together should hopefully become a bit clearer with some examples of their use. Because they're simpler, we'll first look at evaluating AExprs. Recall from above, AExprs are defined by evaluating to a value without side effects or nontermination, and can't manipulate the current continuation.

We could therefore implement evalAExpr without the full M monad, and instead by passing in the local environment, global environment, and the structure that underlies readTree. However, this would complicate the signature significantly, and expose some implementation details that are best explained later.

evalAExpr :: AExpr -> M Value

Local variables accesses are a nice, simple case to get us started. We simply get the current local environment, and perform a lookup there. It's undefined behavior for the binding to be missing, so we don't need to handle that case (it will manifest as an exception).

evalAExpr (A'Local name) = do
  env <- getLocalEnv
  pure (lookupLocal name env)

Global variables are, if anything, simpler.

evalAExpr (A'Global name) = lookupGlobal name

Literals are very simple as well.

evalAExpr (A'Lit lit) = pure (V'Lit lit)

Lambdas are slightly more complicated, since they close over the local environment.

We also include a check that variables don't have duplicated names, by storing variables in a set and checking the cardinality. This is UB to write down, not just to execute (since it can mess with an optimizer), so really this check "ought to be" earlier in the process.

evalAExpr (A'Lam vars body) = do
  env <- getLocalEnv

  let varsSet = Set.fromList (toList vars)
  unless (length varsSet == length vars) $
    ub "Duplicated variables in lambda"

  let clo = MkClosure env vars body
  pure (V'Closure clo)

Foreign functions, like globals, kinda get a cop-out for now, where the complexity is hidden in M. Possibly of note is the use of traverse to evaluate each subexpression. (If you're not familiar, it basically acts as an effectful map.) Since we're using it on a Seq of arguments, it will evaluate them left-to-right, but this isn't actually necessary since they're AExprs. A real implementation can choose to evaluate them in any order it pleases.

evalAExpr (A'PureForeign func args) = do
  argValues <- traverse evalAExpr args
  callForeignFunction func argValues

Constructing records is pretty simple. It again leans on traverse, but this time on Maps. By the magic of typeclasses, this does exactly the right thing, but now the fact that we can run evalAExpr across the subexpressions in an arbitrary order is more important, since Map doesn't necessarily preserve ordering from the source language.

evalAExpr (A'MakeRecord fields) = do
  fieldValues <- traverse evalAExpr fields
  pure (V'Record fieldValues)

We now start to use our type-testing helpers from above, this time to check that a value is a record. Also note that missing fields are UB; records are intended to be monomorphic, initially for typeclass dictionaries. It might make sense to require declaring their types globally as well?

evalAExpr (A'GetRecordMember fieldName expr) = do
  value <- evalAExpr expr
  let record = asRecord value
  case Map.lookup fieldName record of
    Just fieldValue -> pure fieldValue
    Nothing -> ub ("Missing record field " <> show fieldName <> " in " <>
                   show value)

Constructing terms is pretty similar to constructing records. This time though, there's a bit of extra "code smell," where two traverse calls are needed. This first traverses across the Seq of children, the second targets the Expr in the pair.

evalAExpr (A'Cons prodName children) = do
  childValues <- traverse (traverse evalAExpr) children
  pure (V'Term prodName childValues)

To get children off a tree or term, we first get the Seq of children, then index into it. It's a bit longer than any case we've looked at so far, but shouldn't have anything too shocking in it.

evalAExpr (A'GetChild n expr) = do
  value <- evalAExpr expr

  children <- case value of
    V'Term _ children -> pure children
    V'Tree ptr -> do
      tree <- readTree ptr
      pure (tree'children tree)
    _ -> ub ("Type error: expected a term or tree, found " <> show value)

  case Seq.lookup n children of
    Just (_, child) -> pure child
    Nothing -> ub ("No child #" <> show n <> " of " <> show value)

We're getting close to the finish now; pulling attributes off a tree is pretty simple too. It looks approximately like pulling values off a record, except with an extra dereference.

evalAExpr (A'GetAttr attrName expr) = do
  value <- evalAExpr expr
  let ptr = asTree value
  tree <- readTree ptr
  let attrs = tree'attrs tree
  case Map.lookup attrName attrs of
    Just attrValue -> pure attrValue
    Nothing -> ub ("Missing attribute " <> show attrName <> " on " <>
                   show tree <> " at " <> show ptr)

Our last AExprs uses a helper function. I expect this would either be present in the runtime, and operate generically on trees of any production, or a monomorphized function could be generated for each production in a vtable-like thing.

evalAExpr (A'Undecorate expr) = do
  value <- evalAExpr expr
  undecorate value

To undecorate a tree, we go through each of its children, and undecorate each of them.

undecorate :: Value -> M Value
undecorate value = do
  let ptr = asTree value
  tree <- readTree ptr
  let prodName = tree'prod tree
  children <- traverse undecorateChild (tree'children tree)
  pure (V'Term prodName children)

To undecorate a child, we check whether it could have been decorated in the first place. If so, we perform the decoration process recursively. If not, we return the same value.

undecorateChild :: (IsChildDecorable, Value) -> M (IsChildDecorable, Value)
undecorateChild (ChildIsDecorable, value) = do
  newChild <- undecorate value
  pure (ChildIsDecorable, newChild)
undecorateChild (ChildIsntDecorable, value) = pure (ChildIsntDecorable, value)

Hopefully seeing all this written out gave a bit of a taste for how the primitives are used with each other. Now would be an excellent time to stand up and get a cup of tea.

The Current Context

In keeping with making control flow as explicit as possible, we keep an explicit representation of continuations. They are stored in a stack, which one could imagine as mimicking the machine stack. Explicitly modelling the stack this way lets us also see what operations can plausibly be made to perform tail-call optimization.

We assume (and later define) operations in the environment for adding entries to the stack, and for popping the top entry and applying a value to it.

continue :: Value -> M Value
pushCont :: Continuation -> M ()

Our continuation type is shaped by the needs of the Expr type.

data Continuation

Our first necessary entry is for E'Let. After the expression being bound is evaluated, evaluation needs to continue to the body expression.

If you squint, you'll note that this looks equivalent to Closure, but with a single variable rather than a Seq of them. This makes total sense, if you think about let x = a in b as being equivalent to (\x -> b) a.

  = C'Let LocalEnv LocalName Expr

We also need a continuation for writing back to thunks. We'll see below how this fits into lazy evaluation.

  | C'WritebackToThunk (Pointer Thunk)

We also need a continuation that sets an attribute on a tree, after which evaluation continues to another expression. This is currently only used in the semantics of E'CombineAttr; we'll see how it's used in more detail when we get to E'CombineAttr.

  | C'SetAttr AttrName (Pointer Tree) LocalEnv Expr

Lastly, we need a continuation at tree-construction time to help us conduct the tree-traversal that occurs at the time. This will be made more clear once we actually discuss what happens at tree-construction time.

  | C'InitializeChildren (Pointer Tree) Int

Exprs

Now that we have our full toolbox, we can start looking at evaluating Exprs. We first define an apply helper, to apply functions to arguments.

This should look fairly normal to those familiar with eval/apply-style interpreters; we ensure that the function is a closure with the appropriate number of arguments, construct an environment with the closed-over bindings, plus additional bindings for the arguments, then evaluate the body. Note that this is tail-recursive, both in the semantics, and more importantly in that it doesn't touch the continuation stack.

apply :: Value -> Seq Value -> M Value
apply func args = do
  let MkClosure env argNames body = asClosure func

  unless (length argNames == length args) $
    ub ("Bad arity in function call: calling " <> show func <> " with args " <>
        show args)

  let argPairs = Seq.zip argNames args
  let env' = updateLocals argPairs env

  putLocalEnv env'
  evalExpr body

Once we have this helper, we can define actual evaluation of Exprs.

evalExpr :: Expr -> M Value

First off, in the case that an expression is simply an embedded AExpr, we can immediately evaluate this to a value and apply it to the current continuation.

evalExpr (E'A expr) = do
  value <- evalAExpr expr
  continue value

Next, we can look at function calls. Since these have the function and arguments as AExprs, we can immediately evaluate them without caring about order. Then, we can simply call out to our apply helper.

evalExpr (E'Call func args) = do
  funcValue <- evalAExpr func
  argValues <- traverse evalAExpr args
  apply funcValue argValues

Our next expression to evaluate is E'Let. To do this, we first create the continuation (as described above) that will cause control flow to continue to the body of the let after the bound expression finishes evaluating.

We then push the continuation and evaluate the bound expression, safe in the knowledge that control flow will work out.

evalExpr (E'Let name bound body) = do
  env <- getLocalEnv
  let cont = C'Let env name body
  pushCont cont
  evalExpr bound

Letrecs usually require a bit of trickery to implement, and it's no different here. Since we're embracing laziness elsewhere, we can use it to make our lives easy here. Any values that are letrecced with each other are automatically put behind thunks.

This is where it comes in handy that thunks start uninitialized; we start by allocating the required number of thunks without needing to provide values. (The double traverse here serves the same purpose as above; the outer traverse is traversing through the list of bindings, the inner one is selecting the right-hand side of the tuple.)

We then construct an environment with bindings for the thunks created, and use that to initialize each of the thunks.

Finally, we set that as the current environment, and call the body expression. Note that unlike E'Let, this doesn't need to push to the continuation stack.

evalExpr (E'LetRec bindings body) = do
  env <- getLocalEnv
  let addThunkToBinding (name, expr) = do
        thunkPtr <- allocThunk
        pure (name, expr, thunkPtr)
  let bindings' = Seq.fromList (Map.toList bindings)
  bindingsWithThunks <- traverse addThunkToBinding bindings'

  let getNameAndValue (name, _, thunkPtr) = (name, V'Thunk thunkPtr)
  let env' = updateLocals (fmap getNameAndValue bindingsWithThunks) env

  forM_ bindingsWithThunks $ \(_, expr, thunkPtr) -> do
    let thunk = Thunk'Unforced env' expr
    writeThunk thunkPtr thunk

  putLocalEnv env'
  evalExpr body

Errors are fairly simple, we just evaluate the argument, then raise an exception in the interpreter. If there were some "catch" mechanism, they'd have to unwind the continuation stack. But there isn't, so they don't.

evalExpr (E'Error msg) = do
  msgValue <- evalAExpr msg
  let msgStr = asString msgValue
  error (Text.unpack msgStr)

Thunking a single expression is also pretty easy; we've already seen the more complicated case above, in E'LetRec.

Perhaps notable is that this is the first Expr other than E'A we've seen that calls continue. Recall the discussion above about whether E'Thunk should be an AExpr or not; this similarity is essentially why it's a bit of a tricky choice.

evalExpr (E'Thunk body) = do
  env <- getLocalEnv
  thunkPtr <- allocThunk
  let thunk = Thunk'Unforced env body
  writeThunk thunkPtr thunk
  continue (V'Thunk thunkPtr)

Forcing a thunk is a bit interesting. We firstly evaluate the argument, check that it's a pointer to a thunk, and dereference it, all as we might expect.

In the case where the thunk is already forced, we can simply call the continuation with the value.

In the case where the thunk isn't yet forced, we of course have to go off and evaluate it. Before we do, we push a C'WritebackToThunk continuation to the stack, with the pointer to writeback to in it. When the expression contained within the thunk finishes evaluating, it'll get passed to this continuation, which will write back a Thunk'Forced to the pointer, then pass the value to the next continuation.

evalExpr (E'Force thunkExpr) = do
  thunkValue <- evalAExpr thunkExpr
  let thunkPtr = asThunk thunkValue
  thunk <- readThunk thunkPtr
  case thunk of
    Thunk'Forced value -> continue value
    Thunk'Unforced env expr -> do
      pushCont (C'WritebackToThunk thunkPtr)
      putLocalEnv env
      evalExpr expr

To make pattern-matching easier to define, we'll rely on a helper, which we'll define later:

patternMatch :: Pattern -> Value -> LocalEnv -> Maybe LocalEnv

This attempts to match a value against a pattern, returning Just a LocalEnv with any additional bindings the pattern introduced present if the pattern matched, and Nothing if the pattern failed to match.

We evaluate the scrutinee, then check patterns against it one-by-one. If the pattern matches, we use the new bindings, and start evaluating the body expression. Otherwise, we continue to the next pattern.

It's undefined behavior to have a match with no matching cases, since including a wildcard case is cheap and easy.

evalExpr (E'Case scrutinee pats) = do
  scrutineeValue <- evalAExpr scrutinee
  env <- getLocalEnv
  let matchLoop Empty =
        ub ("All patterns (among " <> show pats <>
            ") failed to match against value " <> show scrutineeValue)
      matchLoop ((hPat, hBody) :<| t) = do
        matchEnv <- patternMatch hPat scrutineeValue env
        case matchEnv of
          Just env' -> do
            putLocalEnv env'
            evalExpr hBody
          Nothing -> matchLoop t
  matchLoop pats

Impure foreign functions are naturally almost identical to pure foreign functions, the main difference being that they need to continue with their value rather than simply returning it, since they aren't AExprs (by definition).

evalExpr (E'ImpureForeign func args) = do
  argValues <- traverse evalAExpr args
  value <- callForeignFunction func argValues
  continue value

Now we're finally at the tree-construction parts of the AST. First up is E'SetAttr, which has pretty simple semantics. Understanding it will be important to understanding E'CombineAttr, though, so close attention is warranted.

First, we get the value we're planning to set for the attribute, and the tree itself. Since E'SetAttr is defined to overwrite any previous value, we don't need to check for that. Instead, we can more-or-less blindly overwrite the old value, if there was one. We then write back the tree with the modified attribute, and resume normal evaluation at the andThen expression.

evalExpr (E'SetAttr attrName treeExpr attrExpr andThen) = do
  attrValue <- evalAExpr attrExpr

  treeValue <- evalAExpr treeExpr
  let treePtr = asTree treeValue
  tree <- readTree treePtr
  let attrs = tree'attrs tree

  let attrs' = Map.insert attrName attrValue attrs
  let tree' = tree { tree'attrs = attrs' }

  writeTree treePtr tree'

  evalExpr andThen

As promised, E'CombineAttr is similar to, but more complex than, E'SetAttr. It starts the same, but checks whether the attribute already existed. If it didn't, it proceeds identically to E'SetAttr.

If it did, it uses the function combineExpr evaluates to to produce a new value. This is done by pushing a C'SetAttr continuation to the stack. When this continuation is provided with a value, it will set the attribute to that value, then start evaluating the andThen expression.

The function combineExpr evaluates to is then called, in an approximately identical manner to the way E'Call does.

evalExpr (E'CombineAttr attrName treeExpr attrExpr combineExpr andThen) = do
  attrValue <- evalAExpr attrExpr

  treeValue <- evalAExpr treeExpr
  let treePtr = asTree treeValue
  tree <- readTree treePtr
  let attrs = tree'attrs tree

  case Map.lookup attrName attrs of
    Just oldAttrValue -> do
      env <- getLocalEnv
      pushCont (C'SetAttr attrName treePtr env andThen)

      combineValue <- evalAExpr combineExpr
      let MkClosure cloEnv argNames body = asClosure combineValue
      unless (length argNames == 2) $
        ub ("Bad arity in function call: calling " <> show combineExpr <>
            " when combining attribute values " <> show oldAttrValue <>
            " and " <> show attrValue <> " for attribute " <> show attrName <>
            " on tree " <> show treePtr)

      let argPairs = Seq.zip argNames [oldAttrValue, attrValue]
      let cloEnv' = updateLocals argPairs cloEnv

      putLocalEnv cloEnv'
      evalExpr body
    Nothing -> do
      let attrs' = Map.insert attrName attrValue attrs
      let tree' = tree { tree'attrs = attrs' }
      writeTree treePtr tree'
      evalExpr andThen

Finally, we get to the most complicated Expr, E'Decorate.

After evaluating the expression corresponding to the term we want to decorate, we allocate a tree corresponding to the term. The work of doing this is done in a helper function that we'll examine shortly.

We swap out the (empty) set of attributes on the freshly-allocated tree with the set of attributes provided to the E'Decorate expression.

Finally, we start performing the initialization process with another helper, which we'll examine soon too.

evalExpr (E'Decorate termExpr inhSetter) = do
  termValue <- evalAExpr termExpr
  inhSetterValue <- evalAExpr inhSetter

  treePtr <- allocateTreeForTerm termValue
  tree <- apply inhSetterValue [V'Tree treePtr]
  let treePtr' = asTree tree
  initializeTree treePtr'

Our helper to allocate trees for terms is pretty straightforward; for each decorable child, recurse; then allocate a tree with an empty map of attributes and return its address.

allocateTreeForTerm :: Value -> M (Pointer Tree)
allocateTreeForTerm value = do
  let helper (ChildIsDecorable, child) = do
        childTreePtr <- allocateTreeForTerm child
        pure (ChildIsDecorable, V'Tree childTreePtr)
      helper (ChildIsntDecorable, child) = pure (ChildIsntDecorable, child)

  let (prodName, children) = asTerm value
  childTrees <- traverse helper children

  let tree = MkTree { tree'prod = prodName, tree'children = childTrees
                    , tree'attrs = Map.empty }
  treePtr <- allocTree
  writeTree treePtr tree

  pure treePtr

This is the helper that runs production bodies at tree-construction time. This is actually mostly "wired up" with continuations; we push the relevant ones to the continuation stack, and call it with the uninitialized tree. We'll examine what all the continuations do in detail next, but for now the process is approximately:

Note that we're able to push C'Let continuations to arrange for function calls to occur. The (let x = y in z) = ((\x. z) y) equality comes in handy again!

initializeTree :: Pointer Tree -> M Value
initializeTree treePtr = do
  tree <- readTree treePtr
  prodBodiesUnsorted <- lookupProdBodies (tree'prod tree)
  let prodBodies = Seq.sortOn (\(priority, _, _) -> -priority) prodBodiesUnsorted
  let (nonNegProdBodies, negProdBodies) =
        Seq.spanl (\(priority, _, _) -> priority >= 0) prodBodies

  forM_ nonNegProdBodies $ \(_, name, body) ->
    pushCont (C'Let emptyLocalEnv name body)
  pushCont (C'InitializeChildren treePtr 0)
  forM_ negProdBodies $ \(_, name, body) ->
    pushCont (C'Let emptyLocalEnv name body)
  continue (V'Tree treePtr)

Continuations, in more depth

Continuations are run with another helper:

continue' :: Value -> Continuation -> M Value

The C'Let continuation is the simplest one. It restores a saved environment, adds a binding for the value passed to the continuation, and then begins evaluating an expression. It's used as noted above, to both handle E'Let expressions, and to schedule production bodies to run.

continue' value (C'Let env name body) = do
  putLocalEnv (updateLocals [(name, value)] env)
  evalExpr body

The C'WritebackToThunk continuation simply writes back the value to the given thunk, then contines with the same value.

continue' value (C'WritebackToThunk thunkPtr) = do
  writeThunk thunkPtr (Thunk'Forced value)
  continue value

The C'SetAttr continuation takes a value, uses it to set an attribute on a tree, then continues evaluation at a specified Expr.

continue' value (C'SetAttr attrName treePtr env andThen) = do
  tree <- readTree treePtr
  let attrs = tree'attrs tree
  let attrs' = Map.insert attrName value attrs
  let tree' = tree { tree'attrs = attrs' }
  writeTree treePtr tree'

  putLocalEnv env
  evalExpr andThen

The C'InitializeChildren continuation is what allows the initialization portion of tree-construction to recurse down the tree. If the index it contains is an in-bounds chuild number for the tree, it puts itself back on the stack and recurses the tree initialization process down to the child. Once it gets out of bounds, it continues initialization with the productions with non-negative priority.

continue' value (C'InitializeChildren treePtr n) = do
  tree <- readTree treePtr
  case Seq.lookup n (tree'children tree) of
    Just (ChildIsDecorable, child) -> do
      pushCont (C'InitializeChildren treePtr (n + 1))
      initializeTree (asTree child)
    Just (ChildIsntDecorable, _) -> continue' value (C'InitializeChildren treePtr (n + 1))
    Nothing -> continue (V'Tree treePtr)

Patterns

Pattern matching is defined with a patternMatch function that, which takes a pattern, a value it's matching against, and an environment to elaborate. It requires the evaluation context in order to traverse the pointer necessary to match against a tree. (Pattern matching against trees might be removed in the future, especially if backends make A'Undecorate cheaper than a tree traversal.)

For ease of reading, the check for this is not implemented here, but it's undefined behavior to have a single pattern bind the same variable in multiple places.

patternMatch :: Pattern -> Value -> LocalEnv -> M (Maybe LocalEnv)

The first and most complicated pattern is for records. (Thankfully, that's still not too complicated!)

We iterate through each pattern, accumulating the match from each one. I think this'd be a one-liner with Sufficiently Aggressive Use of Monads...

patternMatch (P'Record pats) (V'Record vals) env =
  let helper [] env' = pure env'
      helper ((field, pat):t) env' =
        case Map.lookup field vals of
          Just val -> case patternMatchPrim pat val env' of
                        Just env'' -> helper t env''
                        Nothing -> Nothing
          Nothing -> Nothing
   in pure (helper (Map.toList pats) env)

When a term is matched against a tree-or-term pattern, it does the reasonable thing; it checks that the production names match, and if they do, it checks the fields.

patternMatch (P'TreeOrTerm patProdName pats) (V'Term prodName children) env =
  if patProdName == prodName
    then pure (patternMatchPrimAll (Seq.zip pats (fmap snd children)) env)
    else pure Nothing

When matching a tree, it does the same thing after dereferencing the tree pointer.

patternMatch (P'TreeOrTerm patProdName pats) (V'Tree treePtr) env = do
  tree <- readTree treePtr
  let children = fmap snd (tree'children tree)
  if patProdName == tree'prod tree
    then pure (patternMatchPrimAll (Seq.zip pats children) env)
    else pure Nothing

When a primitive pattern is used, it is matched against.

patternMatch (P'Prim pat) value env = pure (patternMatchPrim pat value env)

If no pattern matched, we fail.

patternMatch _ _ _ = pure Nothing

This is the function used for matching against primitive patterns. These are patterns that can't recur, essentially. This function behaves the same as patternMatch, but doesn't run under M since it doesn't need to inspect trees.

patternMatchPrim :: PrimPattern -> Value -> LocalEnv -> Maybe LocalEnv

P'Var will bind to any value, and will create a new binding in the returned environment.

patternMatchPrim (P'Var name) value env = Just (updateLocals [(name, value)] env)

P'Lit checks the literal for equality against the value, and continues as expected if it is equal.

patternMatchPrim (P'Lit patLit) (V'Lit lit) env =
  if patLit == lit
    then Just env
    else Nothing

If the wildcard is used, we always succeed.

patternMatchPrim P'Any _ env = Just env

If no pattern matched, we fail.

patternMatchPrim _ _ _ = Nothing

Lastly, we have a helper that tries to match a sequence of primitive patterns. It's pretty much as one might expect.

patternMatchPrimAll :: Seq (PrimPattern, Value) -> LocalEnv -> Maybe LocalEnv
patternMatchPrimAll [] env = Just env
patternMatchPrimAll ((pat, value) :<| tl) env =
  case patternMatchPrim pat value env of
    Just env' -> patternMatchPrimAll tl env'
    Nothing -> Nothing

TopLevelItems

The last piece for the semantics themselves is the handling of top-level items. Frankly, so much of the work is done by the primitive functions that this section isn't really useful, but... here it is. These functions all just introduce bindings in the environment, with some amount of error checking.

defineGlobal :: GlobalName -> Value -> M ()
declareProd :: ProdName -> NTName -> M ()
defineProdBody :: ProdName -> Priority -> LocalName -> Expr -> M ()
defineDefaultProdBody :: NTName -> Priority -> LocalName -> Expr -> M ()

Our defineTopLevelItem function takes a TopLevelItem and dispatches to the appropriate one of these helpers.

defineTopLevelItem :: TopLevelItem -> M ()

The only nontrivial case is for global declarations; it simply creates a thunk for the value, then inserts it into the environment. (Globals can't be immediately evaluated, since there's no obvious evaluation order.)

defineTopLevelItem (T'GlobalDecl name expr) = do
  thunkPtr <- allocThunk
  let thunk = Thunk'Unforced emptyLocalEnv expr
  writeThunk thunkPtr thunk
  defineGlobal name (V'Thunk thunkPtr)

These all are trivial.

defineTopLevelItem (T'ProdDecl prod nt) = declareProd prod nt

defineTopLevelItem (T'DefaultProdBodyDecl nt priority top body) =
  defineDefaultProdBody nt priority top body

defineTopLevelItem (T'ProdBodyDecl prod priority top body) =
  defineProdBody prod priority top body

Foreign Functions

Foreign functions are definitely backend-specific, but we define some in our semantics that we use in our example:

callForeignFunction :: ForeignFunctionName -> Seq Value -> M Value
callForeignFunction (MkForeignFunctionName "strcat") [V'Lit (L'String a), V'Lit (L'String b)] =
  pure (V'Lit (L'String (a <> b)))
callForeignFunction (MkForeignFunctionName "streq") [V'Lit (L'String a), V'Lit (L'String b)] =
  if a == b
    then pure (V'Term (MkProdName "true") [])
    else pure (V'Term (MkProdName "false") [])
callForeignFunction func args = ub ("Unknown foreign function, type error, or bad arity: " <>
                                    show func <> " with args " <> show args)

M and CESK

We'll now go over our monad, and the actual "bedrock" of the interpreter. If you don't feel like you need additional explanation for this, feel free to skip to the next heading.

Our semantics are expressed as a CESK machine. I'll quickly introduce it here, but it was introduced in A Calculus for Assignments in Higher-Order Languages, and has a much more formal description there. Matt Might also has a useful blogpost, Writing an interpreter, CESK-style.

The CESK machine can be thought of as being reached via elaboration from a simpler semantics. If you have a simple call-by-value lambda calculus, whose semantics are written in terms of substitution, you kind of have a "C machine." If instead of using substitutions to define beta-reduction, you add an environment to the state (to avoid needing to define capture-avoiding substitution, perhaps), you get a "CE machine". Describing non-trivial control flow is made much easier by having an explicit representation of the evaluation context as a continuation, which gets you to the more well-known CEK machine. Finally, adding mutable state via a store to support reference objects, we get the full CESK.

Looking at the above, we see that our semantics has the four components of CESK:

As noted, the Control is implicitly represented by evalExpr calls; the other three components are stored in an InterpState record.

interpState'localEnv, interpState'globals, interpState'prods, interpState'prodBodies, and interpState'defaultProdBodies make up the environment; the latter four correspond directly to TopLevelItems defined by defineTopLevelItem.

The store is represented by a heap for thunks and a heap for trees, in interpState'thunks and interpState'trees. (This is heap in the colloquial, "malloc allocates heap memory" sense.)

Lastly, the continuation is represented by the stack interpState'conts.

data InterpState = MkInterpState
  { interpState'localEnv :: LocalEnv
  , interpState'globals :: Map GlobalName Value
  , interpState'prods :: Map ProdName NTName
  , interpState'prodBodies :: Map ProdName (Seq (Priority, LocalName, Expr))
  , interpState'defaultProdBodies :: Map NTName (Seq (Priority, LocalName, Expr))
  , interpState'thunks :: Heap Thunk
  , interpState'trees :: Heap Tree
  , interpState'conts :: [Continuation]
  }

Our monad is then simply the state monad, carrying this state around.

newtype M a = MkM (State InterpState a)

unM :: M a -> State InterpState a
unM (MkM x) = x

Its instances for the typeclasses are boilerplatey (and with {-# LANGUAGE GeneralizedNewtypeDeriving #-}, can be completely autogenerated).

instance Functor M where
  fmap f = MkM . fmap f . unM
instance Applicative M where
  pure = MkM . pure
  (<*>) (MkM f) (MkM x) = MkM (f <*> x)
instance Monad M where
  (>>=) (MkM x) f = MkM (unM . f =<< x)

And, we can define a function to run a computation in M, that provides a default state.

runM :: M a -> a
runM (MkM x) = evalState x (MkInterpState
  { interpState'localEnv = MkLocalEnv Map.empty
  , interpState'globals = Map.empty
  , interpState'prods = Map.empty
  , interpState'prodBodies = Map.empty
  , interpState'defaultProdBodies = Map.empty
  , interpState'thunks = MkHeap Empty
  , interpState'trees = MkHeap Empty
  , interpState'conts = []
  })

We can now show the actual definitions of all the helpers that were simply assumed above.

First up is looking up and defining globals in the environment. The MkM constructor can be used to wrap an action in the state monad; this is done with gets and modify, which respectively result in the types:

gets :: (InterpState -> a) -> State InterpState a
modify :: (InterpState -> InterpState) -> State InterpState ()

Using these, the global variable bindings are pretty simple; a map lookup to look them up, and a map insertion to add one, after checking that it's not already defined.

lookupGlobal :: GlobalName -> M Value
lookupGlobal name = do
  globalEnv <- MkM (gets interpState'globals)
  case Map.lookup name globalEnv of
    Just value -> pure value
    Nothing -> ub ("Access to undefined global " <> show name)

defineGlobal :: GlobalName -> Value -> M ()
defineGlobal name value = do
  globalEnv <- MkM (gets interpState'globals)
  when (Map.member name globalEnv) $
    ub ("Global " <> show name <> " already defined")
  MkM (modify (\st -> st { interpState'globals = Map.insert name value globalEnv }))

lookupProdBodies is a bit more complicated; it finds all the production bodies not only for the given production, but also the default production bodies for its nonterminal. It first looks up the nonterminal for the production (giving it an opportunity to crash if the production wasn't declared). Then, it looks up both sets of production bodies, and returns their concatenation.

lookupProdBodies :: ProdName -> M (Seq (Priority, LocalName, Expr))
lookupProdBodies prod = do
  prods <- MkM (gets interpState'prods)
  case Map.lookup prod prods of
    Just nt -> do
      allProdBodies <- MkM (gets interpState'prodBodies)
      allDefaultProdBodies <- MkM (gets interpState'defaultProdBodies)
      let orNil x = case x of
                      Just value -> value
                      Nothing -> []
      let prodBodies = orNil (Map.lookup prod allProdBodies)
      let defaultProdBodies = orNil (Map.lookup nt allDefaultProdBodies)
      pure (prodBodies <> defaultProdBodies)
    Nothing -> ub ("Production " <> show prod <> " is not declared")

Productions are declared simply by putting the name of the nonterminal they belong to into the environment. If more metadata is kept about them in the future, this would be stored here too.

declareProd :: ProdName -> NTName -> M ()
declareProd prod nt = do
  prods <- MkM (gets interpState'prods)
  when (Map.member prod prods) $
    ub ("Production " <> show prod <> " is already declared")
  MkM (modify (\st -> st { interpState'prods = Map.insert prod nt prods }))

Production bodies (including default production bodies) are also fairly simple insertions into the environment. When defining production bodies, there's a check that the production has been declared, but this might go away in the future; it's not really in keeping with "a program is an unordered set of TopLevelItems" (though, it's still well-defined if one imagines all T'ProdDecls are processed first).

defineProdBody :: ProdName -> Priority -> LocalName -> Expr -> M ()
defineProdBody prod priority top body = do
  prods <- MkM (gets interpState'prods)
  unless (Map.member prod prods) $
    ub ("Production " <> show prod <> " is not declared")
  let addBody = Map.insertWith (<>) prod [(priority, top, body)]
  MkM (modify (\st -> st { interpState'prodBodies = addBody (interpState'prodBodies st) }))

defineDefaultProdBody :: NTName -> Priority -> LocalName -> Expr -> M ()
defineDefaultProdBody nt priority top body = do
  let addBody = Map.insertWith (<>) nt [(priority, top, body)]
  MkM (modify (\st -> st
    { interpState'defaultProdBodies = addBody (interpState'defaultProdBodies st) }))

We define a newtype for the local environment, since we're passing them around (e.g. in closures). It is a trivial wrapper around a map, though, with only a few operations exposed.

newtype LocalEnv = MkLocalEnv (Map LocalName Value)

emptyLocalEnv :: LocalEnv
emptyLocalEnv = MkLocalEnv Map.empty

lookupLocal :: LocalName -> LocalEnv -> Value
lookupLocal name (MkLocalEnv env) =
  case Map.lookup name env of
    Just value -> value
    Nothing -> ub ("Undefined local " <> show name)

updateLocals :: Seq (LocalName, Value) -> LocalEnv -> LocalEnv
updateLocals bindings (MkLocalEnv env) =
  MkLocalEnv (Map.union (Map.fromList (toList bindings)) env)

getLocalEnv and putLocalEnv just take references to and set the LocalEnv in the state, and with them defined we've defined the whole interface to the environment.

getLocalEnv :: M LocalEnv
getLocalEnv = MkM (gets interpState'localEnv)

putLocalEnv :: LocalEnv -> M ()
putLocalEnv env = MkM (modify (\st -> st { interpState'localEnv = env }))

Heaps are simply sequences of possibly-initialized cells. Since this is intended to be an executable semantics, not a "real" interpreter, garbage collection isn't performed. In a real implementation, the heap would probably simply be mutable memory allocated from a garbage collector.

We allocate space in the heap simply by appending an extra entry to the heap, in the Nothing (uninitialized) state, returning its index. We can then read and write to indices passed via "pointers". It's checked that memory is initialized before reading it, thought it shouldn't be possible for a program to encounter uninitialized memory, except as a bug in the implementation.

newtype Heap a = MkHeap (Seq (Maybe a))

newtype Pointer a = MkPointer Int
  deriving (Eq, Show)

heapAlloc :: Heap a -> (Pointer a, Heap a)
heapAlloc (MkHeap vals) = (MkPointer (length vals), MkHeap (vals :|> Nothing))

heapRead :: Heap a -> Pointer a -> a
heapRead (MkHeap vals) ptr =
  let MkPointer n = ptr in
  case Seq.lookup n vals of
    Just (Just x) -> x
    Just Nothing -> ub ("Access to uninitialized value at " <> show ptr)
    Nothing -> ub ("Out of bounds read at " <> show ptr)

heapWrite :: Heap a -> Pointer a -> a -> Heap a
heapWrite (MkHeap vals) (MkPointer n) x = MkHeap (Seq.update n (Just x) vals)

We expose the heaps with the previously-mentioned functions, starting with allocThunk. It's a pretty simple wrapper that uses the heap in the state to call heapAlloc.

allocThunk :: M (Pointer Thunk)
allocThunk = do
  heap <- MkM (gets interpState'thunks)
  let (ptr, heap') = heapAlloc heap
  MkM (modify (\st -> st { interpState'thunks = heap' }))
  pure ptr

readThunk and writeThunk are similarly simple wrappers.

readThunk :: Pointer Thunk -> M Thunk
readThunk ptr = do
  heap <- MkM (gets interpState'thunks)
  pure (heapRead heap ptr)

writeThunk :: Pointer Thunk -> Thunk -> M ()
writeThunk ptr thunk = do
  heap <- MkM (gets interpState'thunks)
  let heap' = heapWrite heap ptr thunk
  MkM (modify (\st -> st { interpState'thunks = heap' }))

The tree equivalents are the same, except of course in that they access the tree heap instead of the thunk heap.

allocTree :: M (Pointer Tree)
allocTree = do
  heap <- MkM (gets interpState'trees)
  let (ptr, heap') = heapAlloc heap
  MkM (modify (\st -> st { interpState'trees = heap' }))
  pure ptr

readTree :: Pointer Tree -> M Tree
readTree ptr = do
  heap <- MkM (gets interpState'trees)
  pure (heapRead heap ptr)

writeTree :: Pointer Tree -> Tree -> M ()
writeTree ptr tree = do
  heap <- MkM (gets interpState'trees)
  let heap' = heapWrite heap ptr tree
  MkM (modify (\st -> st { interpState'trees = heap' }))

Lastly, we get the the K/continuation part of the state. Adding a new contination to the stack is simple:

pushCont :: Continuation -> M ()
pushCont c = MkM (modify (\st -> st { interpState'conts = (c : interpState'conts st) }))

Continuing to the top continuation is slightly more complex. If the continuation stack is empty, we simply return the value; this is how a call to evalExpr eventually returns. If the continuation stack is non-empty, we pop a continuation from it and call continue' from above with the value and continuation.

continue :: Value -> M Value
continue value = do
  conts <- MkM (gets interpState'conts)
  case conts of
    [] -> pure value
    (h:t) -> do
      MkM (modify (\st -> st { interpState'conts = t }))
      continue' value h

At this point, we've made it through a complete and executable version of the semantics, which should be complete enough (not including foreign functions) to run arbitrary Silver programs once translated.

Example Program

Next, we look at how a simple Silver program might get lowered to SilvIR.

This is a super-simplified grammar, since this post is already really long. (A tea break before reading this section may be justified.) In this section, I'll go over a piece of Silver, and then write the corresponding SilvIR.

We're going to define a simple language of addition, integer literals, let expressions, and variables. These things are called arith:Exprs. In this program, we're just checking a fixed expression for errors.

arithGrammar :: Seq TopLevelItem
arithGrammar = [

The grammar, attribute declaration, and nonterminal declarations generate no code. For simplicity's sake, we're only passing down a list of names, rather than a map of bindings or something similar. If boundVars were an autocopy attribute, it'd need to invoke some machinery that we'll see in the next post; for simplicity, it's a standard inherited attribute.

grammar arith;
inherited attribute boundVars::[String];
synthesized attribute errors::[String];
nonterminal Expr with boundVars, errors;

We first define the production for integer literals.

abstract production lit
top::Expr ::= n::Integer
{
  top.errors = [];
}

This generates a T'ProdDecl to declare the production, plus a T'ProdBodyDecl for the equation.

  T'ProdDecl (MkProdName "arith:lit") (MkNTName "arith:Expr"),
  T'ProdBodyDecl (MkProdName "arith:lit") 0 (MkLocalName "top")
    (E'Let (MkLocalName "tmp")
      (E'Thunk (E'A (A'Cons (MkProdName "nil") [])))
    (E'SetAttr (MkAttrName "arith:errors") (A'Local (MkLocalName "top"))
      (A'Local (MkLocalName "tmp"))
      (E'A (A'Local (MkLocalName "top"))))),

Next, we define variables, in a fairly straightforward way. There's a bit of complication in translating this, since booleans haven't been defined! I'm not totally decided on how booleans should work in SilvIR; the possibilities I've thought about are:

Similarly, lists in Silver have some special behavior (see AppendCell, for instance). This isn't implemented here, and I'd somewhat wish for it to be an optimization implemented by different SilvIR implementations, rather than part of the semantics themselves. On the other hand, it's a fairly essential optimization for Silver performance, and lists would need to be built in to perform it, even if the optimization itself weren't part of the semantics. So, this is another open question.

abstract production var
top::Expr ::= name::String
{
  top.errors = if member(name, top.boundVars)
                 then []
                 else ["Unbound variable " ++ name];
}

The translation here gets a bit long, so we'll step through it piece by piece. First, we define a the arith:var production, and the production body for its equation.

  T'ProdDecl (MkProdName "arith:var") (MkNTName "arith:Expr"),
  T'ProdBodyDecl (MkProdName "arith:var") 0 (MkLocalName "top")

The errors attribute is behind a thunk, so we create it behind a let-binding.

    (E'Let (MkLocalName "errors")
      (E'Thunk

We use the global core:member, so we need to force it.

        (E'Let (MkLocalName "member")
          (E'Force (A'Global (MkGlobalName "core:member")))

Then we can call it; for simplicity, I defined core:member as taking an unthunked item to search for, while lists are always behind thunks. We call it with our 0th child (name) and our boundVars attribute.

        (E'Let (MkLocalName "nameIsBound")
          (E'Call (A'Local (MkLocalName "member"))
            [ A'GetChild 0 (A'Local (MkLocalName "top"))
            , A'GetAttr (MkAttrName "arith:boundVars") (A'Local (MkLocalName "top"))
            ])

We then pattern-match on the result; this is translated from the if in the Silver snippet. Since I'm trying out booleans-as-datatypes, they get a tag equal to the strings true or false to represent those values.

        (E'Case (A'Local (MkLocalName "nameIsBound"))

When the result of calling core:member is true, our thunk evaluates to nil.

          [ ( P'TreeOrTerm (MkProdName "true") []
            , E'A (A'Cons (MkProdName "nil") [])
            )

When the result of calling core:member is false, we create a new single-element list for the error. This requires creating a thunk around nil for the tail, and calling the strcat foreign function to concatenate strings to make the error in the head. (The name strcat is subject to change; string_append would probably be more consistent with Silver, I suppose.)

          , ( P'TreeOrTerm (MkProdName "false") []
            , E'Let (MkLocalName "tmp_nil")
                (E'Thunk (E'A (A'Cons (MkProdName "nil") [])))
              (E'A (A'Cons (MkProdName "cons")
                [ ( ChildIsntDecorable
                  , A'PureForeign (MkForeignFunctionName "strcat")
                      [ A'Lit (L'String "Unbound variable ")
                      , A'GetChild 0 (A'Local (MkLocalName "top"))
                      ]
                  )
                , ( ChildIsDecorable
                  , A'Local (MkLocalName "tmp_nil")
                  )
                ]))
            )
          ]))))

Finally, we set the thunk we created onto top as the arith:errors attribute.

    (E'SetAttr (MkAttrName "arith:errors") (A'Local (MkLocalName "top"))
      (A'Local (MkLocalName "errors"))
      (E'A (A'Local (MkLocalName "top"))))),

The add production will be our first chance to see inherited attributes being passed down.

abstract production add
top::Expr ::= l::Expr  r::Expr
{
  l.boundVars = top.boundVars;
  r.boundVars = top.boundVars;
  top.errors = l.errors ++ r.errors;
}

We define the production, and one production body per equation. This is not unlike translating each equation to an aspect production, and makes the frontend's job somewhat easier. The SilvIR generic optimizer will compose these functions with each other if they have the same priority, so there's no worry about the performance impact this would have. (And for our purposes here, it makes it easier to read.)

  T'ProdDecl (MkProdName "arith:add") (MkNTName "arith:Expr"),

The inherited equations are simple, and I do omit some amount of plausible optimization here in favor of simplicity.

For each child, we create a thunk that simply wraps top's arith:boundVars attribute, and set that attribute on the child.

  T'ProdBodyDecl (MkProdName "arith:add") 0 (MkLocalName "top")
    (E'Let (MkLocalName "tmp")
      (E'Thunk (E'Force (A'GetAttr (MkAttrName "arith:boundVars") (A'Local (MkLocalName "top")))))
    (E'SetAttr (MkAttrName "arith:boundVars") (A'GetChild 0 (A'Local (MkLocalName "top")))
      (A'Local (MkLocalName "tmp"))
      (E'A (A'Local (MkLocalName "top"))))),
  T'ProdBodyDecl (MkProdName "arith:add") 0 (MkLocalName "top")
    (E'Let (MkLocalName "tmp")
      (E'Thunk (E'Force (A'GetAttr (MkAttrName "arith:boundVars") (A'Local (MkLocalName "top")))))
    (E'SetAttr (MkAttrName "arith:boundVars") (A'GetChild 1 (A'Local (MkLocalName "top")))
      (A'Local (MkLocalName "tmp"))
      (E'A (A'Local (MkLocalName "top"))))),

To perform the equation for the arith:errors attribute, we call core:append with the errors from our children.

  T'ProdBodyDecl (MkProdName "arith:add") 0 (MkLocalName "top")
    (E'Let (MkLocalName "tmp")
      (E'Thunk
        (E'Let (MkLocalName "append")
          (E'Force (A'Global (MkGlobalName "core:append")))
        (E'Let (MkLocalName "thunk")
          (E'Call (A'Local (MkLocalName "append"))
            [ A'GetAttr (MkAttrName "arith:errors") (A'GetChild 0 (A'Local (MkLocalName "top")))
            , A'GetAttr (MkAttrName "arith:errors") (A'GetChild 1 (A'Local (MkLocalName "top")))
            ])
        (E'Force (A'Local (MkLocalName "thunk"))))))
    (E'SetAttr (MkAttrName "arith:errors") (A'Local (MkLocalName "top"))
      (A'Local (MkLocalName "tmp"))
      (E'A (A'Local (MkLocalName "top"))))),

As our last production, we have let bindings.

abstract production let
top::Expr ::= name::String  bound::Expr  body::Expr
{
  bound.boundVars = top.boundVars;
  body.boundVars = name :: top.boundVars;
  top.errors = bound.errors ++ body.errors;
}

As usual, we declare the production. We also pass our arith:boundVars down to the bound child.

  T'ProdDecl (MkProdName "arith:let") (MkNTName "arith:Expr"),
  T'ProdBodyDecl (MkProdName "arith:let") 0 (MkLocalName "top")
    (E'Let (MkLocalName "tmp")
      (E'Thunk (E'Force (A'GetAttr (MkAttrName "arith:boundVars") (A'Local (MkLocalName "top")))))
    (E'SetAttr (MkAttrName "arith:boundVars") (A'GetChild 1 (A'Local (MkLocalName "top")))
      (A'Local (MkLocalName "tmp"))
      (E'A (A'Local (MkLocalName "top"))))),

Passing arith:boundVars to body requires making a cons, but is otherwise the same.

  T'ProdBodyDecl (MkProdName "arith:let") 0 (MkLocalName "top")
    (E'Let (MkLocalName "tmp")
      (E'Thunk
        (E'Let (MkLocalName "topBoundVars")
          (E'Thunk (E'Force (A'GetAttr (MkAttrName "arith:boundVars") (A'Local (MkLocalName "top")))))
        (E'A (A'Cons (MkProdName "cons")
          [ ( ChildIsntDecorable
            , A'GetChild 0 (A'Local (MkLocalName "top"))
            )
          , ( ChildIsDecorable
            , A'Local (MkLocalName "topBoundVars")
            )
          ]))))
    (E'SetAttr (MkAttrName "arith:boundVars") (A'GetChild 2 (A'Local (MkLocalName "top")))
      (A'Local (MkLocalName "tmp"))
      (E'A (A'Local (MkLocalName "top"))))),

The arith:errors equation is the same as arith:add's arith:errors equation.

  T'ProdBodyDecl (MkProdName "arith:let") 0 (MkLocalName "top")
    (E'Let (MkLocalName "tmp")
      (E'Thunk
        (E'Let (MkLocalName "append")
          (E'Force (A'Global (MkGlobalName "core:append")))
        (E'Let (MkLocalName "thunk")
          (E'Call (A'Local (MkLocalName "append"))
            [ A'GetAttr (MkAttrName "arith:errors") (A'GetChild 1 (A'Local (MkLocalName "top")))
            , A'GetAttr (MkAttrName "arith:errors") (A'GetChild 2 (A'Local (MkLocalName "top")))
            ])
        (E'Force (A'Local (MkLocalName "thunk"))))))
    (E'SetAttr (MkAttrName "arith:errors") (A'Local (MkLocalName "top"))
      (A'Local (MkLocalName "tmp"))
      (E'A (A'Local (MkLocalName "top"))))),

Next, we need to implement the core:append and core:member functions. These definitions aren't too illuminating to SilvIR's attribute-grammar-relevant parts, so they're fine to skip if you don't care.

function append
[a] ::= l::[a]  r::[a]
{
  return case l of
  | [] => r
  | h::t => h::append(t, r)
  end;
}

function member
Boolean ::= x::String  xs::[String]
{
  return case xs of
  | [] => false
  | h::t =>
      if x == h
        then true
        else member(x, t)
  end;
}

Nothing here should be too surprising; the only real complication is the need to maintain thunks appropriately.

  T'GlobalDecl (MkGlobalName "core:append")
    (E'A (A'Lam [MkLocalName "l", MkLocalName "r"]
      (E'Let (MkLocalName "tmp") (E'Force (A'Local (MkLocalName "l")))
      (E'Case (A'Local (MkLocalName "tmp"))
        [ ( P'TreeOrTerm (MkProdName "nil") []
          , E'A (A'Local (MkLocalName "r"))
          )
        , ( P'TreeOrTerm (MkProdName "cons") [P'Var (MkLocalName "h"), P'Var (MkLocalName "t")]
          , E'Thunk
              (E'Let (MkLocalName "append")
                (E'Force (A'Global (MkGlobalName "core:append")))
              (E'Let (MkLocalName "rest")
                (E'Call (A'Local (MkLocalName "append"))
                  [ A'Local (MkLocalName "t")
                  , A'Local (MkLocalName "r")
                  ])
              (E'A (A'Cons (MkProdName "cons")
                [ ( ChildIsntDecorable
                  , A'Local (MkLocalName "h")
                  )
                , ( ChildIsDecorable
                  , A'Local (MkLocalName "rest")
                  )
                ]))))
          )
        ])))),

  T'GlobalDecl (MkGlobalName "core:member")
    (E'A (A'Lam [MkLocalName "x", MkLocalName "xs"]
      (E'Let (MkLocalName "tmp") (E'Force (A'Local (MkLocalName "xs")))
      (E'Case (A'Local (MkLocalName "tmp"))
        [ ( P'TreeOrTerm (MkProdName "nil") []
          , E'A (A'Cons (MkProdName "false") [])
          )
        , ( P'TreeOrTerm (MkProdName "cons") [P'Var (MkLocalName "h"), P'Var (MkLocalName "t")]
          , E'Let (MkLocalName "eq")
              (E'A (A'PureForeign (MkForeignFunctionName "streq")
                [ A'Local (MkLocalName "x")
                , A'Local (MkLocalName "h")
                ]))
            (E'Case (A'Local (MkLocalName "eq"))
              [ ( P'TreeOrTerm (MkProdName "true") []
                , E'A (A'Cons (MkProdName "true") [])
                )
              , ( P'TreeOrTerm (MkProdName "false") []
                , E'Let (MkLocalName "member")
                    (E'Force (A'Global (MkGlobalName "core:member")))
                  (E'Call (A'Local (MkLocalName "member"))
                    [ A'Local (MkLocalName "x")
                    , A'Local (MkLocalName "t")
                    ])
                )
              ])
          )
        ])))),

The main function is ill-typed for Silver, but we haven't imposed a type for main, and this type is more convenient for trying out SilvIR here.

function main
[String] ::=
{
  return (decorate add(let("y", lit(2), add(var("x"), var("y"))), var("z")) with {
    boundVars = [];
  }).errors;
}

We break up the term being constructed into a bunch of let-bindings, for readability. The names here are a bit cryptic, but they intend to abbreviate the term.

  T'GlobalDecl (MkGlobalName "arith:main")
    (E'A (A'Lam []
      (E'Let (MkLocalName "tmp_l2")
        (E'A (A'Cons (MkProdName "arith:lit")
          [(ChildIsntDecorable, A'Lit (L'Int 2))]))
      (E'Let (MkLocalName "tmp_vx")
        (E'A (A'Cons (MkProdName "arith:var")
          [(ChildIsntDecorable, A'Lit (L'String "x"))]))
      (E'Let (MkLocalName "tmp_vy")
        (E'A (A'Cons (MkProdName "arith:var")
          [(ChildIsntDecorable, A'Lit (L'String "y"))]))
      (E'Let (MkLocalName "tmp_avxvy")
        (E'A (A'Cons (MkProdName "arith:add")
          [ (ChildIsDecorable, A'Local (MkLocalName "tmp_vx"))
          , (ChildIsDecorable, A'Local (MkLocalName "tmp_vy"))
          ]))
      (E'Let (MkLocalName "tmp_lyl2avxvy")
        (E'A (A'Cons (MkProdName "arith:let")
          [ (ChildIsntDecorable, A'Lit (L'String "y"))
          , (ChildIsDecorable, A'Local (MkLocalName "tmp_l2"))
          , (ChildIsDecorable, A'Local (MkLocalName "tmp_avxvy"))
          ]))
      (E'Let (MkLocalName "tmp_vz")
        (E'A (A'Cons (MkProdName "arith:var")
          [(ChildIsntDecorable, A'Lit (L'String "z"))]))
      (E'Let (MkLocalName "term")
        (E'A (A'Cons (MkProdName "arith:add")
          [ (ChildIsDecorable, A'Local (MkLocalName "tmp_lyl2avxvy"))
          , (ChildIsDecorable, A'Local (MkLocalName "tmp_vz"))
          ]))

Next, we create a thunk for the boundVars attribute we're setting in a decorate expression.

      (E'Let (MkLocalName "tmp_nil")
        (E'Thunk (E'A (A'Cons (MkProdName "nil") [])))

Finally, we construct a tree, and return the result of plucking arith:errors off of it.

      (E'Let (MkLocalName "tree")
        (E'Decorate (A'Local (MkLocalName "term"))
          (A'Lam [MkLocalName "top"]
            (E'SetAttr (MkAttrName "arith:boundVars") (A'Local (MkLocalName "top"))
              (A'Local (MkLocalName "tmp_nil"))
              (E'A (A'Local (MkLocalName "top"))))))
      (E'A (A'GetAttr (MkAttrName "arith:errors") (A'Local (MkLocalName "tree")))))))))))))))]

Lastly, we define a helper for testing this program, which simply converts a value to a Haskell list of Haskell strings, so it can be easily compared and printed. Since SilvIR lists are thunked, it needs to be able to force thunks, and hence runs in M.

asStringList :: Value -> M [String]
asStringList thunkValue = do
  let thunkPtr = asThunk thunkValue
  thunk <- readThunk thunkPtr
  value <- case thunk of
    Thunk'Forced value -> pure value
    Thunk'Unforced env expr -> do
      putLocalEnv env
      evalExpr expr
  case asTerm value of
    (MkProdName "nil", []) -> pure []
    (MkProdName "cons", [(_, V'Lit (L'String h)), (_, t)]) ->
      (Text.unpack h :) <$> asStringList t
    _ -> error "not a string list"

We can then try evaluating this, expecting to get an error for x being undefined, and one for z being undefined.

main :: IO ()
main = putStrLn . runM $ do
  let expr = E'Let (MkLocalName "main") (E'Force (A'Global (MkGlobalName "arith:main")))
        (E'Call (A'Local (MkLocalName "main")) [])
  forM_ arithGrammar defineTopLevelItem
  value <- evalExpr expr
  valueStrs <- asStringList value
  unless (valueStrs == ["Unbound variable x", "Unbound variable z"]) $ do
    error "Unexpected result"
  pure (show valueStrs)

Remarks

This was a really long post, but hopefully resulted in a strong understanding of the semantics of each SilvIR construct, as well how to translate basic Silver down to SilvIR. Furthermore, as noted up top, this also serves as an artifact to allow experimentation with SilvIR, both by tweaking the semantics and by using them as an interpreter.

In the next post, we'll look at how to translate some of the more "troublesome" features of Silver into SilvIR.