|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
About - Documentation - Download - Links | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Guide to compile-time meta-programmingCompile-Time Meta-Programming (CTMP) can be thought of as being a more powerful cousin of macros; formally, it is said to allow the user of a programming language a mechanism to interact with the compiler. Most commonly CTMP is used to allow the construction of arbitrary program fragments by user code. In essence Converge provides a mechanism to allow its concrete syntax to describe abstract syntax trees - conventionally called ITree's in Converge - which can then be then spliced into a source file.First examplesAs a first example, consider this program which, at compile-time, computes the 30th entry in the Fibonacci sequence (computationally, a fairly expensive operation) and assigns it to the func fib(n): if n == 0: return 0 elif n == 1: return 1 else: return fib(n - 1) + fib(n - 2) fib30 := $<CEI::lift(fib(30))>At run-time the variable fib30 will have the value of 514229 directly assigned to it, negating the need to repeatedly calculate this particular entry in the Fibonacci sequence. The key feature of CTMP are splice annotations $<...> . A splice evaluates the expression within at compile-time (and before VM instruction generation), replacing the splice annotation itself with the AST resulting from its evaluation. This is achieved by creating a temporary module containing the splice expression in a function, compiling the temporary module into bytecode, injecting it into the running VM, and then evaluating the function therein. Since a splice must return an ITree, we need to convert a normal integer object into its ITree integer equivalent: in this example we liftthe integer with the CEI::lift function.
The following program is a slightly more involved example of CTMP, trivially adopted from its Template Haskell cousin. func expand_power(n, x): if n == 0: return [| 1 |] else: return [| $c{x} * $c{expand_power(n - 1, x)} |] func mk_power(n): return [| func (&x): return $c{expand_power(n, [| &x |])} |] power3 := $<mk_power(3)> When the above example has been compiled into VM instructions, power3 := func (x): return x * x * x * 1 Quasi-quoted expressions Differences from other approachesAs the above examples show, by using the splicing and quasi-quoting mechanisms, users can generate code at compile-time. Many readers will be familiar with other macro and macro-esque approaches, and a brief comparison can be enlightening.
Major featuresThe three major features of compile-time meta-programming are splicing, quasi-quoting, and insertion.SplicingThe key part of thepowersprogram is the splice annotation in the line power3 := $<mk_power(3)> . The splice tells the compiler to evaluate the expression between the chevrons at compile-time, and to include the result of that evaluation in the module for ultimate bytecode generation. In order to perform this evaluation, the compiler creates a temporary or dummymodule which contains all the necessary definitions up to, but excluding, the definition the splice annotation is a part of; to this temporary module a new splice function (conventionally called $$splice$$ ) is added which contains a single expression return splice expr . This temporary module is compiled to bytecode and injected into the running VM, whereupon the splice function is called. Thus the splice function `sees' all the necessary definitions prior to it in the module, and can call them freely -- there are no other limits on the splice expression. The splice function must return a valid ITree which the compiler uses in place of the splice annotation.
Splice annotations within a file are executed strictly in order from top to bottom. There are three forms of splice:
Default splices
$<<...>> , $c<<...>> , and $p<<...>> .
Quasi-quotingQuasi-quotes allow ITree's to be built using Converge's normal concrete syntax. Essentially a quasi-quoted expression evaluates to the ITree which represents the expression inside it. For example, whilst the raw Converge expression4 + 2 prints 6 when evaluated, [| 4 + 2 |] evaluates to an ITree which prints out as 4 + 2 . Thus the quasi-quote mechanism constructs an ITree directly from the users' input - the exact nature of the ITree is of immaterial to the casual ITree user, who need not know that the resulting ITree is structured along the lines of add(int(4), int(2)).
The quasi-quotes mechanism can be used to express ITree's of single expressions - in which case an individual ITree is returned - or a multi-line sequence of expressions - in which case a list of ITree's is returned. Note that Converge's splicing and quasi-quote mechanisms cancel each other out: InsertionInsertions within quasi-quotes work very differently to splices (which occur outside quasi-quotes): the insertion expression itself does not force compile-time evaluation. Instead the insertion expression is essentially copied as-is into the code that the quasi-quotes transforms to. For example, the quasi-quoted expression[| ${x} + 2 |] leads to an ITree along the lines of add(x, int(2)) -- the variable x in this case would need to contain a valid ITree. As this example shows, since splice annotations within quasi-quotes do not cause a change of meta-level, variable references do not cause staging concerns.
There are three two forms of insertion. Default insertions This feature completes the cancelling out relationship between splicing and quasi-quoting: Scoping rulesCompile-time meta-programming requires additions to Converge's scoping rules.Scoping and splicingEvaluating a splice expression leads to a new func g(...): return ... func f(x): $<g(x)>Note the resulting error message: Error: Line 21, column 3: Unknown variable 'x'might initially seem somewhat confusing, since the user can clearly see x . However it makes rather more sense when one realises that the message is the result of compiling the following temporary module:
func g(...): return ... func $$splice$$(...): return g(x)Clearly when compiling this, there is no x variable in scope.
Scoping and quasi-quotesThe quasi-quote mechanism can be used to surround any Converge expression to allow the easy construction of ITree's. Quasi-quoting an expression has two important properties: it introduces a new scope (in identical fashion that a function defines a new scope), and it fully respects lexical scoping. Thus quasi-quotes allow users to easily avoid unintended variable capture.Lexical scopingConsider the following contrived example of moduleA :
func x(): return 4 func y(): return [| x() * 2 |]and module B :
import A, Sys func x(): return 2 func main(): Sys::println($<A::y()>)The quasi-quotes mechanisms ensures that since the reference to x in the quasi-quoted expression in A::y refers lexically to A::x , that running module B prints out 8 . This example shows one of the reasons why Converge needs to be able to statically determine namespaces: since the reference of x in A.y is lexically resolved to the function A::x , the quasi-quotes mechanism can replace the simple reference with an original name that always evaluates to the slot x within the specific module A wherever it is spliced into, even if A is not in scope (or a different A is in scope) in the splice location.
HygieneVariable capture is when splicing in an ITree could cause variables with the same name to inadvertently overwrite (or func f(): return [| x := 4 |] func g(): x := 10 $c<f()> y := xWhat might one expect the value of y in function g to be after the value of x is assigned to it? A naive splicing of f() into g would mean that the x within [| x := 4 |] would be captured by the x already in g -- y would end with the value 4 . If this was the case, using the quasi-quote mechanism could potentially cause all sorts of unexpected interactions and problems. In order to solve this problem, not only is Converge able to statically determine namespaces, but variables can be alpha-renamed (essentially meaning that x can be changed consistently to y in a given scope) without affecting the programs semantics. This is a significant deviation from the Python heritage. The quasi-quotes mechanism determines all bound variables in a quasi-quoted expression, and preemptively alpha-renames each bound variable to a name which is invalid in the normal concrete syntax. In so doing, Converge guarantees that the user can not inadvertently cause variable clashes. All references to the variable within the quasi-quotes are updated similarly. Thus the x within [| x := 4 |] will not cause variable capture to occur, and the variable y in function g will be set to 10 .
There is one exception: top-level definitions (all of which are assignments to a variable, although syntactic sugar generally obscures this fact) can not be alpha-renamed since this could lead to run-time Dynamic scopingSometimes the quasi-quote mechanisms automatic alpha-renaming of variables is not what is needed. For example consider a functionswap(x, y) which should swap the values of two variables. In such a case, we want the result of the splice to capture the variables in the spliced environment. The following definition of swap expects to be passed two ITree's representing variables:
func swap(x, y): return [| temp := $c{x} $c{x} := $c{y} $c{y} := temp |]It is initially tempting to try and use this function as follows: a := 10 b := 20 $c<swap([| a |], [| b |])>However this causes a staging error since the a and b in the quasi-quotes do not refer to a bound variable either inside or outside the quasi-quotes when the splice expression is evaluated (the latter would be invalid anyway, due to Converge's lifting rules; see later}). swap needs to be passed ITree's representing variable names, not references to variables. ITree's representing variable names can be constructed by the idiom CEI::ivar(x) where x is a string representing a variable name. When such a variable name is spliced into a quasi-quotes it will not be renamed, thereby allowing dynamic scoping. A correct call to swap thus looks as follows:
$c<swap(CEI::ivar(a), CEI::ivar("b"))>In this case, the variable names constructed by the CEI interface are first spliced into the quasi-quotes in the swap function. The resulting ITree from the quasi-quotes is then spliced in place of the swap call, and the variable names dynamically capture the a and b variables.
Dynamic scoping also tends to be useful when a quasi-quoted function is created piecemeal with many separate quasi-quote expressions. In such a case, variable references can only be resolved successfully when all the resulting ITree's are spliced together since references to the function's parameters and so on will not be determined until that point. Since it is highly tedious to continually write $c<swap([| &a |], [| &b |])> Forward references and splicingWe saw earlier that when a splice annotation outside quasi-quotes is encountered, a temporary module is created which contains all the definitions up to, but excluding, the definition holding the splice annotation. This is a very useful feature since compile-time functions used only in one module can be kept in that module. However this introduces a real problem involving forward references (i.e. a reference to a definition defined later in the file). If a splice annotation is encountered and compiles a subset of the module, then some definitions involved in forward references may not be included: thus the temporary module will fail to compile, leading to the entire module not compiling. Worse still, the user is likely to be presented with a highly confusing error telling them that a particular reference is undefined when, as far as they are concerned, the definition is staring at them within their text editor.Consider the following contrived example: func f1(): return [| 7 |] func f2(): x := f4() func f3(): return $<f1()> func f4(): passIf f2 is included in the temporary module created when evaluating the splice annotation in f3 , then the forward reference to f4 will be unresolvable.
Converge's solution to this problem is to include only the minimal needed subset of definitions in the temporary module; thus most forward references do not raise a compile-time error. When a splice annotation is encountered, the Converge compiler does not immediately create a temporary module. First it calculates the splice expressions' free variables; any previously encountered definition which has a name in the set of free variables is added to a set of definitions to include. These definitions themselves then have their free variables calculated, and again any previously encountered definition which has a name in the set of free variables is added to the set of definitions to include. This last step is repeated until an iteration adds no new definitions to the set. At this point, Converge then goes back in order over all previously encountered definitions, and if the definition is in the list of definitions to include, it is added to the temporary module (this last stage ensures that definitions are not reordered in the temporary module). Note also that free variables which genuinely do not refer to any definitions (i.e. a mistake on the part of the programmer) will pass through this scheme unmolested and will raise an appropriate error when the temporary module is compiled. Using this method, the temporary module that is created and evaluated for the example looks as follows: func f1(): return [| 7 |] func $$splice$$(): return f1()There are thus no unresolvable forward references in this example. Notice that Converge's approach to the forward reference problem is not a completely general solution since some forward references (particularly those to definitions beyond a splice site) are inherently unresolvable. Converge's approach is intended to significantly reduce the problem to the point that any unresolvable references are the result of programmer error. Variable renamingWhen generating code, it is often the case that references to variables outside generated are made. In certain situations, intentional dynamic scoping can interfere with this. Converge thus allows variables to be declared as being renamed in a scope. When a variable is renamed, the previous name is removed from the scope, and the renamed name is implicitly nonlocal to the scope. For example in the following code, callingf will result in the outer x being incremented by 2; the x defined in f is local to f itself.
x := 2 func f(): rename x as y x := 3 y += 2The rename declaration is typically only used in quasi-quoted code. Note that the rename and nonlocal declarations must be declared before any expressions in a function body.
The
At various points when compile-time meta-programming, one needs to interact with the compiler. The |
|
||
Location | Example(s) | Description |
|
||
Block-level expression | x := ... $<...> f(x) |
A single IExpr, or list of IExpr's, can be spliced or inserted when an insert appears as a block-level expression (i.e. it is not a sub-expression). |
Embeddedexpression |
x := $<...> + 2 |
An IExpr can be spliced or inserted anywhere a normal expression would be expected. |
Top-level definitions | import X $<...> |
A single IClass_Def, IFunc_Def, IAssignment, or a list containing objects of these types can be spliced at the top-level of a file (since quasi-quotes can not appear at the top-level of a file, this particular type of splicing does not have an insertion equivalent). |
Class name | class $<...>(...): ... |
An IVar can be spliced or inserted for a class name. |
Class fields | class C: $<...> |
A single IAssignment or IFunc_Def, or a list containing objects of these types, can be spliced or inserted when an insert appears as a class field. |
Function name | func $<...>(...): ... |
An IVar can be spliced or inserted for a function name. |
Function parameter(s) | func f($<...>): ... |
A single IParam, or list of IParams, can be spliced or inserted into a functions parameters list. |
Rename(s) | rename $<...> |
A single IRename, or list of IRenames, can be spliced or inserted into a list of renames. |
Module lookup | Module::$<...> |
An IVar can be spliced or inserted for a definition to be looked-up in a module. |
Slot lookup | o.$<...> |
A string can be spliced or inserted for a slot name to be looked-up in an object. |
Assignment target(s) | $<...> := ... |
An IVar, IGet, ISlot_Lookup, or a list containing objects of these types, can be spliced or inserted for the targets of an assignment. |
|
name := "..." [| func $c{CEI::ivar(name)}(): .. |]Note that the resulting function's name will be dynamically scoped. Since we can splice lists of ITrees in at various locations, it then becomes trivial to generate multiple such functions. Let's take a simple example of a function which should generate a list of ITree functions which should print the name of the function when executed:
func pfuncs(names): funcs := [] for name := names.iter(): funcs.append([| func $c{CEI::ivar(name)}(): Sys::println(${CEI::istring(name)}) |]) return funcsWe can then use this function in a module as follows:
$c<pfuncs(["dog", "cat", "mouse"])> func main(): dog() cat()Running this module will print out
dogand
cat.
[<file name>, <char offset>]
; src infos are lists of these tuples. The special variable __SRC_INFO__
returns src infos that pinpoint the first character of the variable name.
When an ITree is built with quasi-quotes, it automatically picks up the src infos of the file and location of the quasi-quote. Extra src infos can be added using the following syntax:
[<expr>| ... |]Calling ITree creation functions in the CEI module automatically adds src infos relating the caller to the ITree. i.e.
CEI::ix(...)
is roughly equivalent to ITree::Ix.new(..., __SRC_INFOS__)
.
$<<...>>
where expr
should evaluate to a function (the DSL implementation function). Note that, as with normal splices, the default action of a DSL splice is to ensure that no variable capture can occur; both capturing DSL splices $c<<...>>
and pragma DSL splices $p<<...>>
are available.
The DSL implementation function is called at compile-time with a string representing the DSL block, and is expected to return an AST which will replace the DSL block in the same way as a normal splice. Compile-time meta-programming is thus the mechanism which facilitates embedding DSLs. Colloquially one uses the DSL implementation function to talk about the DSL block as being `an expr block'.
An example of a DSL block and DSL implementation function is as follows:
func dslim(dsl_block, src_infos): return ... $<<dslim>>: ...Typically the DSL implementation uses the
CEI::dsl_parse
convenience function to parse the DSL block.
CEI::embeddable_itree
function can be used to solve this problem. It takes in an ITree, and returns a copy with all free and bound variables to fresh names; usefully it also returns a list of IRenames which can be spliced in at the outermost-embedding level to ensure that the capturing. A typical use of this is as follows:
renames, embeddable_cit := CEI::embeddable_itree(cit) return [| func () { rename $c{renames} $c{embeddable_cit} }() |]