Languages

The Prototype Pattern and Extensible Compilers

Posted in Languages on November 1st, 2008 by Lorenz Pretterhofer – Comments Off

During some of the early design work on the Mention programming language, I recognized that a fixed parser would be unable to deal with a great number of interesting language problems. Things like embedded SQL and other DSL language, or perhaps even a unit notation.

All of these things could be added into a syntax, I believe, with a carefully constructed extensible parser. Each extension would become a libraries (modules) adding or replacing function points in the existing parser later reusing some of them as part of the extended syntax. (The actual parser would be based on Parsec and other similar combinator parsers. OMeta is also related to this parser architecture.)

The problem with adding new syntaxes to a language isn’t just the parser implementation however. Most of the really tricky problems actually come from the rest of the tool chain. Things like the compiler, debugger, profilers, editor, and a bunch of other potential tools. It had become clear to me that I wasn’t just working on an extensible parser, but rather an extensible language, and that meant an extensible compiler architecture.

Basically I thinking about a completely extensible intermediate language, which would power a set of semi-extensible libraries for implementing the bulk of the tool-chain. These libraries will take the leg-work out of processing the source code, and will allow the tools to manipulate or interact with the code in its intermediate form (this includes editors and analytical tools). The only problem I saw was that I had absolutely no idea how to construct such an intermediate language, little own how to write tools and libraries to interact with it.

Anyway, recently I run into the latest (at the time) Steve Yegge post, with a somewhat inspiring discussion about the Prototype Pattern (The Universal Design Pattern). After musing about pattern (again) for a couple of days, it finally hit me, I had the architecture all wrong. This isn’t just some extensible compiler or parser…this was the Self programming language…redesigned, implemented and tailored for the task of compiler writing. The AST is simply a specialized variation of the Prototype pattern, or more succinctly, a compiler is just a constrained dialect of Self, designed for compiler tasks.

So this my proposal is this, use the prototype pattern for Mentions AST and build a framework of libraries and tools designed to aid in the maintenance and development of an AST and the compilers, parsers, debugger, et cetera, which depend on it.

There are already a few pitfalls that I can imagine coming out of this system. For one, a good namespace or typing system will probably be required to prevent name conflicts. That is, each extension or collection of extensions would add only private collections of attributes, while a smaller group of shared public attribute interfaces would be used to facilitate cooperation between extensions. And then there’s still the issue of expected behavior, which may be complicated by the potential for different combinations of compiler extensions to inadvertently interact in unexpected ways.

In any case, I believe that this will be one of the key features of the core Mention programming language and it may even make some of the more complicated features like type systems easier to implement and experiment with later on. At the very least, you can expect to see some follow up material on this when I start to get further into the implementation of the Mention.

– Lorenz

Read Transparency

Posted in Languages on October 22nd, 2008 by Lorenz Pretterhofer – Comments Off

Over the past few months I’ve been putting together a collection of requirements, of sorts, for a new class of dynamic functional programming languages, a language that could provide powerful dynamic features on par with Smalltalk or Self (and JavaScript etc.), without losing some important features like laziness or strong static typing.

One of the requirements I keep stumbling across is the need for referentially transparent functions that are capable of interacting directly with mutable data structures. Yeah, the kind that, by definition, are not referentially transparent.

At first I considered using compiler attributes to mark a piece of code that interacts with side effects vs. code that doesn’t. Something like the IO/ST monads from Haskell but general enough any function can interact with mutable variables, not just ones returning a monadic value. Unfortunately any code that interacts with mutable variables may behave radically differently when not constrained by the full monadic infrastructure or some equivalent like strictness (Scheme and ML).

I believe the only useful solution is to define a new property of the code–read transparency–which allows us to recognize code that is referentially transparent on immutable data while also referentially transparent towards unchanging mutable variables.

A function is read transparent if it does not cause new side effects (either by IO, mutable state or otherwise), and the function produces equivalent result between any two calls provided the state of all arguments is equivalent (or the arguments are both immutable and the same values).

The above definition depends on the equivalence property of values, which I define along the lines of “any two data structures are equivalent if they are functionally the same value, or they only contain mutable variables containing equivalent state.” Given a good equivalence for all primitives this allows us to easily check for all functions which could safely be called in our new semi-functional programs.

The original motivating example for this concept was to allow functions like show in Haskell to work with any arbitrary data structure, mutable or not. This would greatly simplify programs like interpreters, which could use many of the same functions as pure code, without having to use somewhat cumbersome combinators like the lifting functions and unboxing functions (readIORef, etc.).

An important thing to remember about this property is that, while many previously referentially transparent functions would now only be read transparent, when used only with pure functional data, they are in fact promoted automatically to fully referentially transparent functions. That is, they remain effectively referentially transparent if you don’t care about mutable state, or to put it another way–the functions become more flexible, not less so. Only when the implementation of a function relies on language features that prevent mutable data from being used would the more restrictive referential transparency be required (like lazily evaluated results, incorporating the read operation).

We don’t actually have to stop there either. Just because a piece of code isn’t read transparent, that doesn’t mean that it suffers from all variations of side-effects. There are still yet more shades of grey which we can use to limit the damage caused by bugs and side-effects in our software. A property like write transparency for code that reads and writes to mutable data structures but performs no IO could be equally useful, just as the ST monad in Haskell, which cannot perform IO is useful for working with mutable state.

Unfortunately while our little definition is a rather interesting take on the mutable state issue, there are some rather serious caveats we still have to consider. The biggest problem we face here is that, while a read transparent function can read from mutable variables without any reprimands, it cannot use the unread mutable variable in the result, unless it uses it, still boxed. That is, if we returned a result lazily, the result might be different depending on when the result was used, definitely not something we want, and the primary reason that mutable state in Haskell is limited to the ST monad et cetera.

More concretely, we can return new mutable variables (since the result would always be equivalent), use the results from reading a mutable variable so long as we read it strictly before the function returns, and finally we can return the mutable variable verbatim as part of the result (something Haskell can do also). If any language implemented mutable variables unboxed however, they would not be usable in last context however, and you would always have to read from them strictly, leading to somewhat less flexibility in the language.

One last thing I’d like to add. The above definition is strikingly similar to something that might be rigorously defined and explored, but for some reason I haven’t done so in this post. Well, for all intents and purposes, I wouldn’t know where to start, at least not yet anyway. If and when I apply this property in an actual language implementation however, you can be sure I’ll have a rigorous writeup of it and it will likely be part of the languages spec too.

– Lorenz

The Obvious Truth

Posted in Languages on March 7th, 2008 by Lorenz Pretterhofer – 1 Comment

One of the most important qualities of a good programming language is the first class function. This critical language feature makes possible a whole class of behavioural and data abstraction techniques. Without it we could not incorporate many modern language like continuations, generators, or callbacks.

There are of course many workarounds to any of these features. These workarounds range from articles on simulating continuations in Object-Oriented languages and callbacks can be replaced in most cases by the, somewhat more complicated, event or messaging passing mechanisms. Generators however, really are just un-useful without proper first class functions.

A generator is simply a function that sends a number, possibly infinite, of values to an independent language construct, which can then take advantage of these values, one at a time. The proverbial construct associated with generators is the for loop. The loop invokes the generator, a value function within the generator produces each value, and the value function itself calls a function parameter containing the code body of the loop itself. There are usually some extra mechanisms to handle premature termination of the loop (among other things).

I digress however, since what I really want to discuss here is the language elements left behind. Features such as environment management, the function dispatch mechanisms, exception handling, memory management, concrete syntax, and so on. What I really want to see is language designers moving towards a more unified system, providing programmers, or at least library implementers, with abstractions for these features, rather than fixed semantics that limit the expressivity of the language.

See despite so many languages taking great pride in providing first class functions, most actually just inherited these features due to the popularisation of functional programming techniques. The real hidden gem here, I feel, is reifying language semantics however. This can allow programmers to selectively enhance or extend their favourite language, without losing the built-in compiler or interpreter, possibly at the cost of optimisations complexity.

My favourite example here is the environment, an element of language design that has historically encountered limitless debate. Yet, for reasons that I cannot comprehend, even languages promoting expressivity over implementation consistently refuse to provide language level abstractions over these features.

I’m not saying that most programming requires such levels of expressivity, and quite the opposite, most programming tasks will never require it. But there are some tasks that would benefit greatly from the presence of such features. Rather than re-implementing these, it should be possible to just tap into the abstractions provided by the programming language.

One style of programming that would definitely benefit from these extensions is programming language research. It would be much simpler to test out new environment models, construct prototype virtual machines and interpreters, and even mess around with less common things like dispatch algorithms or meta object protocols, if instead of building entirely new environment abstractions, the existing ones could be utilized and re-engineered in a controlled way. This requires both abstractions limiting the direct manipulation of the host language implementation, otherwise you really just re-implementing the host language, and by providing a preferable more powerful interface than the one likely used in the host languages compilers or interpreter.

Using Scheme as an example, it would be trivial to add some intermediate abstractions for managing the environment. Scheme already implements something very close to first class environments, and in-fact many lisps do actually provide procedures for inspecting and controlling where evaluation takes place, but it’s really not quite there yet.

The reason I singled out Scheme in this case, even comparatively to other dialects of Lisp, is because the larger dialects typically include an impressive continuations facility. This combines with the hygienic macro feature, producing extremely impressive language extension capabilities. Unfortunately, you still need to generate raw code, when writing domain specific languages if you wish to allow fragments of host language code to be embedded into the DSL. In my case these fragments correspond to actions and predicates in a parsing language I’ve been implementing.

Perhaps I should attempt a Scheme dialect with such features, and just maybe I’ll even end up with something new?

– Lorenz