Read Transparency
Posted in Languages on October 22nd, 2008 by Lorenz Pretterhofer – Comments OffOver the past few months I’ve been putting together a collection of requirements, of sorts, for a new class of dynamic functional programming languages, a language that could provide powerful dynamic features on par with Smalltalk or Self (and JavaScript etc.), without losing some important features like laziness or strong static typing.
One of the requirements I keep stumbling across is the need for referentially transparent functions that are capable of interacting directly with mutable data structures. Yeah, the kind that, by definition, are not referentially transparent.
At first I considered using compiler attributes to mark a piece of code that interacts with side effects vs. code that doesn’t. Something like the IO/ST monads from Haskell but general enough any function can interact with mutable variables, not just ones returning a monadic value. Unfortunately any code that interacts with mutable variables may behave radically differently when not constrained by the full monadic infrastructure or some equivalent like strictness (Scheme and ML).
I believe the only useful solution is to define a new property of the code–read transparency–which allows us to recognize code that is referentially transparent on immutable data while also referentially transparent towards unchanging mutable variables.
The above definition depends on the equivalence property of values, which I define along the lines of “any two data structures are equivalent if they are functionally the same value, or they only contain mutable variables containing equivalent state.” Given a good equivalence for all primitives this allows us to easily check for all functions which could safely be called in our new semi-functional programs.
The original motivating example for this concept was to allow functions like show in Haskell to work with any arbitrary data structure, mutable or not. This would greatly simplify programs like interpreters, which could use many of the same functions as pure code, without having to use somewhat cumbersome combinators like the lifting functions and unboxing functions (readIORef, etc.).
An important thing to remember about this property is that, while many previously referentially transparent functions would now only be read transparent, when used only with pure functional data, they are in fact promoted automatically to fully referentially transparent functions. That is, they remain effectively referentially transparent if you don’t care about mutable state, or to put it another way–the functions become more flexible, not less so. Only when the implementation of a function relies on language features that prevent mutable data from being used would the more restrictive referential transparency be required (like lazily evaluated results, incorporating the read operation).
We don’t actually have to stop there either. Just because a piece of code isn’t read transparent, that doesn’t mean that it suffers from all variations of side-effects. There are still yet more shades of grey which we can use to limit the damage caused by bugs and side-effects in our software. A property like write transparency for code that reads and writes to mutable data structures but performs no IO could be equally useful, just as the ST monad in Haskell, which cannot perform IO is useful for working with mutable state.
Unfortunately while our little definition is a rather interesting take on the mutable state issue, there are some rather serious caveats we still have to consider. The biggest problem we face here is that, while a read transparent function can read from mutable variables without any reprimands, it cannot use the unread mutable variable in the result, unless it uses it, still boxed. That is, if we returned a result lazily, the result might be different depending on when the result was used, definitely not something we want, and the primary reason that mutable state in Haskell is limited to the ST monad et cetera.
More concretely, we can return new mutable variables (since the result would always be equivalent), use the results from reading a mutable variable so long as we read it strictly before the function returns, and finally we can return the mutable variable verbatim as part of the result (something Haskell can do also). If any language implemented mutable variables unboxed however, they would not be usable in last context however, and you would always have to read from them strictly, leading to somewhat less flexibility in the language.
One last thing I’d like to add. The above definition is strikingly similar to something that might be rigorously defined and explored, but for some reason I haven’t done so in this post. Well, for all intents and purposes, I wouldn’t know where to start, at least not yet anyway. If and when I apply this property in an actual language implementation however, you can be sure I’ll have a rigorous writeup of it and it will likely be part of the languages spec too.
– Lorenz