Archive for June, 2008

Scheme Parser Combinators

Posted in Scheme on June 10th, 2008 by Lorenz Pretterhofer – 2 Comments

I’ve had my MMeta parsing library on hold for a while now, but I think its finally time to reinvestigate the idea.

There are actually two reasons for doing this… The first is rather obvious… there is still a vacuum of good parser libraries for Scheme atm. But the other might be less so (uh… unless you read the title)… Scheme Parser Combinators.

After investigating parser combinators for a little while now, a short while ago I had the good fortune to stumble on a post by Gilad Bracha about parser combinators (this time its in Smalltalk).

While not everyone reading this will be able to easily follow the Smalltalk code involved in the post, its possible one of the best descriptions of what a combinator parser is, and how they tend to get written.

My goal here is to replace the forreign syntax of OMeta with a Scheme native syntax using plain old functions and namespace bindings. And of course there’s the odd macro to help with readability (combinators can occasionally get a little out of hand without proper currying, or an unusually light weight syntax for writing lambdas/closures).

What this means is that my parsing solution will now look a bit more like a set of constructors nested between each other being assigned to a variable. Its not quite as pretty as the Haskell variation which uses the parsers directly (thanks to lazy evaluation and currying), but it does work in a reasonably straight forward way.

(define basic-char
  (λ (state)
    (aif rc eof-object? (read-char state)
         fail
         rc)))

(define (char c)
  (λ (state)
    (let ([rc (read-char state)])
      (cond ((eof-object? rc) fail)
            ((char=? rc c) rc)
            (else fail)))))

In the above snipped we define the two fundamental character parsers… They essentially define the naive variation of a character parser and a selective character parser.

What I want you to do is note the second parser’s definition. This parser is not technically a parser at all, and is what we generally refer to as a combinator. In reality its not really a combinator since it doesn’t accept a function as an argument… but it is higher order since it does return one, and that’s they key to making these things work.

All you need to drive a parser combinator is to be capable of returning functions until you have a function with all but one argument… plain and simple currying. Of course because we’re using Scheme and not Haskell, currying for us is not implicit, so rather than doing all of it by hand we’d rather just have combinators which have not been parameterized yet, and actual parsers which only accept a single state argument (a Scheme input port currently).

This does make some of the parser we might implement a little more complicated, since some of our functions will return a parser, and some of our functions will be the parser… but overall it tends to work quite effectively once you understand the difference between the two. It might also help to note that currying is used to prepare functions early, but rather it tends to be on the fly. Occasionally you may even return a parser immediately before running the very same parser (within the same top level expression).

Since I’m looking predominantly at backtracking parsing still lets now investigate a useful practical example to introduce some more combinators and the basic style that you use when writing parser with this stuff…

(define a-or-b
  (many1 (mor (char #a)
              (char #b))))

Ok then… what do we have here. (char #a) and the equivalent for #b are simple parsers which only accept the plain and simple input of their respective character, or they fail.

This might be a good point to explain what the fail value actually is… its not a value for a start. Actually its a clever little MzScheme macro (not R5RS unfortunately), which allows you to either write just fail and it will expand to (make-fail-type ()), or you could write (fail "Did not find character #c!")… which is what we’ll be using when we cover error handling in a later post. Of course there’s also a function for testing it called fail?.

The mor here is of course the or branch (I may come up with a better name for this at some point), which in our case performs any stream rewinding necessary and checks each path until one of them succeeds or the entire construct fails. The parser returned by mor here will actually parse a single #a or a single #b or it will fail.

And finally the many1 accepts a parser which it will run over and over again collecting the results in to a list based accumulator. If it doesn’t see at least one result it will return a fail result, otherwise it will return the list of results. Interestingly this parser that many1 accepts must fail at least once before many1 itself will stop, which means that many1 must always rewind the stream at least a little before returning a successful result.

And of course we use the standard define to set a-or-b…

To finish up this post, our completed parser can be evaluated just like a hand-written parser in Scheme can…

(a-or-b (open-input-string "aabb"))

This will of course return (#a #a #b #b).

Of course this is really just a brief introduction on Parser Combinators and they’re basic operation… in the next post for MMeta I’ll be discussing some actual usage of the parser and of course, attached will be the actual source files for the parser ready to go.

– Lorenz

FoNC Wiki

Posted in Uncategorized on June 6th, 2008 by Lorenz Pretterhofer – Comments Off

If your’re interested in post-Smalltalk research, or more specifically the implementation of such technologies then you might be interested in keeping an eye out on the work being done by Ian Piumarta, et al.

Well, the good news is that a wiki has been setup for increased public communication about the project.

You can check it out over at…linky

–Lorenz

Cygwin…Mingw…’.a’…WTH!!?

Posted in Uncategorized on June 4th, 2008 by Lorenz Pretterhofer – Comments Off

Recently I decided to pick up an apparently fantastic book on Erlang, written by Joe Armstrong (Programming Erlang). Now personally, I’ve never been particularly impressed by the man himself…too much of that functional programmer arrogance if you ask me. For some reason however, the hype got the better of me, especially after stumbling on a conversation between Armstrong and a couple of other functional programming ‘nuts’…uh…enthusiasts.

Perhaps of interest, is that I’m also quite a fan of functional programming myself, but I’d rather think of myself as avoiding a particularly insidious case of accidental complexity rather than using a fundamentally better language. I should explain… While functional programming languages does reduce a lot of the complexity accrued due to side effects and shared memory; they don’t deal with some of the other almost equally dangerous forms of accidental complexity.

Note: The actual problem I have with these functional language guys is the assumption that it’s a natural way of thinking about problems. Its not…it’s just, plain and simple, another way of learning how to solve problems. And its generally not one of the easiest to learn (basic anyone?)

I’m talking about compatibility of course, which brings me to the point of this post.

Cygwin…Mingw…’.a’…WTH!!?

After learning Erlang, and believing (perhaps even jumping the gun slightly), that I would be up to a real challenge, I realized that one thing was missing. A decent flexible, portable, Erlang code only FFI interface.

My challenge…write a real-time, computer game using Erlang. The likely candidate would be a space combat sim, within the vain of a Freespace2 style game (graphics and gameplay wise…hopefully the story will actually give our hero an actual name this time:)

Unfortunately the existing FFI in Erlang is either to use a network port to a C program linked to the candidate library, or to use a “linked-in driver”, attached to the VM and talking the VM’s data-structures in order to facilitate communication. Damn.

Talking to VM’s in case you’ve never tried it (although that seems unlikely if your reading my blog ;), is always a major pita. You effectively skewered between two of the most awkward types of interfaces known to programming kind, and worse yet, your only weapon of choice is what C gives you…or worse…a code generator (SWIG be damned).

No… What I need is something like the Pythoneers, imo brilliant, library…CTypes. And for C interfacing, the ctypes library uses a C level library called libffi (sort of anyway).

While this libffi based library of mine will no doubt be the topic of a future post, it does bring up a rather gruesome topic that I’ve been avoiding for quite some type now…Cygwin and Mingw.

After many hours of searching the web I’ve finally found what appears to be the solution.

Cygwin, as we all know is just a *nix/POSIX subsystem running over the windows COFF binaries (afaik), which allows us to blend windows and bash tasks during our development process (like using Windows compatible GNU/Make makefiles).

Mingw is where I got confused. You can find it in the installer for Cygwin, which put me under the assumption that it was more or less the default for this stuff…it isn’t.

Mingw is simply a cross-compiling target for GCC, which targets COFF binaries and the win32 C library (I can’t remember which one…I believe its the one in C:\windows however).

This means that to compile a binary for windows…one that doesn’t depend on cygwin.dll, or infact any of the cygwin dependent libraries you simply need to tell GCC to use the Mingw target…uh…sure…that sounds easy.

Actually, it turns out that this process is so involved sometimes (cross-compiling) that the mingw guys (or is that the Cygwin guys?) decided to make it easy. All you have to do, as mere mortals, is pass –mno-cygwin to GCC (“CFLAGS=’-mno-cygwin’ ./configure;make), and that’s it!

Fantastic…now back to the real challenge…using libffi to create my “Erlang C Types” library…

– Lorenz