magicfoodhand

Rigging Things Together

rustrigz

As my first real project since being self employed I figured I'd start with something easy, a custom programming language. How hard could it be? This is an interpreted language after all; all you need to do lex, parse the AST, then run the AST. Piece of cake.

Another Language?

Of course, it seems like a fun project but I have a few projects in mind that I want to use it for. The first project I call polc (pol - sea), a policy as code tool; I've tried sentinel and rego but they're not quite what I'm looking for. My goal is something more declarative. I was mainly inspired by Ruby and Terraform when creating this language. When asking myself how I'd want to create a terraform alternative, just as a thought exercise, I settled on everything being a function call. I'd been thinking of the idea for about a year and after a few failed attempts, here we are.

Rigz 1.0

The name rigz was chosen because initially the parsing was done in Rust then the runtime was written in Zig, this was a nightmare that was quickly scrapped. Rigz was meant to be a purely functional language, no expressions or assignments only functions, and that's what v1 turned into. Using rust and a Module trait I figured this would lead to an easy to use powerful language that could be customized for the users needs. A DSL that could be used for anything.

Problem 1. Cluelessness + ChatGPT

Before this I hadn't worked with Zig at all and hardly written any Rust, barely understanding the borrow checker. I'd been able to pick up dynamic languages only using the standard library, and have worked with C, surely I could do the same with these two systems languages. I'd learn them writing a parser and runtime, it'd be easy. Additionally ChatGPT seemed to do a good job generating a base parser using pest, I would only need to learn Zig anyway.

While it's true you can send enough messages to GPT 4 to get a base parser working for a simple grammar; once your grammar was a couple hundred lines long and the accompanying Rust was twice that, you'd start to run into hallucinations or issues with the way the grammar was structured for pest. I would've spent way less time if I had read through the documentation before I got started and using pest_derive converting tokens to elements in an AST is trivial (especially with a little bit of recursion). The wait for GPT and accompanying code review was also slower than if I had typed it, Co-Pilot and Intellij AI were worse for this task without the context to help it out during massive refactors and generated garbage.

Zig seems like a fantastic language and it has a lot of concepts that interest me, that being said I'd prefer to wait a bit longer. The only aspect of the language I found myself disliking immediately was that warnings are compile time errors, an unassigned variable is a sign I'm prototyping and want to check that the rest of the code compiles or still runs. It's not difficult to _ = everything you aren't using but it's one more step that blocks me more than it helps me, at least right now. GPT was useless here, that's fine I don't want to use any of them anymore, but with the degradation of Google and rapidly evolving language it was frustrating to find things that used to work or documentation that wasn't clear to me. I don't blame the docs, when you're rushing to build a ton of components you barely understand in a language you don't know there's no one to blame except the one typing the keys. I also ran into issues with the language server in Intellij, I'm sure I'd have a better experience using neovim or even vscode here instead.

Problem 2: No Plan

For this part it would help to look at a rigz snippet:

std.puts 'Hello World' # std is the module for standard lib

# these would all be defined in a custom module
allow {
variables {
account = :valid_account
}
}

Since the plan was to use a Module trait, users or most likely I would have to create all of the modules to use. My first thought was to use dynamic libraries here, users would define a function and be able to call it from rigz. I'd had most of this framework setup with the Zig interop but this was a nightmare. This meant that modules would have to be built on the system they were going to run on and the language wouldn't work in the browser anymore (via WASM), and they'd have to be pulled from somewhere (git clone). I could go on for all the reasons this was a terrible idea, but finally I realized it after running into an annoying issue. This was the point to pivot to lua and decided that the stdlib module would use mlua and then it would work anywhere again, giving users the power they'd want. This wasn't too difficult to implement but lead to a very convoluted runtime and project, the runtime itself didn't depend on lua and the stdlib could be optionally loaded (from another crate).

The above snippet also requires a way declare aliases to module functions, what if two modules declare the same function? Last wins? That's what ended up happening, but unfortunately this would mean that all modules had to be loaded before the runtime and I'd have to write all the modules loaded by the base CLI, build a better standard lib, or leverage lua more.

Lua is a very simple language that is easy to understand, but once again I didn't have any experience writing it prior to this (outside of copy pasted neovim snippets). Being able to read a language doesn't mean you automatically can write it, much to my chagrin. This meant I now would be learning lua and wiring up a mix of rust/lua interop to write modules for everything I'd want to do. Fun. This was a lot of work to just call lua with a fancy syntax, but alas I continued and you can use rigz today. However at this point I've decided it's time for a full rewrite.

Mini Detour - tree-sitter

As a spoiled modern developer I can't write in a language that doesn't at least have syntax highlighting (an LSP would be better but it's just lex + parse + run right?). Luckily I only had to write some JS then I could generate a fast syntax highlighter using tree-sitter, adding it to neovim was again trivial. If I wasn't so worn down from all the tumultuousness prior this would've put a bit more pep in my step.

Moving On

Rigz 1.0 taught me first and foremost that I didn't know Rust or enough about the options I had for making an interpretted language so it was time to learn more about both.

Deciding on a VM

The Rust Book is still a great resource, it looks like there's now an interactive version as well. [Crafting interpreters] and [Structure and Interpretation of Computer Programs] were a great resource for this task, although I'll admit to mostly skimming them. After a quick review and a few aha moments I started to see how I could improve things. When I started to add more language features on top of the modules I didn't like how much I had to change and how much more I'd need to add to make writing rigz pleasant. This is when I realized that rigz needed to be able to do anything a scripting language could and a VM could simplify the process here, a lot of language features are just syntax sugar so building a base set of instructions would make things simpler as they evolved. I started with the basics of a register VM and a builder that converted everything into an array of bytes for the instructions. This was extremely simple but again I really didn't like the implications of this VM, specifically two aspects. First I was going to convert my AST into an array of byte instructions and all values would be converted to bytes then converted back when they were in use. Secondly without a GC this would be a never free language and VM, that was the plan but it didn't sit well with me.

The second problem was relatively simple to solve, I'd move instructions into frames along with all their variables and once a frame was evaluated it would be invalidated or cleared if it needed to be reused. While it's not a perfect solution it is what I'm going to use moving forward, essentially Region-based memory management. The first problem led me to explore a lot of different ways this VM could work; stack based, an actor VM with actix, and a few different ways to make a register VM that would work for this without byte copying. Ultimately I knew I didn't want a stack based VM, while much simpler it's what almost every other interpreted language VM uses (shout-out to lua for its new register VM). Here's where that VM ended up - https://crates.io/crates/fn_vm, and now I throw it all away to build another one (I may write another blog post on the issues with this version for me but for some applications it may be exactly what you need). The next one will be simple hybrid, an enum for instructions in a register based VM with call frames.

Rigz 2.0 will be built as part of the 10 Day Challenge I'm looking forward to us to start building with rigz after the 14th, see you on stream!