magicfoodhand

A Year of Rust

rust

This won't be a blog post about the latest features in Rigz, for that you can check out Rigz v0.5.0. Instead this will explore challenges faced up until now, some aspects that have made this project painful, and some aspects of Rust that made it perfect for Rigz.

Writing Code the Rust Way

This section will focus on some of the issues I have with Rust now that I've been using it almost daily for over a year. I'm going to skip the issues I've had with Async Rust because I don't think I've had enough time to fully dig in yet but that will probably be its own blog post eventually.

Rust is not Simple

Let's be honest, Rust is not a simple programming language. If you're going to dig into Rust, start with the Rust book and you've got to understand a lot of concepts to be productive in a real project. The borrow checker will make a lot of sense once you've used it for a while (and I've talked about the one remaining gripe I have with it in another post), and while it's pretty straight forward I'd argue it's simpler working with pointers directly. I think a good example of something that should be simple can be shown in the broadcast function of the Rigz VM, bear with me:

macro_rules! broadcast {
($args:expr, $ex: expr) => {
let args: Vec<Value> = $args.collect();
$ex.map(|(id, p)| (id, p.send(args.clone())))
.map(|(id, r)| match r {
Ok(_) => Value::Number((id as i64).into()),
Err(e) => e.into(),
})
.collect::<Vec<_>>()
};
(message: $args:expr, $ex: expr) => {
let message = $args.next().unwrap().to_string();
broadcast! {
$args,
$ex.filter(|(_, p)| match p.lifecycle() {
Some(Lifecycle::On(e)) => e.event == message,
_ => false,
})
}
};
}

fn broadcast(&mut self, args: BroadcastArgs) -> Result<(), VMError> {
let (all, args) = match args {
BroadcastArgs::Args(a) => (false, a),
BroadcastArgs::All(a) => (true, a),
};

let mut args = self
.resolve_args(args)
.into_iter()
.map(|v| v.borrow().clone());

let values = if all {
broadcast! { args, self.processes.iter().enumerate() }
} else {
broadcast! { message: args, self
.processes
.iter()
.enumerate() }
};
self.store_value(values.into());
Ok(())
}

These lines are actually relatively simple, if the broadcast message is all send a message to every process otherwise match on the processes that respond to this message and pass along the arguments. The first question I'd ask is why this needs a macro; macros in Rust use their own syntax and it's effectively like learning a small second language (we'll talk about procedural macros a bit later on). Ideally I could write something like this instead of a macro:

fn broadcast(&mut self, args: BroadcastArgs) -> Result<(), VMError> {
let (all, args) = match args {
BroadcastArgs::Args(a) => (false, a),
BroadcastArgs::All(a) => (true, a),
};

let mut args = self
.resolve_args(args)
.into_iter()
.map(|v| v.borrow().clone());

let broadcast = |send_all: bool| -> Vec<Value> {
let enumerated_processes = self.processes.iter().enumerate();
let processes = if send_all {
enumerated_processes
} else {
let message = args.next().unwrap().to_string();
enumerated_processes.filter(|(_, p)| match p.lifecycle() {
Some(Lifecycle::On(e)) => e.event == message,
_ => false,
})
};
let args: Vec<Value> = args.collect();
processes.map(|(id, p)| (id, p.send(args.clone())))
.map(|(id, r)| match r {
Ok(_) => Value::Number((id as i64).into()),
Err(e) => e.into(),
})
.collect()
};

self.store_value(broadcast(all).into());
Ok(())
}

If you've used Rust before you already know why this won't work, but this will work in a lot of other languages. However the error in Rust, is:

error[E0308]: `if` and `else` have incompatible types
--> crates/vm/src/vm/runner.rs:277:17
|
273 | let processes = if send_all {
| ______________________________-
274 | | enumerated_processes
| | -------------------- expected because of this
275 | | } else {
276 | | let message = args.next().unwrap().to_string();
277 | |/ enumerated_processes.filter(|(_, p)| match p.lifecycle() {
278 | || Some(Lifecycle::On(e)) => e.event == message,
279 | || _ => false,
280 | || })
| ||__________________^ expected `Enumerate<Iter<'_, Process<'_>>>`, found `Filter<Enumerate<Iter<'_, Process<'_>>>, {[email protected]:277:45}>`
281 | | };
| |______________- `if` and `else` have incompatible types
|
= note: expected struct `Enumerate<std::slice::Iter<'_, _>>`
found struct `Filter<Enumerate<std::slice::Iter<'_, _>>, {closure@crates/vm/src/vm/runner.rs:277:45: 277:53}>`

Yes, that's obviously true a Filter<Enumerate<Iter> is not an Enumerate<Iter>, but anyone reading the code can see the intent and know that even though they're not the same this could be valid. Unless I create my own type wrapper to unify these into one type there is no way to remove this duplication without a macro. Duplication is not the end of the world, and sometimes it can make a project simpler overall, but I'd rather make that decision myself instead of the compiler forcing it onto me. A great question is, "should this be how broadcast works? The All option seems like it wouldn't do what people would expect", that's a fantastic observation but that's how I want it to work for now. As I start using it more it may change.

Debugging Rust can be a Pain

This is the biggest negative of Rust in my opinion, I use Intellij IDEs (specifically RustRover for Rust) or VSCode for most of my development. The first thing I setup when working in any language is the ability to debug it, I'm not a fan of print style debugging when I can step through the code instead. While Rust has a great debugger it's not very helpful when working with opaque types, like HashMap or Vec. While there may be a way to improve the experience, out of the box it's useless in RustRover (LLDB is a lot better, at least it doesn't occasionally crash when I try to explore an element and it shows string values unlike GDB). I can see how many elements it contains and that's about it, no exploring inner elements, and I'm left adding print statements when I'm really stuck. When I compare this to the debugging experience in Python, Javascript, Ruby, Java, Kotlin, or Swift; I'm left feeling a bit frustrated.

match expressions & Traits

I'm a huge fan of Rust's enum type (the Result type is fantastic), using match to handle every possibility is nice to have. That being said, it's possible to achieve the same thing in any language that supports dynamic dispatch for interfaces, in Java I'd create a base interface then call a method and use Jackson's polymorphic deserialization to make sure anything passed in from the outside is mapped correctly for the given type. I'd still argue that this is one area where Rust is simpler than the alternative, use serde to generate the Serialization/Deserialization trait implementations for you and handle every type when you need to. In Rust, you want enums and match, the problem arises if you try to do things the Java way in Rust with traits. Consider this enum:

#[derive(Clone, Debug, Default, Serialize, Deserialize)]
#[serde(untagged)]
pub enum Value {
#[default]
None,
Bool(bool),
Number(Number),
String(String),
List(Vec<Value>),
Map(IndexMap<Value, Value>),
Range(ValueRange),
Error(VMError),
Tuple(Vec<Value>),
Type(RigzType),
}

First of all ignore the issue where you can never deserialize a value into a tuple, (it will always turn into a list because I'm using the untagged attribute of serde), but that's beside the point for now. What if I want this to instead be a trait like a Java interface? That's going to be a pain; first of all if it's not known at compile time (as a generic argument) I'll have to wrap in something that has a known size, like a Box that will force it on the heap, but that's not the worst issue. You can also use impl Trait if it's a function argument or return value but storing it in a struct or enum requires the trait to be wrapped. More importantly how do you handle adding two of them together? How do you require them to be de/serializable? Naively you'd think something like this should work:

pub trait ValueTrait: Clone + Debug + Serialize + for <'a> Deserialize<'a> {
fn rigz_type(&self) -> RigzType;

fn cast(&self, rigz_type: RigzType) -> Result<Box<dyn ValueTrait>, VMError>;

fn cast_self(&self, other: Box<dyn ValueTrait>) -> Result<Self, VMError> {
let rt = self.rigz_type();
other.cast(rt)
}

fn add(&self, other: Box<dyn ValueTrait>) -> Result<Box<dyn ValueTrait>, VMError>;
}

impl ValueTrait for Number {
fn rigz_type(&self) -> RigzType {
RigzType::Number
}

fn cast(&self, rigz_type: RigzType) -> Result<Box<dyn ValueTrait>, VMError> {
/// ...
}

fn add(&self, other: Box<dyn ValueTrait>) -> Result<Box<dyn ValueTrait>, VMError> {
let other = self.cast_self(other)?;
Ok(Box::new(self + &other))
}
}

This won't work for three reasons:

  1. The other.cast call in cast_self isn't known to return Self, it returns a Box and that may not be self so Rust won't allow it. This would be mitigated by not including a default implementation in the base trait but the goal here is mirroring things the way you'd do them outside of Rust.
    2 & 3. The trait can't be made into an object if you require Clone + Serialize + Deserialize, the second reason is because they require Sized (which isn't possible on a trait, Clone can be used with a crate like dyn_clone), and the third is that the serialization traits have a generic type parameter (there are other crates like erased_serde, or typetag that can handle these problems but figuring out how to handle it yourself is an uphill battle), here's an example of the error:
error[E0038]: the trait `ValueTrait` cannot be made into an object
--> crates/vm/src/value/mod.rs:51:26
|
51 | let other = self.cast_self(other)?;
| ^^^^^^^^^ `ValueTrait` cannot be made into an object
|
note: for a trait to be "dyn-compatible" it needs to allow building a vtable to allow the call to be resolvable dynamically; for more information visit <https://doc.rust-lang.org/reference/items/traits.html#object-safety>
--> crates/vm/src/value/mod.rs:28:23
|
28 | pub trait ValueTrait: Clone + Debug + Serialize + for <'a> Deserialize<'a> {
| ---------- ^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^ ...because it requires `Self: Sized`
| | |
| | ...because it requires `Self: Sized`
| this trait cannot be made into an object...
|
::: /home/mitch/.cargo/registry/src/index.crates.io-6f17d22bba15001f/serde-1.0.217/src/ser/mod.rs:256:8
|
256 | fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
| ^^^^^^^^^ ...because method `serialize` has generic type parameters
= help: consider moving `serialize` to another trait
= help: only type `number::Number` is seen to implement the trait in this crate, consider using it directly instead
= note: `ValueTrait` can be implemented in other crates; if you want to support your users passing their own types here, you can't refer to a specific type

In Rust using typetag will generate an enum for de/serialization and luckily dyn_clone is very easy to use, but similar to the first issue I mentioned when you know that the code is doing what you want, you may have to bend over backwards for the compiler and do things the Rust way. In Java you'd cast, check the type, and probably throw an exception. Is that simpler than using enums and match? Definitely not, and this is an extreme example of the compiler getting in your way because you're doing something that you probably shouldn't in Rust.

Perks of Rust

We've spend a long time exploring things about Rust that make this project painful, if it's so painful why haven't I switched to something else? While it's true I've fully rewritten this project at least three times, Rust is still the right project for now even though the language would be much further along if I had used Java, Go, Zig, or even C instead. The perks are just really nice to have.

Tree Structures

The parser and CLI is where Rust really shines, once you parse an AST you've got to walk it to produce VM instructions and the convenience of match is fantastic here. Even if I didn't have a dedicated VM and simply walked the tree to run the language Rust would still be excellent here. The CLI is similar, use clap to generate the commands and then match on the enum of possible arguments (Run, Test, Debug, etc).

From/Into

The From trait is probably my favorite (it generates an implementation of Into and is generally preferred over just using the Into trait) and allows you to write code like this.

let v: Value = vec![1, 2, 3].into()
let v = Value::from(vec![1, 2, 3])

impl<T: Into<Value>> From<Vec<T>> for Value {
#[inline]
fn from(value: Vec<T>) -> Self {
Value::List(value.into_iter().map(|v| v.into()).collect())
}
}

The only drawbacks are that the IDE will usually take you to the base implementation of the trait, not the specific implementation being used, but as long as you keep them near the result value it's not hard to track down. Finding where the traits are used can be a little challenging (especially for From/Into) but generally they're write and forget.

Operation Traits

These traits are for when you want to make a custom type and use operators to call a function (PartialEq ==, Add +, AddAssign +=, Index [], etc). This is the reason I'm most hesitant to switch to something else, they're incredibly convenient and in Rust these can be inlined by the compiler where they're used. Even though functions like add, eq, or index, aren't that much to type I think that intent is shown better by using the operators directly. If you've ever seen this chart or listened to Grace Hopper talk about nanoseconds, it really makes you consider if what you're writing really is as fast as it can be. Rust's inlining is pretty impressive and while this is not as fast as it could be, it balances the convenience of writing it with speed well. Eventually the VM may be rewritten in C or Zig, for now we'll stick with Rust as a middle ground.

Macros

Maybe it's from Ruby being the first dynamic language I learned, but I've always liked meta-programming. During this year's Advent of Code it was incredibly helpful. I had one goal, create an abstraction that would let me run any day or part. This meant that I had to accept a String input and return an output for each part. I was able to create macros for all the benchmarking, testing, and integration tests; they helped me move confidently.Here's an example of the test macro for the test input on day 1:

daily_tests!(
r#"3 4
4 3
2 5
1 3
3 9
3 3"#
= 11 & 31
);

While it's definitely possible to get something like this working in other languages, the convenience of Rust macros is tough to beat.

Procedural Macros

While these are a pain they allow you to do some truly incredible stuff, I talked about this a bit in Rigz is Live but wanted to do a deeper dive on the gains here. They're much more challenging to get right than ordinary macros; they are effectively a much more advanced version of the language used for standard macros, maybe even like learning a third language (with regular macros being the second), but the trade off is you can do just about anything you want as long as you generate valid code. In Rigz they are used to parse a definition then generate a trait that can be implemented so that it can be called by the VM.

For example, here's the Random module definition using derive_module:

derive_module! {
r#"trait Random
fn next_int -> Int
fn next_float -> Float
fn next_bool(percent: Float = 0.5) -> Bool
end"#

}

impl RigzRandom for RandomModule {
fn next_int(&self) -> i64 {
rand::random()
}

fn next_float(&self) -> f64 {
rand::random()
}

fn next_bool(&self, percent: f64) -> bool {
let mut rng = rand::thread_rng();
rng.gen_bool(percent)
}
}

While this is a trivial example, derive_module! handles a lot the common operations I'd be doing with any module. To show the savings here's the generated macro:

#[derive(Copy, Clone, Debug)]
pub struct RandomModule;

trait RigzRandom {
fn next_int(&self) -> i64;
fn next_float(&self) -> f64;
fn next_bool(&self, percent: f64 ) -> bool;
}

impl<'vm> Module<'vm> for RandomModule {
#[inline]
fn name(&self) -> &'static str { "Random" }
fn call(&self, function: &'vm str, args: RigzArgs) -> Result<Value, VMError> {
match function {
"next_int" => {
let result = self.next_int();
Ok(result.into())
}
"next_bool" => {
let [ percent ] = args.take()?;
let percent = percent.borrow().to_float()?;
let result = self.next_bool(percent);
Ok(result.into())
}
"next_float" => {
let result = self.next_float();
Ok(result.into())
}
_ => Err(VMError::InvalidModuleFunction(format!("Function {function} does not exist")))
}
}
#[inline]
fn trait_definition(&self) -> &'static str { "trait Random\n fn next_int -> Int\n fn next_float -> Float\n fn next_bool(percent: Float = 0.5) -> Bool\n end" }
}

impl<'a> ParsedModule<'a> for RandomModule {
#[inline]
fn module_definition(&self) -> ModuleTraitDefinition<'static> {
ModuleTraitDefinition {
auto_import: false,
definition: TraitDefinition {
name: "Random",
functions: vec![FunctionDeclaration::Declaration { name: "next_int", type_definition: FunctionSignature { arguments: vec![], return_type: FunctionType { mutable: false, rigz_type: RigzType::Int }, self_type: None, var_args_start: None, arg_type: ArgType::Positional } }, FunctionDeclaration::Declaration { name: "next_float", type_definition: FunctionSignature { arguments: vec![], return_type: FunctionType { mutable: false, rigz_type: RigzType::Float }, self_type: None, var_args_start: None, arg_type: ArgType::Positional } }, FunctionDeclaration::Declaration { name: "next_bool", type_definition: FunctionSignature { arguments: vec![FunctionArgument { name: "percent", default: Some(Value::Number(Number::Float(0.5f64))), function_type: FunctionType { mutable: false, rigz_type: RigzType::Float }, var_arg: false, rest: false }, ], return_type: FunctionType { mutable: false, rigz_type: RigzType::Bool }, self_type: None, var_args_start: None, arg_type: ArgType::Positional } }, ],
},
}
}
}

This handles generating a match statement for the correct function and arity/typing, converts arguments and return values to the correct type, returns a static representation of the parsed input so I don't need to parse it multiple times (and know at compile time if a change broke an existing module). Getting this working was a nightmare and there are definitely edge cases that need to be handled, but with 13 modules (so far) this has saved a lot of repetition and potential errors I could make. It's by far the aspect of Rigz that I'm most proud of, even though I'll probably be the only one to ever use it.

The only drawback I see with macros is the inability to tell if something is a standard macro or procedural from a glance, you can usually go to definition and I'd argue it doesn't really matter from the caller's point of view.

Conclusion

While there are definitely some aspects of Rust that have made this project difficult, with over 15,000 lines of code so far it has been a great experience and I'm looking forward to the next 15k.