interblog

Augmented Index Assignments

I am currently working on a compiler, and even have a number of drafts written up for this very blog that concern it. None of them are in a publishable state, so this very well may be the first blog post on this site. Whoops.

I've made a handful of posts on Tumblr that are kind of off-the-cuff, but I'm going to try documenting the issues I've run into here because you get nice code formatting, and it's more central to what I'm working on.

What we talk about when we talk about assignment

I'm going to be using Python highlighting, but all lines here are going to be using my language's syntax. It should be almost, if not completely, identical to Python's.

In all programming languages you will have a variety of ways to assign one value to some place in memory. Usually that's slapping some kind of thing on the left hand side of an equals sign, evaluating the expression on the right hand side of the equals sign, and then shoving that resulting value into whatever the left hand side represents. In my language, there are currently three ways of assigning a value.

The simplest example is the humble assignment statement:

foo = "bar"

A little more complex is assigning a member on the right hand side of a .:

# easy
foo.bar = "baz"
# harder
foo().bar = "baz"

And finally there is the index assign, usually for assigning a value in a dict or an array:

# simple
foo[0] = 1
# harder
foo[bar] = baz
# hardest
foo.bar[baz] = boop

That's not all for assignments. This doesn't cover "augmented assignments". This is effectively syntactic sugar:

# this:
foo += 1
# is the same as saying:
foo = foo + 1

But consider a more complex assign statement. The naiive implementation would look like this:

# this:
foo.bar()[a() + b()] += 1
# should be the same as saying:
foo.bar()[a() + b()] = foo.bar()[a() + b()] + 1

But this can introduce hidden bugs. What if bar() modifies the state of foo? What if the a or b functions alter global state? We only see these names pop up once in the statement, and therefore, it should safely be assumed that they only ever get called once.

So instead, we need to desugar to something like this:

# this:
foo.bar()[a() + b()] += 1
# should actually compile to this:
temp_base = foo.bar()
temp_index = a() + b()
temp_base[temp_index] = temp_base[temp_index] + 1

We just reserve temporary variables. Easy. Well, kinda easy.

Let's compile this thang

One nice thing about waxing poetic about high-level abstractions is that you don't have to really put in the brainpower to think about specific implementation details. However, we still want this stupid thing to be implemented, so the implementation details matter.

So how would we compile this? Let's start simple again, and work our way up.

foo = "bar"

Should:

  1. Push the constant "bar" to the stack
  2. Put it into the appropriate location in memory.

Note

I am being intentionally vague on how a variable is stored in memory. It depends on whether it's a global or a local, and if it's a local, whether it's being declared for the first time or if it's already been declared. For sake of example and keeping things simple, we are going to assume that we're assigning to a global.

When compiled down, bytecode should probably look something like this:

PUSH_CONSTANT "bar"
STORE_GLOBAL "foo"

Let's see what an index assignment would compile down to. Index assignments themselves are actually sugar for a function call:

# this
foo[0] = "bar"
# desugars into this
foo.__index_assign__(0, "bar")

This way, users can define their own types which can easily override the index suffix. This statement should:

  1. Push the foo variable to the stack
  2. Get its __index_assign__ method
  3. Push the constant 0 to the stack
  4. Push the constant "bar" to the stack
  5. Call the function on the stack 2 arguments back

These steps should compile down to:

LOAD_GLOBAL "foo"
GET_ATTR "__index_assign__"
PUSH_CONSTANT 0
PUSH_CONSTANT "bar"
CALL 2

It's a little more complex, but it's effectively just a function call, so it's a still straightforward.

Let's go through the gnarly one next:

# this
foo.bar()[a() + b()] += 1
# desugars to this
temp_base = foo.bar()
temp_index = a() + b()
temp_base.__index_assign__(temp_index, temp_base[temp_index] + 1)

Breaking this one down into individual steps:

  1. Push the foo variable to the stack.
  2. Get the bar attribute.
  3. Call with 0 arguments to get foo.bar().
  4. Store this result in a temporary variable, call it temp_base.
  5. Load a.
  6. Call with 0 arguments to get a().
  7. Get the __add__ attribute from a (this is how we overload the + operator, and it's how builtin types handle addition as well).
  8. Load b.
  9. Call with 0 arguments to get b().
  10. Call with 1 argument to get a() + b().
  11. Store this result in a temporary variable, call it temp_index.
  12. Load temp_base.
  13. Get the __index_assign__ attribute.
  14. Load temp_index - we have now set up the function and the first argument. We now need to evaluate the value we're storing.
  15. Load temp_base again.
  16. Get the __index__ attribute.
  17. Load temp_index again.
  18. Call with 1 argument to get the value temp_base[temp_index].
  19. Get the __add__ attribute.
  20. Push the constant 1 to the stack.
  21. Call with 1 argument. This gives us the result that we will be storing.
  22. Call with 2 arguments. The __index_assign__ function, as well as the index we're looking up, should be the previous two items on the stack.

Whew.

I want to direct your attention to steps 4 and 11, where we "store this in a temporary variable". These two steps are doing a LOT of heavy lifting. Currently in the VM, there are two ways to store variables: globals, which are tracked in a special list held by the VM, and locals, which are held on the stack and reserved by the compiler. It is probably safe to say that these temporary values look and act a lot like locals instead of globals, and it is probably going to be a lot more convenient to store them on the stack.

For storing temporary values on the stack, we have a couple of options:

  1. Use the already-existing local scope mechanisms to assign values
  2. Re-jigger the DUP instruction (which duplicates the top stack value) to have the ability to look back N items, and duplicate that value instead.

I'm going to go into both options, and this is where the blog post goes from "this is how I do things in the compiler and the VM" to more of a "this is probably what I want to do" scratchpad brainstorm session, becuase I haven't quite figured out what I'm going to do yet.

Use the already-existing local scope mechanisms

The pieces are already there, so why not use them?

I'm a little worried about how the scoping rules work would maybe cause difficult-to-track bugs if we start using it in a way that it technically wasn't originally intended. Hypothetically, weird bugs like these shouldn't happen. You introduce a new local scope, store the temporary variables, and then pop the scope (which would also pop the temporaries). We would use the name "<temporary value>" or something like that so we don't accidentally use a completely valid variable name. If there's a block or a function definition inside of one of those expressions (not saying this would be good code, but it's still valid), they should be creating their own scopes, so they won't be bleeding into our temporary scope.

In the implementation (which is in Rust), it might look like this:

fn visit_index_assign_stmt(&mut self, stmt: &IndexAssignStmt) {
    // .. setup code here
    self.enter_scope();
    let temp_base_local = self.insert_local("<temporary assign base value>")?;
    let temp_index_local = self.insert_local("<temporary assign index value>")?;
    self.compile_expr(&stmt.expr);
    self.compile_expr(&stmt.index);
    // .. compile more stuff, and then eventually we need to retrieve the temporary values:
    self.emit(line, Op::GetLocal(temp_base_local));
    // .. finish compiling everything, and then clean up the scope
    self.exit_scope()
}

Ultimately, the compiled bytecode will look like this:

LOAD_GLOBAL "foo"
GET_ATTR "bar"
CALL 0  # foo.bar()
# STORE_LOCAL "temp_base" - we are actually just leaving it on the stack here
LOAD_GLOBAL "a"
CALL 0  # a()
GET_ATTR "__add__"
LOAD_GLOBAL "b"
CALL 0  # b()
CALL 1  # a().__add__(b())
# STORE_LOCAL "temp_index" - also just leave this on the stack
LOAD_LOCAL "temp_base"
GET_ATTR "__index_assign__"
LOAD_LOCAL "temp_index"
LOAD_LOCAL "temp_base"
GET_ATTR "__index__"
LOAD_LOCAL "temp_index"
CALL 1  # temp_base[temp_index]
GET_ATTR "__add__"
PUSH_CONSTANT 1
CALL 1  # temp_base[temp_index].__add__(1)
CALL 2  # temp_base[temp_index].__index_assign__(..)
POP  # pop result from index assign, which should be nil
POP  # pop temp_index
POP  # pop temp_base

I do think this solution is a little more elegant and robust, but it does leave itself open to hidden bugs due to the compiler's execution state not being exactly what we think it is. I'm always worried about a minor change to a system like how local variables get allocated on the stack, and all of a sudden it breaks something seemingly unrelated like index assignments.

Re-jigger the DUP instruction

The DUP instruction has always felt like a bit of a hack for when the stack machine abstraction breaks down. I will use it when I need a temporary value stored on the stack, or if I need to do some kind of repeated action on something that I just pushed to the stack (like building a Map).

The problem with using a DUP instruction is that we need to modify how the instruction works. If we're dealing with multiple temporary values on the stack, we're going to need to have a way to skip over values to get to the temporary that you need:

temporary #1    # top of stack
temporary #0
# how do we get to temporary 0?

So we would need to modify how DUP works to get a temporary value. We can do this by redefining it to contain an argument, the number of items on the stack to look back. Thus, DUP 0 will look back 0 items and duplicate it, DUP 1 will skip over the top item on the stack and duplicate it, DUP 2 will skip over the top two items and duplicate that, etc.

Using this new definition, its implementation would look like this:

LOAD_GLOBAL "foo"
GET_ATTR "bar"
CALL 0  # foo.bar()
LOAD_GLOBAL "a"
CALL 0  # a()
# Leave this value on the stack. This is the "base" value.
GET_ATTR "__add__"
LOAD_GLOBAL "b"
CALL 0  # b()
CALL 1  # a().__add__(b())
# Leave this value on the stack. This is the "index" value.
DUP 1  # load the base value
GET_ATTR "__index_assign__"
DUP 1  # load the index (have to skip over base.__index_assign__)
DUP 3  # load the base value again
GET_ATTR "__index__"
DUP 4  # load the index again
CALL 1  # temp_base[temp_index]
GET_ATTR "__add__"
PUSH_CONSTANT 1
CALL 1  # temp_base[temp_index].__add__(1)
CALL 2  # temp_base[temp_index].__index_assign__(..)
POP  # pop result from index assign, which should be nil
POP  # pop index
POP  # pop base

This is not an unreasonable design. However, keeping track of the number of stack items that we need to look back when doing a DUP gets cumbersome really quickly. That, plus modifying the existing instruction that already feels like a hack feels like a double-hack. I'm not a huge fan of this.

Conclusion

Both solutions have their positives and their drawbacks. However, I ultimately decided going with the first solution, using local variables. I don't want to have to change an entire instruction to make a specific piece of syntax work. Additionally, it is difficult to keep up with the mental model of the distance between the top of the stack and the items you're wanting to duplicate as you're constantly pushing and popping values. If you need to make a change to the generated code later, you need to go through ALL of your DUP instructions to make sure that you haven't forgotten something and introduced an off-by-one error. Essentially, we are offloading that mental model to be handled by the compiler, rather than my brain, and I think the generated bytecode is going to benefit because of that.