WebAssembly From Scratch

Setting the stage

Before diving into wat, let's come up with a pseudo programming language that we can use to describe the wat code we will write. This pseudo language is not real and not meant to be implemented. It exists solely to force us to think like a compiler: tracking types, stack values, and memory layout explicitly.

This language will be simple, it will contain a few basic types, functions, for/while loops, if statements and global/local variables. The syntax will be based off of TypeScript and Rust and be no more complex than C.

Types

WAT builtin types
- Integer types: i32, i64 -- Can be treated as either signed or unsigned. For example, there are 2 division operations for i32, i32.div_u and i32.div_s for unsigned and signed division. There are also special load/store operations for loading/storing 8/16 bit values as well.
- Float types: f32, f64 -- Standard float types.
All other types are just made up by us and we will have to track those in our head similar to how a compiler would track them.
- Boolean -- In WASM, there is no boolean type. We will use an i32: false will be 0 and anything else will be true. I chose that setup because this mirrors WASM's behavior exactly.
- Structs -- We will track the struct members, offsets, etc. by hand when using them. Accurately tracking structs by hand will help you understand a little bit about what a compiler is doing behind the scenes.
- Pointers -- Pointers are just index offsets to our shared memory between WASM and JS.
- Strings -- We want to have string literals, e.g. log("Hello World!\n"). When we learn about structs we can talk more about how to handle strings.

Variables

let variable: i32 = 0;

Variables are declared with 'let,' and all variables will be mutable and will have their types declared in a TS-like syntax.

Functions

fn do_nothing(): void {}

fn add32(a: i32, b: i32): i32 {
  return a + b;
}

Functions are declared with the 'fn' keyword, and have a return type of T, or void when the function returns nothing. The parameters are typed, just like in TS as well.

Control Flow

if (x > 10) {
  log("Hello World");
} else {
  log("X is not high enough");
}

while (x < 10) {
  x++;
}

for (let i: i32 = 0; i < 10; i++) {
  log("Message!");
}

If statements, while and for loops, just as you would expect in most other languages, nothing fancy.

Structs

struct vec3 {
  x: f32,
  y: f32,
  z: f32,
};

struct string {
  len: i32,
  ptr: i32,
}

Structs are a 1-to-1 representation of the memory. A vec3 above would occupy the space of 3 f32's and no more or less. A string is made of a length and a pointer to the characters, and pointers in WASM are just indices into the block of memory created by the WebAssembly.Memory class. In wat, we use i32 as the pointer type, so for a string the pointer is just an index offset into that memory buffer.

Now that we have a common language we will be able to first write out our code in it and then convert it into wat. Once you have done this enough, writing wat will be second nature.

WASM types documentation

Stack based virtual machines

The WASM VM is a stack based machine where instructions do not take named arguments, but instead operate on the implicit operand stack. Instructions push values onto the stack, pop values off the stack, and sometimes push a result back. There are no registers you move values between and no expression trees in the traditional sense. The stack is the only place values live while a function is executing.

This can feel a bit backwards coming from a register based model. In a register based model you usually think in terms of loading values into named locations, performing operations on those locations, and storing the results somewhere explicit. With WASM start thinking about producing and consuming values, with only one place for them to go to and come from: the stack.

This is why WASM instructions are so small and specific. An instruction like i32.add does not say what to add. It simply assumes that the top two values on the stack are both i32 values. It pops them, adds them, and pushes the result back. If the types do not line up, the module is invalid and will not even instantiate.

This execution model is directly reflected in the WAT syntax. When you see something like:

(i32.add
  (local.get $a)
  (local.get $b)
)

what you are really seeing is a compact way of saying:

Push the value of $a onto the stack
Push the value of $b onto the stack
Pop two i32 values, add them, and push the result

The nested form hides the stack, but the stack is still there. The linear form makes it explicit:

(local.get $a)
(local.get $b)
(i32.add)

Both forms describe the same sequence of stack operations. There is no semantic difference between them. One is just easier to read once you are comfortable with it.

This matters for this article and the rest of the series because almost everything in WASM follows this pattern. Arithmetic, comparisons, function calls, control flow, and even memory loads and stores are all expressed as stack operations. An if does not check a boolean variable. It consumes an i32 value from the stack. A call does not pass arguments by name. It consumes values from the stack in order. A function does not return by assigning to a return slot. It leaves a value on the stack.

While learning WAT, thinking in terms of the stack is a top priority. If you read it as a precise description of stack operations, it becomes easier to manipulate and see how the higher level concepts arise from the low level ones.

The multiple ways of writing wat

There are a few places in wat where there are multiple ways of writing out the code. Let's first look at writing expressions that take arguments. When I say take arguments what I really mean is that the expressions will utilize values on the stack by popping them off.

In the first part we created a function that adds two numbers, lets look at it again

(module
  (func $add32 (export "add32") (param $a i32) (param $b i32) (result i32)
    (return (i32.add (local.get $a) (local.get $b)))
  )
)

Let's hone in on the call to i32.add, this is a binary operation that adds 2 i32 numbers together. It does this by popping the top 2 values off of the stack, adding them together, and then pushing the sum to the stack for the return expression. We could write this another way that explicitly shows this stack-based behavior

(module
  (func $add32 (export "add32") (param $a i32) (param $b i32) (result i32)
    (local.get $a)
    (local.get $b)
    (i32.add)
    (return)
  )
)

In this example we first push $a on the stack with local.get which loads a local value on to the stack. Then we push $b on to the stack, then we call i32.add which pops $a and $b, adds them together and then pushes the sum to the stack. Return signifies the end of the function and the function's return value is whatever is left on the stack. This version really shows how WASM is stack based and is a 1-to-1 representation of each line of WASM in binary form.

When we take a look at blocks, loops and if/else we will take a look at some other alternative syntax that will allow for us to lower the amount of parentheses we have to use.

When do you use each of the forms?

Nested form: readable, compiler-like, good for documentation
Linear form: exact VM behavior, good for learning a new concept or sketching out an idea.

I will use the nested form throughout do to its compactness and legibility.

WASM Binary Docs

Jump into functions

Let's jump right into functions with a simple collision checker that checks if two rectangles are overlapping and return true or false. Each rectangle will be represented by its position <x, y> and dimensions <width, height>. For now we will just pass in each of these numbers (x,y,w,h) separately for each rectangle.

fn collided(aX: f32, aY: f32, aW: f32, aH: f32, bX: f32, bY: f32, bW: f32, bH: f32): bool {
  let xOverlap = (aX < bX + bW) && (aX + aW > bX);
  let yOverlap = (aY < bY + bH) && (aY > aH + bY);
  return xOverlap && yOverlap;
}

All this checks for is if the rectangles overlap on the X axis and Y axis, if they overlap both then the rectangles are overlapping. Let's take a look at the full wat function

(func $collided (export "collided")
  (param $a_x f32) (param $a_y f32) (param $a_w f32) (param $a_h f32)
  (param $b_x f32) (param $b_y f32) (param $b_w f32) (param $b_h f32)
  (result i32)

  (local $x_overlap i32)
  (local $y_overlap i32)

  (local.set $x_overlap
    (i32.and
      (f32.lt (local.get $a_x) (f32.add (local.get $b_x) (local.get $b_w)))
      (f32.gt (f32.add (local.get $a_x) (local.get $a_w)) (local.get $b_x))
    )
  )
  (local.set $y_overlap
    (i32.and
      (f32.lt (local.get $a_y) (f32.add (local.get $b_y) (local.get $b_h)))
      (f32.gt (f32.add (local.get $a_y) (local.get $a_h)) (local.get $b_y))
    )
  )

  (i32.and (local.get $x_overlap) (local.get $y_overlap))
)

Let's break this down starting with the function signature. First let's see them side-by-side:

fn collided(aX: f32, aY: f32, aW: f32, aH: f32, bX: f32, bY: f32, bW: f32, bH: f32): bool

(func $collided (export "collided") (param $a_x f32) (param $a_y f32) (param $a_w f32) (param $a_h f32) (param $b_x f32) (param $b_y f32) (param $b_w f32) (param $b_h f32) (result i32) ... )

It is pretty much 1-for-1 here. I added underscores to the wat code to make it easier to spot the variables due to the sea of parentheses.

I want to quickly highlight the export expression here, the name "collided" is the name you would use outside of this wat module, like from TS. From TS our instance's exports will have a method called 'collided.' The dollar-sign leading name is for use within the wat code itself. If you wanted to call collided from another function in wat, you would use the call expression: (call $collided) with the 2 rectangle's data on the stack.

One thing to note here is that the result type is i32 and not bool, because those types don't exist in wat so we need to mentally cast them into i32 types. Comparisons like f32.lt and f32.gt produce i32 values, where 0 represents false and 1 represents true. Logical operators like i32.and operate directly on those integer values. From the VM’s point of view, this function returns an integer. From our pseudo language point of view, we interpret that integer as a boolean.

Moving down into the body of the function we will see how it maps to the execution model discussed earlier. First are the local declarations

(local $x_overlap i32)
(local $y_overlap i32)

Locals must be declared at the top of WASM functions. You cannot introduce a new local later on in the function body so anything created in a loop or nested loop, etc. must be created at the top. Parameters are also locals, they are just implicitly declared before any explicit locals. All locals have fixed types and those types never change.

After the locals have been declared we can write the actual function logic. The first line sets the x_overlap variable.

let xOverlap = (aX < bX + bW) && (aX + aW > bX);

(local.set $x_overlap
  (i32.and
    (f32.lt (local.get $a_x) (f32.add (local.get $b_x) (local.get $b_w)))
    (f32.gt (f32.add (local.get $a_x) (local.get $a_w)) (local.get $b_x))
  )
)

Let's take a look at the wat in chunks starting with local.set. It needs a value on the stack that matches the local variable's type along with the name of the variable. This expression uses the following forms:

Nested - (local.set $<variable_name> (<value expression>) )
Linear - (<expression that leaves a value on the stack>) (local.set $<variable_name>)

In the case of x_overlap, our value expression is the call to i32.and which pops 2 two i32 values off the stack and runs a logic and operation which treats both operands as Booleans in the form of (i32.and (<expression>) (<expression>) ).

Let's look at the left and right sides of that i32.and, they are both binary comparison operations which compare 2 f32s against each other, popped from the stack, and then pushes an i32 value on the stack as a Boolean value. This pattern is repeated again with the f32.add calls inside of each comparison.

Let's take a look at the linear form of this, just to really show the difference

;; aX < bX + bW
(local.get $a_x) ;; push $a_x on the stack
(local.get $b_x) ;; push $b_x on the stack
(local.get $b_w) ;; push $b_w on the stack
(f32.add)        ;; pop last 2 off stack and push(b_x + b_w)
;; stack has [$a_x, b_x + b_w]
(f32.lt)         ;; pop 2 and push($a_x < b_x + b_w)
(local.get $a_x)
(local.get $a_w)
(f32.add)        ;; pop 2 and push (a_x + a_w)
(local.get $b_x)
(f32.gt)         ;; pop 2 and push (a_x + a_w > b_x)
;; stack has [$a_x < b_x + b_w, a_x + a_w > b_x]
(i32.and) ;; ands the 2 items on the stack and pushes the result to the stack

That might look inefficient if you are thinking in terms of a high level language, but it matches the rules of the VM exactly. You should reinforce that WAT is a one-to-one with WASM.

The same pattern is repeated for $y_overlap. Once both locals are set, the function finishes with this expression:

(i32.and (local.get $x_overlap) (local.get $y_overlap))

There is no explicit return instruction here. The result of i32.and is left on the stack. Because the function declares a single return value, that value becomes the return value automatically when the function exits. This is a common pattern in WAT and keeps the code a little cleaner once you are comfortable reading it.

At this point the function is complete. All parameters were consumed through local.get, all intermediate values lived on the operand stack, and the final result was produced by leaving a value on the stack at the end of the function.

This example is deliberately a bit verbose for what it does. It forces you to see how comparisons, logical operations, locals, and return values all work together in a stack based VM. As we move on to control flow and memory access, the same rules apply. The syntax changes slightly, but the execution model does not.

Let's now go back and update our TypeScript code to load and test out our new function.

1. Run npm run build:wasm

2. Update TypeScript

type TestModule = {
  add32: (a: number, b: number) => number;
  collided: (
    aX: number,
    aY: number,
    aW: number,
    aH: number,
    bX: number,
    bY: number,
    bW: number,
    bH: number
  ) => number;
};

async function main() {
  ...
  console.log(testModule.collided(0, 0, 10, 10, 5, 5, 10, 10) === 1);
}
main();

3. Run the frontend and check

You should see true printed in the dev console. If you do not and you see errors instead then make sure you compiled the WASM binary without errors (step #1) and that you are loading the binary WASM file correctly (from the last article)

And that is a simple overview of how functions work. I would advise you to write a few more functions, compile the binary, update the TestModule type and test out your functions.

Control Flow

So far everything we have written executes in a straight line: Values go on the stack, operations consume them, results come back. Now we are going to look at branches in the code as well as loops.

In WASM, control flow exists, but it looks a little different than what you might be used to. There is no instruction pointer you can move around freely and all control flow is structured. This restrictions is what allows the browser to validate and compile modules quickly and safely.

When I say that structured control flow allows the browser to validate and compile modules quickly and safely, I am not talking about a vague “security benefit” in the abstract. I am talking about very concrete properties of the WebAssembly validation and compilation pipeline.

WebAssembly is designed so that a module can be fully validated in a single linear pass before it ever runs. That validation step proves several things up front:

Every instruction is reachable through well-formed control flow
Every branch target is known and valid
The operand stack has a known shape and types at every program point
No instruction can jump into the middle of another construct
No instruction can observe or corrupt the VM state in an unexpected way

In a traditional assembly language, you have arbitrary jumps and can branch to any address. That means the compiler has to make the assumption that any execution could arrive at almost any instruction with almost any register state.

Let’s start by updating our collided function to introduce a simple early escape. This is not meant to be a real optimization. It is just a concrete excuse to introduce if.

Early exit with if

Here is the updated pseudo code:

fn collided(
  aX: f32, aY: f32, aW: f32, aH: f32,
  bX: f32, bY: f32, bW: f32, bH: f32
): bool {
  let xOverlap = (aX < bX + bW) && (aX + aW > bX);
  if (!xOverlap) {
    return false;
  }

  let yOverlap = (aY < bY + bH) && (aY > aH + bY);
  return yOverlap;
}

If the rectangles do not overlap on the X axis, there is no reason to compute the Y overlap. We return early. In WAT, this becomes:

(func $collided (export "collided")
  (param $a_x f32) (param $a_y f32) (param $a_w f32) (param $a_h f32)
  (param $b_x f32) (param $b_y f32) (param $b_w f32) (param $b_h f32)
  (result i32)

  (local $x_overlap i32)
  (local $y_overlap i32)

  ;; x overlap
  (local.set $x_overlap
    (i32.and
      (f32.lt (local.get $a_x) (f32.add (local.get $b_x) (local.get $b_w)))
      (f32.gt (f32.add (local.get $a_x) (local.get $a_w)) (local.get $b_x))
    )
  )

  ;; if (!x_overlap) return 0
  (if
    (i32.eqz (local.get $x_overlap))
    (then
      (return (i32.const 0))
    )
  )

  ;; y overlap
  (local.set $y_overlap
    (i32.and
      (f32.lt (local.get $a_y) (f32.add (local.get $b_y) (local.get $b_h)))
      (f32.gt (f32.add (local.get $a_y) (local.get $a_h)) (local.get $b_y))
    )
  )

  (local.get $y_overlap)
)

First, if in WASM consumes a value from the stack. There is no condition expression attached to it like in a high level language. Whatever is on the stack is the condition. Zero means false. Non-zero means true. That is why we use i32.eqz. It takes an i32 value, compares it to zero, and produces a new i32 that is 1 if the value was zero and 0 otherwise. This gives us the logical NOT we need.

Second, return is explicit here. When you are inside control flow, relying on “last value on the stack” becomes harder to reason about, so being explicit is often clearer. return immediately exits the function, regardless of how deeply nested you are.

If, else-if and else patterns

(if
  (condition)
  (then
    ;; true branch
  )
  (else
    ;; false branch
  )
)

This is the full if/else pattern. Both branches must produce the same stack effect at the point where control flow rejoins, meaning that if the then branch leaves an i32 on the stack, the else must also leave an i32 on the stack. If it is only used for control flow, neither branch needs to leave anything. You can also explicitly provide a result type, just like we do with functions.

(if (result type)
  (condition)
  (then
    ;; true branch
  )
  (else
    ;; false branch
  )
)

There is no dedicated else if instruction. You build it by nesting if blocks:

(if
  (cond_a)
  (then
    ;; case A
  )
  (else
    (if
      (cond_b)
      (then
        ;; case B
      )
      (else
        ;; default
      )
    )
  )
)

Loops

Loops in WASM are also structured. There are no arbitrary jumps. Everything is built out of block, loop, and branch instructions.

Let’s start with a simple pseudo function:

fn accumulate(n: i32): i32 {
  let i = 0;
  let v = 0;
  while (i < n) {
    v += 11;
    i += 1;
  }
  return v;
}

And in WAT:

(func $accumulate (export "accumulate") (param $n i32) (result i32)
  (local $i i32)
  (local $v i32)

  (local.set $i (i32.const 0))
  (local.set $v (i32.const 0))

  (block $exit
    (loop $loop
      ;; if (i >= n) break
      (br_if $exit
        (i32.ge_s (local.get $i) (local.get $n))
      )

      ;; body
      (local.set $v
        (i32.add (local.get $v) (i32.const 11))
      )
      (local.set $i
        (i32.add (local.get $i) (i32.const 1))
      )

      (br $loop)
    )
  )

  (local.get $v)
)

This introduces block, loop, br and br_if.

loop - Marks the top of the loop and isn't really the loop itself, but a label to jump to.
block - Acts as a marked exit point for our loop to flow out into.
br - Branches to a loop, block or an if statement. This is our unconditional jump/goto command.
br_if - Just like br but it will first pop a boolean off of the stack and jump on a true condition.

That is really it, with that we can design the skeleton of any type of loop. While, for, and do-while can easily be created with the above instructions.

while (<condition>) { <body> }

(block $exit
  (loop $loop
    (br_if $exit <condition>)
    <body>
    (br $loop)
  )
)

do { <body> } while (<condition>)

(block $exit
  (loop $loop
    <body>
    (br_if $loop <condition>)
  )
)

for (<init>; <condition>; <update>) { <body> }

<init>
(block $exit
  (loop $loop
    (br_if $exit <condition>)
    <body>
    <update>
    (br $loop)
  )
)

There is no special syntax for these constructs. They are all just different arrangements of the same few primitives. This is one of the places where WASM feels restrictive compared to native assembly, but that restriction is intentional. Because all control flow is structured, the engine can always know where branches can go and what the stack looks like at those points. This makes validation fast and guarantees that execution cannot jump into the middle of nowhere.

With control flow in place, we now have enough tools to write non-trivial functions. The next step is to stop pretending that everything fits in locals and start working with memory directly. That is where WASM really starts to feel different from high level languages.

Structs, arrays and pointers

So far, we have been working exclusively with locals and function parameters. Real programs need to use memory dynamically and wont live entirely on the stack. To be able to create objects, arrays and dynamic containers we need to understand how we can interact with memory and start thinking about structs instead of objects.

In WebAssembly all memory is exposed as a single, flat, contiguous array of bytes. It is up to us to structure and organize that memory as we see fit. The landscape is ours to meld to our liking. Every “structure” is just a convention, every “pointer” is just an integer and every “array” is just repeated data laid out sequentially.

Linear Memory

A WASM module can define one or more linear memories, but in practice you will almost always use exactly one:

(memory (export "memory") 1)

This declares a memory with an initial size of one page. A page is 64KB. Memory is addressed byte by byte, starting at offset zero.

From inside WASM, memory is accessed using load and store instructions:

i32.load
i32.store
f32.load
f32.store
and more

Every load and store takes an address and that address is just an i32. There is no built-in pointer type, The instruction reads or writes bytes starting at that offset.

If you take one thing away from this section, it should be this: memory in WASM is dumb. All structure comes from how you choose to use it.

Structs are layouts

When we say “struct” in this series, we are not talking about objects with methods, constructors, or identity. We are talking about layouts. We should throw any notion of "object" out the window when dealing with WASM. The struct is the fundamental organization block for any type other than a number. From the struct we can build any other type.

Here is a struct definition in our pseudo language:

struct vec3 {
  x: f32,
  y: f32,
  z: f32,
};

This does not exist in WASM. What exists is the idea that:

x is at offset 0
y is at offset 4
z is at offset 8

That is the entire definition and if you have a pointer p that refers to a vec3, then:

p + 0 is x
p + 4 is y
p + 8 is z

There is no padding unless you add it yourself and there is no alignment respected unless you respect it yourself. The layout is the entirety of the type and this is pretty much how C works once you strip away the syntax.

Pointers

A pointer in WASM is an i32 that represents a byte offset into linear memory. There is nothing special about pointer values, you can add to them, subtract from them, store them in locals, pass them to functions, and return them. In WASM you can think of them as an index into an array of numbers. For example, if a function takes a pointer to a vec3, its signature might look like this:

(param $vec_ptr i32)

When you see that, you should read it as “an index into memory where a vec3 layout begins”. Every load and store that uses that pointer is just arithmetic plus a memory instruction:

(f32.load (local.get $vec_ptr))        ;; x
(f32.load (i32.add (local.get $vec_ptr) (i32.const 4)))  ;; y
(f32.load (i32.add (local.get $vec_ptr) (i32.const 8)))  ;; z

Nothing enforces that this pointer actually points to a vec3. That is entirely on you.

Arrays

An array is just a sequence of values laid out back to back in memory. It is just structs (or basic number values) back-to-back.

For example, an array of i32 values:

element 0 at base + 0
element 1 at base + 4
element 2 at base + 8
and so on

Indexing into an array is just multiplication and addition:

address = base + index * sizeof(element)

In WAT:

(i32.load
  (i32.add
    (local.get $base)
    (i32.mul (local.get $index) (i32.const 4))
  )
)

There is no length unless you store it somewhere yourself. If you want safety, you build it explicitly.

Data segments

So far, all of this assumes that memory already contains something useful. To actually put data into memory at module load time, WASM provides data segments.

A data segment copies raw bytes into memory at a fixed offset:

(data (i32.const 0) "Hellow World!\n")

This writes those bytes starting at offset 0. You can also use data segments to initialize structs:

(data (i32.const 32)
  ;; len = 13
  "\0d\00\00\00"
  ;; ptr = 0
  "\00\00\00\00"
)

It's just bytes! Whether those bytes represent a string, a struct, or garbage depends entirely on how you interpret them later.

Data segments are one of the cleanest ways to demonstrate that data layout is the program. There is no separation between “definition” and “instance”. The bytes are the truth.

Linked List demo

To tie all of this together, let’s build something concrete: a linked list.

Pseudo definition:

struct node {
  value: i32,
  next: i32, // pointer to next node or 0 for null
};

Layout:

value at offset 0
next at offset 4

Let’s place two nodes in memory:

(memory (export "memory") 1) ;; at the top after (module

(data (i32.const 0)
  ;; node 1
  "\01\00\00\00" ;; value = 1
  "\08\00\00\00" ;; next = 8

  ;; node 2
  "\02\00\00\00" ;; value = 2
  "\00\00\00\00" ;; next = 0
)

(func $sum_list (export "sum_list") (param $head i32) (result i32)
  (local $sum i32)
  (local $curr i32)

  ;; sum = 0
  (local.set $sum (i32.const 0))

  ;; curr = head
  (local.set $curr (local.get $head))

  ;; do { ... } while(!(curr == 0))
  (block $exit
    (loop $loop

      ;; sum += curr.value
      (local.set $sum
        (i32.add
          (local.get $sum)
          (i32.load (local.get $curr))
        )
      )

      ;; curr = curr.next
      (local.set $curr
        (i32.load
          (i32.add
            (local.get $curr)
            (i32.const 4) ;; curr.next is curr with an offset of 4
          )
        )
      )

      ;; break if curr is 0 (no cycles)
      (br_if $exit (i32.eqz (local.get $curr)))

      (br $loop)
    )
  )

  (local.get $sum)
)

No magic to be found, no hidden allocations or object lifetimes or any type of runtime. It is just a block of memory and integers.

Now for the TypeScript side:

type TestModule = {
  add32: (a: number, b: number) => number;
  collided: (
    aX: number,
    aY: number,
    aW: number,
    aH: number,
    bX: number,
    bY: number,
    bW: number,
    bH: number
  ) => number;

  sum_list: (ptr: number) => number;
};

async function main() {
  const { instance } = await WebAssembly.instantiateStreaming(
    fetch("wasm/test.wasm")
  );

  const testModule = instance.exports as TestModule;
  console.log(testModule.add32(11, 73));
  console.log(testModule.collided(0, 0, 10, 10, 5, 5, 10, 10) === 1);
  console.log(testModule.sum_list(0));
}
main();

With that addition, you should see "3" printed last in the console.

Once you are comfortable with this model, structs, arrays, strings, trees, graphs, and custom allocators all become variations on the same theme. This is where the WASM model comes into focus as a very small, very predictable machine.

Strings

Strings are a good final topic for this article because they force us to use everything we have learned so far: structs as layouts, pointers as offsets, data segments, and the idea that WASM itself does not understand higher-level concepts.

WebAssembly has no string type and it doesn't even know what text is. If we want strings, we have to decide what a string means in memory and then be consistent about it.

For this series, we will use a very simple string layout:

struct string {
  len: i32,
  ptr: i32,
}

This describes exactly two things:

len: the number of bytes in the string
ptr: a pointer to the first byte of the string data in linear memory

There is no null terminator and there's no encoding metadata. We will assume UTF-8 and trust ourselves not to lie. As with every other struct so far, this is not a language feature, but a convention of ours.

String Layout

Let’s start by removing the linked-list example we just worked on and then we will place a raw string into memory using a data segment:

(data (i32.const 0) "Hello from WASM!\n")

This writes the raw bytes of the string starting at offset 0.

Next, we define the string struct itself somewhere else in memory. The string has length 17 and points to offset 0:

(data (i32.const 32)
  "\11\00\00\00" ;; len = 17
  "\00\00\00\00" ;; ptr = 0
)

At this point, memory looks like this conceptually:

[0..16] -> string bytes
[32..35] -> length
[36..39] -> pointer to string bytes

Nothing links these together except our agreement that offset 32 is a string struct.

Passing strings out of WASM

WASM can't print to the console by itself. If we want to log a string, we need to hand the string to the host environment and let it interpret the bytes. We will do that by exporting a function that returns a pointer to a string struct and importing a log function from JavaScript.

Here is the function in WAT:

(func $get_message (export "get_message")
  (result i32)
  (i32.const 32) ;; pointer to string struct
)

Now we write a log function in TypeScript that understands our string layout.

function logString(memory: WebAssembly.Memory, strPtr: number) {
  const mem = new DataView(memory.buffer);

  const len = mem.getInt32(strPtr, true);
  const ptr = mem.getInt32(strPtr + 4, true);

  const bytes = new Uint8Array(memory.buffer, ptr, len);
  const text = new TextDecoder("utf-8").decode(bytes);

  console.log(text);
}

This should look familiar by now. We do exactly what WASM is doing:

Treat the pointer as a byte offset
Read fields based on known offsets
Interpret the bytes according to our agreed layout

Nothing here is magic. If the layout changes, this code must change with it.

Putting it all together

Let's update the main.ts file:

type TestModule = {
  add32: (a: number, b: number) => number;
  collided: (
    aX: number,
    aY: number,
    aW: number,
    aH: number,
    bX: number,
    bY: number,
    bW: number,
    bH: number
  ) => number;

  memory: WebAssembly.Memory; // Memory exported from WAT
  get_message: () => number;
};

function logString(memory: WebAssembly.Memory, strPtr: number) {
  const mem = new DataView(memory.buffer);

  const len = mem.getInt32(strPtr, true);
  const ptr = mem.getInt32(strPtr + 4, true);

  const bytes = new Uint8Array(memory.buffer, ptr, len);
  const text = new TextDecoder("utf-8").decode(bytes);

  console.log(text);
}

async function main() {
  const { instance } = await WebAssembly.instantiateStreaming(
    fetch("wasm/test.wasm")
  );

  const testModule = instance.exports as TestModule;
  console.log(testModule.add32(11, 73));
  console.log(testModule.collided(0, 0, 10, 10, 5, 5, 10, 10) === 1);

  const strPtr = testModule.get_message();
  logString(testModule.memory, strPtr);
}
main();

When this runs, JavaScript reads the string struct, follows the pointer, decodes the bytes, and prints the message. From WASM’s point of view, it returned an integer. From JavaScript’s point of view, that integer described how to find a string.

This string example is intentionally simple, but it captures the core idea behind all WASM interop. WASM doesn't pass objects or strings, it passes numbers. Everything else is a contract layered on top of linear memory.

Wrapping Up

At this point, we have covered the core pieces needed to read and write real WebAssembly by hand. We built a shared pseudo language to reason about intent, mapped that intent onto a stack-based virtual machine, and walked through functions, control flow, memory, structs, pointers, and strings. The key takeaway is that none of these concepts are abstract or magical in WASM, they are all explicit and mechanical.

In the next article, we will start leaning on this foundation to build more interesting behavior and explore how these low-level pieces scale into larger programs. From here on out, we are no longer learning new rules — we are just applying the same ones more deliberately.

Next Part

In the next article I will go into detail about memory management techniques. We will build the foundation of a WASM application framework that we will use throughout the series.

Patrick Burris

Software Developer

WebAssembly From Scratch - Part 2

Setting the stage

Types

Variables

Functions

Control Flow

Structs

Stack based virtual machines

The multiple ways of writing wat

Jump into functions

1. Run npm run build:wasm

2. Update TypeScript

3. Run the frontend and check

Control Flow

Early exit with if

If, else-if and else patterns

Loops

Structs, arrays and pointers

Linear Memory

Structs are layouts

Pointers

Arrays

Data segments

Linked List demo

Strings

String Layout

Passing strings out of WASM

Putting it all together

Wrapping Up

Next Part