CTF Team at the University of British Columbia

[b01lers CTF 2022] Pact

25 Apr 2022 by Robert Xiao

Overview

pact is a two-part challenge based around reversing a bytecode-based virtual machine architecture called “pactvm”. The first challenge is to reverse a predefined flag-checker program written for the Pact VM. The second challenge is to automatically reverse engineer a series of randomly-generated Pact programs.

Part 1 - Bubble Wrap

Problem Description

  • Solves: 9
  • Score: 423
  • Category: rev

You look stressed, here’s some bubble wrap to play with.

Author: jrm0xE

Difficulty: Hard

Attachments:

  • bubblewrap.pactb 7.67 KB MD5: b418fffa6f292bc1c8cd3884bdab4066
  • pactvm 46.26 KB MD5: 578c21735cd31e8537b9063268eef5f4

Introduction

We’re given a stripped Linux 64-bit binary called pactvm. When run without arguments, it provides this usage message:

Usage ./pactvm input.pactb

Giving it the bubblewrap.pactb file produces this prompt:

Flag Validator:

If we type in some text, we get Try again :(. So, we have to give this program the correct flag as input.

Browsing bubblewrap.pactb in a hex editor, we can spot the strings Flag Validator: , Congrats! and Try again :(, along with several random 10-character strings like cdarzowkky, bldbefsarc, xpklorelln, nwlrbbmqbh, etc.

Reversing pactvm

The main task is to reverse pactvm and figure out the file format of the pactb files, as well as how the virtual machine works. I used Ghidra for reversing. All function names given below, aside from standard libc functions, are guesses based on functionality.

The program is not obfuscated, merely stripped, so reversing proceeds relatively smoothly. There is quite a lot of functionality to reverse; for example, the program allocates and deallocates memory almost exclusively through the function at 401bf0 (35 cross-references), which takes a memory address, old size, and new size, and performs the functions of free (if the new size is zero), malloc (if the address and old size are zero), or realloc (if the address, old size, and new size are all nonzero). This function also does a bunch of other custom-allocation stuff that I did not fully reverse or understand.

The main VM loop is the function at 404bc0, which contains a 45-way switch statement for each of the opcodes in the VM architecture. The VM is structured as a stack machine: most opcodes pop values off of the stack, perform some operations on those values, and then (optionally) push new values onto the stack, using the functions stack_push (407990) and stack_pop (408380).

Stack values are 16-byte tagged union structures, with a 64-bit type tag and a 64-bit value. Handily, one of the VM opcodes, 0x09, calls print_value (403790) which prints a human-readable representation of a stack value, which made it easy to figure out the VM’s type system.

There are two types of tagged-union structures in the VM: “objects” and “values”. Values live on the stack, and have the following tags:

  • 0: bool, LSB of value indicates true or false
  • 1: nil
  • 2: char, LSB of value is a signed character
  • 3: int, value is a 64-bit signed int
  • 4: float, value is a double
  • 5: object, value is a pointer to an object structure

Objects are heap-allocated via 401bf0, and are variable-length with a common header { int type; object *next; }. Type tags are as follows, per print_object (402c90) and reversing the places where the object type is used:

  • 0: function, structure contains argument count, bytecode, constants, function name, etc.
  • 1: closure, structure a reference to the code, and an array of “upvals”, which are references to values in the closure’s context
  • 2: native_fn, structure contains a function pointer to a VM-callable function implemented in C
  • 3: string, structure contains a length, char * pointer, and a 32-bit hash. This structure is used extensively throughout the VM, e.g. as hash table keys
  • 4: upvalue. Did not bother reversing this structure because no VM programs use upvalues
  • 5: class, structure contains a name and a hash table of class variables
  • 6: instance, structure contains a reference to the class and a hash table of instance variables
  • 7: array, structure contains a length, capacity, and a pointer to a value array
  • 8: method, structure contains a value (presumably the class instance on which a method is called?), and a pointer to a closure.

With the basic data types of the VM reversed, we can identify the basic design of the VM. It’s a strict stack machine, with no registers (apart from a program counter). There are two global hash tables: one hash table which stores interned strings (presumably to reduce allocation of string objects), and one hash table which stores global variables. There’s also a separate stack which holds call frames.

The pactb file format starts with an array of 14 strings which are treated as “imports” of the 14 built-in native functions (403a80, 403ac0, …). Effectively, this gives these 14 functions a unique (obfuscated) name, which helps to obfuscate the pactb (these are the cdarzowkky, bldbefsarc, etc. strings seen in the file). These 14 functions are actually the following:

  • 0: clock, gets the current clock() value divided by 1000000
  • 1: append, appends a value to an array
  • 2: remove, removes a value from an array
  • 3: input, prints an optional string prompt and then reads a line of text from the user
  • 4: len, obtains the length of a string or array
  • 5: range, works like the Python 2.x range function, producing an array of integers in a given range
  • 6: typeof, produces a string representing the type of the value (bool, nil, char, etc.)
  • 7: chr, obtains a char for the given int
  • 8: ord, obtains an int from the given char or first character of the given string
  • 9: int, obtains an int from the given char or float
  • 10: bytes, turns an array of chars into a string
  • 11: bytearray, turns a string into an array of chars
  • 12: zeros, creates an array of zero ints of the given length
  • 13: exit, calls the C exit function.

After the assignment of native functions to names, the pactb contains a serialized function object representing the script. This contains the number of argument, number of upvals, bytecode array, line number table (with one line number for each byte of the bytecode), an array of constants (serialized values), and a function name. The line numbers are used by print_error (407a00), which prints a backtrace by walking the call frame stack and printing out line numbers and function names.

The constants will be values of different types, and can also include nested function objects corresponding to functions defined within the script.

So, with all of the file format understood, and the general structure of the VM reversed, the only thing left to do is to reverse each of the instructions. There are quite a lot, and some are somewhat hard to understand, so instead of reversing all of them, I just reversed the ones that are used by bubblewrap and the programs from Search and Rescue. The final disassembler looks like this:

import struct
from ctypes import c_char
from dataclasses import dataclass
from typing import Any, List, Optional

@dataclass
class Code:
    nargs: int
    nupvals: int
    bytecode: bytes
    linenos: List[int]
    consts: List[Any]
    name: Optional[bytes]

    @classmethod
    def from_file(cls, f: "FileReader") -> "Code":
        nargs = f.read_u32()
        nupvals = f.read_u32()
        codelen = f.read_u32()
        bytecode = f.read_bytes(codelen)
        linenos = [f.read_u32() for _ in range(codelen)]

        nconsts = f.read_u32()
        consts = []
        for _ in range(nconsts):
            val = f.read_val()
            consts.append(val)
        named = f.read_u8()
        if named:
            name = f.read_obj()
        else:
            name = None
        return Code(nargs, nupvals, bytecode, linenos, consts, name)

class FileReader:
    def __init__(self, f):
        self.f = f

    def read_fmt(self, fmt):
        return struct.unpack(fmt, self.f.read(struct.calcsize(fmt)))

    def read_u32(self):
        return self.read_fmt("<I")[0]

    def read_u8(self):
        return self.read_fmt("<B")[0]

    def read_u64(self):
        return self.read_fmt("<Q")[0]

    def read_double(self):
        return self.read_fmt("<d")[0]

    def read_bytes(self, n):
        start = self.f.tell()
        res = self.f.read(n)
        assert len(res) == n, "Failed to read %d bytes at %d" % (n, start)
        return res

    def read_str(self):
        sz = self.read_u32()
        return self.read_bytes(sz).decode("latin1")

    def read_obj(self) -> Any:
        type = self.read_u32()
        if type == 3:
            return self.read_str()
        elif type == 0:
            return Code.from_file(self)
        else:
            raise Exception("Invalid object type %d" % type)

    def read_val(self) -> Any:
        type = self.read_u32()
        if type == 0:
            return bool(self.read_u8())
        elif type == 1:
            self.read_u64()
            return None
        elif type == 2:
            return c_char(self.read_u8())
        elif type == 3:
            return self.read_u64()
        elif type == 4:
            return self.read_double()
        elif type == 5:
            return self.read_obj()
        else:
            raise Exception("Invalid value type %d" % type)

def disas(natives, code: Code):
    if code.name:
        print(f"{code.name}:")
    print(f"[{code.nargs} args, {code.nupvals} upvals]")
    pc = 0
    while pc < len(code.bytecode):
        print(f"    0x{pc:04x}:", end=' ')
        opcode = code.bytecode[pc]
        pc += 1
        if opcode == 0:
            print("return")
        elif opcode == 1:
            disp, = struct.unpack(">H", code.bytecode[pc:pc+2])
            pc += 2
            dest = pc + disp
            print(f"if not(top()): goto(0x{dest:04x})")
        elif opcode == 2:
            disp, = struct.unpack(">H", code.bytecode[pc:pc+2])
            pc += 2
            dest = pc + disp
            print(f"goto(0x{dest:04x})")
        elif opcode == 3:
            disp, = struct.unpack(">H", code.bytecode[pc:pc+2])
            pc += 2
            dest = pc - disp
            print(f"goto(0x{dest:04x})")
        elif opcode == 4:
            count = code.bytecode[pc]
            pc += 1
            print(f"args = popn({count}); pop()(*args)")
        elif opcode == 7:
            idx = code.bytecode[pc]
            pc += 1
            subcode = code.consts[idx]
            assert isinstance(subcode, Code)
            assert subcode.nupvals == 0, "upvals not supported"
            print(f"push(new_closure(<Code {subcode.name or id(subcode)}>))")
        elif opcode == 9:
            print(f"print(pop())")
        elif opcode == 10:
            print(f"push(not pop())")
        elif opcode == 11:
            print(f"push(-pop())")
        elif opcode == 12:
            print(f"push(pop() + pop())")
        elif opcode == 13:
            print(f"a, b = popn(2); push(a - b)")
        elif opcode == 16:
            idx = code.bytecode[pc]
            pc += 1
            obj = code.consts[idx]
            print(f"push({obj!r})")
        elif opcode == 17:
            print(f"push(None)")
        elif opcode == 20:
            print(f"pop()")
        elif opcode == 21:
            idx = code.bytecode[pc]
            pc += 1
            print(f"push(frame[{idx}])")
        elif opcode == 22:
            idx = code.bytecode[pc]
            pc += 1
            print(f"frame[{idx}] = top()")
        elif opcode == 25:
            idx = code.bytecode[pc]
            pc += 1
            name = code.consts[idx]
            if name in natives:
                name = natives[name]
            print(f"push({name})")
        elif opcode == 26:
            idx = code.bytecode[pc]
            pc += 1
            name = code.consts[idx]
            print(f"{name} = top()")
        elif opcode == 27:
            idx = code.bytecode[pc]
            pc += 1
            name = code.consts[idx]
            print(f"{name} = pop()")
        elif opcode == 28:
            idx = code.bytecode[pc]
            pc += 1
            name = code.consts[idx]
            print(f"tbl, obj = popn(2); tbl.{name} = obj; push(obj)")
        elif opcode == 29:
            idx = code.bytecode[pc]
            pc += 1
            name = code.consts[idx]
            print(f"obj = pop(); push(obj.{name}))")
        elif opcode == 31:
            print(f"push(pop() == pop())")
        elif opcode == 32:
            print(f"push(pop() < pop())")
        elif opcode == 33:
            print(f"push(pop() > pop())")
        elif opcode == 34:
            idx = code.bytecode[pc]
            pc += 1
            name = code.consts[idx]
            print(f"push(new_class({name!r}))")
        elif opcode == 36:
            idx = code.bytecode[pc]
            pc += 1
            name = code.consts[idx]
            print(f"tbl, obj = popn(2); tbl.{name} = obj; push(obj)")
        elif opcode == 37:
            count = code.bytecode[pc]
            pc += 1
            print(f"push(new_array(popn({count})))")
        elif opcode == 38:
            print(f"arr, idx = popn(2); push(arr[idx])")
        elif opcode == 39:
            print(f"arr, idx, obj = popn(3); arr[idx] = obj; push(obj)")
        elif opcode == 40:
            print(f"push(pop() ^ pop())")
        elif opcode == 42:
            print(f"push(pop() & pop())")
        elif opcode == 44:
            print(f"a, b = popn(2); push(a >> b )")
        else:
            raise Exception("unknown opcode %d" % opcode)

real_natives = [
    "clock",
    "append",
    "remove",
    "input",
    "len",
    "range",
    "typeof",
    "chr",
    "ord",
    "int",
    "bytes",
    "bytearray",
    "zeros",
    "exit",
]

def read_pactb(fn):
    f = FileReader(open(fn, "rb"))
    # natives table
    num_natives = f.read_u32()
    assert num_natives == 14
    natives = [f.read_obj() for _ in range(num_natives)]
    natives = dict(zip(natives, real_natives))
    program = f.read_val()

    disas(natives, program)
    for const in program.consts:
        if isinstance(const, Code):
            disas(natives, const)

import sys
read_pactb(sys.argv[1])

I deliberately chose to use a Pythonic syntax for the VM instructions, instead of the usual mnemonics, in order to make the semantics crystal-clear and make reversing easier.

Reversing bubblewrap

The disassembled bubblewrap.pactb looks like this:

[0 args, 0 upvals]
    0x0000: push(new_closure(<Code nwlrbbmqbh>))
    0x0002: nwlrbbmqbh = pop()
    0x0004: push(nwlrbbmqbh)
    0x0006: args = popn(0); pop()(*args)
    0x0008: push(0)
    0x000a: push(pop() == pop())
    0x000b: if not(top()): goto(0x001c)
    0x000e: pop()
    0x000f: push('Congrats!')
    0x0011: print(pop())
    0x0012: push(exit)
    0x0014: push(0)
    0x0016: args = popn(1); pop()(*args)
    0x0018: pop()
    0x0019: goto(0x0027)
    0x001c: pop()
    0x001d: push('Try again :(')
    0x001f: print(pop())
    0x0020: push(exit)
    0x0022: push(1)
    0x0024: args = popn(1); pop()(*args)
    0x0026: pop()
    0x0027: push(None)
    0x0028: return
nwlrbbmqbh:
[0 args, 0 upvals]
    0x0000: push(bytearray)
    0x0002: push(input)
    0x0004: push('Flag Validator: ')
    0x0006: args = popn(1); pop()(*args)
    0x0008: args = popn(1); pop()(*args)
    0x000a: push(len)
    0x000c: push(frame[1])
    0x000e: args = popn(1); pop()(*args)
    0x0010: push(36)
    0x0012: push(pop() == pop())
    0x0013: push(not pop())
    0x0014: if not(top()): goto(0x001e)
    0x0017: pop()
    0x0018: push(1)
    0x001a: return
    0x001b: goto(0x001f)
    0x001e: pop()
    0x001f: push(range)
    0x0021: push(len)
    0x0023: push(frame[1])
    0x0025: args = popn(1); pop()(*args)
    0x0027: args = popn(1); pop()(*args)
    0x0029: push(0)
    0x002b: push(frame[3])
    0x002d: push(len)
    0x002f: push(frame[1])
    0x0031: args = popn(1); pop()(*args)
    0x0033: push(pop() > pop())
    0x0034: if not(top()): goto(0x0058)
    0x0037: pop()
    0x0038: goto(0x0046)
    0x003b: push(frame[3])
    0x003d: push(1)
    0x003f: push(pop() + pop())
    0x0040: frame[3] = top()
    0x0042: pop()
    0x0043: goto(0x002b)
    0x0046: push(frame[2])
    0x0048: push(frame[3])
    0x004a: push(int)
    0x004c: push(frame[1])
    0x004e: push(frame[3])
    0x0050: arr, idx = popn(2); push(arr[idx])
    0x0051: args = popn(1); pop()(*args)
    0x0053: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
    0x0054: pop()
    0x0055: goto(0x003b)
    0x0058: pop()
    0x0059: pop()
    0x005a: push(frame[2])
    0x005c: push(1)
    0x005e: arr, idx = popn(2); push(arr[idx])
    0x005f: push(frame[2])
    0x0061: push(15)
    0x0063: arr, idx = popn(2); push(arr[idx])
    0x0064: a, b = popn(2); push(a - b)
    0x0065: push(17)
    0x0067: push(-pop())
    0x0068: push(pop() == pop())
    0x0069: push(not pop())
    0x006a: if not(top()): goto(0x0074)
    0x006d: pop()
    0x006e: push(1)
    0x0070: return
    0x0071: goto(0x0075)
    0x0074: pop()
    0x0075: push(frame[2])
    0x0077: push(25)
    0x0079: arr, idx = popn(2); push(arr[idx])
    0x007a: push(frame[2])
    0x007c: push(22)
    0x007e: arr, idx = popn(2); push(arr[idx])
    0x007f: push(pop() ^ pop())
    0x0080: push(43)
    0x0082: push(pop() == pop())
    0x0083: push(not pop())
    0x0084: if not(top()): goto(0x008e)
    0x0087: pop()
    0x0088: push(1)
    0x008a: return
...
    0x03ee: push(frame[2])
    0x03f0: push(0)
    0x03f2: arr, idx = popn(2); push(arr[idx])
    0x03f3: push(frame[2])
    0x03f5: push(21)
    0x03f7: arr, idx = popn(2); push(arr[idx])
    0x03f8: push(pop() + pop())
    0x03f9: push(198)
    0x03fb: push(pop() == pop())
    0x03fc: push(not pop())
    0x03fd: if not(top()): goto(0x0407)
    0x0400: pop()
    0x0401: push(1)
    0x0403: return
    0x0404: goto(0x0408)
    0x0407: pop()
    0x0408: push(0)
    0x040a: return
    0x040b: push(None)
    0x040c: return

The script itself is simply the following code:

def nwlrbbmqbh():
    ...

if nwlrbbmqbh() == 0:
    print("Congrats!")
    exit(0)
else:
    print("Try again :(")
    exit(1)

The main nwlrbbmqbh validation function looks like this:

def nwlrbbmqbh():
    frame1 = bytearray(input("Flag Validator: "))
    if len(frame1) != 36:
        return 1

    frame2 = range(len(frame1))
    for frame3 in range(len(frame1)):
        frame2[frame3] = int(frame1[frame3])

    if frame2[1] - frame2[15] != -17:
        return 1
    if frame2[25] ^ frame2[22] != 43:
        return 1
    ...
    if frame2[0] + frame2[21] != 198:
        return 1
    return 0

Aha, it’s just a very long sequence of constraints on the flag. We can just solve this with Z3. I used a bunch of ugly regexes to turn the disassembly into the correct expressions:

\w+: push\(frame\[2\]\)\n    \w+: push\((\d+)\)\n    \w+: arr, idx = popn\(2\); push\(arr\[idx\]\)
=>
input[\1]

    \w+: push\(pop\(\) == pop\(\)\)\n    \w+: push\(not pop\(\)\)\n    \w+: if not\(top\(\)\): goto\(\w+\)\n    \w+: pop\(\)\n    \w+: push\(1\)\n    \w+: return\n    \w+: goto\(\w+\)\n    \w+: pop\(\)\n
=>
\n

(\w+): push\((\d+)\)\n    \w+: push\(-pop\(\)\)
=>
\1: push(-\2)

input\[(\d+)\]\n    input\[(\d+)\]\n    \w+: a, b = popn\(2\); push\(a - b\)\n    \w+: push\((-?\d+)\)\n
=>
s.add(input[\1] - input[\2] == \3)

input\[(\d+)\]\n    input\[(\d+)\]\n    \w+: push\(pop\(\) ([+^]) pop\(\)\)\n    \w+: push\((-?\d+)\)\n
=>
s.add(input[\1] \3 input[\2] == \4)

This produces a nice list of Z3-compatible expressions, which we can wrap into a quick script:

from z3 import BitVec, Solver

s = Solver()

input = [BitVec("b%d" % i, 8) for i in range(36)]

# regex-extracted expressions:
s.add(input[1] - input[15] == -17)
s.add(input[25] ^ input[22] == 43)
s.add(input[19] - input[14] == 8)
s.add(input[29] - input[34] == -5)
s.add(input[23] + input[21] == 219)
s.add(input[24] + input[12] == 200)
s.add(input[35] ^ input[25] == 9)
s.add(input[14] ^ input[27] == 62)
s.add(input[22] + input[8] == 190)
s.add(input[3] + input[26] == 206)
s.add(input[32] ^ input[34] == 50)
s.add(input[21] ^ input[23] == 19)
s.add(input[7] + input[10] == 212)
s.add(input[2] + input[10] == 227)
s.add(input[17] - input[35] == -10)
s.add(input[5] + input[18] == 199)
s.add(input[15] ^ input[1] == 23)
s.add(input[30] ^ input[31] == 26)
s.add(input[18] - input[10] == -9)
s.add(input[9] - input[19] == 16)
s.add(input[31] + input[8] == 210)
s.add(input[4] - input[26] == 19)
s.add(input[10] - input[9] == -10)
s.add(input[13] + input[5] == 212)
s.add(input[6] ^ input[13] == 1)
s.add(input[28] ^ input[20] == 17)
s.add(input[34] - input[30] == 4)
s.add(input[11] ^ input[2] == 1)
s.add(input[16] + input[11] == 222)
s.add(input[8] ^ input[18] == 57)
s.add(input[20] ^ input[0] == 7)
s.add(input[27] ^ input[28] == 43)
s.add(input[26] - input[17] == -11)
s.add(input[12] ^ input[31] == 44)
s.add(input[33] - input[8] == 23)
s.add(input[0] + input[21] == 198)

# extra flag format constraints
s.add(input[0] == ord('b'))
s.add(input[1] == ord('c'))

s.check()
m = s.model()
print(bytes([m[p].as_long() for p in input]))

Note that we needed to constrain the first two bytes using the flag format to get a meaningful result. When we run this program, we get the flag:

bctf{are_you_satisfied_with_this_vm}

Part 2 - Search and Rescue

Problem Description

  • Solves: 1
  • Score: 500
  • Category: rev

Search for the flag through the forest. The server provides hex dumps of files runnable with the vm from bubble_wrap.

nc ctf.b01lers.com 9304

Author: jrm0xE

Difficulty: Hard

Attachments:

  • pactvm 46.26 KB MD5: 578c21735cd31e8537b9063268eef5f4

Note that this is the same pactvm binary as the bubblewrap challenge.

Introduction

When we connect to the given server, we’re given an xxd hexdump followed by a prompt:

00000000: 0e00 0000 0300 0000 0a00 0000 6279 6e65  ............byne
00000010: 6364 7967 6778 0300 0000 0a00 0000 7870  cdyggx........xp
00000020: 6b6c 6f72 656c 6c6e 0300 0000 0a00 0000  klorelln........
00000030: 6d70 6170 7166 776b 686f 0300 0000 0a00  mpapqfwkho......
00000040: 0000 706b 6d63 6f71 686e 776e 0300 0000  ..pkmcoqhnwn....
00000050: 0a00 0000 6b75 6577 6873 716d 6762 0300  ....kuewhsqmgb..
00000060: 0000 0a00 0000 6275 7163 6c6a 6a69 7673  ......buqcljjivs
...
00001fb0: 0000 0300 0000 0a00 0000 6d6c 6e6f 7a6a  ..........mlnozj
00001fc0: 6b70 7170 0300 0000 0100 0000 0000 0000  kpqp............
00001fd0: 0500 0000 0300 0000 0a00 0000 6d6c 6e6f  ............mlno
00001fe0: 7a6a 6b70 7170 0300 0000 0000 0000 0000  zjkpqp..........
00001ff0: 0000 00                                  ...
> 

We can enter something, after which the connection just closes. Connecting again gives us a slightly different binary. Saving the hexdump, using xxd -r to reverse it back into a binary, and running it with pactvm produces the same > prompt.

Reversing the Program

The disassembly of a sample program is quite long:

[0 args, 0 upvals]
    0x0000: push(new_class('nwlrbbmqbh'))
    0x0002: nwlrbbmqbh = pop()
    0x0004: push(nwlrbbmqbh)
    0x0006: push(new_closure(<Code init>))
    0x0008: tbl, obj = popn(2); tbl.init = obj; push(obj)
    0x000a: pop()
    0x000b: push(new_closure(<Code hiddqscdxr>))
    0x000d: hiddqscdxr = pop()
    0x000f: push(new_closure(<Code jmowfrxsjy>))
    0x0011: jmowfrxsjy = pop()
    0x0013: push(new_closure(<Code bldbefsarc>))
    0x0015: bldbefsarc = pop()
    0x0017: push(bldbefsarc)
    0x0019: args = popn(0); pop()(*args)
    0x001b: q = pop()
    0x001d: push('MJZt0FWGkaqpSGqMphfCpDqRgW')
    0x001f: k = pop()
    0x0021: push(input)
    0x0023: push('> ')
    0x0025: args = popn(1); pop()(*args)
    0x0027: path = pop()
    0x0029: push(jmowfrxsjy)
    0x002b: push(q)
    0x002d: args = popn(1); pop()(*args)
    0x002f: root = pop()
    0x0031: push(root)
    0x0033: cur = pop()
    0x0035: push(new_array(popn(0)))
    0x0037: c = pop()
    0x0039: push(0)
    0x003b: push(frame[1])
    0x003d: push(len)
    0x003f: push(path)
    0x0041: args = popn(1); pop()(*args)
    0x0043: push(pop() > pop())
    0x0044: if not(top()): goto(0x00cf)
    0x0047: pop()
    0x0048: goto(0x0056)
    0x004b: push(frame[1])
    0x004d: push(1)
    0x004f: push(pop() + pop())
    0x0050: frame[1] = top()
    0x0052: pop()
    0x0053: goto(0x003b)
    0x0056: push(ord)
    0x0058: push(path)
    0x005a: push(frame[1])
    0x005c: arr, idx = popn(2); push(arr[idx])
    0x005d: args = popn(1); pop()(*args)
    0x005f: push(0)
    0x0061: push(frame[3])
    0x0063: push(8)
    0x0065: push(pop() > pop())
    0x0066: if not(top()): goto(0x00c9)
    0x0069: pop()
    0x006a: goto(0x0078)
    0x006d: push(frame[3])
    0x006f: push(1)
    0x0071: push(pop() + pop())
    0x0072: frame[3] = top()
    0x0074: pop()
    0x0075: goto(0x0061)
    0x0078: push(cur)
    0x007a: obj = pop(); push(obj.l))
    0x007c: push(None)
    0x007d: push(pop() == pop())
    0x007e: if not(top()): goto(0x0088)
    0x0081: pop()
    0x0082: push(cur)
    0x0084: obj = pop(); push(obj.r))
    0x0086: push(None)
    0x0087: push(pop() == pop())
    0x0088: if not(top()): goto(0x009f)
    0x008b: pop()
    0x008c: push(append)
    0x008e: push(c)
    0x0090: push(cur)
    0x0092: obj = pop(); push(obj.v))
    0x0094: args = popn(2); pop()(*args)
    0x0096: pop()
    0x0097: push(root)
    0x0099: cur = top()
    0x009b: pop()
    0x009c: goto(0x00a0)
    0x009f: pop()
    0x00a0: push(frame[2])
    0x00a2: push(1)
    0x00a4: push(pop() & pop())
    0x00a5: push(1)
    0x00a7: push(pop() == pop())
    0x00a8: if not(top()): goto(0x00b6)
    0x00ab: pop()
    0x00ac: push(cur)
    0x00ae: obj = pop(); push(obj.r))
    0x00b0: cur = top()
    0x00b2: pop()
    0x00b3: goto(0x00be)
    0x00b6: pop()
    0x00b7: push(cur)
    0x00b9: obj = pop(); push(obj.l))
    0x00bb: cur = top()
    0x00bd: pop()
    0x00be: push(frame[2])
    0x00c0: push(1)
    0x00c2: a, b = popn(2); push(a >> b )
    0x00c3: frame[2] = top()
    0x00c5: pop()
    0x00c6: goto(0x006d)
    0x00c9: pop()
    0x00ca: pop()
    0x00cb: pop()
    0x00cc: goto(0x004b)
    0x00cf: pop()
    0x00d0: pop()
    0x00d1: push(len)
    0x00d3: push(c)
    0x00d5: args = popn(1); pop()(*args)
    0x00d7: push(len)
    0x00d9: push(k)
    0x00db: args = popn(1); pop()(*args)
    0x00dd: push(pop() == pop())
    0x00de: push(not pop())
    0x00df: if not(top()): goto(0x00ed)
    0x00e2: pop()
    0x00e3: push(exit)
    0x00e5: push(1)
    0x00e7: args = popn(1); pop()(*args)
    0x00e9: pop()
    0x00ea: goto(0x00ee)
    0x00ed: pop()
    0x00ee: push(0)
    0x00f0: push(frame[1])
    0x00f2: push(len)
    0x00f4: push(k)
    0x00f6: args = popn(1); pop()(*args)
    0x00f8: push(pop() > pop())
    0x00f9: if not(top()): goto(0x0135)
    0x00fc: pop()
    0x00fd: goto(0x010b)
    0x0100: push(frame[1])
    0x0102: push(1)
    0x0104: push(pop() + pop())
    0x0105: frame[1] = top()
    0x0107: pop()
    0x0108: goto(0x00f0)
    0x010b: push(c)
    0x010d: push(frame[1])
    0x010f: arr, idx = popn(2); push(arr[idx])
    0x0110: push(k)
    0x0112: push(frame[1])
    0x0114: arr, idx = popn(2); push(arr[idx])
    0x0115: push(pop() == pop())
    0x0116: push(not pop())
    0x0117: if not(top()): goto(0x0131)
    0x011a: pop()
    0x011b: push(c)
    0x011d: push(frame[1])
    0x011f: arr, idx = popn(2); push(arr[idx])
    0x0120: print(pop())
    0x0121: push(k)
    0x0123: push(frame[1])
    0x0125: arr, idx = popn(2); push(arr[idx])
    0x0126: print(pop())
    0x0127: push(exit)
    0x0129: push(1)
    0x012b: args = popn(1); pop()(*args)
    0x012d: pop()
    0x012e: goto(0x0132)
    0x0131: pop()
    0x0132: goto(0x0100)
    0x0135: pop()
    0x0136: pop()
    0x0137: push(exit)
    0x0139: push(0)
    0x013b: args = popn(1); pop()(*args)
    0x013d: pop()
    0x013e: push(None)
    0x013f: return
init:
[1 args, 0 upvals]
    0x0000: push(frame[0])
    0x0002: push(None)
    0x0003: tbl, obj = popn(2); tbl.l = obj; push(obj)
    0x0005: pop()
    0x0006: push(frame[0])
    0x0008: push(None)
    0x0009: tbl, obj = popn(2); tbl.r = obj; push(obj)
    0x000b: pop()
    0x000c: push(frame[0])
    0x000e: push(frame[1])
    0x0010: tbl, obj = popn(2); tbl.v = obj; push(obj)
    0x0012: pop()
    0x0013: push(frame[0])
    0x0015: push(0)
    0x0017: tbl, obj = popn(2); tbl.f = obj; push(obj)
    0x0019: pop()
    0x001a: push(frame[0])
    0x001c: return
hiddqscdxr:
[2 args, 0 upvals]
    0x0000: push(nwlrbbmqbh)
    0x0002: push(frame[2])
    0x0004: args = popn(1); pop()(*args)
    0x0006: push(frame[3])
    0x0008: push(frame[2])
    0x000a: obj = pop(); push(obj.f))
    0x000c: tbl, obj = popn(2); tbl.f = obj; push(obj)
    0x000e: pop()
    0x000f: push(frame[1])
    0x0011: push(None)
    0x0012: push(pop() == pop())
    0x0013: if not(top()): goto(0x001d)
    0x0016: pop()
    0x0017: push(frame[3])
    0x0019: return
    0x001a: goto(0x001e)
    0x001d: pop()
    0x001e: push(frame[1])
    0x0020: push(frame[4])
    0x0022: obj = pop(); push(obj.r))
    0x0024: push(None)
    0x0025: push(pop() == pop())
    0x0026: push(not pop())
    0x0027: if not(top()): goto(0x0036)
    0x002a: pop()
    0x002b: push(frame[4])
    0x002d: obj = pop(); push(obj.r))
    0x002f: obj = pop(); push(obj.f))
    0x0031: push(frame[3])
    0x0033: obj = pop(); push(obj.f))
    0x0035: push(pop() > pop())
    0x0036: if not(top()): goto(0x0044)
    0x0039: pop()
    0x003a: push(frame[4])
    0x003c: obj = pop(); push(obj.r))
    0x003e: frame[4] = top()
    0x0040: pop()
    0x0041: goto(0x0020)
    0x0044: pop()
    0x0045: push(frame[3])
    0x0047: push(frame[4])
    0x0049: obj = pop(); push(obj.r))
    0x004b: tbl, obj = popn(2); tbl.r = obj; push(obj)
    0x004d: pop()
    0x004e: push(frame[4])
    0x0050: push(frame[3])
    0x0052: tbl, obj = popn(2); tbl.r = obj; push(obj)
    0x0054: pop()
    0x0055: push(frame[1])
    0x0057: return
    0x0058: push(None)
    0x0059: return
jmowfrxsjy:
[1 args, 0 upvals]
    0x0000: push(None)
    0x0001: push(0)
    0x0003: push(frame[3])
    0x0005: push(len)
    0x0007: push(frame[1])
    0x0009: args = popn(1); pop()(*args)
    0x000b: push(pop() > pop())
    0x000c: if not(top()): goto(0x0052)
    0x000f: pop()
    0x0010: goto(0x001e)
    0x0013: push(frame[3])
    0x0015: push(1)
    0x0017: push(pop() + pop())
    0x0018: frame[3] = top()
    0x001a: pop()
    0x001b: goto(0x0003)
    0x001e: push(frame[1])
    0x0020: push(frame[3])
    0x0022: arr, idx = popn(2); push(arr[idx])
    0x0023: push(0)
    0x0025: push(pop() == pop())
    0x0026: push(not pop())
    0x0027: if not(top()): goto(0x004e)
    0x002a: pop()
    0x002b: push(nwlrbbmqbh)
    0x002d: push(chr)
    0x002f: push(frame[3])
    0x0031: args = popn(1); pop()(*args)
    0x0033: args = popn(1); pop()(*args)
    0x0035: push(frame[4])
    0x0037: push(frame[1])
    0x0039: push(frame[3])
    0x003b: arr, idx = popn(2); push(arr[idx])
    0x003c: tbl, obj = popn(2); tbl.f = obj; push(obj)
    0x003e: pop()
    0x003f: push(hiddqscdxr)
    0x0041: push(frame[2])
    0x0043: push(frame[4])
    0x0045: args = popn(2); pop()(*args)
    0x0047: frame[2] = top()
    0x0049: pop()
    0x004a: pop()
    0x004b: goto(0x004f)
    0x004e: pop()
    0x004f: goto(0x0013)
    0x0052: pop()
    0x0053: pop()
    0x0054: push(frame[2])
    0x0056: push(None)
    0x0057: push(pop() == pop())
    0x0058: push(not pop())
    0x0059: if not(top()): goto(0x0064)
    0x005c: pop()
    0x005d: push(frame[2])
    0x005f: obj = pop(); push(obj.r))
    0x0061: push(None)
    0x0062: push(pop() == pop())
    0x0063: push(not pop())
    0x0064: if not(top()): goto(0x00b1)
    0x0067: pop()
    0x0068: push(frame[2])
    0x006a: obj = pop(); push(obj.v))
    0x006c: push(frame[2])
    0x006e: obj = pop(); push(obj.r))
    0x0070: frame[2] = top()
    0x0072: pop()
    0x0073: push(frame[2])
    0x0075: obj = pop(); push(obj.v))
    0x0077: push(frame[2])
    0x0079: obj = pop(); push(obj.r))
    0x007b: frame[2] = top()
    0x007d: pop()
    0x007e: push(nwlrbbmqbh)
    0x0080: push(0)
    0x0082: args = popn(1); pop()(*args)
    0x0084: push(frame[5])
    0x0086: push(frame[3])
    0x0088: obj = pop(); push(obj.f))
    0x008a: push(frame[4])
    0x008c: obj = pop(); push(obj.f))
    0x008e: push(pop() + pop())
    0x008f: tbl, obj = popn(2); tbl.f = obj; push(obj)
    0x0091: pop()
    0x0092: push(frame[5])
    0x0094: push(frame[3])
    0x0096: tbl, obj = popn(2); tbl.l = obj; push(obj)
    0x0098: pop()
    0x0099: push(frame[5])
    0x009b: push(frame[4])
    0x009d: tbl, obj = popn(2); tbl.r = obj; push(obj)
    0x009f: pop()
    0x00a0: push(hiddqscdxr)
    0x00a2: push(frame[2])
    0x00a4: push(frame[5])
    0x00a6: args = popn(2); pop()(*args)
    0x00a8: frame[2] = top()
    0x00aa: pop()
    0x00ab: pop()
    0x00ac: pop()
    0x00ad: pop()
    0x00ae: goto(0x0054)
    0x00b1: pop()
    0x00b2: push(frame[2])
    0x00b4: obj = pop(); push(obj.v))
    0x00b6: return
    0x00b7: push(None)
    0x00b8: return
bldbefsarc:
[0 args, 0 upvals]
    0x0000: push(zeros)
    0x0002: push(256)
    0x0004: args = popn(1); pop()(*args)
    0x0006: push(18)
    0x0008: push(19)
    0x000a: push(20)
    0x000c: push(14)
    0x000e: push(11)
    0x0010: push(21)
    0x0012: push(17)
    0x0014: push(13)
    0x0016: push(18)
    0x0018: push(15)
    0x001a: push(new_array(popn(10)))
    0x001c: push(14)
    0x001e: push(10)
    0x0020: push(16)
    0x0022: push(14)
    0x0024: push(17)
    0x0026: push(17)
    0x0028: push(15)
    0x002a: push(14)
    0x002c: push(19)
    0x002e: push(17)
    0x0030: push(24)
    0x0032: push(15)
    0x0034: push(8)
    0x0036: push(19)
    0x0038: push(16)
    0x003a: push(14)
    0x003c: push(14)
    0x003e: push(25)
    0x0040: push(9)
    0x0042: push(12)
    0x0044: push(17)
    0x0046: push(10)
    0x0048: push(8)
    0x004a: push(15)
    0x004c: push(11)
    0x004e: push(21)
    0x0050: push(new_array(popn(26)))
    0x0052: push(11)
    0x0054: push(13)
    0x0056: push(11)
    0x0058: push(17)
    0x005a: push(16)
    0x005c: push(16)
    0x005e: push(14)
    0x0060: push(22)
    0x0062: push(14)
    0x0064: push(18)
    0x0066: push(19)
    0x0068: push(15)
    0x006a: push(18)
    0x006c: push(17)
    0x006e: push(16)
    0x0070: push(18)
    0x0072: push(8)
    0x0074: push(13)
    0x0076: push(12)
    0x0078: push(19)
    0x007a: push(17)
    0x007c: push(19)
    0x007e: push(18)
    0x0080: push(12)
    0x0082: push(12)
    0x0084: push(17)
    0x0086: push(new_array(popn(26)))
    0x0088: push(ord)
    0x008a: push('0')
    0x008c: args = popn(1); pop()(*args)
    0x008e: push(frame[5])
    0x0090: push(ord)
    0x0092: push('9')
    0x0094: args = popn(1); pop()(*args)
    0x0096: push(pop() < pop())
    0x0097: push(not pop())
    0x0098: if not(top()): goto(0x00bf)
    0x009b: pop()
    0x009c: goto(0x00aa)
    0x009f: push(frame[5])
    0x00a1: push(1)
    0x00a3: push(pop() + pop())
    0x00a4: frame[5] = top()
    0x00a6: pop()
    0x00a7: goto(0x008e)
    0x00aa: push(frame[1])
    0x00ac: push(frame[5])
    0x00ae: push(frame[2])
    0x00b0: push(frame[5])
    0x00b2: push(ord)
    0x00b4: push('0')
    0x00b6: args = popn(1); pop()(*args)
    0x00b8: a, b = popn(2); push(a - b)
    0x00b9: arr, idx = popn(2); push(arr[idx])
    0x00ba: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
    0x00bb: pop()
    0x00bc: goto(0x009f)
    0x00bf: pop()
    0x00c0: pop()
    0x00c1: push(ord)
    0x00c3: push('a')
    0x00c5: args = popn(1); pop()(*args)
    0x00c7: push(frame[5])
    0x00c9: push(ord)
    0x00cb: push('z')
    0x00cd: args = popn(1); pop()(*args)
    0x00cf: push(pop() < pop())
    0x00d0: push(not pop())
    0x00d1: if not(top()): goto(0x00f8)
    0x00d4: pop()
    0x00d5: goto(0x00e3)
    0x00d8: push(frame[5])
    0x00da: push(1)
    0x00dc: push(pop() + pop())
    0x00dd: frame[5] = top()
    0x00df: pop()
    0x00e0: goto(0x00c7)
    0x00e3: push(frame[1])
    0x00e5: push(frame[5])
    0x00e7: push(frame[3])
    0x00e9: push(frame[5])
    0x00eb: push(ord)
    0x00ed: push('a')
    0x00ef: args = popn(1); pop()(*args)
    0x00f1: a, b = popn(2); push(a - b)
    0x00f2: arr, idx = popn(2); push(arr[idx])
    0x00f3: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
    0x00f4: pop()
    0x00f5: goto(0x00d8)
    0x00f8: pop()
    0x00f9: pop()
    0x00fa: push(ord)
    0x00fc: push('A')
    0x00fe: args = popn(1); pop()(*args)
    0x0100: push(frame[5])
    0x0102: push(ord)
    0x0104: push('Z')
    0x0106: args = popn(1); pop()(*args)
    0x0108: push(pop() < pop())
    0x0109: push(not pop())
    0x010a: if not(top()): goto(0x0131)
    0x010d: pop()
    0x010e: goto(0x011c)
    0x0111: push(frame[5])
    0x0113: push(1)
    0x0115: push(pop() + pop())
    0x0116: frame[5] = top()
    0x0118: pop()
    0x0119: goto(0x0100)
    0x011c: push(frame[1])
    0x011e: push(frame[5])
    0x0120: push(frame[4])
    0x0122: push(frame[5])
    0x0124: push(ord)
    0x0126: push('A')
    0x0128: args = popn(1); pop()(*args)
    0x012a: a, b = popn(2); push(a - b)
    0x012b: arr, idx = popn(2); push(arr[idx])
    0x012c: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
    0x012d: pop()
    0x012e: goto(0x0111)
    0x0131: pop()
    0x0132: pop()
    0x0133: push(frame[1])
    0x0135: push(ord)
    0x0137: push('{')
    0x0139: args = popn(1); pop()(*args)
    0x013b: push(13)
    0x013d: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
    0x013e: pop()
    0x013f: push(frame[1])
    0x0141: push(ord)
    0x0143: push('}')
    0x0145: args = popn(1); pop()(*args)
    0x0147: push(19)
    0x0149: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
    0x014a: pop()
    0x014b: push(frame[1])
    0x014d: push(ord)
    0x014f: push('_')
    0x0151: args = popn(1); pop()(*args)
    0x0153: push(9)
    0x0155: arr, idx, obj = popn(3); arr[idx] = obj; push(obj)
    0x0156: pop()
    0x0157: push(frame[1])
    0x0159: return
    0x015a: push(None)
    0x015b: return

It’s a lot of code. Luckily, after reversing bubblewrap, we have a general feel for the structure of the code. This program uses classes, specifically a class called nwlrbbmqbh, along with a constructor function called init which creates instance variables l, r, v, and f. Combined with the problem description (“Search for the flag through the forest.”), I made a guess that this class represented a binary tree node with children l and r, and value v.

Reversing this program took some time, but eventually resulted in the following Python implementation:

append = list.append

class nwlrbbmqbh:
    def __init__(self, arg):
        self.l = None
        self.r = None
        self.v = arg
        self.f = 0

def main():
    q = bldbefsarc()
    k = 'MJZt0FWGkaqpSGqMphfCpDqRgW'
    path = input('> ')
    root = jmowfrxsjy(q)
    cur = root
    c = []
    for f1 in range(len(path)):
        val = ord(path[f1])
        for f3 in range(8):
            if cur.l is None and cur.r is None:
                append(c, cur.v)
                cur = root
            if (val & 1) == 1:
                cur = cur.r
            else:
                cur = cur.l
            val >>= 1

    if len(c) != len(k):
        exit(1)

    for f1 in range(len(k)):
        if c[f1] != k[f1]:
            print(c[f1])
            print(k[f1])
            exit(1)

    exit(0)

def hiddqscdxr(f1, f2):
    f3 = nwlrbbmqbh(f2)
    f3.f = f2.f
    if f1 is None:
        return f3

    f4 = f1
    while f4.r is not None and f3.f > f4.r.f:
        f4 = f4.r

    f3.r = f4.r
    f4.r = f3
    return f1

def jmowfrxsjy(f1):
    f2 = None
    for f3 in range(len(f1)):
        if f1[f3] != 0:
            f4 = nwlrbbmqbh(chr(f3))
            f4.f = f1[f3]
            f2 = hiddqscdxr(f2, f4)
    while f2 is not None and f2.r is not None:
        f3 = f2.v
        f2 = f2.r
        f4 = f2.v
        f2 = f2.r
        f5 = nwlrbbmqbh(0)
        f5.f = f3.f + f4.f
        f5.l = f3
        f5.r = f4
        f2 = hiddqscdxr(f2, f5)
    return f2.v

def bldbefsarc():
    f1 = [0] * 256
    f2 = [18, 19, 20, 14, 11, 21, 17, 13, 18, 15]
    f3 = [14, 10, 16, 14, 17, 17, 15, 14, 19, 17, 24, 15, 8, 19, 16, 14, 14, 25, 9, 12, 17, 10, 8, 15, 11, 21]
    f4 = [11, 13, 11, 17, 16, 16, 14, 22, 14, 18, 19, 15, 18, 17, 16, 18, 8, 13, 12, 19, 17, 19, 18, 12, 12, 17]
    f5 = ord('0')
    while f5 <= ord('9'):
        f1[f5] = f2[f5 - ord('0')]
        f5 += 1

    f5 = ord('a')
    while f5 <= ord('z'):
        f1[f5] = f3[f5 - ord('a')]
        f5 += 1

    f5 = ord('A')
    while f5 <= ord('Z'):
        f1[f5] = f4[f5 - ord('A')]
        f5 += 1

    f1[ord('{')] = 13
    f1[ord('}')] = 19
    f1[ord('_')] = 9
    return f1

The program, in essence, does is the following:

  • bldbefsarc initializes an array of 256 ints, one per byte, in which each int is a “frequency count” (or zero if the int is not present)
  • jmowfrxsjy converts the frequency count array into a Huffman code binary tree. It does this by using hiddqscdxr to insert all of the non-zero frequency counts into a sorted linked list (formatted as a tree with only right children), then repeatedly taking the lowest two counts and joining them into a single tree node. The result is a binary tree which efficiently encodes strings with the provided frequency distribution. (There’s a bug in the sorting algorithm which results in the first element being out of order, so we have to use this specific Huffman tree implementation to attain bug compatibility).
  • main (artificial name for the script “function”) then takes your input, treats it as a bit string, and repeatedly reads Huffman codes from it. It then compares the resulting decoded characters with the expected output.

The server’s randomly-generated binaries only differ in the frequency table in bldbefsarc and the expected output in main.

Armed with the knowledge of this implementation, reversing it is easy: we just have to take the desired output and encode it using the same Huffman code tree into a bitstring. As it turns out, once you get the input correct, the server sends another random program and expects another input. You have to successfully pass five rounds of this to get the flag.

One extra detail was that the bitstring decoded by main must be a multiple of 8 bits in length. However, the Huffman code for the final expected character might not end at a byte boundary, so we have to pad the final byte out to 8 bits. If we choose the wrong padding bits, however, we might end up accidentally adding a valid Huffman code and producing an extra character. Lazily, I decided to just bruteforce the correct final character by calling the pactvm program with every possibility, although this did require patching the program to fix a bug in the native input function(): by default, it reads input from fd 1 (stdout), and I had to patch it to read from fd 0 (stdin) instead.

Aside - Line Numbers

The bytecode comes with line numbers, which made me curious to know if you could use line numbers to infer the syntax of the programming language that compiles down to pactb. In fact, it turns out you can, to some extent! The line numbers tell us that there’s a for-range syntax, and that there are likely braces (or at least explicit “end” markers) at the end of blocks.

The following disassembly, presented in very mangled C/Python-ish syntax, presents each line of code on the line indicated in the bytecode (some line breaks were added/removed to make line numbers correspond). Some bytecode corresponds to things like automatic nil returns and loop jumps, and corresponds to close-braces in the code below.

class nwlrbbmqbh {
    def init(arg) {
        this.l = nil
        this.r = nil
        this.v = arg
        this.f = 0
        return this
    }
}
def hiddqscdxr(f1, f2) {
    f3 = nwlrbbmqbh(f2)
    f3.f = f2.f
    if f1 == nil {
        return f3
    }

    f4 = f1
    while f4.r != nil and f3.f > f4.r.f {
        f4 = f4.r
    }
    f3.r = f4.r
    f4.r = f3
    return f1
}

def jmowfrxsjy(f1) {
    f2 = nil
    for f3 = 0; f3 < len(f1); f3++ {
        if f1[f3] != 0 {
            f4 = nwlrbbmqbh(chr(f3))
            f4.f = f1[f3]
            f2 = hiddqscdxr(f2, f4)
        }
    }
    while f2 != nil and f2.r != nil {
        f3 = f2.v
        f2 = f2.r
        f4 = f2.v
        f2 = f2.r
        f5 = nwlrbbmqbh(0)
        f5.f = f3.f + f4.f
        f5.l = f3
        f5.r = f4
        f2 = hiddqscdxr(f2, f5)
    }
    return f2.v
}

def bldbefsarc() {
    f1 = zeros(256)

    f2 = [18, 19, 20, 14, 11, 21, 17, 13, 18, 15]
    f3 = [14, 10, 16, 14, 17, 17, 15, 14, 19, 17, 24, 15, 8, 19, 16, 14, 14, 25, 9, 12, 17, 10, 8, 15, 11, 21]
    f4 = [11, 13, 11, 17, 16, 16, 14, 22, 14, 18, 19, 15, 18, 17, 16, 18, 8, 13, 12, 19, 17, 19, 18, 12, 12, 17]

    for f5 = ord('0'); f5 <= ord('9'); f5++ {
        f1[f5] = f2[f5 - ord('0')]
    }

    for f5 = ord('a'); f5 <= ord('z'); f5++ {
        f1[f5] = f3[f5 - ord('a')]
    }

    for f5 = ord('A'); f5 <= ord('Z'); f5++ {
        f1[f5] = f4[f5 - ord('A')]
    }
    f1[ord('{')] = 13
    f1[ord('}')] = 19
    f1[ord('_')] = 9
    return f1
}

q = bldbefsarc()
k = 'MJZt0FWGkaqpSGqMphfCpDqRgW'
path = input('> ')
root = jmowfrxsjy(q)
cur = root
c = []

for f1 = 0; f1 < len(path); f1++ {
    val = ord(path[f1])
    for f3 = 0; f3 < 8; f3++ {
        if cur.l == nil and cur.r == nil {
            append(c, cur.v)
            cur = root
        }
        if (val & 1) == 1 {
            cur = cur.r
        } else {
            cur = cur.l
        }
        val >>= 1
    }
}

if len(c) != len(k) {
    exit(1)
}
for f1 = 0; f1 < len(k); f1++ {
    if c[f1] != k[f1] {
        print(c[f1])
        print(k[f1])
        exit(1)
    }
}
exit(0)

Solution

All we need to do is extract out the new frequency table and expected output in each iteration and build the expected input. Here’s the script to do that; attach this to the Python code above:

def get_input(q, k):
    root = jmowfrxsjy(q)
    lookup = {}
    def visit(cur, path):
        if cur.l:
            visit(cur.l, path + "0")
        if cur.r:
            visit(cur.r, path + "1")
        if cur.l is None and cur.r is None:
            lookup[cur.v] = path
    visit(root, "")

    fullpath = "".join(lookup[c] for c in k)
    out = bytearray()
    for i in range(0, len(fullpath) - 8, 8):
        ch = 0
        for j in range(8):
            ch |= int(fullpath[i+j]) << j
        out.append(ch)
    return out

from pwn import *
import subprocess
import re
import sys
import ast

# context.update(log_level='debug')

s = remote('ctf.b01lers.com', 9304)
for i in range(5):
    hexdump = s.recvuntil(b"\n> ", drop=True)
    with open('test.xxd', 'wb') as f:
        f.write(hexdump)
    subprocess.check_call(["xxd", "-r", "test.xxd", "test.bin"])
    disas = subprocess.check_output([sys.executable, "disas.py", "test.bin"])
    with open('test.txt', 'wb') as f:
        f.write(disas)
    disas = disas.decode()

    k = re.findall(r"push\(([^)]+)\)\n.+k = pop\(\)", disas)[0]
    print(k)
    k = ast.literal_eval(k)
    _, x = disas.split("bldbefsarc:\n", 1)
    pushes = [int(c) for c in re.findall(r"push\((\d+)\)", x)]
    assert pushes[0] == 256
    digits = pushes[1:11]
    lcase = pushes[11:37]
    ucase = pushes[37:63]
    assert pushes[63] == 1
    assert pushes[64] == 1
    assert pushes[65] == 1
    lbrace = pushes[66]
    rbrace = pushes[67]
    uscore = pushes[68]
    assert len(pushes) == 69

    q = [0] * 256
    f5 = ord('0')
    while f5 <= ord('9'):
        q[f5] = digits[f5 - ord('0')]
        f5 += 1

    f5 = ord('a')
    while f5 <= ord('z'):
        q[f5] = lcase[f5 - ord('a')]
        f5 += 1

    f5 = ord('A')
    while f5 <= ord('Z'):
        q[f5] = ucase[f5 - ord('A')]
        f5 += 1

    q[ord('{')] = lbrace
    q[ord('}')] = rbrace
    q[ord('_')] = uscore

    poss = get_input(q, k)
    log.info("Prefix: %s", poss)
    # The requirement for the trailing bits is that they cannot produce a valid
    # path; otherwise, an extra character would be appended. We could work out
    # what paths would work, but it'll be easier to just bruteforce it.
    for last in range(32, 128):
        proc = subprocess.Popen(["../pactvm.patched", "test.bin"], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
        stdout, stderr = proc.communicate(poss + bytes([last, 10]))
        if proc.returncode == 0:
            password = poss + bytes([last])
            log.info("Valid password: %s", password)
            break

    s.sendline(password)

s.interactive()

This quickly chews through all five iterations and spits out our flag:

bctf{search_the_forest_through_the_trees}