JCCIDC
JCCIDC
Joint Cognitive Control
& Intelligent Design Company
← BACK TO CASE STUDIES

Building a Compiler from Scratch

TypeScript · Node.js · Recursive Descent Parser · AST · Bytecode Generation · 2025

Overview

A production-grade TypeScript compiler for a domain-specific scripting language used in an embedded hardware automation environment. Built as a 5-stage pipeline targeting a constrained bytecode runtime, with full language coverage, actionable error messages, and deep integration with IDE tooling.

The Challenge

The target language runs in a constrained embedded environment — limited memory, strict frame timing, and unusual control-flow constructs. The goal was a modern compiler pipeline that slots into an existing ecosystem while unlocking better developer experience.

Architecture

Source Code (.gpc)
      |
  [Scanner]  -- Tokenization (keywords, literals, operators)
      |
  [Parser]   -- Recursive descent, AST construction
      |
  [Analyzer] -- Semantic analysis, scope resolution, type checking
      |
  [Compiler] -- AST to intermediate representation
      |
  [Generator] -- IR to bytecode assembly + raw binary
      |
Output (.bin)

Key Technical Decisions

Recursive Descent over Parser Generators — Chose hand-written recursive descent for full control over error messages and recovery. The language has unusual constructs (combo blocks, hardware-specific keywords) that would fight a generated parser.

Two-Pass Semantic Analysis — First pass collects all declarations (functions, defines, data sections). Second pass resolves references, validates types, and checks constraints. This allows forward references without requiring declaration order.

Constrained Bytecode Generation — The target runtime imposes strict memory and timing envelopes. The code generator budgets instruction counts, aligns data sections, and emits output that slots cleanly into the existing ecosystem toolchain.

Compiler Statistics

Metric Value
Opcodes implemented 61 (full language surface)
Language features All (functions, combos, data sections, defines, remaps)
Scanner tokens 45+ token types
AST node types 30+
Error codes 41 with human-readable messages
Test coverage Golden-file regression suite + runtime acceptance tests

Verification Strategy

The test suite runs real-world scripts through the full pipeline and asserts that the generated output interoperates correctly with the runtime under the expected memory and timing envelope.

Key Learnings

  • Compiler pipelines reward discipline — every pass should have one job and one invariant, and stages should be testable in isolation
  • Error messages are a product feature — users see compiler errors more than they see working code
  • The scanner is the simplest stage but has the most edge cases (string escaping, numeric formats, comment nesting)
  • Forward references make users happy but make the compiler author's life harder — worth the trade
TypeScriptNode.jsRecursive Descent ParserASTBytecode Generation