Very Generally, How To Write Your First Compiler

24 September 2024

Writing a compiler can be a fulfilling first multi-folder coding project, for several reasons:

  • Compilers are probably the most ubiquitous software in the world, so it’s satisfying to having some sense of their inner workings.
  • There are plenty of in-depth resources on the general techniques of their implementation, so there’s always something to look at when one’s lost 1.
  • In spite of this, depending on the implementer’s goals, it is still possible to get tons of practice in making design decisions, which is good practice in getting used to open-endedness, which is a feature of most problems, coding and otherwise.

If you’re looking for a step-by-step, technically deep guide on implementing a compiler, this is not a blog post for you. The meat of this very short blog post is meant to be a very abstract overview of the general steps you might go through in working on a compiler, speaking from my experience having worked on compilers through class projects and research during my bachelor’s and master’s programs.

  1. Define valid program syntax, or what valid programs look like.
  2. Define what your compiler is expected to return when given a valid program.
    • You may not be able to write down exactly how to get to the expected output given an input in one sitting (because, that would be writing the compiler).
    • But you can specify the form your output takes, as well as its function.
      • “Form” here can be assembly, an executable file, or another program that a different compiler/interpreter can run.
      • “Function” here asks a more abstract question to answer: What should a valid input program’s valid output “do”? (More formally, you can call this the language’s operational semantics.)
  3. Realize a parser for your language.
    • A parser will accept programs that look like valid programs without taking into consideration the meaning or semantics of individual components of the program.
    • For example, a parser won’t check if your program reads from a variable not defined yet, but can check for imbalanced parentheses.
    • One can handwrite a parser or generate it using tools like lex or yacc. Handwritten parsers require more effort but generated parsers require learning a separate tool and adapting to how said tool handles errors.
  4. Write a semantic analyzer for your language, for catching semantic errors.
    • Between this and the previous step, a couple different representations of your program may emerge.
    • You can operate directly on the program text to check for syntax and semantic errors but compilers frequently create specialized representations for each.
  5. Write code for representing your program’s control flow, lower it closer to the target language, and optimize your code.
    • Many compiler optimizations rely on control flow, or knowing the possible orders in which a program’s parts can execute. For example, a variable may be assigned to but never read, or maybe that particular assignment is never used because of an assignment later down the line, but it is necessary to check all possible paths of program execution to know this.
    • Transforming your program representation into one that reflects control flow helps enable these optimization.
    • For ease of writing, it may be better to modularize these optimizations, i.e. write each optimization separately while keeping the program representation used between them the same.
    • Careful though, as individually correct optimizations may become incorrect when applied together due to mismatched or incorrectly handled assumptions.
  6. Generate target language code, test, and repeat all until satisfied.

The textbooks in the footnotes certainly do a much more detailed explanation of how to write a compiler, as well as tried-and-tested ways to represent the intermediate structures therein, so I highly recommend checking out said resources before implementing. Writing a compiler can be a fun and fulfilling way to learn more about how compilers work and to practice writing, testing, and rewriting code. It is hoped that this blog post makes the task look less intimidating.

  1. For example, a quick DuckDuckGo search gave me: Appel’s Modern Compiler Implementation in Java/ML/C; Cooper/Turzon’s Engineering a Compiler;Aho/Sethi/Ullman’s Compilers: Principles, Tools, and Techniques; Muchnick’s Advanced Compiler Design and Implementation. Reference books are expensive, but there are ways around directly purchasing them (e.g., local libraries, online or in-person friends and acquaintances). Also, there are far more resources than listed, both specifically for compilers and for adjacent topics like introductory discrete math and automata theory.