Zig/Internals

From Gentoo Wiki
< Zig
Jump to:navigation Jump to:search

This article describes how Zig compiles Zig source code into executable.[1]

Tokenizer

Zig splits the input buffer into Token, defined at this file:

FILE lib/std/zig/tokenizer.zig
pub const Token = struct {
        tag: Tag,
        loc: Loc,

        pub const Loc = struct {
            start: usize,
            end: usize,
    };
    ...
    pub const Tokenizer = struct {
        buffer: [:0]const u8,
        index: usize,
    ...
        const State = enum {
            start,
            expect_newline,
            identifier,
            builtin,
            string_literal,
    ...
        /// After this returns invalid, it will reset on the next newline, returning tokens starting from there.
        /// An eof token will always be returned at the end.
        pub fn next(self: *Tokenizer) Token {
    ...

Token tag field defines what type of token it is, a keyword or doc comment. Token Loc field defines the token contents, not including at the end in Tokenizer buffer field.

Tokenizer buffer field is the input file. Tokenizer index is for next() to start parsing at. The enum State is meant the store the state inside next() is currently at to help contruct Token in DFA.

The actual function getting tokens is next(), which is called repeatedly until it returns EOF token.

References