ABI Cafe

Not sure if your compilers have matching ABIs? Then put them through the ultimate compatibility crucible and pair them up on a shift at The ABI Cafe! Find out if your one true pairing fastcalls for each other or are just another slowburn disaster. (Maid outfits optional but recommended.)

Quickstart

To run ABI Cafe, just checkout the repository and cargo run!

(cargo install TBD...)

What Is This

ABI Cafe automates testing that two languages/compilers agree on their ABIs.

ABI Cafe is essentially an ABI fuzzer, which:

If they agree, great!

If they don't agree, even better, we just learned something! We then try to diagnose why they disagreed, and generate a minimized version that a human can inspect and report!

Now do this a bajillion times and suddenly we're learning a whole lot! Alternatively, you can hand-craft any type or function signature you're interested in, and explore its interoperability between different toolchains.

ABI Cafe is purely descriptive. It has no preconceived notion of what should work, and it doesn't trust any damn thing anyone says about it. We don't analyze assembly or metadata, and we'll gleefully create programs riddled with Undefined Behaviour. We're here to learn not lecture.

This design is based on a fundamental belief that ABIs exist only through sheer force of will. The spec if often "read GCC's source code", and damn if that ain't an error-prone process. Also GCC doesn't even know you exist, and you're only going to keep interoperating with them if you check and maintain your work. So here's a tool for checking and maintaining your work!

Choose Your Own Adventure

usage

To run ABI Cafe, just checkout the repository and cargo run!

(Working on a prebuilt solution, a few last blockers to resolve before shipping that.)

While ABI Cafe isn't a For Reals Fuzzer (yet?), it accomplishes a similar goal through the magic of procedural generation and combinatorics. These docs serve to describe the N layers of combinatorics we use to turn a grain of sand into a mountain of broken compilers.

When you run abi-cafe we will end up running the cross-product of all of these settings, typically resulting in thousands of function calls. See the subsections for details!

You can also run --help to get information on all the supported features.

As Part Of Your Testsuite

We're still cleaning up the details of this usecase to make it nicer. If you would like to use abi-cafe in your testsuite, please let us know what you'd need/want!

For now, we can at least gesture to these two examples:

tests

ABI Cafe tests are defined are KDLScript header files describing an interface, for which we'll generate and build many different implementations (callees) and users (callers). KDLScript was purpose-made for ABI Cafe, with a custom syntax that avoids us tying our hands to any particular language/semantics. The syntax will feel fairly familiar to Rust programmers.

Each function in the test is regarded as a "subtest" that can be individually passed/failed.

Adding Tests

The default suite of tests can be found in /include/tests/, which is statically embedded in abi-cafe's binary. You don't need to register the test anywhere, we will just try to parse every file in the tests directory.

There are two kinds of tests: .kdl ("normal") and .procgen.kdl ("procgen").

Procgen tests are sugar for normal tests, where you just define a type with the same name of the file (so MetersU32.procgen.kdl is expected to define a type named MetersU32), and we generate a battery of types/functions that stress test that the ABI handles that type properly.

We recommend preferring procgen tests, because they're simpler to write and will probably have better coverage than if you tried to manually define all the functions.

Suggested Examples:

  • simple.kdl - a little example of a "normal" test with explicitly defined functions to test
  • SimpleStruct.procgen.kdl - similar to simple.kdl, but procgen
  • MetersU32.procgen.kdl - an example of a "pun type", where different languages use different definitions
  • IntrusiveList.procgen.kdl - an example of how we can procgen tests for self-referential types and tagged unions
  • i8.procgen.kdl - ok this one isn't instructive it's just funny that it can be a blank file because i8 is builtin so all the info needed is in the filename

Test Rules (Expectations)

ABI Cafe test expectations are statically defined in ABI Cafe's code. There is a region specifically reserved where patchfiles can be applied to the codebase to allow updates to happen.

The benefit of this approach is that you get the full expressivity of actual code to specify when a test should fail and why, but having proper runtime test expectation files is a good idea.

We call test expectations TestRules, which have two settings:

  • TestRunMode: up to what "phase" should we run the test. These are, in increasing order:
    • Skip: don't run the test at all (marked as skipped)
    • Generate: generate the source code
    • Build: compile the source code into staticlibs
    • Link: link the staticlibs into a dylib
    • Run: load and run the dylib's main function
    • Check: check the output of the dylib for agreement
  • TestCheckMode: to what level of correctness should the test be graded?
    • Pass(TestRunMode): The test must succesfully complete this phase, after that whatever
    • Fail(TestRunMode): The test must fail at this exact phase (indicates failing is "correct")
    • Busted(TestRunMode): Same as Fail, but indicates this is a bug/flaw that should eventually be fixed, and not the desired result longterm.
    • Random: The test is flakey and random but we want to run it anyway, so accept whatever result we get as ok.

The default TestRule is of course to run everything and expect everything to work (run: Check, check: Pass(Check)).

--tests

By default we will run all known tests. Passing the names of tests (the filename without the extension(s)) to --tests will instead make us run only those tests (unlike the cargo test harness this isn't a fuzzy/substring match (but it could be if someone wants to implement that)).

--add-tests

While it's ideal for tests to be upstreamed into ABI Cafe's codebase where everyone can benefit from them, you can also add your own custom tests that are read at runtime (instead of baked into the binary) by passing a path to a directory containing them to --add-tests.

--disable-builtin-tests

If, for whatever reason, you want all the builtin tests to go away, you can pass --disable-builtin-tests to do so. Presumably you'll want to use --add-tests as well if you do.

calling conventions

A calling convention is as close as ABI Cafe ever gets to referring to "An ABI" directly, but they're still pretty abstract, since a single calling convention can mean different things on different platforms.

By default, for each test we will generate a copy of it for every known calling convention (changing the convention of all functions declared by that test).

Each Toolchain may claim to support a particular set of calling conventions (and may use knowledge of the target platform to adjust their decisions). Refusing to support a convention will result in those tests getting marked as "skipped" and omitted from the final report.

If two Toolchains claim to support a calling convention on a platform, it is assumed that they want to have compatible ABIs, and it's our goal to identify what does and doesn't work.

--conventions

All of the following conventions are enabled by default, and only these conventions are supported.

Universal Conventions:

  • c: the platform's default C convention (extern "C")
  • rust: the platform's default Rust convention (extern "Rust")

Windows Conventions:

  • cdecl
  • fastcall
  • stdcall
  • vectorcall

(There exists some code for other weird conventions rustc supports, but they aren't really wired up properly and it's not clear if they serve any purpose.)

lang reprs

Lang reprs abstractly describe an interop target for the layout of structs and enums and the like. These currently exactly match the "lang reprs" in KDLScript.

For each test we will generate a copy of it for every enabled lang repr (changing the definitions of all types which don't specify an explicit repr).

--reprs

All of the following reprs are enabled by default, and only these reprs are supported.

  • c: layout structs in a C-compatible way (repr(C))
  • rust: layout structs in a Rust-compatible way (repr(Rust))

toolchains

Toolchains refer to a specific compiler (or configuration of a compiler). The entire purpose of ABI Cafe is to take two compilers and pair them up, checking that code built with one can properly call into code built by the other.

Within ABI Cafe, each Toolchain also comes with a code generation backend, which can take a header file describing some types and functions and generate either an implementation of those functions, or a caller of those functions.

--toolchains

The following toolchains are available, but only "rustc" and "cc" are enabled by default.

  • rustc - uses the rustc on your PATH
  • cc - gets the "system" C compiler via the CC crate (supports msvc on windows)
  • gcc - explicitly run the gcc on your PATH (probably less reliable than cc)
  • clang - explicitly run the clang on your PATH (probably less reliable than cc)
  • msvc (incomplete)

You can also add custom rustc codegen backends as new toolchain (inheriting all the behaviour of the rustc toolchain) with --rust-codegen-backend=mytoolchain:path/to/codegen_backend. Where mytoolchain is a custom id for referring to it in --pairs and test output.

--pairs

By default, we will look at the enabled toolchains and pair them with themselves and all the default "pairer" toolchains (if they're enabled). The default pairer toolchains are "rustc" and "cc".

With the default toolchains enabled, that means we will test:

  • rustc_calls_rustc
  • cc_calls_cc
  • rustc_calls_cc
  • cc_calls_rustc

Adding A Toolchain

Adding a toolchain has two levels of difficulty:

  • Easier: Adding a new compiler or mode for an existing language
  • Harder: Adding a brand new language (and its compiler)

The easier case is probably "adding some settings to an existing Toolchain" while the harder case is probably "writing a code generator for a language (with the help of abi-cafe's libraries)".

(Although if you want to add a C++ backend, probably you want it to be a variant of the C toolchain, and not a whole new one, since, a lot of overlap there?)

Adding A New Compiler Or Mode For An Existing Language

All the work you'll probably want to do is in src/toolchains/mod.rs. Looking at how gcc is implemented as a variant of CcToolchain (src/toolchains/c.rs) is probably informative.

At a minimum you will need to change toolchains::create_toolchains to create and register your Toolchain.

Toolchain::compile_caller and Toolchain::compile_callee will likely need to be changed to select your compiler, or use the compiler flags for your mode.

Toolchain::generate_caller and Toolchain::generate_callee may also need to be modified to generate source code that is compatible with your new compiler/mode. For instance when adding a

To test your new toolchain out you can first make sure it works with itself by running:

cargo run -- --toolchains=mytoolchain

(where "mytoolchain" is the id you registered in create_toolchains)

Once more confident you can pair it up with other compilers by running:

cargo run -- --toolchains=mytoolchain,rustc,cc

Adding A New Language (And Its Compiler)

In addition to the things you need to do in the previous section, you now need to specify how to generate source code for your language given a header file describing some types and functions.

I have good news, bad news, and okay news.

The good news is we have several libraries and utilities for helping with this.

The bad news is that no matter what you're going to need to create like a thousand lines of business logic for specifying the syntax of the language.

The okay news is that a lot of this can be accomplished without too much pain by copying one of the existing Toolchains and just editing it incrementally, aggressively returning UnimplementedError to indicate unsupported features, or things you just haven't gotten to yet. This is totally fine, all the backends have places where they give up!

As a codegen backend you will need to answer 3 major questions:

  • declare: How do you declare types in this language?
  • init: How do you initialize values in this language?
  • write: How do you access and print fields of values in this language?

This largely amounts to writing recursive functions which match on a type definition and loop over all the fields of that type, or handle primitive types as the base case.

For reference, when the author of ABI Cafe rewrote the codebase, the C backend was recreated mostly from scratch in a day by copying the Rust implementation and changing Rust syntax to C syntax.

In doing this, you will have 3 major allies (assuming you match the idioms of the other codgen backends):

  • state.types (TypedProgram) is the type system of the KDLScript Compiler, which gives you type ids for interning state, and handles computing facts about the type definitions
  • state.vals(ValueTree) has the values and enum variants your program should use
  • f (Fivemat) is an indent-aware Write implementation for creating pretty formatted code

value generators

When generating the source for a program to test, we need to pick values which will be passed to all the functions. These values are baked into the program, with both sides statically knowing which values they should have, and the test harness also having that information available for checking and diagnostics.

As later sections will explain, this system also allows us to do magic like "generate only the necessary branches of match statements", "generate random-depthed values of self-referential types", "derive debug on c-like untagged unions", and "compare the values of type puns with different fields".

--gen-vals

By default we only set --gen-vals=graffiti, as this mode produces the most useful diagnostic information.

The possible values are

  • graffiti: prefers patterning the bytes of values in a way that helps you identify which byte of which field each recorded value was.
  • randomN (random1, random37, ...): seeds an RNG with N to make random (repeatable) values with

graffiti values

The grafitti pattern stores two indices in each byte: the high nibble contains the index of the field that the byte belongs to (mod 16), and the low nibble contains the index of the byte (mod 16).

For instance graffiti values should look something like:

let array_of_points = [
    Point { x: 0x0102_0304, y: 0x1112_1314 },
    Point { x: 0x2122_2324, y: 0x3132_3334 },
]
let bytes = [0x40, 0x50, 0x60, 0x70];

some_func(array_of_points, bytes);

The benefit of this system is that when there's an ABI mismatch if you see something like:

mismatch in some_func val 2 (array_of_points[1].x: u32)
expect: [21, 22, 23, 24]
caller: [21, 22, 23, 24]
callee: [23, 24, 31, 32]

You can pretty clearly see that the callee got half its bytes from val 2, and half of its bytes from val 3, indicating some kind of alignment/padding disagreement.

The Value Tree

The value tree is the solution that was created to address the various problems sketched out in this article on ABI Cafe.

Value Tree Problem Statement

There are several problems we need ABI Cafe to solve regarding values:

  • We need to be able to pick values for fields in a deterministic/repeatable way so we can reliably reproduce any issues found
  • We would like the test harness to know what values the programs should have when checking them (at very least to produce better diagnostics)
  • Given a type with "cases" like a tagged union, untagged union, or c-like enum, we need to be able to pick the case it should have, and deal with the fact that this changes the number/names of the fields
  • If we want to report the values of an untagged union, we find ourselves needing to derive(Debug) on an untagged union, which is impossible (nothing in the value itself specifies its case)
  • Self-referential types may have an arbitrary number of values (like a linked list)
  • Type puns may introduce completely different shapes/paths for the same logical values ((u32, u32) vs [u32; 2])

The value tree is able to handle all these cases!

Value Tree Solution

Essentially the key insight here is that no matter how complicated a type is, an instance of that type has a tree structure (which syntax like x.y[2].3 is a path on). A traversal of that tree then gives a linear list of (paths to) fields and types, for which we need to generate values.

(Note: intrusive values which contain pointers to themselves aren't treelike, but we simply don't support those because they aren't interesting to us. When we refer to self-referential types, we're talking about types which can contain other instances of the same type, and not literally the same instance.)

When traversing a struct, there isn't anything special to do: we can just recursively traverse each of its fields. When traversing a tagged or untagged union we apply a simple trick: we introduce an additional artificial leaf field representing the case (tag) of the union.

Since we're already assuming we have a way to deterministically generate pseudorandom values, this gives us a way to uniformly select one of the cases of the union for each instance of the type. This also naturally handles all valid self-referential types, because they inevitably need to contain something that looks like Option<Self>. Each time we encounter that Option, we are essentially flipping a coin as to whether we should add another layer of Self, or finish.

These artificial tag fields also give us a way to "derive(Debug) on an untagged union" -- because we're emitting the print statements totally inline for each instance, the code generator can consult the value tree for which case each union has, and only emit the code for accessing that case.

This ends up being nice even for tagged unions, because it lets us emit only a minimal if let instead of a full match. We even get an internal entry for "which case, semantically, did the program think the tagged union was in", which the program can use to report back that information back to to the harness (as a u32 value, reporting u32::MAX in the else branch of the if-let to indicate "not the one we expected").

Also, we don't actually care about any of the internal fields, we only care about the leaf (primitive) fields, which have actual logical bytes we can inspect and compare (or the cases of our unions). So, as long as two value tree traversals produce lists with the same lengths, we can compare them, allowing us to handle puns like (u32, u32) vs [u32; 2] or even more complex cases readily.

Value Tree Compromises

The only kind of pun we really can't handle is one like u64 vs (u32, u32), because the counts desync. In the current implementation when traversing a pun, we assert that all blocks of the pun produce the same length list. This is a bit sad, but honestly, that was always going to be a semantic nightmare.

For various reasons it's tempting to want to "statically" number the leaf fields, such that x.y[2].3 can be referred to by a stable index, regardless of the value generation strategy. This is theoretically possible even with unions, because you can traverse even the cases that aren't selected and still number them.

However the notion of a static numbering completely falls apart once you introduce self-referential types, as there is no static bound on the size of a self-referential subtree (and trying to artifically bound it is more work than it's worth). I also vaguely recall this completely breaking my brain to think about in the context of type puns, so, no big loss. We can just make value numbering value-generation-scoped, and get much the same benefit.

The same logic that allows for type puns to be handled would also allow function puns to be handled (fn(x: u32, y: u32) vs fn(point: (u32, u32))). We currently don't allow for this, in an attempt to make diagnostics better. In the future we might lift this restriction.

value selectors

When generating the source for a program to test, we want a way to identify which of the values of the function arguments we actually care about writing somewhere. This is specifically relevant when generating a minimized test case.

Let's say we generate a program with a dozen functions, and specifically we find a mismatch in function3 on arg2.field2[7].y, and want to now generate a minimized program that focuses on only that one value. What is safe to throw out?

It's hopefully safe to throw out all the other functions, because that kind of spooky action at a distance isn't something we're concerned with. (There is a concern that removing random garbage on the stack from previous function calls could change the values captured if something is reading ~padding bytes. We try to avoid repetitive values to minimize the chance of this mattering.)

Once we throw away functions, we can also throw away any type definitions that aren't used by the remaining function. (God help your poor compiler if this substantially changes the result.)

Now we have function arguments we don't actually care about but removing those is expected to change the ABI of the function, and will likely make the mismatch disappear. Changing the values of the arguments is also something we want to avoid here, as this may change which variants unions take on and has a chance of introducing unfortunate new coincidental value matches.

What we can hopefully do is remove all the code that writes the other arguments/values to output. So in an ideal world the final result is something like this:

struct SomeComplicatedType {
    /* omitted for brevity of example */
}

fn function3(arg0: f32, arg1: u32, arg2: SomeComplicatedType, arg3: bool) {
    println!("{:?}", arg2.field2[7].y);
}

(This is the callee, the caller ends up being uglier because it needs to still initialize all those values and pass them in.)

--select-vals

This CLI flag is reserved but not yet implemented. Its semantics are however used internally when regenerating a failed test, as described above.

When filtering a test you currently get 3 levels of granularity:

  • functions: all or one
  • arguments: all or one
  • values (fields): all or one

The default is for all levels to be set to "all", because we want to check everything.

When abi-cafe detects an error, it will regenerate the test with all levels set to "one", so that it can highlight only the one field that matters.

value writers

When generating the source for a program to test, we want the program to write the values of the function arguments somewhere for validation: callbacks, prints, asserts, etc.

--write-vals

This isn't a setting you typically want to mess with in normal usage, since the default ("harness") is the only one that is machine-checkable. All the others are intended for minimizing/exporting the test for human inspection (see --minimize-vals below).

The supported writers are:

  • harness: send values to the abi-cafe harness with callbacks
  • print: print the values to stdout
  • assert: assert the values have their expected value
  • noop: disable all writes (see also the less blunt value selectors)

--minimize-vals

This takes the same values as write-vals, but is specifically the writer used when a test has failed and we want to regenerate the test with a minimized human readable output.

The default is "print".

Trophy Case

Has ABI Cafe helped you find/fix a bug in your compiler? We'd love to hear!

KDLScript

KDLScript, the KDL-based programming language!

KDLScript ("Cuddle Script") is a "fake" scripting language that actually just exists to declare type/function signatures without tying ourselves to any particular language's semantics. It exists to be used by ABI Cafe.

Basically, KDLScript is a header format we can make as weird as we want for our own usecase:

struct "Point" {
    x "f32"
    y "f32"
}

enum "ScaleMode" {
    Width
    Height
}

fn "print" {
    inputs { _ "Point"; }
}

fn "scale" {
    inputs { _ "Point"; factor "f32"; scalemode "ScaleMode"; }
    outputs { _ "Point"; }
}

fn "sum" {
    inputs { _ "&[Point; 4]"; }
    outputs { _ "Point"; }
}

Ultimately the syntax and concepts are heavily borrowed from Rust, for a few reasons:

  • The author is very comfortable with Rust
  • This (and ABI Cafe) were originally created to find bugs in rustc
  • Rust is genuinely just a solid language for interfaces! (Better than C/C++)

The ultimate goal of this is to test that languages can properly communicate over FFI by declaring the types/interface once and generating the Rust/C/C++/... versions of the program (both caller and callee) and then linking them into various combinations like "Rust calls C++" to check that the values are passed correctly.

Quickstart

kdl-script is both a library and a CLI application. The CLI is just for funsies.

The main entry point to the library is Compiler::compile_path or Compiler::compile_string, which will produce a TypedProgram. See the types module docs for how to use that.

The CLI application can be invoked as kdl-script path/to/program.kdl to run a KDLScript program.

attributes

KDLScript Attributes start with @ and apply to the next item (function or type) that follows them. There are currently 3 major classes of attributes:

  • repr attrs
    • lang reprs
      • @repr "rust" - use rust's native struct layout
      • @repr "c" - use C-compatible struct layout
    • primitive reprs - for any enums, use the given primitive as its type
      • @repr "u8"
      • @repr "f32"
      • ...
    • transparent repr - equivalent of rust's repr(transparent)
      • @repr "transparent"
  • modifier attrs
    • @align 16 - align to N
    • @packed - pack fields to eliminate padding
  • passthrough attrs
    • @ "literally anything here"

The significance of repr attributes is that providing any explicit repr attribute is considered an opt-out from the default automatic repr all user-defined types receive.

When we generate tests we will typically generate both a repr(rust) version and a repr(C) version. In these versions any user-defined type gets (an equivalent of) those attributes applied to it.

This means that applying @align 16 still leaves a struct eligible to have the rust layout and c layout tested, while applying @repr "u8" to a tagged union does not (if you want to test repr(C, u8), you need to set @repr "C" "u8").

functions

Functions are where the Actually Useful library version of KDLScript and the Just A Meme application version of KDLScript diverge. This difference is configured by the eval feature.

As a library, KDLScript only has function signature declarations, and it's the responsibility of the ABI Cafe backend using KDLScript to figure out what the body should be.

As a CLI binary, KDLScript actually lets you fill in the body with some hot garbage I hacked up.

function signatures

Here is a fairly complicated/contrived example function signature:

fn "my_func" {
    inputs {
        x "u32"
        y "[&MyType; 3]"
        _ "&bool"
    }
    outputs {
        _ "ErrorCode"
    }
}

Functions can have arbitrarily many inputs and outputs with either named or "positional" (_) names which will get autonaming like arg0 and out0.

Currently there is no meaning ascribed to multiple outputs, every backend refuses to implement them. Note that "returning a tuple" or any other composite is still one output. We would need to like, support Go or something to make this a meaningful expression.

Named args could be the equivalent of Swift named args, where the inner and outer name can vary, but the outer name is like, part of the function name itself (and/or ABI)?

Varargs support is also TBD but has a sketch.

Outparams

not implemented distracting ramblings about outparams As discussed in the section on "Reference Types", references in outputs are sugar for out-params, which should appear after the inputs and before outputs. So the above would lower to something like the following in Rust (values chosen arbitrarily here, and we wouldn't use asserts in practice, but instead record the values for comparison):
fn my_func(
    x: u32,
    y: [&MyType; 3],
    arg2: &bool,
    out1: &mut ErrorCode,
) -> bool {
    // Check the inputs are what we expect...
    assert_eq!(x, 5);
    assert_eq!(y[0].val, 8);
    assert_eq!(y[1].val, 9);
    assert_eq!(y[2].val, 10);
    assert_eq!(*arg2, true);

    // Return outputs
    *out1 = ErrorCode::Bad;
    return true;
}


fn my_func_caller() {
    // Setup the inputs
    let x = 5;
    let y_0 = MyType { val: 8 };
    let y_1 = MyType { val: 9 };
    let y_2 = MyType { val: 10 };
    let y = [&y_0, &y_1, &y_1];
    let arg2 = false;

    // Setup outparams
    let mut out1 = ErrorCode::default();

    // Do the call
    let out0 = my_func(x, y, &arg2, &mut out1);

    // Checkout outputs
    assert_eq!(out0, true);
    assert_eq!(*out1, ErrorCode::Bad);
}

God writing that sucked ass, and it wasn't even the "proper" value checking! This is why I built all this complicated crap to automate it!

Update: actually even automating this was miserable, and also outparams aren't really substantial ABI-wise right now, so I'm not sure I'll ever implement outparams. It's more complexity than it's worth!

KDLScript function bodies

The kdl-script compiler does technically

The evaluator has not at all kept up with the type system, so it can only handle some really simply stuff. You can run the examples/simple.kdl. All the other examples will just dump type information and decl order as they don't define main.

> cargo run examples/simple.kdl

{
  y: 22
  x: 11
}
33

Is executing the following kdl document:

struct "Point" {
    x "f64"
    y "f64"
}

fn "main" {
    outputs { _ "f64"; }

    let "pt1" "Point" {
        x 1.0
        y 2.0
    }
    let "pt2" "Point" {
        x 10.0
        y 20.0
    }

    let "sum" "add:" "pt1" "pt2"
    print "sum"

    return "+:" "sum.x" "sum.y"
}

fn "add" {
    inputs { a "Point"; b "Point"; }
    outputs { _ "Point"; }

    return "Point" {
        x "+:" "a.x" "b.x"
        y "+:" "a.y" "b.y"
    }
}

Why Did You Make KDL Documents Executable???

To spite parsers.

Ok more seriously because I needed the parser and type-system for abi-cafe but it's a ton of work so I'm self-motivated by wrapping it in the guise of a scripting language because it's funny and I could make more incremental progress. This in fact worked, because as of the publishing of this book, abi-cafe was rewritten to use kdl-script!

types

The following kinds of types exist in KDLScript.

All of these types can be combined together as you expect, and self-referential types do in fact work!

We do not currently support generics.

primitive types

There are various builtin primitives in KDLScript, such as:

  • integers - fixed width integers
    • i8, i16, i32, i64, i128, i256
    • u8, u16, u32, u64, u128, u256
  • floats - fixed with floating point numbers
    • f16, f32, f64, f128
  • bool- your old pal the boolean
  • ptr - an opaque pointer (void*), used when you're interested in the address as a value (unlike &T)

The lowering of these to Rust is pretty direct, since we're reusing Rust's naming scheme.

The lowering of these to C uses uint8_t and friends for the integers, and then the usual types for the rest.

In the future there will probably be language-specific primitives like c_long...?

struct types

A KDLScript struct type is just what you expect! This definition:

struct "Point" {
    x "f32"
    y "f32"
}

(or struct "Point" { x "f32"; y "f32"; })

is equivalent to this Rust:

#![allow(unused)]
fn main() {
struct Point {
    x: f32,
    y: f32,
}
}

and this C:

typedef struct Point {
    float x;
    float y;
} Point;

Attributes And Layouts

The various KDLScript attributes can be applied to structs to specify how they should be laid out, like so:

@repr "transparent"
struct "MetersU32" {
    _ "u32"
}

If no explicit @repr attribute is applied (the default, which is recommended), the struct will be eligible for repr combinatorics. Basically, we'll generate a version of the test where it's set to #[repr(C)] and version where it's set to #[repr(Rust)], improving your test coverage.

It's up to each compiler / language to implement these attributes however they see fit. But for instance we would expect Rust backends to support both layouts, and C backends to bail on the Rust repr, producing twice as many rust-calls-rust test cases.

Note that repr(transparent) is not currently eligible for repr combinatorics. If you want to test that, set it explicitly.

Tuple Structs

As a convenience, you can omit the names of the fields by calling them _, and we'll make up names like field0 and field1 for you:

struct "Point" {
    _ "f32"
    _ "f32"
}

If all fields have the names omitted, then languages like Rust can emit a "tuple struct". So the above example can/should be emitted like this:

struct Point(f64, f64);

Generic Structs

Generic structs are not supported.

union types

A KDLScript union type is a C-like untagged union. For rust-like tagged unions, see tagged types.

This definition:

union "FloatOrInt" {
    a "f32"
    b "u32"
}

is equivalent to this Rust:

#![allow(unused)]
fn main() {
union FloatOrInt {
    a: f32,
    b: u32,
}
}

and this C:

typedef union FloatOrInt {
    float a;
    int32_t b;
} FloatOrInt;

enum types

A KDLScript enum type is a C-like enum with no nest fields. For a Rust-like enum (tagged union), see tagged types.

This definition:

enum "IoError" {
    FileNotFound
    FileClosed
    FightMe
}

is equivalent to this Rust:

#![allow(unused)]
fn main() {
enum IoError {
    FileNotFound,
    FileClosed,
    FightMe,
}
}

and this C:

typedef enum IoError {
    FileNotFound,
    FileClosed,
    FightMe,
} IoError;

(There are like 3 ways we could lower this concept to C, it's an eternal struggle/argument, I know.)

Attributes And Layouts

The various KDLScript attributes can be applied to enums to specify how they should be laid out, like so:

@repr "u32"
enum "MyEnum" {
    Case1
    Case2
}

If no explicit @repr attribute is applied (the default, which is recommended), the enum will be eligible for repr combinatorics. Basically, we'll generate a version of the test where it's set to #[repr(C)] and version where it's set to #[repr(Rust)], improving your test coverage.

It's up to each compiler / language to implement these attributes however they see fit. But for instance we would expect Rust backends to support both layouts, and C backends to bail on the Rust repr, producing twice as many rust-calls-rust test cases.

Note that repr(u32) and friends are not currently eligible for repr combinatorics. If you want to test that, set it explicitly.

Explicit Tag Values

⚠️ This feature exists in the KDLScript parser but isn't fully implemented yet.

You can give enum variants an integer value (currently limited to i64 range):

enum "IoError" {
    FileNotFound -1
    FileClosed
    FightMe 4
}

It's up to each to each compiler / language to implement these however they see fit.

Value Initialization And Analysis

When initializing an instance of an enum, we will uniformly select a random variant to use (deterministically).

When checking the value of an enum, we will just check its bytes. In the future we may instead check it semantically with a match/switch.

tagged types

A KDLScript tagged type is the equivalent of Rust's enum: a tagged union where variants have fields, which has no "obvious" C/C++ analog. Variant bodies may either be missing (indicating no payload) or have the syntax of a struct body. For c-like enums, see enum types. For c-like untagged unions, see union types.

This definition:

tagged "MyOptionU32" {
    None
    Some { _ "u32"; }
    FileNotFound {
        path "[u8; 100]"
        error_code "i64"
    }
}

Is equivalent to this Rust:

#![allow(unused)]
fn main() {
enum MyOptionU32 {
    None,
    Some(u32),
    FileNotFound {
        path: [u8; 100],
        error_code: i64,
    }
}
}

We may one day implement the C(++) equivalents to this definition which are real and my friend. They could theoretically detect when a tagged is equivalent to a c-like enum, but that kinda defeats the purpose of making them separate concepts for backend simplicity.

Attributes And Layouts

The various KDLScript attributes can be applied to tagged unions to specify how they should be laid out, like so:

@repr "u8"
tagged "MyOptionU32" {
    Some { _ "u32"; }
    None
}

If no explicit @repr attribute is applied (the default, which is recommended), the struct will be eligible for repr combinatorics. Basically, we'll generate a version of the test where it's set to #[repr(C)] and version where it's set to #[repr(Rust)], improving your test coverage.

It's up to each compiler / language to implement these attributes however they see fit. But for instance we would expect Rust backends to support both layouts, and C backends to bail on the Rust repr, producing twice as many rust-calls-rust test cases.

Note that repr(u32) and friends are not currently eligible for repr combinatorics. If you want to test that, set it explicitly.

Tuple Variants

As a convenience (and as shown liberally above), you can omit the names of the fields by calling them _, and we'll make up names like field0 and field1 for you.

If all fields of a variant have the names omitted, then languages like Rust can emit a "tuple variant".

Explicit Tag Values

Tagged unions currently do not support explicit tag values, unlike enums.

Generic Tagged Unions

Generic tagged unions are not supported.

alias

A KDLScript alias type is just what you expect! for superpowered ifdefy aliases, see pun types

alias "MetersU32" "u32"

is equivalent to this Rust:

#![allow(unused)]
fn main() {
type MetersU32 = u32;
}

and this C:

typedef uint32_t MetersU32;

Note that the ordering matches Rust's type Alias = RealType; syntax and not C/C++'s backwards-ass typedef syntax (yes I know why the C syntax is like that, it's very cute).

Attributes And Layouts

The various KDLScript attributes can be applied to aliases, but nothing currently respects them, because, what the fuck?

Generic Aliases

Generic aliases are not supported.

I'm Normal And Can Be Trusted With Codegen

The abi-cafe codegen backends will go out of their way to "remember" that a type alias exists and use it when the alias was specified there. So for instance given this definition:

enum "ComplexLongName" {
    A,
    B,
}

alias "Clean" "ComplexLongName"

struct "Enums" {
    x "Clean"
    y "ComplexLongName"
}

The Rust backend should initialize an instance of Enums as follows:

#![allow(unused)]
fn main() {
let temp = Enums { x: Clean::A, y: ComplexLongName::B }
}

Is this important?

No.

Am I happy the section is longer than the actual description of alias?

Yes.

I will fight every compiler that doesn't work like this. Preserve my fucking aliases in diagnostics and code refactors, cowards. Yes I will accept longer compile times to get this. Who wouldn't? People who are also cowards, that's who.

pun types

A KDLScript pun type is the equivalent of an ifdef/cfg'd type, allowing us to declare that two wildly different declarations in different languages should in fact have the same layout and/or ABI. A pun type contains "selector blocks" which are sequentially matched on much like CSS. The first one to match wins. When lowering to a specific backend/config if no selector matches, then compilation fails.

Here is an example that claims that a Rust repr(transparent) newtype of a u32 should match the ABI of a uint32_t in C/C++:

pun "MetersU32" {
    lang "rust" {
        @repr "transparent"
        struct "MetersU32" {
            _ "u32"
        }
    }

    lang "c" "cpp" {
        alias "MetersU32" "u32"
    }
}

Because of this design, the typechecker does not "desugar" pun types to their underlying type when computing type ids. This means [MetersU32; 4] will not be considered the same type as [u32; 4]... because it's not! This is fine because type equality is just an optimization for our transpiler usecase. Typeids mostly exist to deal with type name resolution.

Pun resolving is done as a second step when lowering the abstract TypedProgram to a more backend-concrete DefinitionGraph.

(alias also isn't desugarred and has the same "problem" but this is less "fundamental" and more "I want the backend to actually emit a type alias and use the alias", just like the source KDLScript program says!)

The currently supported selector blocks are:

  • lang "lang1" "lang2" ... - matches any of the languages
  • default - always matches

Potentially Supported In The Future:

  • compiler "compiler1" "compiler2" ...
  • cpu ...
  • os ...
  • triple ...
  • any { selector1; selector2; }
  • all { selector1; selector2; }
  • not { selector; }

reference types

The "value" of a KDLScript reference type &T is its pointee for the purposes of abi-cafe. In this regard it's similar to C++ references or Rust references, where most operations automagically talk about the pointee and not the pointer. Using a reference type lets you test that something can properly be passed-by-reference, as opposed to passed-by-value.

Reference types may appear in other composite types, indicating that the caller is responsible for allocating variables for each one and then storing pointers to them in the composite type.

Currently theoretical and probably will never be implemented: When used in the outputs of a function, a reference type is sugar for an out-param that the caller is responsible for allocating and the callee is responsible for initializing. Out-params should appear after all normal inputs but before varargs.

array types

KDLScript array types like [u32; 4] have the layout/repr you would expect from languages like C and Rust.

But there's a problem with passing them by-value: C is supremely fucking weird about passing arrays by value if they're not wrapped in a struct.

This is actually sugar for pass-by-reference (and largely decays into uint32_t*):

void blah(uint32_t array[4]);

And this doesn't even compile:

uint32_t[4] blah(); // invalid syntax
uint32_t blah()[4]; // valid syntax, but still disallowed

To avoid trying to squish weird square pegs in round holes, passing an array by-value like this in KDLScript should indeed mean passing it by-value! C/C++ backends should simply refuse to lower such a KDLScript program and produce an error. Rust backends are free to lower it in the obvious way. If you want to test the C way, use this:

fn "blah" {
    inputs { _ "&[u32; 4]"; }
}

NOT THIS:

fn "blah" {
    inputs { _ "[u32; 4]"; }
}

tuples

KDLScript only has the empty tuple () currently implemented.