nuxt

Why Zig is Frustrating To Learn

Joseph Montañez
#howto #zig

Table of Contents

Why Zig is Frustrating To Learn

This was written for Zig 0.13, cannot promise it will continue to work for future versions

Zig is very promising, the tooling is far better than languages like Swift outside the Apple ecosystem. Zig is also low level, so basic constructs like a "String" doesn't exist. The most obvious issue aside, Zig changing all the time. Making the content written just a year ago too old to apply to the current release. That in itself is fustrating but its not my fustrating part. My frustrating part is it doesn't feel intuitive. Zig forces a lot upfront to learn, so its not a language you can piece meal, learn over time. This gives the implications that is hard to learn, which coming from other languages is true, Zig is harder to learn than other languages, because of the upfront costs and differences from other languages.

Now before I jump into the topic at hand I need to bring this up. The "Documentation", this is a language reference. Its not there to hold your hand and string together a coherent guide. Nor is it there to explain all the technical aspects of those features.

Strings

OK, enough talk, lets start with the first thing You'll do in Zig, create a "string".

const foo = "Hello";

Such a simple, harmless visual of assigning a string literal to foo. First everything in Zig has a type, but the very word "type" is a problem to even say this. So before we even understand what foo really is, we have to know the difference between a Type Definition and a Primitive Type. You cannot just say type in Zig, it doesn't make sense. So, what is the type definition of foo?

const foo: *const [5:0]u8 = "Hello";

*const [5:0]u8 is visually noisy for what is a "string literal", but remember Zig has no construct of a string. The best way I can explain how Hello resolves to this type definition, is to look at the ", the double quotes. This is syntactic sugar to the resulting type definition. "Hello" is literally &[5:0]u8{ 'H', 'e', 'l', 'l', 'o' }

OK, lets break down the type definition.

  • *const - A constant pointer, this points to the letter 'H'

const foo: *const [5:0]u8 = "Hello";
           ^-----------------|

If you don't know what the point of a pointer is, its to tells your code where the "string" starts. On top of this the pointer identifier, *, this is like an interface or trait. It gives foo a pointer trait/interface. So you have access to foo.* to dereference, or in simpler terms go directly to the first byte 'H'. The const part, just like the variable means you cannot have * point to anywhere else. However it also means you cannot change any value. To me, this feels broken, let me show you why... Lets change 'H' to 'W'!

const foo: *const [5:0]u8 = &[5:0]u8{ 'H', 'e', 'l', 'l', 'o' };
foo.*[0] = 'W';

This is technically valid Zig. We are not changing foo, nor are we changing the * pointer. But the compiler will tell you:

error: cannot assign to constant
    foo.*[0] = 'W';
    ~~~~~~^~~

So wait, the const foo must obviously be the reason, right? So lets change const foo to var foo:

var foo: *const [5:0]u8 = &[5:0]u8{ 'H', 'e', 'l', 'l', 'o' };
foo.*[0] = 'W';

error: local variable is never mutated
    var foo: *const [5:0]u8 = &[5:0]u8{ 'H', 'e', 'l', 'l', 'o' };
        ^~~
main.zig:11:9: note: consider using 'const'

So wait, we are not mutating foo? Correct if you were to mutate foo, you completely need to reassign it i.e:

var foo: *const [5:0]u8 = &[5:0]u8{ 'H', 'e', 'l', 'l', 'o' };
foo = "Wello"; // <-- NOW we are mutating foo.

So then the const applies to not just the pointer, but also the other part of the type definition? My answer... I think, because this next thought experiment literally breaks. Let me introduce you to @constCast, it removes const from a pointer.

const foo: *[5:0]u8 = @constCast(&[5:0]u8{ 'H', 'e', 'l', 'l', 'o' });
foo.*[0] = 'W';

OK so here foo is still const, but we've removed const from the pointer so now we clearly must be able to modify 'H' into 'W' right? Yeah... but...:

Segmentation fault at address 0xdc26a3
C:\projects\zig-windows-x86_64-0.13.0\lib\std\start.zig:363:53: 0xd424fc in WinStartup (main.exe.obj)
    std.os.windows.ntdll.RtlExitUserProcess(callMain());
                                                    ^
???:?:?: 0x7ff8aa62257c in ??? (KERNEL32.DLL)
???:?:?: 0x7ff8abb2af07 in ??? (ntdll.dll)

This seemingly proves that the *const does also apply to the rest of the type definition. Now doing this was obviously stupid, and probably waste of time, as this is not how you'd ever interact with a "string" in Zig, which doesn't have strings. So the bigger question is, if we are not working with a string and what are we working with!?

  • [5:0] - A Sentinel-Terminated array...

So let me back up a little. There is this software term C-Strings you might see this in other language when you need to downcast say a Swift string to a C based function. A C-String is normally also assorted with a null-terminated string. So when you see "Hello" its not just 5 bytes long, its actually 6 bytes as the last byte is null or 0. This has a problem in many other programming language as you might see functions marked as binary-safe, multi-byte or "UTF8" safe functions as those "strings" have null bytes in between data/words. If a binary safe function is expecting a C-String then you normally need to provide the length of the string or else it will have no choice but to treat this as a "null terminated" string. So the 0 in 5:0 literally means null byte terminated. Now, we have to talk about the [ array ] part.

Just like * gives foo a pointer trait/interface, [``] gives foo an array Indexable trait/interface. In other programming languages we see this as an ArrayAccess interface. To be clear this only gives access to a single array feature in Zig, to access an array's value by its index. So why can't I use foo[0] = 'W';, well... you can. Its why I talk about these as traits or interfaces. You can do both foo[0] and foo.*[0].

I also brought up that the [] annotation only provides the indexable array feature. Arrays generally have other features like iterations, or iterables. To understand that only means indexable and not iterable, I will show you this:

// Not valid Zig but just demostration
const mall = [] - Slice
const stall = [*] - Pointer Array
const wall = [5] - Array
const stall = [_] - Array

Each one of those examples you can request an index from i.e stall[0]. These do not mean you can also all iterate over. For example [*] points to an array and you might naturally think how is that different than *? You can access indices/indexes if you mark it as [*] instead. Now question how do you get a pointer array from foo? Lets try....

const foo: *const [5:0]u8 = "Hello";

const foo_p_arr = &foo[0]; // Nope, it creates *u8
const foo_p_arr = &foo.*[0]; // Nope, it also creates *u8
const foo_p_arr: [*]u8 = @ptrCast(&foo.*[0]); // Nope, cannot cast a const pointer.
const foo_p_arr: [*]u8 = @constCast(&foo); // Nope still creates *u8

// And The Winning Solution...
const foo_p_arr: [*]u8 = @constCast(&foo.*);

std.debug.print("Single: {c} Word: {s} \n", .{ foo_p_arr[0], foo_p_arr });
// Prints: "Single: H Word: Hello"

Ok why go into this point at all? Why talk about something I'll not need right now? Well... its absolutely important because it drives home [] means indexable AND its also a fundamental part of slices. Which we need to use to work with "strings".

const foo: *const [5:0]u8 = "Hello";
const foo_slice: [:0]u8 = foo[0..5];
const foo_p_arr: [*:0]u8 = foo_slice.ptr;

std.debug.print("Single: {c} Word: {s} \n", .{ foo_p_arr[0], foo_p_arr });

foo_slice.ptr is the pointer array that the slice of foo uses. OK, but WHY are you showing all this non-sense!? OK.. step back lets try the same thing but without :0 sentinel value stuff.

const foo: *const [5]u8 = "Hello";
const foo_slice: []u8 = foo[0..5];
const foo_p_arr: [*]u8 = foo_slice.ptr;
std.debug.print("Single: {c} Word: {s} \n", .{ foo_p_arr[0], foo_p_arr });

YES! It didn't have to be that hard now did it!?

error: invalid type given to std.mem.span: [*]u8
                .Many => if (ptr_info.sentinel == null) @compileError("invalid type given to std.mem.span: " ++ @typeName(T)),
                                                        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
note: called from here
pub fn span(ptr: anytype) Span(@TypeOf(ptr)) {
                          ~~~~^~~~~~~~~~~~~~

And this is why you also have to understand Sentinel-Terminated Arrays/Slices. Zig is frustrating to learn, the upfront costs to doing the most basic things. You'll start to try interact with these types and the compiler errors won't help you understand the problem. Even if you read the docu.. ermm Language Reference, these concepts are disjointed and you may not even know what you need to lookup since some of these compiler errors are not obvious to the specific topic. But I digress, its not the point of the language reference to help you here. Lets review where this takes us for a working solution.

const std = @import("std");

pub fn main() !void {
  // var foo = [5:0]u8{ 'H', 'e', 'l', 'l', 'o' }`;
  var foo = "Hello".*;
  foo[0] = 'W';
  
  
  std.debug.print("{s}", .{foo}); // Prints "Wello"
}

Now if you have been keep up, you would look at that and say, that doesn't work. We tried foo.*[0] = 'W'; already and that crashes the program. Well the trick is the assignment. By assigning foo.* or better yet "Hello".* to create a mutable copy. We never really have direct access to the original string literal of "Hello". This is still technically less efficient than [5:0]u8{ 'H', 'e', 'l', 'l', 'o' }, as there is no copy made.

Try/Catch

Lets get user input! Finally, a real program.

pub fn main() !void {
    const output = std.io.getStdOut().writer();
    const input = std.io.getStdIn().reader();

    //-- Lets ask for their name
    var name = [_]u8{0} ** 30;
    try output.print("Please enter your name (Max 30 characters)\n", .{});

    var name_slice: []u8 = "";
    try input.readUntilDelimiter(&name, '\n');

    var msg_str = [_]u8{0} ** 100;
    const msg_slice = try std.fmt.bufPrint(&msg_str, "Hello {s}", .{name_slice});
    try output.print("{s}\n", .{msg_slice});
}

This is showing many other concepts here. First what is the exclamation point !void? What is [_]u8{0} ** 30 doing? Why is there a try without a catch? Lets first touch on variable initialization.

A variable must always be initialized in Zig, with undefined and null allowing you to bypass this. However I would argue, specially with undefined, never use it unless you know you need to. This is why I have var name = [_]u8{0} ** 30; instead of var name [30]u8 = undefined; They are not the same thing but still do the same job. For example if you print the name when its undefined you'll get weird character repeated displayed.

var msg_str: [30]u8 = undefined;
msg_str[10] = 'h';
std.debug.print("{s}\n", .{msg_str});
// Prints: ¬¬¬¬¬¬¬¬¬¬h¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬¬ ...

The [_]u8{0} ** 30 part simply takes 0, an empty character and repeating it 30 times to fill up the entire array with a blank string. The [_] part is a comptime feature that I will not get into but its the same as [30] as it takes the number "30", lastly ** is how you multiple an array. Just like ++ is how you add an array. Since a "string literal" is just syntactic sugar, then "Hello" ++ " " ++ "World" is how you add a string because it is an array, except we are not there yet.

Now, the !void is... in most other language it means this function returns nothing void, but also will throw. However you do not throw in Zig, you simply just return the value. So technically the ! is a union, an error union, and at comptime (just ignore this part so far) its associates error. Meaning all possible errors returned from the other functions that use try, catch, or errors that you manually return. For example error{StreamTooLong}!void. Doing this manually is a nightmare to track what functions return what errors, so its easier to just use !void. Another part to this is in Zig, you can literally return a switch of type definitions.

Lastly try and catch? This is part of Zig I do not like, its not try and catch like in any other programming language I have used. First try will stop the execution of your function and immediately returns if an error is hit. But it depends on the catch. There is catch {}, catch switch, catch <type>.

If you want to control this you do not use try or catch {}, you can use if instead or catch <type>. On top of that try and catch more like redirection in F#, Ocaml, etc. The better visual of this is:

// Not Zig but a demonstration of what makes sense to me
const name_slice: [u8] = try <| input.readUntilDelimiter(&name, '\n');
const name_slice: [u8] = input.readUntilDelimiter(&name, '\n') |> catch{};
// <| is called the pipe operator in F#

Here you are taking the output and redirecting it to try, if try does not receive an error from readUntilDelimiter it forwards the result. If an error is returned, it immediately returns with the error stopping the rest of the code from executing in the function. This is why you cannot use try AND catch together, because there would be nothing to catch if you used try. Now the reverse is also the same for catch:

// try input.readUntilDelimiter is the same catch{return}
const name_slice: []u8 = input.readUntilDelimiter(&name, '\n') catch {
    return;
};

In this example we throw away the error, and are still forced to return. What you can return is the error or nothing void hence the !void on the outer function fn main() !void, but you have to return regardless and cannot process if an error is given to catch {}. In fact, you can just have catch {} and its the same as try. try is syntactic sugar.

Great, but I didn't need to know of this to just capture user input, you're making this harder than it needs to be. Ah, well that is where this plays a critical role. What if the user enters more that 30 characters as their name The program crashes or halts. We really don't want this...

Now that we have all this context, lets fix this. If we cannot try or catch, because we want to change the behavior of if the user creates an error, we can tell them. Lets review what we need to change:

var name = [_]u8{0} ** 30;
try output.print("Please enter your name (Max 30 characters)\n", .{});
var name_slice: []u8 = try input.readUntilDelimiter(&name, '\n');
var name = [_]u8{0} ** 30;
try output.print("Please enter your name (Max 30 characters)\n", .{});
var name_slice: []u8 = try input.readUntilDelimiter(&name, '\n');

if (input.readUntilDelimiter(&name, '\n')) |slice| {
    name_slice = slice;
} else |_| {
    try output.print("Your name is too long!\n", .{});
    return;
}

Here we have more control now. Its like a catch {}/catch switch but we can decide to continue or to exit the function, we are not forced to only exit. OK but minor issue, why the fuck are we doing |slice| { and else |_| {. Ah! Good observation, coming from other languages this would be like unwrapping an optional, or nullable variable. Think bigger and yes, you do this with optional values in Zig, but remember ! is a union, Well what to do think ?c_int does? ? is syntactic sugar, while it is not the same you can think of it as error{Null}!c_int And so you are not unwrapping an optional, you're unwrapping a union. Let's think about this a bit then... If our main function can return !void, we can... do the same with our variables? Yes-ish, but lets try!:

const name_slice: ![]u8 = input.readUntilDelimiter(&name, '\n');

error: expected type expression, found '!'
    const name_slice3: ![]u8 = input.readUntilDelimiter(&name, '\n');

So that logic didn't pan out, it feels inconsistent. However, this idea is still valid, we just cannot use the implicit comptime version. Instead, the code needs to be more explicit, so the following does work:

const slice: anyerror![]u8 = input.readUntilDelimiter(&name, '\n');
// Or
const slice: error{
    InputOutput, AccessDenied, BrokenPipe,  SystemResources, 
    OperationAborted, WouldBlock,  ConnectionResetByPeer, 
    Unexpected, IsDir, ConnectionTimedOut, NotOpenForReading, SocketNotConnected, EndOfStream, StreamTooLong 
}![]u8 = input.readUntilDelimiter(&name, '\n');

Now a problem with doing this is that it forces Error Unions to be handled manually and slice.anyerror / slice.err are exposed to you but they are not directly accessible. So We didn't have to use catch or if... but we still can only use catch, if (while too) to filter / access the union values. Meaning we cannot explicitly unwrap. So let's explore how we'd work with these values:

// Here we explicty ignore anyerror and only capture a slice with a fallback
// empty slice if there was an error
const slice_err_u: anyerror![]u8 = input.readUntilDelimiter(&name, '\n');
const slice: []u8 = slice_err_u catch "";

OK back to why understand this? You're giving me more information than I need! Well the code speaks for itself:

var name = [_]u8{0} ** 30;
try output.print("Please enter your name (Max 30 characters)\n", .{});

var name_slice: []u8 = "";
if (input.readUntilDelimiter(&name, '\n')) |slice| {
    name_slice = slice;
} else |err: NoEofError || error{StreamTooLong}| {
    switch (err) {
        error.StreamTooLong => {
            try output.print("Sorry the name you entered is too long!\n", .{});
        },
        else => {
            try output.print("Did you pressed control+c on us?\n", .{});
            try output.print(":( Goodbye!\n", .{});
            return;
        },
    }
}

Here you see we unwrap the union, and then use a switch statement to further filter out the other possible errors. But this feels verbose, and we can take what we learned to distill it down to just this:

const std = @import("std");

pub fn main() !void {
    const output = std.io.getStdOut().writer();
    const input = std.io.getStdIn().reader();

    //-- Lets ask for their name
    var name = [_]u8{0} ** 30;
    try output.print("Please enter your name (Max 30 characters)\n", .{});

    // Filter errors to only just StreamTooLong && []u8
    const result: error{StreamTooLong}![]u8 = input.readUntilDelimiter(&name, '\n') catch "";
    const name_slice = result catch {
        try output.print("Sorry the name you entered is too long!\n", .{});
        return;
    };

    // Generate the message after the if/else block
    var msg_str = [_]u8{0} ** 100;
    const msg_slice = try std.fmt.bufPrint(&msg_str, "Hello {s}", .{name_slice});
    try output.print("{s}\n", .{msg_slice});
}

Thats it... Did I need to know all of this?

  • String Literals
  • Const/Var
  • Variable Initalization
  • Pointers
  • Arrays
  • Indexable
  • Sentinel-Terminated Arrays & Slices
  • Try/Catch/If
  • Error Unions

Overall Zig is interesting as a language but coming from other languages its hard to accept when it has deeper concepts that need to be understood or else you just start guessing on what the problem is.