Cstructs

misterfish · 1 April 2021 13:29

As part of the message router design, some structures are specified as ‘Cstructs’, which are basically a way of tightly packing bytes together with the layout of a ‘struct’ in C, but in OCaml.

My concern is that it’s not the right time in the project to introduce them: they break type safety, introduce run-time errors and require a lot of mental effort to properly use. Thoughts?

And, can what I’ve written below be improved?

Specifically:

Given this specification from the doc:

typedef uint8_t Curve25519PubKey[32];
typedef uint8_t EdDSASig[64];
typedef uint8_t Blake3Hash[32];

enum MessageType {
  UNICAST = 0,
  MULTICAST = 1,
  CBOR = 2
};

struct MessageHeader {
  uint16_t size;
  uint16_t type;
};

struct UnicastMessage {
  struct MessageHeader header;
  uint32_t flags;
  uint32_t ttl;
  Curve25519Pubkey source;
  Curve25519Pubkey destination;
  Curve25519Pubkey via;
  EdDSASig signature;
};

struct MulticastMessage {
  struct MessageHeader header;
  uint32_t flags;
  uint32_t ttl;
  Blake3Hash seen[4];
  Curve25519PubKey group;
  Curve25519PubKey via;
  EdDSASig signature;
};

this is how I got it working using OCaml’s Cstructs. (Note, skipping the enum for now).

  [%%cstruct
  type unicast_message = {
    message_header_size: uint16_t;
    message_header_type: uint16_t;
    flags: uint32_t;
    ttl: uint32_t;
    source: uint8_t [@len 32];        (* Curve25519Pubkey *)
    destination: uint8_t [@len 32];   (* Curve25519Pubkey *)
    via: uint8_t [@len 32];           (* Curve25519Pubkey *)
    signature: uint8_t [@len 64];     (* EdDSASig *)
  } [@@big_endian]]
  
  [%%cstruct
  type multicast_message = {
    message_header_size: uint16_t;
    message_header_type: uint16_t;
    flags: uint32_t;
    ttl: uint32_t;
    seen: uint8_t [@len 128];         (* Blake3Hash[4] *)
    group: uint8_t [@len 32];         (* Curve25519Pubkey *)
    via: uint8_t [@len 32];           (* Curve25519Pubkey *)
    signature: uint8_t [@len 64];     (* EdDSASig *)
  } [@@big_endian]]

It seems that:

You can’t embed a struct (like message_header) inside another one (unicast_message / multicase_message), leading to repetition.
You can’t create custom types (like curve_25519_pubkey) and use them in a struct, so you have to write it out each time as an array of integers.
Multidimensional arrays like seen are even trickier (128 uint8s, in this case)

tg-x · 1 April 2021 15:48

Thanks for looking into this.

Reasons to use a cstruct header (most low-level internet protocols work this way):

provides framing in stream-oriented protocols like tcp, specifying the exact length of the message: the message size header includes the size of the header itself + the size of payload that directly follows the header
it’s simple and fast to parse in most languages (zero-copy)
if there’s a single header field that needs to be adjusted, it can be updated without re-serialization of the whole data structure

CBOR have its advantages as well, it may be easier to deal with one kind of serialization in applications, and can provide an easier way of interacting with the data, though possibly with some lost performance.
In any case it’s worth considering how a purely CBOR protocol would look like.

CBOR does not have a way to indicate the message size or add framing,
instead CBOR-based protocols can be implemented with streaming parsers, see:
https://tools.ietf.org/html/rfc7049#section-3.1

It’s important here to ensure the security of the parsing and to avoid resource exhaustion attacks, like endless indefinite-length objects or excessive fixed-sized objects.
One measure against this is to not support indefinite length arrays/objects,
another is to limit fix sizes as well.
There should be as well a max message size, limited to the size of a TCP/UDP packet (64K).
The application payload should be represented as a binary string with fixed size,
and parsed only when needed.

Any further insights / considerations regarding how to make secure/performant CBOR-based protocols are welcome. cc @emery

We could try the first implementation the CBOR way if we find a good solution for these issues, especially if it would make it considerably easier.

tg-x · 13 April 2021 07:57

@misterfish I made a CBOR version:

It’s equivalent to the C struct version, now updated in the design document as well.

This needs a streaming parser, here’s an example that uses angstrom

misterfish · 13 April 2021 13:16

Thanks.

I have a partially implemented version of a streaming parser with Angstrom and lwt – I’ll check your noise library as well.

tg-x · 13 April 2021 20:13

Never mind that example, it uses an older version of Angstrom,
they have since improved the API, and now there’s a choice of a Buffered or Unbuffered interface,
with Buffered being the easier to use.