Message Routing Design

The document is available at:

2 Likes

Shape Up meeting with @misterfish @tg-x @how on 2021-01-29T09:00:00Z.

This document seems to describe the “Message Router”. What I am missing is the context. Why is this “Message Router” needed?

The title of the document is “Software Architecture Design”? What problem does this sofware architecture design solve?

I have worked out the message routing design in more detail for both unicast and multicast messages,
and the message passing interface between services.

I’m in the process of documenting these in detail
and posted a work-in-progress update in the git repo.
It still needs further refinements and a couple more message type definitions.

the message router is necessary to route unicast and multicast messages in a message passing architecture, and is the component responsible for maintaining outgoing connections to remote nodes

i have addressed this in the design document

I have published an update to the design document with more details and protocol specifications,
and addressing questions that were raised.

1 Like

Here are some reading/editorial notes…

Abstract

We present a design for a multicast message routing infrastructure to be used together with decentralized publish-subscribe protocols in a two-tier P2P system, where core and edge networks run separate P2P protocol instances, and core nodes provide store-and-forward message delivery service to edge nodes.

This is unclear. A “protocol instance” does not seem to parse in my mind. Do you mean that they run on different protocols or that their networks are distinct? If the latter, then it could be rephrased:

We present a design for a multicast message routing infrastructure to be used together with decentralized publish-subscribe protocols in a two-tier P2P system: a core network runs permanently and core nodes provide store-and-forward message delivery to edge nodes running on the intermittent edge network.

This sentence clarifies the main difference between core and edge IMO. But since the Design Overview section mentions

both the core and edge networks run distinct P2P dissemination protocols

then I suppose it would read better as:

We present a design for a multicast message routing infrastructure to be used together with decentralized publish-subscribe protocols in a two-tier P2P system: a core network runs permanently and core nodes provide store-and-forward message delivery to edge nodes running on the intermittent edge network. Both the core and edge networks run distinct P2P dissemination protocols tailored to their connectivity.

Also, it would be useful to settle on a single way to mention publish-subscribe: I’ve seen all kinds of notations: publish/subscribe, publish-subscribe, pubsub, PubSub, pub/sub, etc. Since you’re using pub/sub later in the document maybe change publish-subscribe to publish/subscribe (pub/sub) in the Abstract for consistency.

Maybe the Abstract could be expanded a bit with a second paragraph to indicate the benefits of the multicast routing message protocol.

Introduction

DREAM uses a pub/sub service for the decentralized synchronization of mergeable, replicated databases with eventual consistency properties. Replication is done using operation-based CRDTs where operations are transmitted via pub/sub to subscribers who replicate the database locally.

Maybe: for the decentralized and uncoordinated synchronization of…

Add an introduction paragraph about DREAM to specify where this comes in.

Related Work

IP multicast was designed as

Add a reference to RFC 1112: IP multicast (RFC 1112).

since IP multicast is not routed by the internet backbone due to scalability and security limitations.

[citation needed] :slight_smile:

Application-layer publish/subscribe systems

Can you add their limitations/constraints compared to your approach?

Broker-based approaches decouple publishers and subscribers and take over responsibility for delivering messages from publishers.

Why is it undesirable? Actually it’s not clear whether this is a wanted property or not since you do not relate it to your approach at this point. I recommend having a look at the related work section of DMC specification that brings a nice approach: what → why is it related → limitations/differences. You’re missing the last part.

The next paragraph starts with the limitation (‘always-on connectivity’). I would simply real-life and actual.

Design Overview

The incentives for core nodes to provide such a service are based on a traditional service provider model

This paragraph is very informative. It begs for exploring the business incentives for ISPs to run core nodes. This should be part of work in Cycle 6-8 when we start thinking about future sustainability.

This is made possible by relying on public key addressing

Maybe cryptographic public key addressing

I’m eager to write a publication titled Post-DNS Internet subtitled For an open Internet without kings. Hmmm KING should be an interesting acronym to deploy. Kill Internet Network Goons.

Public key addressing makes it possible for multicast routers to forward only signed messages sent by authorized senders, without having to maintain group membership state for the whole group on each message router. This is important to prevent denial of service attacks trying to overload the multicast dissemination infrastructure.

Excellent feature! Is it too early to mention ERIS?

Software Architecture

This architecture offers fault isolation between components as well as establishes clear security boundaries, provided the services are run in securely sandboxed environments.

I can imagine it refers to the GNUnet paper, but is there a more precise reference that we can use to understand the scope of the problem and what is meant by “securely sandbox[ing] [an] environment”? I guess that would help bridge with SHRUTHI since you mention unikernels in the following paragraph, so I think it’s a proposed solution to that problem.

Services and Applications

Services and applications can be run in various ways: as lightweight VMs when running the system on dedicated servers, while end-user devices would run sandboxed processes instead.

It seems that in any case the resource requirements are pretty low. Can you make an estimate of how it compares to other approaches? For example, I would expect that it takes much less storage space than a blockchain, and probably less processor power as well on modern processors that implement cryptographic primitives. I bet this consideration can wait for actual implementations where we can test large deployments on the European test bed.

Authentication and Encryption

Messages are addressed using public keys and authenticated with a signature. Services authenticate each other using public keys that they either have stored in their configuration or learned via P2P protocols, without the use of a public key infrastructure and certificate authorities.

  1. I’m curious about this signature and compatibility with the ones from RDF-Signify
  2. Any reference for this feature? Still GNUnet? More?

End-to-end Security for Groups

The first reference seems like a required reading for the whole team.

This group encryption scheme relies on a causal broadcast primitive that the underlying p2p pub/sub infrastructure provides, and implements a decentralized membership management protocol.

This begs for a reference to another section…

This whole section sounds like a real improvement over existing approaches. I think it would be nice to dedicate an article to this, cramming it with references and comparing with other approaches, including ways for others to hop up the bandwagon.

Message Router

The message router service is responsible for public key-based routing of unicast and multicast messages between services.

:+1:

Unicast & Multicast Messages

For example, the pub/sub service uses multicast to deliver messages to locally subscribed applications, while the peer discovery service sends updates about discovered peers that the message router and pub/sub service subscribers to.

Still using Multicast? Maybe sends multicasts updates

multicast messages are signed by the private key that belongs to the group’s public key address.

This is confusing. Is it the private key part of the group’s key pair whose address is the public key? (Not sure if this makes more sense…)

Decentralized publish-subscribe is implemented by the pub/sub service on each node. It is responsible for the dissemination of multicast messages published in a topic according to a P2P dissemination protocol.

This is the first time the term topic is mentioned, without prior context or definition.

In order to implement P2P dissemination, the pub/sub service forwards each message received in a topic to a number of directly connected remote subscribers according to the P2P protocol used

This whole paragraph is a bit confusing since there are different protocols involved. I suppose it depends on whether the considered node is part of the core or edge network, then the dissemination protocol works differently, however leaving this ambiguity to the external reference makes me wonder whether another approach to explain this would be beneficial to the reader.

UPSYCLE: Ubiquitous Publish-Subscribe Infrastructure for Collaboration on Edge Networks shows the complexity of the dissemination. We probably need to figure out an in-depth look into this in order to come up with a simpler explanation. Maybe another article along the way. I can imagine this part will be a bump on the road for developers: why so many cases? Can’t we just… The topic needs a bit more expansion.

it may also provide service to remote nodes based on a mutual agreement.

Second time there’s a mention of “a mutual agreement”. Is it the same agreement as the previous mention? Is it a legal agreement like a contract or a technical one like a handshake? Is it something to be configured or automated? What does it entail in terms of implementation?

Connection Establisment

We use TLS 1.3 with mutual public key authentication.

Is this part of the ‘mutual agreement’?
Does this refer to https://tools.ietf.org/html/rfc8446#ref-CHHSV17 or more generally to RFC8446 client-server mutual authentication? A reference or precision would be helpful.

If the connection is closed, the message router queues messages for a limited amount of time, and delivers them when the connection is back up.

What happens if the connection does not come back up in time? Are messages discarded?
What makes a good “limited amount of time”? This may be useful for implementation.

Message Header

Great to have this section!

typedef uint8_t Blake3Hash[32];

Any reason to choose Blake3Hash over Blake2b that’s used in ERIS? Should ERIS upgrade or the Message Router downgrade?

enum MessageType {
 UNICAST = 0,
 MULTICAST = 1,
 CBOR = 2
};

I’m a bit surprised here. I dig UNICAST vs. MULTICAST, as they’re alternative delivery mechanisms, but it seems that CBOR is a different beast entirely. Can someone explain? Let me try intuitively. A CBOR-serialized message would simply be a serialized UNICAST or MULTICAST message. :thinking:

Next section mentions

For application messages in the payload, we define a third message type, CBOR.

Now I’m lost.

Message Payload

The message payload either contains another encapsulated unicast or multicast message, or an application-level message that services and applications send to each other.

Can an encapsulated uni/multi-cast message be itself encapsulated? How far down the turtle stack?

image


OK, I think I’m a bit burned for today. I’ll look into the rest of the document later. It’s a lot to eat at once. I wish we could have it in chunks instead of all at once. But it’s quite impressive @tg-x. I’m eager to see this document with examples as they come. @misterfish, @arie, your turn. :slight_smile:

Thanks for your comments.

This is explained here, but also without citations, if you know a citeable source let me know.

perhaps referencing some relevant section of the rfc should be sufficient.

One project i know of in this space is librecast (also ngi0 funded)
that is trying to address some multicast deployment issues but they dont have much docs either.

That would be indeed interesting/relevant.

Mutual agreement in the sense that the service provider agrees to provide service to a specific client, as opposed to open service for all.
In the technical sense it’s just the matter of authorizing the client’s public key by the provider.
In the non-technical sense any kind of agreement, be that a commercial service provider, community service provider, friends sharing their servers, etc.

yes, good point, limitations are mainly the lack of public key addressing and thus limited routing possibilities that require more trust between nodes

it’s not undesireable, i’ll clarify

  1. i’m not familiar with rdf-signify, any pointers?
  2. not necessarily gnunet but a generic property of public-key addressing

How would you mention ERIS in this context?

This is related to the referenced DSGM paper.
Or what section do you mean? inside this document?

Yes, I’m going to start implementing this in NGI0,
and later we could evaluate and research it further

that is to differentiate the naming on different layers, perhaps it warrants a glossary somewhere…
pub/sub topics correspond to multicast groups, but these are different layers

I mean using client authentication as well in addition to server-only authentication:
https://tools.ietf.org/html/rfc8446#section-4.6.2

This is to ensure that both parties learn each other’s long-term public key, and thus peer identity,
during the connection establishment.

That’s on a different level, but yes, any kind of authorization for using services is then based on the public key of the requestor, which is established using TLS client authentication.

This is to say: in the routing header only UNICAST and MULTICAST is valid,
while the payload could also contain an application payload using CBOR encoding, in addition to encapsulated unicast & multicast messages.
will clarify.

In the initial design a single level of encapsulation is considered, but this approach allows the possibility for later extension.
In the future we may want to explore multi-hop source routing with onion encryption, as some literature for p2p search & discovery suggest.

Blake3 is quite recent, and seems much faster. We should use this in the long run for both I think. We would need to add blake3 support to digestif (a binding to the portable C implementation), which should be easy, but if we’re out of time we can use blake2b in the initial prototype.

I’ll address these and the rest in an update to the design doc.

our turn to provide feedback you mean?

This document is changing a lot. It would help me to know which parts are stable before trying to give feedback.

that was a major update that you can give feedback on already
(and you already did about the cbor parts which was useful)
next i’m going to work on clarifications an refinements

Some initial notes and comments.

A decentralized, asynchronous publish/subscribe system is a crucial component of any decentralized communication infrastructure.

[citation needed]

Why? And why not some other primitive such as message passing or UNIX sockets?

Software architecture
both local and remote, communicate with each other via message passing.
such a model with message passing between services has been used by the GNUnet P2P framework

I think GNUnet is a counter-example to the proposed architecture. IMHO GNUnet does not work and I think a lot of it has to do with the architecture (message passing between individual services).

I’m sold on the encapsulation, isolation of components and the actor model. But what you describe (and what I think GNUnet is) sounds more like “microservice-hell”. But I am very happy to be proven wron on this.

Unikernels are specialized, single-address-space machine images constructed by library operating systems that can be run as lightweight virtual machines or as sandboxed processes

How many services (each running in a Unikernel) are you thinking of running on a single physical machine? 10 or 10’000? Is running a Unikernel really so lightweight?

Unicast & multicast messages

Unicast messages are sent between local services and between services of directly connected remote nodes. The message router is responsible for forwarding incoming unicast messages to their destination according to its routing table.

Nice. Ok, that helps me understand the difference between message passing and pub/sub. Unicast messages correspond to message passing and multicast corresponds to pub/sub. Is that about right?

1 Like

That’s referring to the p2p network context not the local system.
I was referring to the many uses of pub/sub in general, and that placed in a decentralized context.
I’ll clarify this in the text.

can you elaborate?
what makes you think message passing make it not work?
the actor model relies on message passing as well.

erlang/otp is another example of a system relying on message passing extensively.

as for gnunet, issues I see are e.g. the low level cstruct interface for encoding messages that makes things more complicated than necessary especially from languages that are not C,
but this is not the fault of message passing per se,
another issue there is reinventing many things in a non-standardized way,
e.g. the encrypted transport between two nodes, or a systemd-like service supervisor, etc
(to be fair gnunet had these before the standardized or more widely used alternatives existed)
and non-technical issues include the academic environment and development model it originated from, with components developed via short-term student work resulting in many abandoned parts

at what point do you think it becomes microservice-hell?
it does not necessarily have to be “micro” services, the granularity i’m not trying to define here,
we can figure out what makes sense to package together in one unikernel.
in any case we do have a number of self-contained high-level services
(e.g. peer sampling, pub/sub, kv store, dmc)

depends on the context if it’s providing service to only a handful of users
(e.g. a small vm or a low-power device)
or intends to serve as many users as possible (server in a datacentre)
i don’t have first-hand numbers on scalability,
but hannes gave a talk (slides attached there) about mirage and resource usage recently with some examples

@dvn any thoughts / opinions on this?

yes

Since you can also run “unikernels” as POSIX binaries, then I assume 10,000 is not unreasonable. Scalability questions when running as VMs is likely very dependent on the Hypervisor backend.

Does this need to be answered now though? I personally don’t think so. I think the more important question to answer is "What benefits do Unikernels provide which make them worth the additional effort."

Questions from DREAM Cycle 2 Meeting 2/3 2021-03-03.

These I addressed in the design doc, here’s a summary based on that.

If there are other questions elsewhere that I should answer, please link them here.

1-to-many pub-sub services

The issue with MQTT is that it is simply a one-to-many dissemination protocol,
while we also need routing capabilities to deliver messages coming from different group members.
Public key addressing and routing of signed messages makes it possible to route messages to other group members in the network, both locally and via the p2p pub/sub protocols.

lan peer-discovery

The reason is the request-response model these protocols operate by,
which is designed with a handful of services on the network in mind that need to be discovered instantaneously.
what we need in a p2p model instead, is the ability to discover peers over time but not necessarily all at once, thus periodic announcements spread over time will be more resource efficient in larger networks especially, just think of thousands of nodes having to reply to a query all at once if it were using DNS/SD over mDNS or SSDP over uHTTP (HTTP over UDP).
In addition by reusing the same CBOR-based message passing interface we use in other parts of the system, we also reduce complexity and potential vulnerabilities, and this also makes it easier to implement since we’re also time-constrained.

The aspect that will need further research later is to investigate potential blocking or filtering of multicast traffic on wlan networks where multicast is problematic and different equipment vendors have different ways of dealing with it

gossip-based clustering

The separation I clarified more precisely in this document,
with the message router being the task for DREAM and the pub/sub service for NGI0,
and basic pub/sub functionality already usable on a LAN by using multicast messages routed by the message router.

Yes I think that’s more important to focus on.
What would you add to the already mentioned security & isolation benefits?

I’m not questioning if message passing. I’m questioning efficiency of message passing between components.

In the mentioned examples, components are very different things:

  • Erlang/OTP: Components are light-weight co-routines (in a single Unix process and much lighter than threads). The Erlang VM is optimized to start massive amounts of them and make message passing between them efficient.
  • GNUnet: Processes. Starting up processes is less efficient and message passing is not super fast.
  • What you are proposing: Unikernels. I guess starting up is at best as efficient as processes and at worst much worse. Ditto message passing.

There of course the Microkernels where processes do communicate via message passing. But afaiu making message passing between processes efficient to the point that the Microkernel archtecture is usable is the major technical challenge of Microkernels. It’s not easy.

When the number of processes is dynamic and the system is not implemented in Erlang/OTP (or similar implementations of the Actor models with co-routines). I admit that this is a very biased opinion.

This has focusing on performance which is need not be a primary objective at this moment. But I think the same arguments can be made for complexity and ergonomics.

If you want to do something like this in OCaml why not use Functional Reactive Programming? There are libraries for that (React / Erratique).

Agree.

Maybe a more critical inquiry would be to not only look for more supporting arguments but also consider and compare alternatives.

1 Like

I think these are both relevant questions in our context. P.S.: is a small structure and intends to remain so ; we won’t be able to maintain a software that requires lots of people to understand all the parts for it to be functional. Since we’re not in an “academic environment and development model”, it appears that we must understand and agree on what model we can follow – both in terms of desire and capacity – not to erect a wall before us as we go.

As I tend to follow Einstein’s approach to « do the simplest thing that can possibly work, but not simpler », and that the Actor Model seems to be quite standardized these days, especially in the context of openEngiadina and IN COMMON, I need to understand 1) where it’s relevant to keep using or interoperability with this model, and 2) if it’s not compatible, demonstrate why another approach is required. Especially when we think about micro-services, and I can here see a relation to unikernels, it seems that “hell” gets loose when additional complexity is met with another layer of indirection through new micro-services – which ends up with some parts being less well maintained than others. I also capture the idea of “reusing the same CBOR-based message passing interface”: is it this interface only useful internally or can it be use to create interoperability with other existing systems that do not use public key addressing and routing – or in other words, can this PK addressing and routing be put on top of foreign systems to make them compatible in a way to UPSYCLE (not all the work should be coming from our side, maybe there are ways for others to bridge to us, as previously suggested with regard to XMPP or ActivityPub. I think it’s worth repeating here that the point being that by the end of the project people can have something to play with, and not just a bunch of unrelated components. The trade-off is always between what we can do within a year, and what needs to be done so the full system works.)

Back to the CBOR-based message passing interface: is it something that can be extracted from UPSYCLE in a way that existing system can adopt it for their own use? I try to imagine a “blind service” that just passes packets around until some component can deal with it.

Talking about blindness: can we consider core nodes as blind in the sense that their only concern is to serve packets to connections they know about (exiting to edge nodes), and pass on the rest they do not know about further on to other code nodes? May this notion of blindness help describe the P2P network?

I think that this discussion is going somewhere. I’m not sure yet there’s an agreement on which way to proceed, but at least having these first answers in context start making sense to base more explanation work on.

@tg-x apart from the questions from the previous meetings, there’s a higher level set of questions laid out in https://dream.public.cat/t/design-patterns/211. This is a lot of them, so I encourage everyone to take them a handful at a time not to saturate anyone’s already stressed up mindspace. What I find interesting about these questions is that they bridge technique and legal, social and individual aspects, and provide a nice framework to come up with comparative narratives.

I’m sorry if my comments come as off-topic in the technical context. I’m trying my best to figure out all the routes available before us.

surely in-process message passing is certainly the fastest that erlang/OTP does.
(as a side-note, there’s an early implementation of OCaml on BEAM in progress
that may be interesting to look at once it’s more stable)

as for services running as unikernels or processes, when the granularity of services are very small and too many tiny components have to communicate via message passing (e.g. CBOR) that indeed includes quite a bit of communication overhead (de/serialization and transport)
but it’s also possible to design these services in a way that packages together components that need to talk to each other frequently to minimize overhead, and static resource allocation can be used in the form of pre-defined unikernel instances, I’d go for this approach initially in any case.
I’ve indeed seen some extremely dynamic proposals where unikernels would be spun up per-request, but i don’t think we need to follow this approach.

the unikernel approach i find interesting enough to explore further
but the software architecture does not actually depend on this merely makes it possible,
and I like the library OS approach that Mirage follows too where components are developed as reusable libraries, which I find valuable and makes it possible to compose the system in different ways depending on the requirements of the deployment

in any case I do not think this document needs to go in depth about unikernels,
i just wanted to mention the possibility of running these services as unikernels.

actually there’s two different approaches to interoperability that we can consider:

  1. via the pub/sub interface we provide, an application or protocol has access to a p2p pub/sub infrastructure
  2. the message router itself could use other existing public-key addressed transports (instead of TLS over IP), provided by other p2p systems (cdjns, yggdrasil, etc) and thus this would make it useful to provide pub/sub functionality for those systems

That’s exactly what the message router does, it’s a generic component to pass around unicast and multicast messages. The pub/sub service (upsycle) uses this, and other components could use this later (e.g. p2p search & discovery protocols)

the purpose of gnunet was to do p2p research and provide an experimentation framework for researchers, especially those working closely together around its primary author, since it was not very accessible to outsiders for a long time due to the lack of documentation and lack of community efforts.

in our case of a small structure intending to develop useful/usable software,
if there’s emphasis on producing documentation and making it accessible and understandable to developers and users, that would be a good start to ensure sustainability of the project.

1 Like