D1.1 – DROMEDAR Data Model Specification

pukkamustard · 6 January 2021 10:44

tl;dr: 2021-03-31 – DREAM releases D1.1, the Dromedar Data Model Specification.

Introduction

As part of DREAM we are researching and developing data models that enable peer-to-peer group collaboration.

Research Problem

The premise is that currently common ways of digitally capturing information are not well-suited for decentralized systems. Problems include:

Information is only stored in plain text / natural language. Relevant information can not be extracted and used efficiently.
Information is stored in application-specific serializations that is not usable beyond original scope (e.g. an application-specific JSON schema).
It is hard to reference content : either pieces of content do not have globally identifiable names or they are tied to centralized naming schemes (e.g. DNS).
Conflict-resolution is difficult: dynamically changing content (mutable content) either requires a centralized service or considerable amount of out-of-band coordination to prevent conflicts.

Research Result

The main outcome of the research being conducted within DREAM is a specification for mutable containers that address all of these issues: Distributed Mutable Containers (DMC).

Distributed Mutable Containers are distributed data structures that can hold references to content while allowing replicas of the data structures to diverge and merge without conflict by using Commutative Replicated Data Types (CRDTs). DMC enables consistent referencing of mutable content or mutable collections of content, allowing a wide-range of applications such as decentralized geographical information systems.

Distributed Mutable Containers enable decentralized applications. Containers are distributed and can be replicated over many different transport mechanisms. Container replicas can be mutated locally and their state may diverge. DMC does not impose any restrictions on how local replicas may be mutated and does not requires any out-of-band coordination. Replicas can always be merged back to a consistent state.

DMC solves all four research problems:

DMC containers can be searched without prior extraction
RDF is used so that any implementation can be created on top of it
ERIS encoding provides content-addressed globally unique identifiers regardless of DNS
CRDTs provide automated conflict resolution for mutable content

DMC uses four magic ingredients:

Resource Description Framework

The Resource Description Framework (RDF): A simple, expressive and widely used graph-based data model that enables interlinking of diverse sets of data.

DMC uses RDF to describe container definitions, operations and container state. DMC is especially well-suited for applications that also use RDF, but not limited to such.

Content-addressing with ERIS and RDF-Signify

Content-addressing: By using a cryptographic hash of the content itself as identifier of the content we can enable robust referencing of arbitrary content. Our contributions include a method for making RDF content-addressable as well as a robust and censorship-resistant scheme for content-addressing.

By using the Encoding for Robust Immutable Storage (ERIS), most of the data that constitutes a DMC container can be transported over insecure mediums without loosing confidentiality or censorship resistance. Only a small amount of data needs to be transmitted securely for a peer to be able to reconstruct the state of a container.

DMC also allows control over containers to be delegated by attaching additional authorized keys. This allows collective control over a container.

RDF Signify s a simple RDF vocabulary (a single class and three predicates) that describes how the Ed25519 [RFC8032] algorithm can be used for signing and verifying content.

RDF Signify can not sign messages directly, but can be used to sign identifiers of content-addressed content. In particular it can sign identifiers of content-addressed RDF that is encoded with ERIS.

We believe that RDF Signify is a significantly simpler approach than what is proposed in the context of Linked Data Proofs (previously Linked Data Signatures), allowing much easier implementation and wider adoption.

Commutative Replicated Data Types (CRDTs)

Commutative Replicated Data Types (CRDTs): Distributed data-structures that ensure conflict-free merging of replicas.

This conflict-free merge-ability is possible by using Commutative Replicated Data Types (CRDTs). A down-side is that DMC only provides two container types (set and register) with very specific semantics. However, we argue that these two basic container types are sufficient for building interesting applications, especially when using a graph-based data model.

Using CRDTs also eliminates the necessity of specifying any dependencies between mutating operations or revisions of container states. This enables DMC to forget operations that do no longer contribute to the current state, thereby improving efficiency and allowing removed content to be irrevocably deleted (the right to be forgotten).

Datalog

Datalog: A query language based on Logic Programming that allows rich interaction and usage of data.

Datalog is a declarative logic programming language. It has a very simple syntax and semantics that makes it useful as a query language for interacting with large amounts of data.

We will use Datalog as a specification language for describing the semantics of container state. This allows a very concise description that can be directly used in existing, efficient implementations of Datalog. Furthermore, certain aspects of Datalog semantics (monotonicity) make it perfectly adapted for reasoning about distributed systems.

Datalog can also be used as an application level query language (e.g. for answering queries like “all posts about cats posted by people that I have sent a message to in the last year”). The query can be combined with container state resolution queries to a single query. This enables a DMC implementation that uses Datalog to not only provide primitive containers but a rich interface for efficiently interacting with data.

Previous Work

DREAM research on Distributed Mutable Containers builds on prior work done within the openEngiadina project, especially related to ERIS and Content-Addressed RDF.

This research builds on the shoulders of giants: the related work section of the DMC specification compares inspirational approaches to DREAM’s. The follow-up prerequisites section describes the underlying techniques.

Outlook

As next we intend to develop an OCaml implementation of the DMC specification. This will allow us to explore applications and gain further insight. This work will be available as part of the Alpha software release, due at the end of June 2021. You may follow updates in the related software repositories.

We welcome feedback and criticism! Our forum is open for friendly cooperation among DREAM Catchers. Do not hesitate to contact us and contribute to the code: our research is made to improve the digital commons.