D1.1 – DROMEDAR Data Model Specification

pukkamustard · 6 January 2021 10:44

tl;dr: 2021-03-31 – DREAM releases D1.1, the Dromedar Data Model Specification.

Introduction

As part of DREAM we are researching and developing data models that enable peer-to-peer group collaboration.

Research Problem

The premise is that currently common ways of digitally capturing information are not well-suited for decentralized systems. Problems include:

Information is only stored in plain text / natural language. Relevant information can not be extracted and used efficiently.
Information is stored in application-specific serializations that is not usable beyond original scope (e.g. an application-specific JSON schema).
It is hard to reference content : either pieces of content do not have globally identifiable names or they are tied to centralized naming schemes (e.g. DNS).
Conflict-resolution is difficult: dynamically changing content (mutable content) either requires a centralized service or considerable amount of out-of-band coordination to prevent conflicts.

Research Result

The main outcome of the research being conducted within DREAM is a specification for mutable containers that address all of these issues: Distributed Mutable Containers (DMC).

Distributed Mutable Containers are distributed data structures that can hold references to content while allowing replicas of the data structures to diverge and merge without conflict by using Commutative Replicated Data Types (CRDTs). DMC enables consistent referencing of mutable content or mutable collections of content, allowing a wide-range of applications such as decentralized geographical information systems.

Distributed Mutable Containers enable decentralized applications. Containers are distributed and can be replicated over many different transport mechanisms. Container replicas can be mutated locally and their state may diverge. DMC does not impose any restrictions on how local replicas may be mutated and does not requires any out-of-band coordination. Replicas can always be merged back to a consistent state.

DMC solves all four research problems:

DMC containers can be searched without prior extraction
RDF is used so that any implementation can be created on top of it
ERIS encoding provides content-addressed globally unique identifiers regardless of DNS
CRDTs provide automated conflict resolution for mutable content

DMC uses four magic ingredients:

Resource Description Framework

The Resource Description Framework (RDF): A simple, expressive and widely used graph-based data model that enables interlinking of diverse sets of data.

DMC uses RDF to describe container definitions, operations and container state. DMC is especially well-suited for applications that also use RDF, but not limited to such.

Content-addressing with ERIS and RDF-Signify

Content-addressing: By using a cryptographic hash of the content itself as identifier of the content we can enable robust referencing of arbitrary content. Our contributions include a method for making RDF content-addressable as well as a robust and censorship-resistant scheme for content-addressing.

By using the Encoding for Robust Immutable Storage (ERIS), most of the data that constitutes a DMC container can be transported over insecure mediums without loosing confidentiality or censorship resistance. Only a small amount of data needs to be transmitted securely for a peer to be able to reconstruct the state of a container.

DMC also allows control over containers to be delegated by attaching additional authorized keys. This allows collective control over a container.

RDF Signify s a simple RDF vocabulary (a single class and three predicates) that describes how the Ed25519 [RFC8032] algorithm can be used for signing and verifying content.

RDF Signify can not sign messages directly, but can be used to sign identifiers of content-addressed content. In particular it can sign identifiers of content-addressed RDF that is encoded with ERIS.

We believe that RDF Signify is a significantly simpler approach than what is proposed in the context of Linked Data Proofs (previously Linked Data Signatures), allowing much easier implementation and wider adoption.

Commutative Replicated Data Types (CRDTs)

Commutative Replicated Data Types (CRDTs): Distributed data-structures that ensure conflict-free merging of replicas.

This conflict-free merge-ability is possible by using Commutative Replicated Data Types (CRDTs). A down-side is that DMC only provides two container types (set and register) with very specific semantics. However, we argue that these two basic container types are sufficient for building interesting applications, especially when using a graph-based data model.

Using CRDTs also eliminates the necessity of specifying any dependencies between mutating operations or revisions of container states. This enables DMC to forget operations that do no longer contribute to the current state, thereby improving efficiency and allowing removed content to be irrevocably deleted (the right to be forgotten).

Datalog

Datalog: A query language based on Logic Programming that allows rich interaction and usage of data.

Datalog is a declarative logic programming language. It has a very simple syntax and semantics that makes it useful as a query language for interacting with large amounts of data.

We will use Datalog as a specification language for describing the semantics of container state. This allows a very concise description that can be directly used in existing, efficient implementations of Datalog. Furthermore, certain aspects of Datalog semantics (monotonicity) make it perfectly adapted for reasoning about distributed systems.

Datalog can also be used as an application level query language (e.g. for answering queries like “all posts about cats posted by people that I have sent a message to in the last year”). The query can be combined with container state resolution queries to a single query. This enables a DMC implementation that uses Datalog to not only provide primitive containers but a rich interface for efficiently interacting with data.

Previous Work

DREAM research on Distributed Mutable Containers builds on prior work done within the openEngiadina project, especially related to ERIS and Content-Addressed RDF.

This research builds on the shoulders of giants: the related work section of the DMC specification compares inspirational approaches to DREAM’s. The follow-up prerequisites section describes the underlying techniques.

Outlook

As next we intend to develop an OCaml implementation of the DMC specification. This will allow us to explore applications and gain further insight. This work will be available as part of the Alpha software release, due at the end of June 2021. You may follow updates in the related software repositories.

We welcome feedback and criticism! Our forum is open for friendly cooperation among DREAM Catchers. Do not hesitate to contact us and contribute to the code: our research is made to improve the digital commons.

how · 31 March 2021 13:24

9 posts were split to a new topic: D1.1 discussion about Discourse

pukkamustard · 30 March 2021 06:40

Deadline is Wednesday 2021-03-31 (tomorrow). Here a list of TODOs for the deadline:

Release DMC 0.2.0 (@pukkamustard)
Release RDF Signify 0.2.0 (@pukkamustard)
Final touches to top post of this topic

@how How does the incommon-ns play in with the deadline? Can you add TODOs if necessary?

how · 30 March 2021 07:00

@pukkamustard, sorry for taking so long in moving things on this. I don’t think we can deliver D1.1 as I was expecting it on time, but I will make everything I can to ensure it can evolve in a printable way over the next couple of cycles. I think it’s important that the design documents can form a continuous reading that can help users and implementors grasp all the concepts.

One of the required work is to make understand how those components work together. E.g., how to replace HTTP Signatures using RDF Signify vocabulary instead, and figure out a road map towards v1.0 of this vocabulary (i.e., what’s missing to drop-replace HTTP Signatures, including negotiations with the ActivityPub community or standardization). The vocabulary is so small that it does not signify how to use it. )

how · 30 March 2021 07:03

I want to write an overview for policy markers to complement the highly technical documents.
Unfortunately I have conflicting agendas with the NGI0 requiring me to complete other documents. I don’t want to rush it. however the page is already online, so it’s OK. Like with the website, we can update it as we go. It would be good to have a live talk to fix the points for inclusion in the final deliverable.

pukkamustard · 30 March 2021 07:40

I am not happy with this. Completing a deliverable creates space to move on and continue with other things. I want to deliver D1.1, take a week off and move on.

Suggestion: We deliver D1.1 as follows:

Release DMC 0.2.0
I’ll add some touches to RDF Signify and release 0.2.0
Add reference to WIP INCOMMON-NS in top post of this topic

And be done with it.

Yes, and this was also the plan. But it’s good to have an initial version done and be able to concentrate on other things.

I strongly believe that in order to help people (not only implementors) grasp the concept is to implement them and make them tangible [1]. IMO there is little point in spending more time on documentation now. I don’t want to talk about abstract ideas, I want to show stuff. For that I need clear head to focus on implementation.

[1] Habitat Chronicles: You can't tell people anything
[2] If you can't tell people anything, can you show them? -- Dustycloud Brainstorms

Already started: Content-addressing and signatures - openEngiadina - SocialHub

I’m Mumble’able today morning.

how · 30 March 2021 08:14

Yes we deliver D1.1, just not as I was expecting it to be. I’ll make some coffee and be on the mumble.

how · 31 March 2021 09:39

I pushed SSH deployment for dream.public.cat, since the CI deployment has pending issues.

https://gitlab.com/public.dream/dream.public.cat#ssh-deployment

SSH deployment works well for static HTML. If you need to push some changes I can add your SSH public key.

how · 31 March 2021 10:20

@pukkamustard I made a pass at the first post, please tell me what you think, if there are other points you’d like to add…

how · 31 March 2021 11:19

I remember a discussion about this and a specific binary encoding that was searchable, but I lost the reference and did not find it in the spec. Since it is a response to one of the stated problems, it would be nice to have it.

Apart from this I think the document is good as it is. I updated the CSS to have more legible links and blockquotes.

https://dream.public.cat/pub/dream-data-spec

Please tell me what you think.

Next is the announcement.

pukkamustard · 31 March 2021 13:06

Made some small edits to the section “Previous Work”.

Otherwise, lgtm.

pukkamustard · 31 March 2021 13:11

It’s not in the spec. I currently would classify it as future work for “Content-addressable RDF” (that doc should also be updated eventually)…

how · 31 March 2021 13:47

3 posts were merged into an existing topic: Pages for dream.public.cat