DMC: Distributed Mutable Containers

pukkamustard · 30 September 2020 09:55

Hello,

As part of next milestone in the openEngiadina NGI0 project I am drafting an initial version of DROMEDAR: DREAM / DROMEDAR / dmc · GitLab. I hope to make this directly usable for DREAM.

The abstract might be a rough sketch of the idea:

DROMEDAR is a specification of mutable containers that are capable of holding references to arbitrary content. Containers are based on Commutative Replicated Data Types (CRDTs). This allows container state to diverge and merge without conflict. Containers are mutated with operations that are cryptographically signed by authorized keys. We present a RDF vocabulary for describing containers, state and operations as well as the semantics for computing current known state.

The two types of containers that I am working on are:

Set with multiple unique members (based on the OR-Set CRDT). This could be used for ActivityPub collections (e.g. an inbox is a set containing Activities)
Register that holds a single member (based on LWW-Register CRDT). This would be used for things like user profiles or ontologies - there is always only one current member.

In this form the specification is compatible with ActivityPub. I.e. ActivityPub can be used as protocol for making changes to the containers and can also be used as the PubSub protocol for sharing changes. Plan is to implement this using ActivityPub in the openEngiadina software (CPub and GeoPub). My hope is that eventually ActivityPub can just be replaced with UPSYCLE.

Does this fit with your conception of what DROMEDAR should be? Is the scope chosen properly?

pukkamustard · 30 September 2020 09:56

I also think it is compatible with the Linked Data Platform (LDP - what Solid uses). There was a recent discussion on the Solid forum on how CRDTs might be used: Application of CRDTs to Solid - Solid Community Forum

pukkamustard · 6 October 2020 09:00

Some more hack-typing away: https://openengiadina.gitlab.io/dmc/

I renamed it to “Distributed Mutable Containers”. I think that name is slightly more descriptive. Also want to prevent confusion if common understanding of DROMEDAR is something else and we confuse terms.

Im’ trying to have a half-way ok version this week and hit a milestone for NGI0 project.

pukkamustard · 19 October 2020 06:27

Initial version is out: https://openengiadina.gitlab.io/dmc/

how · 27 October 2020 10:32

Thank you @pukkamustard for your diligence!

I’d be interested in @tg-x’s comments on puk’s approach to DROMEDAR. I guess it’s important to have confirmation sooner than later

Ah, I did not see https://dream.public.cat/t/dromedar-overview/58 prior to posting this.

pukkamustard · 1 March 2021 16:07

Requesting feedback for draft version 0.2.0 which will be part of deliverable D1.1 :

Todos

Hook up CI to host build artifacts on dream.public.cat (Add hook to deploy DMC specification and namespace to `https://dream.public.cat/dmc/` (#2) · Issues · DREAM / dream.public.cat · GitLab)
“Appication” section
Exact representation of state in “Container types” section
Conclusion

how · 1 March 2021 21:03

All right, just finished reviewing the document. It was very nice to read, with a simple structure that takes the reader along the trip. You have a knack for structuring complexity in an appealing way!
I committed a single typo fix for the whole doc, and noted the following along the way…

However, IPNS (and IPFS to a large extent) is currently not practically usable due to performance issues.

This sentence will require some follow up. Maybe rephrase or link to a relevant document about performance issues, maybe IPNS is very slow issue on their Github. Rephrasing in a way that makes it a non-issue (whether or not IP
NS is slow) would be helpful. E.g., are there other advantages of DMC over IPFS that can be highlighted?

In particular we use the content-addressable grouping of RDF triples into fragment graphs.

I guess fragment graphs should link to Content-addressable RDF

There are two sub-classes of conflict-free replicated data types: Convergent Replicated Data Types (state based) and Commutative Replicated Data Types (operation based). We will use Commutative Replicated Data Types.

The last sentence could start with: “We will use the latter…” and provide an argument why the former is not useful to our case. Or it can simply be “We will use the latter.” just to avoid repetition.

WARNING: The URN of ERIS encoded content is not yet finalized.

Maybe add a link to current work / discussion.

A key Key is an authorized key for a container Container if and only if the authorizedKey(Container, Key) holds.

This section begs for an answer to the question: how do you remove a key, or: how do you make authorizedKey(Container, Key) not hold? From my memory of our previous discussions about it, key removal is not yet solved. Maybe a link to current discussion would be useful,
even if it defines that’s future work – or impossible.

Operation transport and synchronization are beyond the scope of this document, we assume that the available operations are given in an RDF graph.

Maybe add a link to the relevant documents (e.g., D1.2 at https://dream.public.cat/pub/dream-pubsub-spec or D1.4 https://dream.public.cat/pub/specifications, or a specific topic).

When removing an element we must specify the element and operation that added the element

This begs for an answer to: how do you keep track of operations, and who can remove operations? Maybe a footnote and link to reference?

It can be shown that the operations (add and remove) on an OR-Set commute.

Is this coming from Shapiro et al.? #citation-needed

A concrete application is user profiles.

Maybe add: where only users access to their own profiles. Thinking about this timestamp issue. Usually clocks on the Internet are synchronized, and only a bad clock or a malicious operation would allow this DOS. How simple would it be to add some ‘relative clock deviation’
limit that would constraint timestamps to a range between time(last operation) to time(now + 1 minute) or something approaching. I understand there’s a difficulty here given that two containers could merge operations months or years apart. But it’s only to understand th
e kind of counter-measure that could be implemented, and at what layer. E.g., (contrived example) given a register that maintains the name of the last active agent on a set, the timestamp would certainly match the time of last operation in the set ; in this case, any devia
tion from this time in the future is suspect. I guess something could be said about the extent of the DOS risk, and ways to mitigate it.

NOTE: Soufflé has only recently added the aggregation functionality required in the registerValue predicate (not included in version 2.0.2).

Maybe link to the relevant MR, maybe Introduce Aggregate Scoping, Witnesses, Multi-leveled Injected Variables by rdowavic · Pull Request #1693 · souffle-lang/souffle · GitHub?

The set of objects and set of block references only stores read capabilities and references to blocks.

Is it a single set(objects, block references) or two sets? “only stores” would be “only store” in the latter case.

==== Garbage collection

What happens if implementations do not run GC, or: how does a lack of GC affect the running program or the network synchronization?
I can answer partly: When an implementation does not garbage collect removed blocks, the right to be forgotten cannot be respected. In other words: implementations seeking GDPR-compliance MUST implement GC.

This synchronization procedure can be implemented over protocols such as HTTP or CoAP.

Just reading this, my mind wants a reference. Implemented how? Is there a section below talking about this? Where is the documentation? Maybe it’s just “below we’re discussing such implementations…”

However, it seems necessary to still be able to run the synchronization procedure as described above when replicas disconnect and reconnect from the publish-subscribe system.

Why?

In this section we define a CBOR serialization of replica state. This is a simple way of writing out the state of a replica to a file and transporting “out-of-band” (e.g. as an e-mail attachment or on a USB disk).

Note that this serialization is not suitable as working state representation. Implementations should use more efficient state storage such as key-value stores.

This passage is a bit mysterious. Maybe reformulate to introduce CBOR serialization as a mean to perform OOO transport, and eventually describe the limits of this approach that can be solved by other means. Maybe there can be a section in D1.2 that covers this aspect.
The mention of caching below further calls for clarification of when CBOR serialization is useful, and the cases where it can be problematic – and what to use then.

I the conclusion.

pukkamustard · 2 March 2021 16:29

All right, just finished reviewing the document. It was very
nice to read, with a simple structure that takes the reader
along the trip. You have a knack for structuring complexity in
an appealing way!

Thank you very much for the kind words as well as the excellent
comments.

I committed a single typo fix for the whole doc, and noted the
following along the way…

Typo fix is merged. Notes are answered in-line below and with a
commit.

However, IPNS (and IPFS to a large extent) is currently not
practically usable due to performance issues.

This sentence will require some follow up. Maybe rephrase or
link to a relevant document about performance issues, maybe
IPNS is very slow
issue on their Github. Rephrasing in a way that makes it a
non-issue (whether or not IP
NS is slow) would be helpful. E.g., are there other advantages
of DMC over IPFS that can be highlighted?

I have added an explanation of how IPNS works and that it is
similar to
a DMC register. The note on IPFS performance is removed. In it’s
place I
have highlighted the ability of DMC to allow state to diverge and
authority to be delegated.

In particular we use the content-addressable grouping of RDF
triples into fragment graphs.

I guess fragment graphs should link to
Content-addressable RDF

I’ve moved the square bracket reference to the
“Content-addressable RDF”
doc just after fragment_graph.

I’m trying to keep in-line links to a minimum. I feel that they
can
break flow of reading and can create a lot of dependencies to
documents
that are not explicitly listed in the bibliography. Also the
fragment
URL (#org75555) is not stable.

There are two sub-classes of conflict-free replicated data
types: Convergent Replicated Data Types (state based) and
Commutative Replicated Data Types (operation based). We will
use Commutative Replicated Data Types.

The last sentence could start with: “We will use the latter…”
and provide an argument why the former is not useful to our
case. Or it can simply be “We will use the latter.” just to
avoid repetition.

Used latter formulation and added a short explanation of why
(operations can be represented as content-addressed, immutable
entities).

WARNING: The URN of ERIS encoded content is not yet finalized.

Maybe add a link to current work / discussion.

Added link to the “Future work” section.

A key Key is an authorized key for a container Container
if and only if the authorizedKey(Container, Key) holds.

This section begs for an answer to the question: how do you
remove a key, or: how do you make authorizedKey(Container, Key) not hold? From my memory of our previous discussions
about it, key removal is not yet solved. Maybe a link to current
discussion would be useful,
even if it defines that’s future work – or impossible.

I’ve added section “4.6.1 Revocation” on this. Also added some
notes on Vegvisir (Section 2.5) and how they do revocation.

I think I also have a clearer idea of the trade-offs after writing
it down. I will formulate a concrete proposal.

Operation transport and synchronization are beyond the scope of
this document, we assume that the available operations are
given in an RDF graph.

Maybe add a link to the relevant documents (e.g., D1.2 at
https://dream.public.cat/pub/dream-pubsub-spec or D1.4
https://dream.public.cat/pub/specifications, or a specific
topic).

The sentence was a remainder from version 0.1.0 in 0.2.0 this is
more and scope and discussed in section 6. I’ve added a link.

When removing an element we must specify the element and
operation that added the element

This begs for an answer to: how do you keep track of operations,
and who can remove operations? Maybe a footnote and link to
reference?

Hm, keeping track of operations is a key functionality of DMC
implementations. I imagine that an end-user UI would make this
transparaent and removing an element creates a remove operation
that
reference the operation(s) that added the element.

Remove operations are just operations, so the discussion on
“authorized
operations” in section 4.7 is applicable.

It can be shown that the operations (add and remove) on an
OR-Set commute.

Is this coming from Shapiro et al.? #citation-needed

It is. I’ve put the statement and reference to Shapiro et. al. in
the same paragraph.

A concrete application is user profiles.

Maybe add: where only users access to their own profiles.

Added “where only users are authorized to mutate the profile”.

Thinking about this timestamp issue. Usually clocks on the
Internet are synchronized, and only a bad clock or a malicious
operation would allow this DOS. How simple would it be to add
some ‘relative clock deviation’
limit that would constraint timestamps to a range between
time(last operation) to time(now + 1 minute) or something
approaching.

I think the problem is that malicious actors can simply choose to
not
abide by this constraint and there is no way to enforce such a
constraint.

I understand there’s a difficulty here given that two
containers could merge operations months or years apart. But
it’s only to understand th
e kind of counter-measure that could be implemented, and at what
layer. E.g., (contrived example) given a register that maintains
the name of the last active agent on a set, the timestamp would
certainly match the time of last operation in the set ; in this
case, any devia
tion from this time in the future is suspect. I guess something
could be said about the extent of the DOS risk, and ways to
mitigate it.

Maybe something like Vector
Clocks could be used
to
mitigate the risk…but I’m not sure if this can prevent malicious
actors in all cases…

NOTE: Soufflé has only recently added the aggregation
functionality required in the registerValue predicate (not
included in version 2.0.2).

Maybe link to the relevant MR, maybe
Introduce Aggregate Scoping, Witnesses, Multi-leveled Injected Variables by rdowavic · Pull Request #1693 · souffle-lang/souffle · GitHub?

Added link. Excellent PR finding skills!

The set of objects and set of block references only stores read
capabilities and references to blocks.

Is it a single set(objects, block references) or two sets? “only
stores” would be “only store” in the latter case.

Two sets. Great catch! Fixed and renamed “set of objects” and “set
of block references” to “replica objects” and “replica block
references” respectively.

==== Garbage collection

What happens if implementations do not run GC, or: how does a
lack of GC affect the running program or the network
synchronization?
I can answer partly: When an implementation does not garbage
collect removed blocks, the right to be forgotten cannot be
respected. In other words: implementations seeking
GDPR-compliance MUST implement GC.

Added a note that implementations MAY maintain a set of garbage
collected block references. This set of previously deleted content
can
be used to prevent forgotten content to reappear locally because
of
synchronization with non-forgetting implementation. This allows
implementations to be locally GDPR compliant.

This synchronization procedure can be implemented over
protocols such as HTTP or CoAP.

Just reading this, my mind wants a reference. Implemented how?
Is there a section below talking about this? Where is the
documentation? Maybe it’s just “below we’re discussing such
implementations…”

Added a note that this will follow with implementations. This is
something I believe needs to be implemented before it can be
exactly specified.

However, it seems necessary to still be able to run the
synchronization procedure as described above when replicas
disconnect and reconnect from the publish-subscribe system.

Why?

In the case of MQTT if a client is not connected it will not
receive
messages it missed out on when reconnected. So replica A
disconnects,
lots of stuff happens between other replicas connected to MQTT
broker
and then replica A reconnects. It somehow needs to learn about
what
happened in the time it was disconnected.

Core XMPP does also not do “offline delivery” - in the case of
XMPP
there are extensions that allow offline delivery. Still when a
replica
connects for the first time it needs a way of requesting currently
known
state…which is basically synchronization.

I’m curious on how this would work with UPSCYLE…

In this section we define a CBOR serialization of replica
state. This is a simple way of writing out the state of a
replica to a file and transporting “out-of-band” (e.g. as an
e-mail attachment or on a USB disk).

Note that this serialization is not suitable as working state
representation. Implementations should use more efficient state
storage such as key-value stores.

This passage is a bit mysterious. Maybe reformulate to introduce
CBOR serialization as a mean to perform OOO transport, and
eventually describe the limits of this approach that can be
solved by other means. Maybe there can be a section in D1.2 that
covers this aspect.
The mention of caching below further calls for clarification of
when CBOR serialization is useful, and the cases where it can be
problematic – and what to use then.

Reformulated slightly to make it clear that CBOR serialization is
only suitable for OOO transport.

I the conclusion.

I your comments. Thanks!

Just pushed changes described here.

Nemael · 9 March 2021 15:27

Hey, I read the document, and I agree with @how’s introduction two posts before, it is a great document! I like your habit of making a list of what is expected for the reader to understand it.

GNU Name System is very interesting, as are every project you cited in “Related work”. Do you know of anyone that worked on these project that is interested in DMC?
I do not completely understand the “a” terminology in, for example

@prefix dmc: <http://purl.org/dmc/ns#> .

<urn:erisx2:AAAELYRT24S2SACAPSMGTNYYHCCQHBDGQH3SC3K3Y35PI73SLTA6JTO7S7XTA66LZQ2OKSAZZI4BWXJCGGJLZ74WIWEMXZ2SXDC4JVAFKM>
   a dmc:SetDefinition ;
   dmc:rootPublicKey <urn:ed25519:pk:O7RZC7XVKCRU4LFT4ACSBP3GQ55PPGBZHCBQJMJ46VKJN63HOMSA> .

Is the “a dmc:SetDefinition;” only a way to identify this operation? Or could it carry any data (for example, the operation id)?
I like the duality of registers and set. Each one is very specific and very different from the other, and they complement each other nicely.
After reading, I had a few more questions and comments, but they were answered by your discussion with @how. However, I think the chapter about Garbage Collection should make it clearer that implementation of the garbage collection is crucial for RGPD-compliance and to maintain the right to be forgotten.
As a last thing, the link purl.org/dmc/ns of the RDF vocabulary is broken, and leads to an error 404. Is it meant to be?

pukkamustard · 11 March 2021 10:24

Thank you!

Better ask @dvn and @tg-x.

a is short for rdf:type which is short for http://www.w3.org/1999/02/22-rdf-syntax-ns#type. a is just a very useful shortcut.

The rdf:type defines what the object is. There is no custom id. All objects are content-addressed with ERIS and can not be distinguished just by looking at the id.

So the rdf:type (using a syntax) is super important.

My understanding of GDPR is very limited and I didn’t want to make any claims on compliance without understanding what that means. Maybe you or somebody else who knows about GDPR can suggest a rewording to make this clear?

Fixed now (I would have fixed this for release). In the moment it is hosted on a server I manage (inqlab.net). For release we would like to host this on dream.public.cat.

misterfish · 15 March 2021 13:44

This is really excellent.

Here are some mostly non-technical comments for sections 1-3.

or a map that consists of waypoints and points of interest can be implemented can be seen as a set of elements.
A CBOR serialization is defined in Section 6.4, which can be used for “out-of-band” transport of DMC containers.
- I don’t love the quotes here – personally prefer if people just say what they mean If it’s an unusual term it could go in the glossary maybe.
2.1. Hypercore and Secure ScuttleBut
- I would like a sentence that says what these are, before going into how they work.
DMC solves both issues by using CRDTs which do not impose any order on mutating operations.
- strongly recommend a comma: “by using CRDTs, which do not …”
Last two paragraphs of 2.1: I would make it clear we are talking about DMC again. Suggestion: “With DMCs, content can be completely removed …” and “The states of two DMC replicas are permitted to diverge, as they can …”
[2.2]: DMC does not require […]
- recommendation: “DMC does not require manual merging: CRDTs are guaranteed to always merge without conflict.” Also, I wonder if there’s a way to avoid this repetition, as this was already clear from 2.1.
[2.3] It solves a very similar problem to DMC
- I found this a little bit confusing on first read. Seems to me that the main problem which GNS solves is only one of the problems which DMC solves, since DMC does much more than just naming. I like the part where it says that they are complementary: I feel that this highlights their relationship better than the part where it says they solve a similar problem.
2.4
- Suggestion: “IPFS together with an extension such as the InterPlanetary Name System (IPNS) would be closer to the goals of DMC.”
DMC improves [this situation?] by always encrypting content with ERIS (see [_content-addressing]).
Here you mention confidentiality, censorship resistance, encryption, and the right to be forgotten for the first time. I feel that these are core (and very interesting) properties of DMCs and should be added to the introduction.
2.5 suggestion: “While this allows features such as key revocation (Section 4.6.1), it prevents the Right to be Forgotten (Section 6.2.1).” Also, you might want to use the same capitalisation throughout for this right.
[2.6] Unlike Riak …
- ok but what’s special about the enterprise datacenter? For example, can they do things we can’t, like use containers other than register and set?
3.1 suggestion: “Subjects and predicates are identified with IRIs (internationalized URIs), while objects are either IRIs or literal values.”
3.2: “in order to be able to [?] content-addressed RDF data”
3.4: I think I would go with “Conflict-free replicated data types are a class”
3.4 suggestion: “that can be replicated over hosts and updated independently, and are guaranteed to be mergeable without conflict.”
3.4: I would say “same set of operations is applied”
3.5 suggestion: “query language for databases, among other things.”.
3.5: About the container state resolution queries: does this mean you can micro-manage how you want the state to be resolved at the same time that you do a data query? This would seem to go against all the nice features mentioned above, unless it’s clear that it’s something the DMC developer might want to do, but not the application developer. Also it’s not clear to me on reading this whether an application developer must use Datalog to interact with the DMCs (I guess not), can use it if they want (I guess so), or whether it’s purely internal.

pukkamustard · 15 March 2021 15:48

Allen Haim via DREAM noreply@dream.public.cat writes:

This is really excellent.

Here are some mostly non-technical comments for sections 1-3.

or a map that consists of waypoints and points of interest
can be implemented can be seen as a set of elements.

Fixed.

A CBOR serialization is defined in Section 6.4, which can be
used for “out-of-band” transport of DMC containers.

I don’t love the quotes here – personally prefer if
people just say what they mean If it’s an unusual term it
could go in the glossary maybe.

I like that - say what is meant. Out-of-band is a correct term and
should be used just like that. I have added an entry in the
glossary
just to make it clear.

2.1. Hypercore and Secure ScuttleBut

I would like a sentence that says what these are, before
going into how they work.

Perfect. Added a sentence: “… are protocols for building
decentralized
applications.”

DMC solves both issues by using CRDTs which do not impose
any order on mutating operations.

strongly recommend a comma: “by using CRDTs, which do not
…”

Added comma.

Last two paragraphs of 2.1: I would make it clear we are
talking about DMC again. Suggestion: “With DMCs, content can be
completely removed …” and “The states of two DMC replicas are
permitted to diverge, as they can …”

+1. Fixed as suggested.

[2.2]: DMC does not require […]

recommendation: “DMC does not require manual merging:
CRDTs are guaranteed to always merge without conflict.”
Also, I wonder if there’s a way to avoid this repetition, as
this was already clear from 2.1.

Fixed. Yes, this seems to be a key difference from most existing
systems. I think the repetition in this section is ok, as it may
help a
reader familiar with any one of the existing systems get a grasp
of what
is going on directly without reference to other parts or without
reading
the sections on the systems they are not familiar with.

[2.3] It solves a very similar problem to DMC

I found this a little bit confusing on first read. Seems
to me that the main problem which GNS solves is only one of
the problems which DMC solves, since DMC does much more than
just naming. I like the part where it says that they are
complementary: I feel that this highlights their
relationship better than the part where it says they solve a
similar problem.

I’ve tried to make it more clear with commit 55f1b61. Could you
check?

2.4

Suggestion: “IPFS together with an extension such as the
InterPlanetary Name System (IPNS) would be closer to the
goals of DMC.”

Fixed as suggested.

DMC improves [this situation?] by always encrypting content
with ERIS (see [_content-addressing]).

Here you mention confidentiality, censorship resistance,
encryption, and the right to be forgotten for the first time. I
feel that these are core (and very interesting) properties of
DMCs and should be added to the introduction.

Good point. Added two sections to the introduction: One on things
ERIS
provides and one on the right to be forgotten.

2.5 suggestion: “While this allows features such as key
revocation (Section 4.6.1), it prevents the Right to be
Forgotten (Section 6.2.1).” Also, you might want to use the same
capitalisation throughout for this right.

Fixed. And changed all occurences of “Right to be Forgotten” to
“right to be forgotten”.

[2.6] Unlike Riak …

ok but what’s special about the enterprise datacenter? For
example, can they do things we can’t, like use containers
other than register and set?

I admit I was a bit sloppy with the Riak comparison. I’ll dig into
Riak some more and improve the section…

3.1 suggestion: “Subjects and predicates are identified with
IRIs (internationalized URIs), while objects are either IRIs or
literal values.”

Fixed.

3.2: “in order to be able to [?] content-addressed RDF data”

Fixed.

3.4: I think I would go with “Conflict-free replicated data
types are a class”

Fixed.

3.4 suggestion: “that can be replicated over hosts and updated
independently, and are guaranteed to be mergeable without
conflict.”

Fixed. However, I’ve left the last comma before the and. Commas
before
an and feel weird to me. I think a comma is required in US English
whereas UK/International English does not require it. Can somebody
confirm that? Any strong feelings on commas before an and?

3.4: I would say “same set of operations is applied”

Fixed.

3.5 suggestion: “query language for databases, among other
things.”.

Hm, the thing I would like to emphasize here is that Datalog is
especially well-suited for large amounts of data.

Things like Prolog are equally cool and expressive, but not
intrinsically efficient for querying large amounts of data.

I’ve changed “database” to “for interacting with large amounts of
data”.

3.5: About the container state resolution queries: does this
mean you can micro-manage how you want the state to be resolved
at the same time that you do a data query? This would seem to go
against all the nice features mentioned above, unless it’s clear
that it’s something the DMC developer might want to do, but not
the application developer. Also it’s not clear to me on reading
this whether an application developer must use Datalog to
interact with the DMCs (I guess not), can use it if they want (I
guess so), or whether it’s purely internal.

Hm, I don’t quite understand what “micro-manage how you want the
state to be resolved” means.

But the paragraph goes into a maybe tricky thing, so let me
elaborate on what I mean:

A DMC implementation MAY use a Datalog evaluation algorithm to
compute
container state. It can then either provide a Datalog interface to
applications/users or not.

If it does not, the applications just gets the current state of a
container from the DMC implementation. The current state might be
very
big and the application might be interested in very specific
things
about the container (e.g. only elements of the set with type
Apple).
Eventually the application will have to implement it’s own
databases/indices for efficently getting the information it wants.

If the DMC implementation does leak the Datalog interface then the
application can formulate a Datalog query for exactly the things
it
needs (a subset of the container state) using the DMC Datalog
predicates
to access state of the container. The query can now be computed
more
efficiently than just a query for the container state because
Datalog
can use the application level query to further restrict the amount
of
data it needs to consider. There is no need for the application to
maintain any application specific indices.

Does that make sense? Any ideas on how it can be made clear and
concise
in section 3.5?

DMC can also be implemented without Datalog. In that case Datalog
is
just a specification language.

Thank you very much for the great comments!

Pushed changes (minus Riak section - will follow in the next
days).

how · 15 March 2021 21:38

I remember from reading about CRDTs and from your explanations that the conflict resolution always happens. I do not know if it’s the right time or if it has a place in D1.1, but I think this notion may lead to confusion. My understanding is that, even if the CRDT mechanism will solve conflicts in all cases, the final result may not be what is actually wanted. In some cases, conflicting edits to a set might end up resolving to an unwanted state, e.g., when a party makes a late modification to the set after another party brings in new data. Then, an intermediary state would be the right thing while the final resolution would end up with wrong data. Can we come up with an example of such discrepancy? Let’s say an organization changes a phone number that goes into the set, while another party updates the set afterwards – e.g., due to later synchronization, with the old number: the CRDT would resolve by changing back the number to its old value, right? What happens then? My intuition tells me that another operation would be needed to restore the new number… Since this is not obvious, and because the property is new and differs from existing systems, it might be useful to highlight the caveats and ways to solve them. The CRDT has no way to know the right number: only humans would be able to notice the change. Does that open to potential malicious usage or overlooks? I guess both developers and users would be curious about such a situation and ways to deal with it.

Thank you all for taking so much care into providing excellent feedforth!

How important is that to specify and expand upon?

arie · 16 March 2021 20:00

Hi, some notes from my side. I think it is very nicely written, and concise.
I have to admit there were quite a few parts which I couldn’t completely follow, so therefore my comments might not always be relevant, and I suppose a thorough understanding is necessary to be able to comment on all levels, but luckily there are little typos too

About the assumed knowledge, it might useful to point to a specific reference (or more) about RDF’s,
because it wasn’t clear to me where to start.
And is there a less formal introduction to datalog out there? This one is hard to
understand, and it’s possibly also not necessary to understand this article for understanding this document.

3.2

“… to be able to content-addressed RDF-data…”
Maybe ‘content-address RDF-data’? or ‘to be able to implement content addressed RDF-data’?
“… when discussing replica local state …”
Maybe “when discussing local state of replicas?”

3.4

“… and guaranteed to be mergeable …”
Perhaps “… and are / is guaranteed to be mergeable …”

4.1

“types objects”
Might be “types of objects”?
“read cpability”

4.2 - 4.4

does the definition of the set correspond to a concrete set? If so, is it
the empty set?
similar, does the operation correspond to a concrete element? Or is the
rdf:value a placeholder for an element?

4.5

“messgae”

4.5

for me it’s hard to follow. For example, some questions about the signed Datalog helper predicate;
the graph (… , …, …) predicate is used to access RDF triples in the
replica object graph. Is the replica object graph implicit here?
Is the Message an RDF-triple? Is it part of the object graph?
what’s the relation between a Message and an operation?

4.6

does the line
" A key Key is an authorized key for a container Container if and only if the authorizedKey(Container, Key) holds. "
mean that, given a Key and a Container, you can use the datalog
predicate(s?) to check whether the Key is authorized? Does it mean it
follows both predicates and evaluates whether one of them is true?

5.1.2

there is a reference to the same section, it seems to be not right

5.2.3

‘te register’

6.1

is the set of ERIS read capabilities of all objects the replica
knows about called ‘replica objects’?

7.1

we believe that DMC can server …

Allright, I hope it’s somehow useful.

pukkamustard · 21 March 2021 11:12

I also had trouble finding a concise introduction to RDF. I have written one myself: A Primer on the Semantic Web and Linked Data — inqlab but am reluctant to reference that.

I can recommend following articles:

I also intend to write-up the talk I gave…someday.

Fixed.

Yes. It does correspond to the empty set. This is implicit from the state resolution in 5.1.4.

Also yes. The element is what you get when you de-reference the ERIS URN in the rdf:value predicate. To be honest, I don’t remember what the ERIS URN de-references to. I copied it from some example…

In a way. State resolution and the predicates only make sense at a replica and then we use the replica object graph.

Maybe following sentence in 4.1 make this clear (?):

In the following the Datalog predicate graph(Subject, Predicate, Object) is used to access RDF triples in the replica object graph.

The “message” is public-key cryptography lingo. The message is the data that is signed and whose integrity can be verified. In RDF Signify (rdf-signify.adoc · main · openEngiadina / rdf-signify · GitLab) the message is just the identifier.

The identifier of an operation is the message.

Because identifier of an operation exactly identifies the operation (content-addressing) this is enough to verify the entire operation.

Yes, exactly. Datalog evaluates to the union of all solutions from all rules (so both rules).

Should be reference to 4.2. Fixed.

Fixed.

Yes.

Fixed.

Very much. Thank you!