All right, just finished reviewing the document. It was very
nice to read, with a simple structure that takes the reader
along the trip. You have a knack for structuring complexity in
an appealing way!
Thank you very much for the kind words as well as the excellent
comments.
I committed a single typo fix for the whole doc, and noted the
following along the way…
Typo fix is merged. Notes are answered in-line below and with a
commit.
However, IPNS (and IPFS to a large extent) is currently not
practically usable due to performance issues.
This sentence will require some follow up. Maybe rephrase or
link to a relevant document about performance issues, maybe
IPNS is very slow
issue on their Github. Rephrasing in a way that makes it a
non-issue (whether or not IP
NS is slow) would be helpful. E.g., are there other advantages
of DMC over IPFS that can be highlighted?
I have added an explanation of how IPNS works and that it is
similar to
a DMC register. The note on IPFS performance is removed. In it’s
place I
have highlighted the ability of DMC to allow state to diverge and
authority to be delegated.
In particular we use the content-addressable grouping of RDF
triples into fragment graphs.
I guess fragment graphs should link to
Content-addressable RDF
I’ve moved the square bracket reference to the
“Content-addressable RDF”
doc just after fragment_graph.
I’m trying to keep in-line links to a minimum. I feel that they
can
break flow of reading and can create a lot of dependencies to
documents
that are not explicitly listed in the bibliography. Also the
fragment
URL (#org75555) is not stable.
There are two sub-classes of conflict-free replicated data
types: Convergent Replicated Data Types (state based) and
Commutative Replicated Data Types (operation based). We will
use Commutative Replicated Data Types.
The last sentence could start with: “We will use the latter…”
and provide an argument why the former is not useful to our
case. Or it can simply be “We will use the latter.” just to
avoid repetition.
Used latter formulation and added a short explanation of why
(operations can be represented as content-addressed, immutable
entities).
WARNING: The URN of ERIS encoded content is not yet finalized.
Maybe add a link to current work / discussion.
Added link to the “Future work” section.
A key Key
is an authorized key for a container Container
if and only if the authorizedKey(Container, Key)
holds.
This section begs for an answer to the question: how do you
remove a key, or: how do you make authorizedKey(Container, Key)
not hold? From my memory of our previous discussions
about it, key removal is not yet solved. Maybe a link to current
discussion would be useful,
even if it defines that’s future work – or impossible.
I’ve added section “4.6.1 Revocation” on this. Also added some
notes on Vegvisir (Section 2.5) and how they do revocation.
I think I also have a clearer idea of the trade-offs after writing
it down. I will formulate a concrete proposal.
Operation transport and synchronization are beyond the scope of
this document, we assume that the available operations are
given in an RDF graph.
Maybe add a link to the relevant documents (e.g., D1.2 at
https://dream.public.cat/pub/dream-pubsub-spec or D1.4
https://dream.public.cat/pub/specifications, or a specific
topic).
The sentence was a remainder from version 0.1.0 in 0.2.0 this is
more and scope and discussed in section 6. I’ve added a link.
When removing an element we must specify the element and
operation that added the element
This begs for an answer to: how do you keep track of operations,
and who can remove operations? Maybe a footnote and link to
reference?
Hm, keeping track of operations is a key functionality of DMC
implementations. I imagine that an end-user UI would make this
transparaent and removing an element creates a remove operation
that
reference the operation(s) that added the element.
Remove operations are just operations, so the discussion on
“authorized
operations” in section 4.7 is applicable.
It can be shown that the operations (add and remove) on an
OR-Set commute.
Is this coming from Shapiro et al.? #citation-needed
It is. I’ve put the statement and reference to Shapiro et. al. in
the same paragraph.
A concrete application is user profiles.
Maybe add: where only users access to their own profiles.
Added “where only users are authorized to mutate the profile”.
Thinking about this timestamp issue. Usually clocks on the
Internet are synchronized, and only a bad clock or a malicious
operation would allow this DOS. How simple would it be to add
some ‘relative clock deviation’
limit that would constraint timestamps to a range between
time(last operation) to time(now + 1 minute)
or something
approaching.
I think the problem is that malicious actors can simply choose to
not
abide by this constraint and there is no way to enforce such a
constraint.
I understand there’s a difficulty here given that two
containers could merge operations months or years apart. But
it’s only to understand th
e kind of counter-measure that could be implemented, and at what
layer. E.g., (contrived example) given a register that maintains
the name of the last active agent on a set, the timestamp would
certainly match the time of last operation in the set ; in this
case, any devia
tion from this time in the future is suspect. I guess something
could be said about the extent of the DOS risk, and ways to
mitigate it.
Maybe something like Vector
Clocks could be used
to
mitigate the risk…but I’m not sure if this can prevent malicious
actors in all cases…
NOTE: Soufflé has only recently added the aggregation
functionality required in the registerValue
predicate (not
included in version 2.0.2).
Maybe link to the relevant MR, maybe
Introduce Aggregate Scoping, Witnesses, Multi-leveled Injected Variables by rdowavic · Pull Request #1693 · souffle-lang/souffle · GitHub?
Added link. Excellent PR finding skills!
The set of objects and set of block references only stores read
capabilities and references to blocks.
Is it a single set(objects, block references) or two sets? “only
stores” would be “only store” in the latter case.
Two sets. Great catch! Fixed and renamed “set of objects” and “set
of block references” to “replica objects” and “replica block
references” respectively.
==== Garbage collection
What happens if implementations do not run GC, or: how does a
lack of GC affect the running program or the network
synchronization?
I can answer partly: When an implementation does not garbage
collect removed blocks, the right to be forgotten cannot be
respected. In other words: implementations seeking
GDPR-compliance MUST implement GC.
Added a note that implementations MAY maintain a set of garbage
collected block references. This set of previously deleted content
can
be used to prevent forgotten content to reappear locally because
of
synchronization with non-forgetting implementation. This allows
implementations to be locally GDPR compliant.
This synchronization procedure can be implemented over
protocols such as HTTP or CoAP.
Just reading this, my mind wants a reference. Implemented how?
Is there a section below talking about this? Where is the
documentation? Maybe it’s just “below we’re discussing such
implementations…”
Added a note that this will follow with implementations. This is
something I believe needs to be implemented before it can be
exactly specified.
However, it seems necessary to still be able to run the
synchronization procedure as described above when replicas
disconnect and reconnect from the publish-subscribe system.
Why?
In the case of MQTT if a client is not connected it will not
receive
messages it missed out on when reconnected. So replica A
disconnects,
lots of stuff happens between other replicas connected to MQTT
broker
and then replica A reconnects. It somehow needs to learn about
what
happened in the time it was disconnected.
Core XMPP does also not do “offline delivery” - in the case of
XMPP
there are extensions that allow offline delivery. Still when a
replica
connects for the first time it needs a way of requesting currently
known
state…which is basically synchronization.
I’m curious on how this would work with UPSCYLE…
In this section we define a CBOR serialization of replica
state. This is a simple way of writing out the state of a
replica to a file and transporting “out-of-band” (e.g. as an
e-mail attachment or on a USB disk).
Note that this serialization is not suitable as working state
representation. Implementations should use more efficient state
storage such as key-value stores.
This passage is a bit mysterious. Maybe reformulate to introduce
CBOR serialization as a mean to perform OOO transport, and
eventually describe the limits of this approach that can be
solved by other means. Maybe there can be a section in D1.2 that
covers this aspect.
The mention of caching below further calls for clarification of
when CBOR serialization is useful, and the cases where it can be
problematic – and what to use then.
Reformulated slightly to make it clear that CBOR serialization is
only suitable for OOO transport.
I the conclusion.
I your comments. Thanks!
Just pushed changes described here.