Data sync - next steps and considerations

protocol

#1

Here’s an update/brain dump of data sync next steps and some considerations. For the previous post, please see Introducing a data sync layer.

Briefly on reliability

Let’s first recap and look at what data sync provides. It provides a way to reliably sync or replicate data between various peers. What do we mean by reliable? Essentially keep things working even when system crashes/behaves badly. E.g.:

  • deal with lost messages (acks with retransmission)
  • deal with out-of-order messages (use a DAG)
  • deal with duplicate messages (messages immutable and have same id)
  • message integrity (id should be hash of message)
  • (flow control (receive buffer))

This is very similar to what TCP provides over IP. Notice that reliability in a p2p network should not be impacted by something like Chaos Unicorn Day. This should be a non-event.

Our current transport layer uses pub sub and relies on mailservers to deal with normal failure (receiver offline). With pubsub, you generally want to avoid back chatter. This means the underlying transport protocol is simple, but by default it removes the possibility to coordinate between sender and receiver. In general, we lack end to reliability semantics, and we are fragile to several classes of failure, which is the same as being unreliable.

Considerations

What follows are design considerations that are being looked into, and requires further thought.

1) Framing

Currently it is assumed that everything will use sync mode. This seems false, as a lot of use cases might have different desired properties. The proper way to solve this is to use framing, similar to how IP multiplexes to TCP/UDP/ICMP. E.g. you could imagine something like:

| <header, mode={sync,nosync}|> | <payload> |

Where e.g. pubsub like messages could be provided for without requiring data sync semantics. Additionally, it might be interesting to have a basic form of control/session management, similar to ICMP, which enables pings etc.

The current spec also encodes the header in protobuf (https://github.com/status-im/status-research/blob/master/data_sync/sync.proto#L14), which may not be desirable, especially as order isn’t guaranteed so it isn’t really a proper “header”. Specific alternatives remain to be investigated and suggestions are welcome!

2) Piggybacking and acks

When a node receives a message it is assumed to send an ACK straight away. That might not be desirable for a few reasons. For example, it leaks information about when the user is online, which might be undesirable. It also might be wasteful.

Techniques for handling this needs to be investigating more. For example, you could piggyback ACKs in the header when you reply. It might also be possible to ACK using less information, so ACKing one message implies ACKing previous messages as well.

Also consider what graceful degradation looks like for end users.

Related: ZMQ reliability patterns http://zguide.zeromq.org/page:all, Exactly-Once Delivery problem (reliable/distributed queues).

Also see efficiency considerations below.

3) Semi-trusted nodes - discovery and routing

While data sync provides reliability and eventual consistency, the latency might be high for this. This is especially true if a data sync group only contains mobile nodes, which are mostly-offline. To mitigate this, the idea of semi-trusted nodes is considered. A semi-trusted node would not be able read the content of the messages, but it would know message ids within a scope, as well as some nodes participating in the sync scope. An example of a would be a friend’s desktop, where high-availability wouldn’t be a requirement, but it’d have higher availability than two mobiles nodes. In the following example topology:

Sync group: 0xdeadbeef (some scope)
Nodes: A (mobile), B (mobile), C (desktop), D (desktop).
Edges: A<>B, A<>C, B<>D, C<>D

a) How does A initially discover and choose C?
One idea would be to push hardware-based Status Nodes (sell?) and make it simple to scan QR code to get started; this would make the journey of Status nodes and any random node the same. But this would likely be a prohibitive cost for normal user onboarding. See also: Network incentivisation - First draft.

b) How does C and D connect with each other?
C and D don’t have access to the contents of the messages, they can’t see events such as “node X joined”. Additionally, how does C and D know which messages are most relevant if they can’t see the DAG? This may or may not be a problem as the DAG grows.

c) How privacy-preserving should these semi-trusted nodes be?
While the underlying transport privacy layer would be responsible for safeguarding metadata, this is useful to keep in mind. If it really is your own desktop at home it isn’t a big deal, but if you end up with people using a lot of the same nodes, this might be a problem.

Additionally:

  • Single Use Reply Blocks (SURBs) might mitigate this somewhat, but this is making assumption about the transport privacy layer which seems undesirable at this stage. The best way to think of it is rather as an enhancement.

  • See Bloom filter discussion in efficiency considerations below (but at the end of day you have to ACK actual messages to get provable reliability).

  • See base layer routing solutions like BGP and how they generalize.

  • Swarm design is worth looking into more and seeing how to leverage it here, without relying on it

While these concerns are slightly orthogonal to the problem of data sync, solving this is very useful for end user experience.

4) Efficiency: Set reconciliation (bandwidth) and minimizing latency

This is more a consideration of performance, which is more relevant after the thing actually works. But it’s worth keeping in mind as it might inform the design.

With many nodes acting in a sync group, and mostly-offline behavior, this would by default lead to a lot of communication overhead. There are many techniques for dealing with these, for example leveraging (normal or “inverted”) bloom filters. Efficient Set Reconciliation problem is related here.

Special care must be taken to basic operations that will occur frequently. E.g. message set will grow, and what type of messages a bloom filter it’ll trigger. Tribler’s Dispersy is relevant here, though it deals with static big files, which is a slightly different problem.

Aside from message growth overhead, there’s also the replication factor. E.g. if you have a group chat with 100 people you might just want to share with a subset, e.g. the receiver and K other nodes that have proven to be useful (desktop, etc).

Most naive way, N node sync group all sync everything with each other would be (n*(n-1))/2 or O(n^2) communications over network, or O(n) from an individual peer’s point of view. Notice that N here corresponds to participant’s in a sync group, and isn’t the same thing as a Whisper topic. Additionally, if nodes are offline most of the time this requires resending, and due to Byzantine faults (losing ACKs etc), we need to resend information about M messages. Ideally this should be O(1). Additionally, if we select a subset of nodes to share with in a sync group (say, sender and K others), we can likely get O(logn) scaling behavior (for node, Onlogn for network) for arbitrary number of messages.

Regarding latency, the desired user story is that you come online and can quickly send out a message equivalent to “what did I miss” to a few nodes (which might double as an OFFER message bloom filter), and have them send something back.

5) Does it make sense to have public chats (as currently designed) at all?

This is pure pub sub, and means there’s no explicit way to see who is participating in a sync scope. Alternative would be treat like a big public group chat, which is essentially what all other major clients do. There are two main properties we want to maintain here:

  • scalability - most likely requires symmetric-key encryption when N is large (>20/100/1000)
  • coordination - should be able take a deterministic hash of a dapp and every
    client and user understand where to chat

There might still be use cases for broadcast based communication, but these be better modeled as analog radio as opposed to having reliability as a first class citizen, due to back-chatter of pubsub.

This can probably be achieved by know dapp name -> know topic -> know sym key. This topic can be, but doesn’t have to be, the same as Whisper topic. What’s needed are semantics for letting other peers know about each other.

6) How does this interface with a mixnet-based design?

The data sync layer should be generic enough to work both with something like Whisper, as well as PSS, Tor, Mixnet. These differ in their topology and semantics. Especially for https://github.com/w3f/messaging we need to understand exactly how data sync layer will interface with it, and any special considerations there might be.

Next steps

Right now there’s a proof of concept simulation (https://github.com/status-im/status-research/tree/master/data_sync). The general plan of action is to extend this to be an end-to-end toy example implementation, initially for 1:1 chat. This will allow realistic experimentation of the data sync layer without disturbing current Core work. Once this implementation and, more importantly, the accompanying spec is deemed acceptable, Core can use this as a guide for implementing data sync in the app.

The current method for implementing this end-to-end toy implementation is to use Whisper via geth and web3.py. Progress for this can be seen here: https://github.com/status-im/status-research/pull/2/files. At some point, this Python-based prototype might be replaced with an implementation in Nim (a la Stratus) as this would bring us closer to other research efforts.

In parallel with this (and informed by), research will continue on above considerations.


Thoughts on any of this or specific designs you think are worth looking into are welcome!


Data Sync Research Log
#2

Quick little screencast demo to show 1:1 data sync PoC using Whisper as a transport.

Running two geth nodes with Whisper to show-case sync with retransmission and acks of data. Simplistic interactive 1:1 chat. This PoC doesn’t: include any additional sync nodes; try to optimize latency or bandwidth; client doesn’t encode a DAG.


#3

What is sync mode vs nosync mode? Nosync mode can be used to just inform about something without triggering syncing that package with other nodes?

Regarding implementation:

  • I would strongly advise that we implement it first within a single process but using concurrent programming technics. This implementation should solve cases like receiving the same package multiple times, out of order etc. and should be well-covered with unit tests.
  • Next, we should have a transport interface with multiple transports tested, for example, direct TCP connection between peers and local Whisper network.
  • Peers discovery should be a totally separate problem but passing messages by nodes knowing only message IDs should be part of this implementation.
  • From day one have useful metrics published so that there are numbers to compare, e.g. it may turn out that the first naive implementation was better than what we ended up later which is wrong.

One more important thing to discuss here is, I believe, how we treat identity and devices. For example, I can be a part of a group chat but I can have two or three devices. How is it gonna look like underneath? If we want to select K nodes from all participants of the group chat, we should probably select K different identities. There are definitely more problems related to this.


#4

how data sync works when receiver is offline?
e.g MESSAGE was sent, and sender waits for an ACK. does it expect to receive an ACK from receiver’s mail server in such case?