Minimal Viable Data Sync Research Log

protocol

#1

Inspired by Pedro’s daily updates for the road to Nix and a bunch of others, I too have decided to start a log (might be a little more random than daily). Thanks to @oskarth’s work on the data sync layer we can piece together various useful components to start building a minimal viable data sync layer (MVDS) which should already provide noticeable enhancements to status clients.

We already have experiments based off of BSP which we can use as a foundation for a more production ready data sync layer.

Requirements

There are certain requirements we use as guidlines for the current design of MVDS:

  • SHOULD support messages that don’t need replication (framing, flag)
  • MUST be possible to make mobile-friendly, this MAY be done with helper services such as IPFS/Swarm
  • MUST work reliably without helper services, though possibly with higher latency
  • MUST be resiliant to individual nodes failing/having low uptime
  • MUST provide immutable/stable message ids
  • SHOULD be able to provide casual consistency
  • MUST be able to sync data reliably between devices, by which we mean MUST be able to deal with out of order, dropped, duplicate and delayed messages
  • SHOULD be designed to be agnostic to whatever transport is running underneath
  • SHOULD allow for privacy preserving features such as exploding messages

Additonally we have certain requirements which should remain in the back of our heads while desiging the protocol but are not required for this minimal version, such as:

  • Economic Incentives
  • Mobile-Friendliness

What’s next?

The goals of MVDS right now are to create a specification that compiles prior research done on the subject by @oskarth as well as new important aspects while also creating a proof of concept level implementation for core to use.


(Mostly) Data Sync Research Log
#2

Does this mean that the sync could be computationally difficult for a mobile phone? If so, how does it work in terms of machine dependence, does a mobile just subscribe to a more competent node and trust it?

I’m trying to see why this is a SHOULD and not a MUST.


#3

You’re right, it definitely should be a MUST.


#4

22 April

Today I spent most of my time thinking about a proposed change to Status which would create paritioned whisper topics. This allows for a drastic reduction in bandwidth, however the proposed upgrade path creates an incompatability between old and new clients which is something we do not want. There are also incompatabilities when simply changing the number of partitions.

A solution to these compatbility breaks is by negotiating partitions. When a client has a specific topic that it would like to use, it sends a request message to the peer specifying the desired channel to use. The peer responds with an OK message and then continues communication within that topic. In order to have full downward compatibility, we acknowledge any other message that is not an OK as a refusal to join this topic and continue conversation on the default topic. This means that old clients that do not understand this message would simply ignore it. This also works when upgrading the amount of partitions.

A successful channel negotiation would look as follows:

MESSAGE ON CHANNEL ABC (A -> B): USE TOPIC XYZ
MESSAGE ON CHANNEL ABC (B -> A): OK
MESSAGE ON CHANNEL XYZ (A -> B): HELLO
MESSAGE ON CHANNEL XYZ (B -> A): HELLO

An unsuccessful negotiation with an older client would look as follows:

MESSAGE ON CHANNEL ABC (A -> B): USE TOPIC XYZ
MESSAGE ON CHANNEL ABC (B -> A): HELLO
MESSAGE ON CHANNEL ABC (A -> B): HELLO

For ease of this specification we assume that negotiation happens on client upgrades. Here is a pseudo code description:

// var topic // chats current topic

const N = 5000
var pubkey []byte // current account's pubkey
topicString := fmt.Sprintf("%d-discovery", pubkey[:16] mod N)
topicHash = keccak256(topicString)
newTopic = topicHash[:4] // whisper topic is 4 bytes

if newTopic != topic {
    negotiate(newTopic)
}


#5

At some point we will have to evaluate how all the bandwidth saving and reliability increasing measures are impacting darkness and anonymity and if it is still worth using whisper over pss after all


#6

Thanks for the description,
we discussed a few approaches earlier, and considered this one as well.

The main reason we decided not to go forward with this approach is that currently multiple devices are not explicitly managed (i.e Peer A does not know how many devices peer B has).
To handle multiple devices for 1-to-1 messages (in group chats are explicitly managed as encryption is device to device), we just piggy back on whisper that has support for multi-cast messages by public key. This means that if I add a device (or recover my account), I will start receiving messages straight away, but if any topic negotiation has happened, the new device (or the device running an older version), will stop receiving messages.

This might result in adding noise with bug reports as these behaviors are difficult to explain to a user, which might result in decreased lack of confidence in the app.

For example, say B has 2 devices, one older version of desktop B1 and a newer mobile with protocol negotiation B2

MESSAGE ON CHANNEL ABC (A -> B1): USE TOPIC XYZ
MESSAGE ON CHANNEL ABC (B1 -> A): OK
MESSAGE ON CHANNEL XYZ (A -> B1): HELLO
MESSAGE ON CHANNEL XYZ (B1 -> A): HELLO 

Here B2 will not receive any message.

Another problematic scenario:

MESSAGE ON CHANNEL ABC (A -> B1): USE TOPIC XYZ
MESSAGE ON CHANNEL ABC (B1 -> A): OK
MESSAGE ON CHANNEL XYZ (A -> B1): HELLO
B1 REINSTALL APPLICATION
MESSAGE ON CHANNEL XYZ (B1 -> A): HELLO 

The last message will not be received anymore by B1, and we’d need a way to reset the protocol, but B1 won’t be notified as won’t be listening to the topic.

Some of these problems have been solved by using device-to-device encryption, where managing this information is necessary, in such case protocol negotiation can be done as devices are explicitly managed between peers (so it is up to the user owning B1 B2 to pair their devices, at which point we take care of communicate this to other peers).

As a side note as well, generally you would piggyback protocol negotiation on the actual messages, given that mobiles are often offline, so that you can start communicating from the first message, something like:

MESSAGE ON CHANNEL ABC (A -> B1): USE TOPIC XYZ, HELLO
MESSAGE ON CHANNEL ABC (B1 -> A): OK // This one can go already on XYZ
MESSAGE ON CHANNEL XYZ (A -> B1): HELLO 2 // This one can go on topic only if the previous one was received
MESSAGE ON CHANNEL XYZ (B1 -> A): OK, HELLO 2 // OK is only included if the previous messages was not received, or not sent

But that’s just some optimization.

Hope this is helpful.

@yenda some stepping stones (calculating the number of collisions / bandwith etc) has already been done, which might help in a more thorough analysis https://github.com/status-im/status-react/blob/d05ff8a1b86a50735aa0df74ced925d9b321c860/test/cljs/status_im/test/transport/partitioned_topic.cljs#L27


#7

9 May

The past roughly 3 weeks were spent on expanding the Minimal Viable Data Sync specification as well as writing a basic implementation in go. The basic specification can be found here, it is based on the Bramble Synchronization Protocol and the implementation of it written by @oskarth.

We have made an optimization to it by creating a Payload message that packs all the other message types into one. This allows us to send m messages to a user at every tick as one rather than sending n number of packets.

Something we need to figure out at later stages is how we handle larger group chats, this is because the way OFFERs and ACKs currently work is not ideal. Currently all clients store a message and offer this message to all their peers in a specific group until said peers have acked this message. This is not ideal as a peer only acks the message for the first peer that offers it and not all subsequent peers, meaning those will continue offering a message we have already acked. This is fine for small chats but becomes unideal for larger group chats, a solution I have proposed but not yet tested is to treat an ACK as a broadcasted message to everyone in the group, meaning that instead of sending an ACK to one peer, every peer views it and records it.

The go implementation can be found here, it was written as a rather basic & reusable implementation allowing us to build it into the status console client. Work of which has started and can be found in this pull request. The refactoring for this is rather large as it requires us to not only change the way how messages are sent, but also how we store them and generally handle them, for example IDs need to be handled completely differently as we can no longer use the ID whisper gives us.


#8

11 May

In anticipation of making MVDS suitable for the status console client, 2 changes were made which should make it more suitable.

The first change takes changed the way groups are handled in the code. Previously it was handled so each node could only handle one Group, this has been changed so multiple groups can be supported. Making it easier for actual usage in messaging application. The code can be found here.

The next change was rather simple but pretty important, it changes the PeerID from a byte array to an ECDSA Public Key, this makes it compatible with whisper. The code can be found here.

Both of these changes reduce the friction to integrating MVDS in the console client quite drastically, slowly transitioning MVDS from POC type code to an actual framework.

One thing that became apparent to me which is quite interesting and might be fun to experiment with, is that group chats and 1:1 chats are handled the same by the data sync layer, a 1:1 chat is simply a group chat that only contains one peer. It might be interesting to see if it would make sense and fully technically possible to allow for 1:1 chats to be able to transition into group chats.


#9

definitely should design for this


#10

Thanks for the update.

The next change was rather simple but pretty important, it changes the PeerID from a byte array to an ECDSA Public Key , this makes it compatible with whisper. The code can be found here .

Interesting, instinctively I would prefer to keep the id as a []byte, and leave to other layers the choice of what to encode (in our specific usage, we would just pass the bytes of compressed public key for example, but someone else re-using this might as well pass an uuid encoded), so it’s agnostic on the other layers with this regard.

Unless we are planning to actually use the public key as a public key (and not just an identifier as it is now), seems like []byte would be less committal and does not ties us to a specific id scheme, what do you think?

Also, just to understand, what are the differences with BSP for the MVDS? or should they basically be the same for the initial implementation?


#11

So basically right now it is identical to BSP except for how payloads look, this is like iteration one of MVDS however and more optimizations will come once we were able to test everything more extensively.


#12

22 May

The goal of MVDS, was to get it integrated into the status console client for end to end testing. With this pull request that becomes possible.

Due to the already modular architecture of the console client, not much needed to be changed, what I did was create a new data sync compatible protocol adapter that can be given to the messenger. Currently the implementation works using whisper as its underlying transport. Due to the fact that transport and data sync logic are separated however, changing the transport layer at later stages should not be a too complex task.

To test the data sync, simply run the console client with the -ds option. (Due to the SQLite change I can currently not run it, but will update once possible.)

Handshakes

With data sync, clients need to know their peers in order to send messages, this is easy for 1:1 chats but less so once we get to group chats, the reason being that we currently do not have any form of handshake indicating that x joined a chat. This needs to be solved so clients can add peers whenever a new one joins the chat. A simple handshake is good enough for this, simply indicating that I have joined a chat.

Downward Compatibility

One of the problems we need to resolve is downward compatibility, how do we still talk to those who are running clients that do not use data sync. An ugly solution that may work in the beginning is simply to send messages twice, once using the data sync protocol once without. This of course is not clean, but will work in beginning stages. Protocol negotiation will probably be able to help here too.

Epoch

In the Minimal Viable Spec we defined send_time for retransmission. This pull request, changes the naming to epoch as this makes more sense due to the fact that our code currently uses epochs and will most likely continue to do so.


#13

27 May

End to End Tests

One of the biggest milestones we achieved this week was managing to get an end to end test running successfully between @oskarth and I using the status console client. Currently we know that OFFERs and REQUEST messages are passed correctly between 2 nodes. We never received a message after the requests, both Oskar and I believe that this is due to the status-client rather than data sync as we have a simulation which works. There are multiple issues with the client that require resolving, including further investigation into why these messages aren’t received, I believe they are being sent but our logs were not verbose enough in the last tests, something which has been changed.

Refactoring

MVDS has gone through a lot of refactoring in order to reduce the likeliness of race conditions. Thanks to the help of @igor certain issues were highlighted and resolved. These changes include:

This changes are pretty temporary, considering a larger pull request is currently open which changes the entire payloads model, which can be found here. The main goals of this pull request is to take creation of payloads to an event that occurs every second, to something that happens the instant we receive messages. This means we no longer need to track an entire sync state and have fewer chances of a race condition occurring.

Changes to spec

One thing found in the BSP spec which we decided to change was that MESSAGEs are only sent after the specified send_time in addition to OFFERs, we adapted this send_time variable to only be relevant for OFFERs rather than MESSAGEs due to the fact that when we consider nodes are mostly offline we should get a MESSAGE to a specific node as soon as it requests it.

A later change to send_time or in our case send_epoch will be to reduce it again after a certain epoch was hit, this will ensure that if I have not been online for 2 days or so, I don’t need to wait another 2 days for messages.


#14

5 June

Node Simulation

To enable more dynamic testing of MVDS, the node simulation was enhanced by adding multiple ways for it to be configured. This includes:

  • communicating: amount of nodes sending messages
  • interactive: amount of nodes to use INTERACTIVE mode, the rest will be BATCH
  • interval: seconds between messages
  • node: amount of nodes
  • offline: percentage of time a node is offline
  • sharing: amount of nodes each node shares with

Request Retransmission

Something previously considered was adding retransmission to REQUEST message types. There is now a pull request open for that which will retransmission REQUESTs the same as OFFER messages are retransmitted. It can be found here.

Batch Mode

BSP documents multiple modes on how messages should be sent, these are BATCH and INTERACTIVE, up until now we have only supported INTERACTIVE mode, which is a 3 step process for a message to be received. We now also have BATCH mode which is a 1 step process as there are no longer OFFER messages sent, the messages are just immediately sent to the peer.

Peer Inefficiency

The way we had previously sent messages to peers was a little inefficient, we have changed this. Now only the peers in a specific group are iterated to see if we should send a message or not.

E2E Simulation

Thanks to a bug fix we were finally able to actually test MVDS end-to-end using the status console client. This means we can now jump to the next phase which is running better tests and actually investigating how good MVDS is.


#15

24 June

Simplified Protocol Buffers

To simplify protocol buffers, all the sub messages from the Payload type have been removed. Acks, Offers and Requests are now all simple arrays rather than messages that contain arrays.

Renaming Messages -> Records

We had the usage of the term Message in a lot of the sections of our spec. We decided to reclassify the types Message, Ack, Offer and Request to record to be able to differentiate better between whisper messages and what is being sent in MVDS.

Sharing Simplification

We used to differentiate between which peers we are aware of and which ones we share with. To simplify MVDS this was summarised into one, we now share with every peer that we know automatically.

Message Retransmission

There was a minor flaw in BATCH mode, which occurred when the peer was offline, this was due to the fact that Message records were never retransmitted. So they would never arrive, this has been changed so now these records are also retransmitted.

Specification

The specification is going through its final changes, we have added new graphs to make it more explanatory and changed some of the language in it thanks to the people who helped review it.