25 March, 2019
Plan of action: go through specific applications and describe them along relevant dimensions. Timeboxed. At the end, do brief comparison. E.g. something like: Briar, Matrix, Scuttlebutt, Tox, Swarm.
Desired artifacts:
- Rough descriptions of similar solutions
- Rough comparison of above
- Better formulated research questions
Briar - BSP
Briar synchronizes data in a group context among a subset of devices called peers.
-
Since you only connect to devices you did key exchange with, it is a friend-to-friend network
-
Each message is immutable, and (depending on the client) include message dependencies. This means we build up a DAG of history, and it thus provides casual consistency
-
It’s not an unstructured network nor a super-peer network. Since you have direct connections with each other you might classify it as a direct p2p network, or possibly a structured p2p network, since A talking to B means A knows B has its own log of data. It is not anything like a DHT though, so perhaps this notion makes less sense for f2f networks.
-
BSP requires a transport security protocol to communicate between devices.
-
Peer discovery: to sync up initially requires meeting up in person. To know how to reach a device Briar keeps track of transport properties. For short-distance it uses BT/LAN discovery beacons, and for long distance it uses Tor hidden services.
-
Look into Tor hidden services interface more, both in general and in Briar. E.g. see onion addresses.
-
Assumptions: A Briar node runs on a mobile device, but it runs in background. This means currently mainly Android is supported, and there are some battery considerations. Though it is less than in an unstructured network.
-
Briar and problems with battery consumption. Additionally, there’s some resources on Tor specific battery work that I can’t find right now.
-
Briar and problems with running on iOS due to background restrictions
-
In Briar asynchronous, optimistic replication is practiced. This means you append to your own log, and then try to sync up with other nodes in some group context. Since each message is immutable and hashed, conflicts are likely to be rare.
-
Conflict-resolution: Unlikely to be a problem due to hashes, but in the case it is it is presumed validation will take care of this in a straightforward manner.
-
Briar’s have weak network assumptions, and it can sync over pretty much any medium. Including USB sticks, BT, and Tor. It only requires the ability to securely transport something from device A to B.
-
Replication data: For individual objects, it is single-master and ‘static data’ since only a device writing a message can write it, and it doesn’t change after that. No other node can change a message. This is the minimum object being replicated and is essentially a simple file. However, if we look at the whole thing being replicated in a group context, it is really a DAG. This DAG can be updated by all the members, and since it is a DAG it is a form of CRDT. This means it might be more apt to classify it as a dynamic data; multi-master setup.
-
Exactly how resolution works in case of group membership conflicts needs to be studied more carefully. The semantics are fairly coarse, but well specified through a finite state machine. E.g. see [private group sharing client[(https://code.briarproject.org/briar/briar/wikis/Private-Group-Sharing-Client).
-
Since all devices participating in a group sync context are interested in messages of that group, it is a form of passive replication. No other helper nodes are active by default in Briar, e.g. there’s no active replication involved. If there were, these nodes would have to choose between seeing the encrypted messages or see the shape of the DAG, since the DAG is encoded inside a message through message dependencies. This is different from, say, a linear log where the contents could be encrypted but the sequence number would be transparent.
-
There are some plans to have a “relayer” node that actively replicates for you, to get better latency guarantees for offline inboxing.
-
Full vs partial replication: within a group, a set of peer for a single node is a subset of the devices within that group. Messages are thus partially replicated by default, but it is likely it’ll settle on full replication as an individual node is interested in all the messages of a group context. But the contract is a partial copy of the graph, so it is a form of partial replication.
-
Pull/push call: there are different synchronization modes, i.e. interactive and batch mode. These are coordination-less and up to the client. I.e. either you can request specific messages or offer messages to someone else. This can either be in on go (send all messages) or more efficient by offering message ids, then waiting for a specific request. There doesn’t appear to be any specific “request all most recent messages” API call.
-
To ensure casual consistency, a message is only delivered once all its message dependencies have been delivered. This ensures Monotonic Writes (MW).
-
Additionally: it is possible to delete/unshare messages (while retaining some information to fill in the partial graph).
-
TBD: How exactly it ensures WFR, MR, RYW, MW (above) session properties.
Matrix
Matrix can be looked at in a few different ways. They provide a suite of protocols, and by default an end user uses their client server API and then each server uses the server to server API for data syncronization.
-
This is a form of federated or super peer architecture, since a Homeserver has special capabilities. This means individual endpoints don’t need high availability.
-
It is also possible to do pure P2P by hosting your own homeserver. We won’t look into this here, as it isn’t currently widely practiced and has its own challenges.
-
Federation: Homeservers can send multiple types of messages, not all which needs to be sync. This is a form of framing. They distinguish between Persisted Data units (PDU), ephemeral (EDU) and queries. PDUs record history of message and state of room. EDUs don’t need to be replied to, necessarily. Queries are simple req/resp to get snapshot of state.
-
Like Briar, Matrix uses DAGs in PDUs. Their equivalent of a group context is a room.
-
Additionally, Matrix separates prev_event from auth_events. Auth events are special events that contain previous room state etc. In Briar, this separation is a bit more general and the semantics are encoded in a state machine for each specific data sync client.
-
PDU validation: checks if events are valid, signatures, hash and various auth rules match.
-
Soft failure / ban evasion prevention: in order to prevent user from evading bans by attaching to an older part of DAG. These events may be valid, but a federation homeserver checks if such an event passes the current state auth checks. If it does, homeserver doesn’t propagate it. A similar construct does not appear in Briar, since there’s no such federation contruct and there’s no global notion of the current state.
-
EDUs are used for things such as presence notifications.
-
There’s a room state that contains e.g. room name, and they have techniques for room state resolution, e.g. a form of conflict resolution. This may look different depending on which room version is used (client id for data sync client in Briar).
-
Once a homeserver joins a room it gets new events by other homeservers in that room. Users might request previous history before homeserver was part of this room. Homeservers can get previous history from each other with the /backfill API or /get_missing_events.
-
The above implies homeservers practice partial replication, i.e. a node doesn’t need to have all state to be functioning.
-
Since Matrix is a form of federated/super-peer architecture, we can also look at it as a client server architecture. A client authenticates with a server in some fashion.
-
Client discovery of homeservers happens out-of-band, for example by a known URL.
-
TBD: How does failure work if a user’s homeserver fails? Can you seamlessly be connected to several homeservers?
-
TBD: Homeserver discovery? It appears that you need to know a hostname for homeservers to connect. It is not clear to me if this can happen in-band, e.g. through some form of propagation.
-
Conversation history is ‘linearized’ eventually (casual) consistent event graph into an event stream for the end user.
-
There are two types of room events: state events and message events. State events describe metadata and overwrite each other. Message events are transient one-off events. Applications can also add their own type of events.
-
To me it seems like this is mostly putting data sync client of Briar into the data sync layer itself. E.g. message/state events are different event types with different semantics. The main difference is the soft ban / intermediation that servers can do to check current auth state. It isn’t clear what this would look like if each server is a homesever, since this enforcement is up to each node. This makes moderation and group membership an interesting topic, and the state can possibly look different for different people. But assuming there’s a well-encoded rule set and clients are well-behaving, this should be fine.
-
To sync, a client simple calls GET /sync. This returns most recent message events and state for each room, it is thus a pull API. This also gives you a pointer to know how to only get most recent states, as well as how to go further back in the event stream. I.e. a prev_batch and next_batch field.
-
To get new events, Matrix uses HTTP long polling. There’s thus a persistant connection with a homeserver, that is assumed to have high uptime.
-
TBD: How is the partial graph compacted into an event stream?
-
For clients to send events, it sends a PUT request to its homeserver. This means event is acknowledged once homeserver, and homeserver assumes responsibility to deliver it. This is true both for state events and message events.
-
TBD: If a homeserver goes down, is it possible to do message resends to a different homeserver?
-
TBD: Why exactly do they make sure a big difference between? Is it simply because of the soft failure / ban evasion case?
-
They also support redaction of events, which strips off a lot of common keys.
-
For room creation in client-server API they use POST requests.
-
TBD: In case of redactions/deletes, how does the hashing change not change? Doesn’t it break the immutability invariant? Same question for Briar.
-
Homeservers can query each other’s public rooms through a public room directory, which is a form of discovery mechanism.
-
Client server API end to end encryption is optional. Key exchange happens out of band, and its public key is uploaded to a homeserver.
-
Multi device: A user has a user id which can be queried through homeserver, and this contains list of all device identity keys.
-
For E2EE it is only the content of the message that is encrypted from what I can tell, not event types or who it is sent to. This means the DAG is transparent to the homeserver, even though the home server isn’t an end-user endpoint.
-
TBD / out of scope: They use the Olm crypto ratched for E2EE. It isn’t clear to me how it differs from Double Ratchet.
-
TBD / out of scope: For group chat they use Megolm variant, which is more scalable for group chat. It isn’t clear to me exactly how it is more scalable in terms of e.g. algorithmic complexity.
-
It is multi-master since many people can write to a room. Depending on if it is a message event or room state change different conflict resolution might be required.
-
From a client’s point of view to a homeserver it is synchronous (PUT request acts as a confirmation). Between home servers it is optimistic asynchronous.
-
Like Briar, it provides casual consistency through a DAG.
-
TBD: Conflict resolution algorthim for room state exact semantics.
-
TBD: Treating each user as running their own homeserver, how it change things. Probably requires looking into more detailed Github issues / planning docs / code, as right now the details are a bit hazy.
-
Note: Clients can send messages to a room without knowing who is participating in it, this is a form of pubsub.
-
Partial replication, since a homeserver doesn’t need all events to start being used.
-
Server discovery: Ad hoc, e.g. see https://matrix.to/#/#matrix-spec:matrix.org
-
TBD look into relevnat existing WIP proposals here: https://matrix.org/docs/spec/proposals
-
Example: Websockets as alternative transport: https://github.com/matrix-org/matrix-doc/blob/master/drafts/websockets.rst
-
Transactions: homeservers sync PDUs/EDUcs in transaction of some limited size (~50/100). These are synced in a HTTP PUT, which gives ACK semantics (200 OK). Additionally, errors can be returned with the 200 such as not allowed to send messages to room, wrong version, etc.
-
Room Version 3 is the latest spec for rooms. In Briar this would be a specific data sync client. It specifies things like room state resolution (actually in room v2 spec). It uses things like revers topological power ordering.
-
List of super peers aka home servers: https://www.hello-matrix.net/public_servers.php
Secure Scuttlebutt
SSB.
Decentralized secure gossip platform.
-
Discovery: Locally it broadcasts UDP beacons for discovery.
-
Have a concept of pubs, ssb peer available publicly and you can be invited to it. This acts as a form of super-peer (?).
-
Each user has a feed which is a list of all messages posted by an identity. This is an append-only log. Each message has a pointer to previous one. This points to them using a form of casual consistency.
-
Each message has a link to previous message with a hash, author which is a public key of where it should appear, and sequence number for position in feed. Also timestamp, hash, content. Field are in a specific order.
-
Message ID is a hash of messages with their signature.
-
A SSB client maintains a set of feeds they are interested in.
-
When peers connect, they ask each other if the feeds they care about have any new messages.
-
There’s of blobs for attachments etc. They can be linked from meesages. Peers signal with ‘want’ and ‘have’ (request and offer in Briar).
-
A feed can follow another feed, this means they are interested in messages from that feed. To follow, you post a special message on your own feed. This message is signed with author, and content includes identity of contact want to follow.
-
Each feed announces which feed they are following publicly. This means clients can arrange feeds into a graph of who follows who.
-
By doing this, we have several layers of a social graph: a user’s own feed, the feeds it explicitly follows (visible in UI), 2 hops away client fetches and stores messages (but doesn’t display), and 3 hops clients see these feeds mentions but doesn’t store. Layer 0: write, layer 1: read, layer 2: store/fetch; layer 3: aware.
-
Clients can choose how they want to display these messages.
-
Pub is a publicly accessible SSB node, it serves social and technical purpose: socially as a gathering point for new users. Technically it has a stable IP and allows incoming TCP connections. Joining a pub means you follow it and it follows you back.
-
A lot of emphasis is on content discovery, which happens automatically when you join a pub since you are two hops away from other users of that pub. After following a pub a user discovered new feeds and don’t need to follow pub for feed visibility. Though pub helps with replication and accepting incoming TCP.
-
Any user can follow a pub. But to get pub to follow you back, you need invite code. Invite codes can work in different way (e.g. single use, etc).
-
Similar to Briar, SSB is offline by default, and deals with short-distance discovery well through lan beacons. Similar to Bittorrent local peer discovery BEP 14 or SSDP based on UDP.
-
It is a form of private p2p, but group based, as you can see more than one hop.
-
TBD: What happens if a pub doesn’t follow you back?
-
TBD: Can this be a setting? I.e. access control up to each node how visible they want to be.
-
Private messages are encrypted and posted to own feed like normal messages but e2ee.
-
Note: I like their protocol guide that link to specific parts of various implementations.
-
TBD: Privacy preservation seems unclear. Threat model? Options?
-
TBD: How does replicaiton and gossiping work in more detail?
-
Peer connections: once SSB peer discovered IP/port/pubkey of a peer that can connect via TCP it asks for updates and exchanges messages.
-
Handshake and connection based to create encrypted channel. JSON RPC protocol. E.g. createHistoryStream which asks a peer for list of messages for specific feed.
-
Scuttlebot behaves just like a Kappa Architecture DB. In the background, it syncs with known peers. Peers do not have to be trusted, and can share logs and files on behalf of other peers, as each log is an unforgeable append-only message feed. This means Scuttlebots comprise a global gossip-protocol mesh without any host dependencies.
-
A key difference with other p2p architectures is that it doesn’t use singletons such as a DHT. Instead it operates on a human user network which is a form of group-based private p2p. Also see [https://ssbc.github.io/docs/articles/design-challenge-sybil-attack.html](SSB on sybil attack).
-
A pub is just a normal helper client, and anyone can run it. Thus it doesn’t act so much as a super peer, but it helps with active replication for other peers, if you have a static IP.
-
All data is eventually consistent, or casually consistant within one log (total order), globally there’s a partial order, similar to Briar and Matrix.
-
TBD. “There’s a proposal to used signed pings to measure the “freshness” of a feed, but this could only be used in small groups of interested peers.” Not clear where this proposal lives.
-
TBD. Documentation for SSB replication API - https://ssbc.github.io/docs/ dead link. https://scuttlebot.io/apis/scuttlebot/replicate.html not a lot. Can ask in their chat. Also see https://github.com/ssbc/ssb-db/issues/148
-
Privacy-preservation: https://ssbc.github.io/docs/ssb/end-to-end-encryption.html “Private-box” proposal (?). For hidden recipents.
-
It appears to be partially replicated since you can choose how much of the partial graph you want to sync.
-
Unlike Briar and Matrix, there’s no strict notion of a room with specific semantics. Instead everything is your log and other logs you sync, with fluid discovery/gossiping of interests.
-
It is optimistic async, since you sync your own log locally.
-
It is single-master since you only write to your own growing log, and sync other people’s logs separately.
-
It is either a form of direct or structured network since data locality is in a specific peer, or possibly two hops away.
-
TBD: Exactly how policy for sharing is decided. I.e. why do my messsages end up at pub, and how does this get to another pub and then to B recipient? Gossip based but more details would be useful on this type of active replication.
-
The log/feed structure is not encrypted, instead only content is. The main idea here is that it’s a private network and you trust your friends (and your friend’s friends?).
Misc:
- Added minor to people, also a bunch of papers in Zotero (see https://www.zotero.org/groups/2306124/secure_messaging/items for latest).
- Created https://github.com/status-im/specs/ to capture current spec and future requests
Next
-
Continue with other real-world applications, such as Tox, Swarm, IPFS, (Whisper, Status), (Bittorrent, Git, Tribler).
-
Create table of comparison of these.
Collecting descriptions per stack first instead, since (a) there’s no usually very little agreement on what terms to use (b) these tools are generally too new to have been studied in any academic surveys.
Also look into more:
-
Local peer discovery how generalize and work for us, what’s needed (RFP). This relate to transport properties component, and would be amazing for conferences.
-
Bittorrent hidden services.
-
Spend more dedicated time looking into protocol upgradability
-
/specs repo outline and rough TODOs to get this work started