Core sync follow-up - Accurate message status


#1

Context: Follow-up discussion Core weekly sync about communication of message sent/receive status in UI
Objective: Understand UI requirements to accurately inform people of message send/receive status.

@igor @cammellos @dmitryn @andmironov @anna @pedro @roman @sergii @jakubgs @adam
Please correct misunderstandings on the different message states and alternative proposals.

We want to make sure people sending a message using Status can rely on that message arriving at its intended recipient; Intended recipient can be anyone in a public, group or 1:1 chat. Next to continuous work on message reliability, we want to accurately inform people of the status of messages sent and received. There are situations where somewhere in between tapping [send] and the recipient [opening a chat], something goes wrong. Below is a list of use cases that may lead to messages not being sent or received. All are by design, a consequence of using Whisper and a network of nodes acting as mailservers in conjunction with efficient use of bandwidth.

Use case 1a

  • Connection is dropped for a few seconds or minutes after tapping Send in a 1:1 chat / Public chat / Group chat. The client is still trying to connect while the user has the app open.

Result

  • Message does not get ‘sent’ caption
  • Client will attempt sending again when connection is back

Proposal: Add caption ‘Sending’, ‘Pending’ or other indicator to show that client is still attempting to send.

NOTE: This was part of the original proposal, but back in April of last year was left for future iteration.

Use case 1b

  • Message is sent when client reconnects
    Result
  • Message gets ‘Sent’ caption
  • Recipient sees message appear at bottom of thread. Given the timeframe it is likely, not guaranteed, that the message still makes sense in context.

Use case 2a

  • Connection is dropped for a significant amount of time (?) after tapping Send in a 1:1 chat / Public chat / Group chat. Status is no longer trying to connect.
    Result
  • We cannot guarantee that the message has been sent. It may have been sent, but the client has not received a confirmation. It may not have been sent at all.
  • Message gets error note stating [Not sent, tap for options]. This is not accurate. The message might be sent, we simply don’t know if it has been sent or received.

Proposal: Send a ‘soft confirmation’ (@cammellos can you expand on this?)
Proposal: Update error note to indicate that it’s not an error, but rather there is no guarantee that message was sent, to all users. Give user control to send again.

Use case 2b

  • Message is sent when user taps ‘Tap to send again’.
    Result
  • Message is send again; recipient client removes duplicate and shows the message only once
  • Recipient sees message appear at bottom of thread
  • Message on sender client gets ‘sent’ caption

Proposal: Add caption ‘delayed’ or similar to message both on recipient and sender side to indicate that the message may appear out of context.

Related issues




#2

In terms of soft confirmation:

Currently in develop builds we wait for confirmation from a mailserver.
This provides good guarantees, but it relies on the fact that you are connected to a mailserver, which might not always be the case.

Effectively it means that if you are not connected to a mailserver, “Message not confirmed. Tap for more options” will be shown for all the messages sent while not connected, although the message is likely sent to the chat.

In release builds we have a less strong and reliable way to confirm messages (we only check it could be written to the network socket), which provides little guarantees.

It could be possible to combine the two methods, and have 2 indicators, so that if not connected to a mailserver you see one tick for example (message has been dispatched in the network, but not sure if anynone has/will receive(d) it, message has been confirmed by a mailserver)

In any case I would say that we should retry messages for say at least a minute if online & not confirmed? That sounds non-controversial to me.


#3

Agreed, from what I gather that means we differentiate 5 states for the sender (cc @maciej):

  1. Pending/Sending

  2. Message sent, but not confirmed yet > when dispatched > 1 tick

  3. Message sent, but not confirmed for over 1 minute > when dispatched, but no confirmation from mailserver > 1 tick + ‘Tap to try again’

  4. Message sent and confirmed > when dispatched and confirmed by mailserver > 2 ticks

  5. Message sent and confirmed, but delayed (sender and receiver side) > when dispatched and confirmed after x (5?) minutes


#4

To get actual end to end guarantees for reliability, we probably need something like data sync layer. Once we that, we’d know when the receive actually have the message. Until we have any form of end-to-end acknowledgements (intermediary mailserver doesn’t count in a p2p context), I’d be careful about being over confident with our feedback, as this will lead to lack of trust when things fail. Which they inevitably will.


#5

We don’t really need a data sync layer for e2e reliability, simply an ack between recipient and sender will just do and can be implemented already (I believe we had it before), not that a data sync layer would not be useful.
Currently is possibly too expensive as we are already struggling with bandwidth consumption, but we might revisit once we use the partitioned topic.

Also even we a data sync layer, we still face the same problem, i.e “what kind of guarantees we can make about the delivery of a message if the recipient is offline?” (which is what we are solving here, and as we don’t have e2e acks, we assume the recipient is always offline)


#6

I would consider acks with retransmission for multiple participants and sending more than one messages as ‘something like data sync layer’. I agree that’s a useful information bit, but want to make sure we don’t rely too much on it since it isn’t end to end. Hence two ticks, and having mailserver failure as part of the design model.


#7

Yes, I agree, currently with two ticks it would be:

  1. Write to socket
  2. Mailserver delivery

if add more guarantees (i.e e2e), this can be easilly changed to

  1. Mailserver delivery
  2. Recipient delivery

all of which is transparent to the user (the actions taken might not be identical of course)


#8

I think it’d prefer to see two ticks being reserved for end to end guarantees, otherwise we are going to run into issues and people will be pissed and lose trust in the reliability of Status. Let’s call it for what it is and communicate expectations properly.

Write to socket should be error, first tick can be mailserver delivery. Second tick end to end.


#9

I think you are advocating for implementing acks first and postpone this discussion until we have them implemented? (This is another discussion to be had, we haven’t talked about re-introducing acks yet)

Also socket write is not an error, just a necessary condition for the message to be delivered, and we are showing the message as not sent if that errors (as we know that it did not leave the node).


#10

I’m advocating for not using a UI element (second tick) unless we back it up with end to end reliability guarantees. This is on UX grounds, as otherwise the correspondence to reality in p2p networks is poor.


#11

ok,
then we are back to the drawing board:
currently we have 1 tick (socket write) in release.
in dev mode we have 1 tick (mailserver delivery), but gives issue when not connected to mailserver (in my opinion we can’t go live with it)

2 ticks solution is only acceptable if adding e2e delivery,
any other option so we can still go live with mailserver confirmation? (other than adding e2e delivery, which will have to be implemented)


#12

So there should be one more state it seems so.

Our real states are:
(1) message didn’t left the device
(2) message left the device to some node
(3) message was written to a mailserver
(4) only for 1:1/group chats message was received by the recipient (and that is disabled in the app now).

To be hones with our user, we need to account to all the states, and errors can happen at any level.