Core OKR Q2 scoring

NOTE: This topic is continuously updated as new information comes to light

Context

At the offsite In Bangkok we collectively decided on a set of a OKRs and their priorities. This can be found here: https://docs.google.com/spreadsheets/d/1BhWKyjkpxhavkqtk9VYB3EHNDIzQtUlkwlg7_lQNAws/edit#gid=0

As we are almost at the end of the quarter, this is a good time to do a quick overview of our OKRs and see where we are at so we can improve, both in execution and planning.

(Mid-term discussion: https://docs.google.com/document/d/1ujwOCoOw35CjPGQT59oy18oWxk2iExv76USEmLdZKTw/edit#heading=h.pij0lz9nnvrf)

How are OKRs scored?

On a scale from 0 to 1, where 0 means no progress and 1 means completed. Since these are stretch goals, a score of around 0.7 is the sweet spot.

An additional dimension that is useful to capture is one of confidence. I.e. how likely are we to achieve the goal? From 0 to 100%.

For us, each swarm/team/individual with most context can simply add a suggestion for score with a rationale. We then use rough consensus to get a preliminary score.

Individual KR scores are then aggregated at the objective and then whole team level.

OKRs

Score: 0.50
Confidence: 70%
Comment: Not weighted by priority and not final; P0 and P1 ~0.61 – Oskar

Raw

O0 0.6 / 80%
O1 0.63 / 90%
O2 0.47 / 70%
O3 0.1 / 50%
O4 0.7 / 70%
O5 0.5 / 60%

Messaging is reliable

Note: Pinged Chad and Pedro, likely have most context

Score: 0.6
Confidence: 80%
Comment: (See below)

Send/delivered ratio >99%
Score: 0.7
Confidence: 70%
Comment: Current metrics point to around 98% – Pedro

More than 95% of 20+ people surveyed trust Status for messaging
Score: 0.6
Confidence: 80%
Comment: From June 5th survey, ~75% of 30 users found sending/receiving messages moderately to very reliable. There is bound to be a lag between quality being delivered (which has been increasing) and user opinion in polls. – Pedro

0 message reliability Instabug reports
Score: 0.5
Confidence: 90%
Comment: We’ve had 2-3 Instabug reports relative to chat functionality – Pedro

Beta is launched successfully

Score: 0.63
Confidence: 90%
Comment: (See below)

Note: Pinged Chad and Rachel for product metrics and Adam and Jakub re cluster metrics.

5k daily active users
Score: .5
Confidence: 90%
Comment: Our peak DAU is < 400 users. – Rachel. I’d use a geometric mean for this (10->100 ~ 100->1000) so with that heuristic we are more than 50% there IMO (~10->300 ~ 300->500) – Oskar

More than 80% of users retained 7 days after recovering an account
Score: 0
Confidence: 90%
Comment: ~15% recurring users retained after day 7. Nearly 0% of first-time users retained. – Rachel

Cluster can handle 500 concurrent users
Score: 0.9
Confidence: 95%
Comment: It should already be able to do that, but some stress tests are in order. As before, scaling vertically and horizontally with the current setup is easy. – Jakub

More than 20% of users send a transaction
Score: 0.5
Confidence: 95%
Comment: Last 30 days DTU/DAU x 100 = ~10%

More than 20% open at least 1 Dapp
Score: 1
Confidence: 100%
Comment: Last 30 days DDU/DAU x 100 = 42%

More than 99% cluster ~uptime~ availability
Score: 0.9
Confidence: 90%
Comment: I already commented before on using the word “uptime” which is confusing in this context. Uptime is availability of a specific server, what you mean here is cluster “availability”. And with a 2 DC setup and multiple host of each type our availability prospects are good. Though a 3rd DC would recommended. – Jakub

SNT is a powerful utility in Status

Score: 0.47
Confidence: 70%
Comment: (See below)

2x launched SNT use cases
Score: 0
Confidence: 100%
Comment: 0 launched SNT use cases – Rachel

2x demo’s/proof of concepts using SNT
Score: 0.4
Confidence: 50%
Comment: ENS registration will be on testnet soon. – Rachel

2x Fleshed out description of the utility
Score: 1
Confidence: 100%
Comment: Tribute to Talk, paid mail nodes, usernames and voting DApp all have thorough write-ups. – Rachel

Status is used everyday internally

Note: Pinged Chad

Score: 0.1
Confidence: 50%
Comment: (See below)

80% of core contributors use Status (mobile or desktop) every workday
Score: 0.1
Confidence: 30%
Comment: Not much outside of testing – Chad

10% more usage of Status Desktop than Slack
Score: 0
Confidence: 70%
Comment: Essentially zero right now – Oskar

Performance significantly improves

Note: Pinged Igor

Score: 0.7
Confidence: 70%
Comment: (See below)

Reduce data consumption to <10Mb/day
Score: 0.7
Confidence: 50%
Comment: Need to re-check this one – Igor.

Reduce power consumption to <120% of Telegram/Skype
Score: 0.7
Confidence: 90%
Comment: We are at 2x worse than the goal now, starting at >~600%. – Igor

UI interaction time <100ms
Score: 0.8
Confidence: 60%
Comment: Most of the common issues fixed, some scenarios aren’t great and we have a room for improvements, but the UI is for sure much more responsive. – Igor

Implement continuous delivery

Note: Pinged Jakub, Anton, Igor

Score: 0.5
Confidence: 60%
Comment: (See below)

100% of iOS and Android releases are automated
Score: 0.5
Confidence: 30%
Comment: XCode is a nightmare. I have very little confidence in this. (0) – Jakub. Basic Jenkins jobs to release with Fastlane, changelog manual (0.7) – Chad

More than 80% automated test coverage
Score: 1
Confidence: 100%
Comment: We’ve used 80% of’ Functional tests for nightly build` suite is covered in Testrail – Anton

Get nightly to two sigma reliability
Score: 0
Confidence: 30%
Comment: Replacing artifactory might have a good effect on this, but the rest is up to devs and their merging policy. – Jakub. Testing last 60 builds randomly shows 30% success – Oskar

TODOs

  • [x] Messaging is reliable
    – [x] Send/delivered ratio >99%
    – [x] More than 95% of 20+ people surveyed trust Status for messaging
    – [x] 0 message reliability Instabug reports

  • [x] Beta is launched successfully
    – [x] 5k daily active users
    – [x] More than 80% of users retained 7 days after recovering an account
    – [x] Cluster can handle 500 concurrent users
    – [x] More than 20% of users send a transaction
    – [x] More than 20% open at least 1 Dapp
    – [x] More than 99% cluster uptime

  • [x] SNT is a powerful utility in Status
    – [x] 2x launched SNT use cases
    – [x] 2x demo’s/proof of concepts using SNT
    – [x] 2x Fleshed out description of the utility

  • [x] Status is used everyday internally
    – [x] 80% of core contributors use Status (mobile or desktop) every workday
    – [x] 10% more usage of Status Desktop than Slack

  • [x] Performance significantly improves
    – [x] Reduce data consumption to <10Mb/day
    – [x] Reduce power consumption to <120% of Telegram/Skype
    – [x] UI interaction time <100ms

  • [x] Implement continuous delivery
    – [x] 100% of iOS and Android releases are automated
    – [x] More than 80% automated test coverage
    – [x] Get nightly to two sigma reliability

  • [x] All preliminary scores set

  • [ ] Sanity check with everyone

  • [ ] Final Core OKR scores


For Q3 OKRs, please go here: Decentralised OKRs for 2018 Q3

  • If you know something, please edit or comment
  • If you disagree with a score, please speak up
  • Uncertainty in scoring is OK. Estimates are fine.
1 Like

My take on it

Performance significantly improves

Score: 7/10
Confidence: 8/10
Comment: XXX

Reduce data consumption to <10Mb/day
Score: 7/10
Confidence: 5/10
Comment: Need to re-check this one.

Reduce power consumption to <120% of Telegram/Skype
Score: 7/10
Confidence: 9/10
Comment: We are at 2x worse than the goal now.

UI interaction time <100ms
Score: 8/10
Confidence: 6/10
Comment: Most of the common issues fixed, some scenarios aren’t great and we have a room for improvements, but the UI is for sure much more responsive.

1 Like

Let’s see:


Beta is launched successfully

5k daily active users
Score: 0.4
Confidence: 95%
Comment: This is just a matter of scaling when we see lack of resources, and that is already trivial in the current setup.

Cluster can handle 500 concurrent users
Score: 0.9
Confidence: 95%
Comment: It should already be able to do that, but some stress tests are in order. As before, scaling vertically and horizontally with the current setup is easy.

More than 99% cluster uptime availability
Score: 0.9
Confidence: 90%
Comment: I already commented before on using the word “uptime” which is confusing in this context. Uptime is availability of a specific server, what you mean here is cluster “availability”. And with a 2 DC setup and multiple host of each type our availability prospects are good. Though a 3rd DC would recommended.


Implement continuous delivery

100% of iOS and Android releases are automated
Score: ?
Confidence: 10%
Comment: XCode is a nightmare. I have very little confidence in this.

More than 80% automated test coverage
Score: ?
Confidence: ?
Comment: I don’t write the tests, so can’t really tell.

Get nightly to two sigma reliability
Score: ?
Confidence: 30%
Comment: Replacing artifactory might have a good effect on this, but the rest is up to devs and their merging policy.


Just a rough estimate.

1 Like

Messaging is reliable

Send/delivered ratio >99%
Score: 0.7
Confidence: 70%
Comment: Current metrics point to around 98%

More than 95% of 20+ people surveyed trust Status for messaging
Score: 0.6
Confidence: 80%
Comment: From June 5th survey, ~75% of 30 users found sending/receiving messages moderately to very reliable. There is bound to be a lag between quality being delivered (which has been increasing) and user opinion in polls.

0 message reliability Instabug reports
Score: 0.5
Confidence: 90%
Comment: We’ve had 2-3 Instabug reports relative to chat functionality

1 Like

Beta is launched successfully

5k daily active users
Score: .1
Confidence: 90%
Comment: Our peak DAU is < 400 users.

More than 80% of users retained 7 days after recovering an account
Score: 0
Confidence: 90%
Comment: ~15% recurring users retained after day 7. Nearly 0% of first-time users retained.

More than 20% of users send a transaction
Score: .5
Confidence: 95%
Comment: Last 30 days DTU/DAU x 100 = ~10%

More than 20% open at least 1 DApp
Score: 1
Confidence: 100%
Comment: Last 30 days DDU/DAU x 100 = ~42%




SNT is a powerful utility in Status

2x launched SNT use cases
Score: 0
Confidence: 100%
Comment: 0 launched SNT use cases

2x demo’s/proof of concepts using SNT
Score: .5
Confidence: 90%
Comment: ENS registration will be on testnet soon.

2x Fleshed out description of the utility
Score: 1
Confidence: 100%
Comment: Tribute to Talk, paid mail nodes, usernames and voting DApp all have thorough write-ups.

2 Likes

My assessment is not qualitatively different from @rachel’s.

I would score the 2x demo’s/proof of concepts using SNT as 0.3 though, as it has technically not been deployed as of today.

Also, can we update the TODOs in the OP @oskarth to reflect this, the first x is in the wrong place.

Thanks all!

Updated state. The main thing I changed was DAU as I think this is better reflected with geometric scale rather than with arithmetic (10->100 same as 100->1000).

Still missing better numbers on:

  • – [ ] 80% of core contributors use Status (mobile or desktop) every workday
    – [ ] 100% of iOS and Android releases are automated
    – [ ] More than 80% automated test coverage
    – [ ] Get nightly to two sigma reliability

Pinged Anton (automated tests) and Chad (usage and release automation) about preliminary scores. Tested “Get nightly to two sigma reliability” in meantime here: https://github.com/status-im/status-react/issues/2878#issuecomment-399969332 - scoring 0.

While we are waiting for these I’m going to preliminary score them 0 to calculate a basic score. This will likely be updated. EDIT: Got some preliminary scores.

UPDATE:

Preliminary Q2 OKR scores done, still waiting for some numbers.

TLDR
Score: 0.5
Confidence: 70%
Comment: Not weighted by priority and not final; P0 and P1 ~0.61

  • More than 80% automated test coverage:

We’ve used 80% of’ Functional tests for nightly build` suite is covered https://ethstatus.testrail.net/index.php?/suites/view/42&group_by=cases:custom_automation&group_order=asc

We’ve reached that coverage in Q2, so score is 1.
e.g.

Let’s call the preliminary results final as there was no follow up for a month, and we are well-into Q3.