Deployment of Status hosted full-nodes and analysis API

infrastructure

#1

All,

I came across a neat repo that gives a docker-compose script for hosting a full Ethereum node (parity) plus an analysis API for querying it. Harry (security lead) from MyCrypto brought it to my attention.

The repo can be found at cyber-drop/ethereum-analytical-db.

More specifically it gives the following capability:

  • grafana dashboard
  • SQL queries
  • API
  • use in Jupyter (or observable) notebooks
  • full node access

I was thinking we at Status could use this for various things in-house, as well as give the community access to a Status hosted full-node, including:

  • contract monitoring solutions (because no one else is building them apparently)
  • @barry -like economic modeling
  • financial reports of various kinds (@Dani)
  • fallback for the status app to query (@igor)

In any event, @jakubgs has priced out how much this thing would cost:

  • single instance (slightly less powerful than their spec machine): ~ $160USD / month
  • multiple hosts: ~$200USD/month

Thoughts?


#2

How would it compare to using something like Ethereum in Bigquery?: https://cloud.google.com/blog/products/data-analytics/ethereum-bigquery-public-dataset-smart-contract-analytics


#3
  1. it’s ours, and if we build anything off of it, we are not subject to their changes, only ours.
  2. it allows us to serve the community through a Status hosted implementation (both full-node api and SQL query), which is valuable to some aspect of the community.

I’m not saying this is a sure fire deal, just gauging interest within Status.


#4

Interesting. What kills is this RAM needed, otherwise I would be able to run it. https://github.com/cyber-drop/ethereum_analytical_db/issues/1

They already run this, maybe we can pay them for monthly access of their instance?


#5

I asked them if they allow access. We’ll see how that goes.

Purchasing access from them is no different in terms of reliance as any other 3rd party service, just saying.

any reasoning regarding that stuff goes out the window, but not throwing the baby out with the bathwater.


#6

Tbh when you say per month costs, you’re probably thinking cloud, in which case again you’re depending on someone else. I would vote for no cloud for anything, ever.


#7

What is the problem with using a service for some analytic queries?

For a small organization like ours I would say the real cost is mostly the time it will take to setup and maintain our own service + the resources it takes from other swarms, increasing their development times.

If the concern is data quality, I would posit paying for 2 or 3 separate services and sanity checking the data against each other might actually be cheaper than running our own service.


#8

The question is why do we need analytics of this type?

  • contract monitoring solutions (because no one else is building them apparently)

There are plenty solutions out there for this. Alethio just released their monitor, there’s Ethtective, and there’s various other subscriber hook services for both address and contract transactions.

  • @barry -like economic modeling

Possibly a valid use case, but arguably does not justify 160 USD p/m cost when you need one-off access to fire off some queries and play with the results. Pay for compute time of this when needed only, and do not keep it alive just for this?

financial reports of various kinds

Etherscan API and Infura is sufficient for this as trust is hardly an issue in accounting of this type.

fallback for the status app to query

Incomparably cheaper and more practical solutions exist.


Do we need it to verify past data and make sure some history is true? If so, there’s no alternative to running our own stuff if it’s truth we seek. Do we need it for fun? Then it makes no sense to run our own cluster on a cloud somewhere as buying access from a provider would be just as safe, only (likely) cheaper.

For analytics (archive node) you need a beast of a machine (as stated in the original post) and you better have a business case for it if you want it to be anything other than a drain of funds.

If all we need is a full node and not an archive node, then an initial-cost-only no-maintenance plug-and-play full + whisper + status node that can run anywhere on any bandwidth and with no recognizable electricity cost is, in my opinion, a better thing to pursue organization-wide.

Don’t get me wrong, I’m all for everyone running nodes at home and in offices and heck, even plugged in secretly into smart benches, I don’t care, but this just seems a little wasteful without a clearly defined purpose. Feels a little like GAS (gear acquisition syndrome).


#9

Quality feedback everyone.

as a note, I feel strongly that Status should provide at the very least an archive node. We want to be a pillar of this community, that (imo) is a part of it. How we do that is subject to opinions, but cloud seems like the most reliable way.

Going the route of everyone running what @Bruno mentions is also something we should all be pushing. I’d also agree there isn’t much business case for this, which is why I was querying the rest of you to see if there was, as I thought the project was cool.


#10

Note that if we do run an archive node on cloud and make it publicly accessible, it is likely not going to cost 160 per month but much more once people start connecting and hitting it for data.


#11

shameless self-promotion time!

for the #BUIDL week, I wanted to work on improving the UX of deploying Status nodes by having a simple web UI there, helping to enable/disable services, tweak config and display QR codes.


#12

let’s ask @jakubgs to set it up on one of the clouds☁️? it doesn’t seem to be a hell of a job, if that is one instance, all we need is specs for this VM.


#13

Ooof that sounds awesome! A web UI that lets you configure the various assets would be stellar, and adding some resource tracking with stuff like netdata (thanks for that tip btw, can’t get enough of those graphs now!) would be a huge UX boost.