Refactoring configuration in status-go


#1

TLDR; This proposal is about unifying passing configuration to status-go. Each environment (cli, Docker, status-react) would use a JSON blob to start a node and manage its services. For the CLI, it would mean no flags except for one -c/--config. Moreover, the provided configuration would be final and would not be implicitly updated.


The current state of configuration in status-go is less than ideal. It has a long history and accumulated a lot of tech debt. I think it’s high time to discuss its future form and refactor it.

I have a list of the most frustrating things about the current configuration in status-go:

  • defaults are scattered over many places; in case of a CLI, some defaults are defined for flags and some are in params package,
  • provided config as a JSON blob is not final; there is an “update phase” which checks if e.g. bootnodes are empty while ClusterConfig is enabled and if that’s the case, the config is implicitly changed,
  • it’s called NodeConfig while it encapsulates other areas as well,
  • it’s not flatten; NodeConfig has other structs embedded like ClusterConfig that can be nil.

The another matter that needs clarification is how the config is utilized. This is derived from the fact how status-go is utilized:

  • as a CLI program (statusd binary) and the config is built from the provided flags or a config file (rarely used),
  • as a Docker container and in this case the config is also built from the provided flags,
  • as a library statically linked with status-react where the config is built using GenerateConfig binding and JSON object manipulation in ObjC or Java.

Proposal

This proposal is about unifying the way the configuration is provided and used in the mentioned environments. The config object should always be created from a JSON blob and it must be complete and final, i.e. there are no implicit updates and if some required values are missing, an error is returned.

CLI

In the CLI, we would ideally remove all flags and only accept JSON files:

statusd -c config/fleet-eth.beta.json -c path/to/my/config.json

As you can see, it’s possible to provide multiple JSON files. They will be merged according to the occurrence order. The reason is that some configuration values rarely change and can be versioned. A fleet configuration is a good example. At the same time, it should be possible to easily experiment and provide a custom config file.

We may consider keeping some very popular flags like -networkid or -datadir but that raises a question which flags are popular and should be kept. Alternative is to do use https://github.com/kelseyhightower/envconfig instead where each configuration value can be overridden with a env variable (it will be possible because the configuration struct will be flat).

Docker

If we remove flags, how can we pass a custom configuration file to a docker container? One option is to dynamically create a JSON file in Ansible based on some vars, upload it on a host machine and mount as a volume. Is this solution ok? cc @jakubgs

status-react

status-react will pass the config to start the node in the same way as today. However, the question is whether we should keep GenerateConfig binding or it should be removed. In my opinion, we should remove it and encourage usage of ValidateConfig binding instead. In this way, we would be able to remove most of the JSON manipulation from Objective-C and Java code and keep the configuration as plain JSON files. cc @yenda

Also, we will get rid of a situation that some values are defined in status-go and some in status-react (the current situation with bootnodes and Whisper nodes vs MailServer nodes) as all values would need to be defined on status-react side.

Issues

  1. We still need to have a way to calculate some configuration values. For example, if the network ID is provided and LES enabled, we can safely calculate the chain genesis which is required by LES. ValidateConfig should return an error if these values are missing and it’s not possible to calculate them.

#2

I’m all for this configuration overhaul.
To answer your question Adam, yes, providing the configuration via JSON generated trhough Ansible by mounting it through a docker volume is how configuration is supposed to be provided for docker conteiners in general. It would also make it easier to manage, for example version, in order to debug issues caused by config changes.


#3

I definitely like the proposal to reduce implicitness and number of configuration sources. I’m not sure about “complete and final” part, because as config grows, the burden of keeping config complete and final will rise as well. In this case, the sane defaults will be an absolute must.

So it seems like the best approach would be to have something like a NewDefaultConfig() call, which will return ready to use default config, with methods for changing config before starting the node. In this case, called will only need to update fields it needs, while passing a complete config object.
This also plays well with gomobile. For example:

config := statusgo.NewDefaultConfig()
config.SetNetworkID(3)
config.SetWhisperEnabled(true)
node, err := statusgo.StartNode(config)

Does it align with the proposal?

And I strongly agree that we need to keep a few (as little as possible) command line options, following the Pareto principle – 80% of use cases should not require passing complete configuration.

I would also separate serialization question (JSON) from having a single source of config. JSON is relevant only for some of the cases (again, gomobile would not need it if we update status-react to use it).


#4

I see it this way that there will be a single, flat (without nested structs) struct Config.

Now, what would NewDefaultConfig() return? Would it be just a zero-value struct or would it have some specific “default” values for some fields? I am not against providing a way to generate a config representation with some fields filled in, however, it will introduce some uncertainty.

From the code perspective, of course it will be possible to change Config field values. I would even add methods like Config.UpdateWithJSON(data []byte).

By “complete and final”, I meant removing methods like NodeConfig.updateConfig() which is called after calling LoadNodeConfig(json string). When a config is passed as JSON in order to start a node, it should not be modified. Some values can be calculated (e.g. chain genesis based on the network id) but the config itself should be identical to the one passed in the param. If the config is incomplete, well, let’s return an error instead of trying to fix it on-the-fly because it brings confusion and ambiguity. This also means removing fields like MailServerPasswordFile.


#5

Absolutely. It’s even embarrassing we still have this.