The road to Nix, a functional package manager to rule them all


#145

Is this issue nix-related? I don’t recall seeing this Detox.framework anywhere…


#146

Right! I just grepped the source and could not find it. Sorry, must be some local problem. Will investigate.


#147

Daily progress report:

  • After fixing the issues in the latest progress report, I’ve come across a build error in :app:mergeReleaseResource, with aapt2 complaining about a missing file/directory. Running with strace and looking at the sources has been inconclusive so far. However:
    • taking the temp nix-build directory and creating a pure Nix shell on it, I was able to re-run the exact same gradle assembleRelease command and have it succeed. This would seem to point to environment differences.
    • comparing the environment variables from both environments didn’t yield any significant differences.
    • Running nix-build without --pure didn’t help though.

#148

Daily progress report:

  • Finally found out the issue that was causing aapt2 to mysteriously fail with ENOENT: the aapt2 being downloaded from a non-Nix source (an Internet Maven repo), it points to an dynamic library interpreter that is forbidden in a sandboxed Nix environment. The solution was to use patchelf --set-interpreter to point it to a Nix interpreter. This has allowed the gradle assemblyRelease command to run successfully to completion.
  • Will now clean up the existing Nix recipes from test code, and work on replacing the existing Jenkins logic with a call to the Nix build recipe.

#149

Daily progress report:

  • Started replacing Jenkins logic to use nix-build, as well as adding logic to clean up derivations once they’re build to avoid filling up the disk.

#150

Daily progress report:

  • Fighting with Groovy/Jenkins to get the Android keystore file recognized in the Nix script.
  • Cleaning up of branch.

#151

Daily progress report:

  • Managed to tame Groovy/Jenkins and get the first successful end-to-end sandboxed Nix build of the Android app!
  • The recent rebase seems to have introduced a new issue in the taoensso/timbre library (a single difference in the taoensso.timbre.*config* macro invocation), as well as the BuildID of a couple of the RN native libraries (libglog_init.so and libreactnativejni.so). These results were based on comparisons between builds done in 2 different CI servers.

#152

Daily progress update:

  • Removed the BuildID of the RN native libraries;
  • With the help of @yenda, fixed the Timbre library fork so that it generates reproducible macro expansions;
  • The last remaining file (which I had hoped would no longer have differences once we fixed the index.android.bundle issue) is resources.arsc at the root of the APK. This is a known source of non-determinism. The fact that we’re on an old version of the Google Android Gradle plugin doesn’t help either, so I’ll upgrade the plugin to at least 3.4.0 (which also implies upgrading Gradle to at least 5.1.1) and see if that helps.

#153

What an epic ending to a difficult week! Today marks the first time we have generated two binary-identical .apk files from 2 separate CI machines:

diff -s /home/pedro/Downloads/StatusIm-190628-104059-743f20-manual.apk.zip /home/pedro/Downloads/StatusIm-190628-102026-743f20-manual.apk.zip
Files /home/pedro/Downloads/StatusIm-190628-104059-743f20-manual.apk.zip and /home/pedro/Downloads/StatusIm-190628-102026-743f20-manual.apk.zip are identical

Daily progress update:

  • After upgrading to Gradle 5.1.1 and the Android plugin to 3.4.1, started running into build issues saying that :app:mergeReleaseResources task could not find aapt2. After some investigation, it turned out that :react-native-android:packageReleaseResources' was causing the file to be deleted from the cache because it was considered stale for some reason. It looks like the way we’re saving the ~/.gradle folder from one Nix expression to the release-android expression is not convincing Gradle, which causes it to consider them stale and trigger a rebuild. For now, I was able to get around it by disabling caching in Gradle for Nix builds.
  • The next step will be to guarantee identical builds between my dev machine and the CI servers, since the build environment uses a slightly different configuration at the moment (currently there are 1-byte differences in some files - such as /res/layout/abc_action_mode_close_item_material.xml - at the second-to-last byte in 5 files).
  • Even though we’ve reached the goal of reproducible builds, there is still a lot of work that remains to be done, namely:
    • The issue with ClojureScript minification is still out there. Currently, I just disabled some aspects of minification, but the goal is to keep it at the :advanced level instead of :simple;
    • I haven’t focused on ensuring that the remaining development environment works properly after these changes (e.g. that the developer is able to run make startdev-android-* successfully, and ideally reuse the same Maven local Nix repo instead of downloading dependencies to another local cache);
    • Cleaning up and documentation;
    • Testing other more compatible methods of reaching the same outcome (e.g. use yarn2nix instead of node2nix in order to keep compatibility with yarn.lock);
    • Moving some new Nix infrastructure out of status-react, since it doesn’t need to be there (e.g. the tool to compute a local Nix Maven repository, which can be open-sourced);

#154

Daily progress report:

  • I believe I found the reason for slightly different XML files between local build and CI build, thanks to apktool (which by the way is available as a Nix package): the different byte was in the resource ID, because the local build had an extra identifier coming from .env (DEV_BUILD=1). The solution is to accept a variant argument in release-android.nix and default to the same value as the CI server (nightly).
  • Now that build differences are resolved across machines, I’ll re-enable :advanced CLJS minification support in Google Closure compiler and try to get that to build deterministically.

#155

Managed to fix the minification issue once I found a compiler option for that: :stable-names true.


#156

I guess we should try publishing the app to F-Droid to see if the store is happy with the deterministic build.


#157

Daily progress update:

  • Yesterday I tried installing the resulting app on a device, but got a loader error regarding libgojni.so. It turns out that the fix to remove the .gnu.version_d renders the library unloadable, so a different approach will need to be devised. After a preliminary investigation, this is my understanding of the situation so far:
    • gomobile creates a temporary dir in the form of $TMP/gomobile-work-xxxxxxx/ where xxxxxxx is a variable hex number created by ioutil.TempDir. There are a few approaches we could use to fix this:

      • Patch the strings of the resulting library in the string table (or even remove them altogether);
      • Patch the sources of gomobile so that it uses a known directory;
    • go build is leaving a build path on the string table (which you can see using readelf -V), which ends up affecting the a field in the .gnu.version_d header (which is why simply patching the string table isn’t enough). Normally this should not happen due to this logic in golang, so I’ll need to try to understand where things are going wrong.

      Truncated output of readelf -V:

      Version definition section '.gnu.version_d' contains 1 entry:
        Addr: 0x0000000000015c7c  Offset: 0x015c7c  Link: 3 (.dynstr)
        000000: Rev: 1  Flags: BASE  Index: 1  Cnt: 1  Name: /build/go-build203396378/b001/exe/a.out
      

Running go build with the -x flag to print all commands, we can see the source of this path:

mkdir -p $WORK/b001/exe/
cd .
/nix/store/9642xkwmps282cy96s1sadis877dcr74-go-1.11.5/share/go/pkg/tool/linux_amd64/link -o $WORK/b001/exe/a.out -importcfg $WORK/b001/importcfg.link -installsuffix shared -buildmode=c-shared -buildid=d6aCMiKF8dVHGgjqvwOI/Qx9Fub1et8xxoKugc2ST/ZiA8cufNSF4DIqGIOmys/d6aCMiKF8dVHGgjqvwOI -s -extld=/nix/store/61sbaavrlq3hdq7rx8m2wkcibz6qyvfj-ndk-bundle-19.2.5345600/libexec/android-sdk/ndk-bundle/toolchains/llvm/prebuilt/linux-x86_64/bin/x86_64-linux-android21-clang $WORK/b001/_pkg_.a
/nix/store/9642xkwmps282cy96s1sadis877dcr74-go-1.11.5/share/go/pkg/tool/linux_amd64/buildid -w $WORK/b001/exe/a.out # internal
mkdir -p /build/gomobile-work/android/src/main/jniLibs/x86_64/
mv $WORK/b001/_cgo_install.h /build/gomobile-work/android/src/main/jniLibs/x86_64/libgojni.h
mv $WORK/b001/exe/a.out /build/gomobile-work/android/src/main/jniLibs/x86_64/libgojni.so

We can also see that the go compiler is leveraging the NDK’s clang compiler ability to generate reproducible paths with by passing it -fdebug-prefix-map=$WORK/b070=/tmp/go-build.

For future reference, here is where the a.out name is decided, and here is where the path gets computed.


#158

Daily progress update:

  • After trying several approaches to fixing non-determinism in the generation of libgojni.so, I opted for patching the Go compiler suite. Although that sounds heavy-handed, it’s a simple line of code in Nix to patch a source file in the Go compiler so that it looks for a temporary build location that we control in an environment variable we provide. If it doesn’t see that variable, it just falls back on existing behavior. I’ll be opening up a bug report at https://github.com/golang/go/issues/, but for the time being, we already reap the benefits with little downside.
  • After the libgojni.so fix, I tested deploying the CI build to an Android phone and the app worked fine.
  • I conducted several tests building the same commit across different machines. We’re able to get identical apks on CI servers (where we provide the official Status signing certificate), but those won’t be exactly identical to the ones built on a developer machine, since they’ll be signed with the developer’s certificate (there’ll be differences in /META-INF/CERT.RSA /META-INF/CERT.SF /META-INF/MANIFEST.MF and /classes2.dex). This is assuming we build with the same arguments of course (notably build-number and build-type).
  • The rest of the day will be spent fixing CI builds for the other platforms due to the changes in this branch.

#159

Something I’ll be putting on my watch queue: https://www.gophercon.co.uk/videos/2017/building-go-with-bazel/


#160

Daily progress update:

  • Fixed all CI builds, iOS still giving issues in fastlane;
  • Created a separate smaller PR with unrelated things to make upcoming PR smaller.

#161

Daily progress update:

  • Fixed iOS build, all PR builds are now green.
  • Merged yesterday’s spin-off PR.
  • Testing other workflows to see if nothing was broken by PR branch.

#162

Daily progress update:

  • Started looking into migrating from node2nix to yarn2nix for mobile builds. node2nix was initially used for the Reproducible Builds branch due to perceived better extensibility and sophistication. However, now that hindsight is 20/20 and we’re aware of all the pitfalls, I’ve confirmed that yarn2nix is enough for our needs, and brings the following advantages:
    • we get to keep the mobile_files/yarn.lock file everyone is already used to;
    • yarn2nix doesn’t require the generation of a nix expression from the package.json file. It is smart enough to generate it on-the-fly for a yarn.lock file (which is already deterministic anyway), so it greatly simplifies ongoing maintenance;
    • we don’t need to add special patching for realm, since all the yarn2nix does is leverage yarn’s offline cache support.

With this in mind, I’ll bring the yarn2nix work into the fold of the upcoming PR, even if that causes 1-2 days of extra work.


#163

The Reproducible Android builds PR is now ready for review: https://github.com/status-im/status-react/pull/8549


#164

The Reproducible Android builds PR has now been merged. We officially have reproducible Android builds! :tada: