Continuous Learning

This is where I share and save my findings for you and future me.

Bazel build system and Rust

I will write more posts for each step I'm learning Bazel together with Rust, but don't wait for the next post because that could take a while.

A build system is a tool for automating the build process of software. There are multiple well known build systems which have been used for decades, make is one of the older ones. Then we have CMake, MSBuild, Gradle, Ant, Maven and the list can go on.

What is common for all build systems is:
They have a process for how to compile source code into binary code, packaging binary code, and running automated tests.

Most common way to use build systems is to use a continuous integration chain which is activated when there is a committed change in the source or a new release.

But still developers need to be able to build and execute their code locally on their desktops for shorter round trip to have a binary to test against in their environment.

Bazels way of defining the builds enables that both the continuous integration chain and the developers local compiling process can take advantage of each other by having a shared cache of all builds and its dependencies. Developers have multiple iterations and changing their code and then compile it until they are satisfied, all these changes will have their own build which can be tracked and cached. So when the developer commit their code to the central repository, the build cache is also shared with the continuous integration chain so it can jump directly to testing without the need to rebuild everything.

This enables so a developer can pull the whole source code and build it without rebuilding, because everything is in the cache.

Why not cargo

Cargo is the build system and package manager in Rust and is well used by the rust community. It is the de facto standard in the Rust sphere. And by that it is easy to have a package system for different libraries or crates which they are called in Rust.

So why not continue to use Cargo, you would first start argue with.

For most of the projects out there, Cargo will be sufficient and there is no need to use Bazel or other build systems than Cargo.

But the moment you have multiple teams in charge of different software components which shall work together in a bigger system, Cargo gets out grown pretty quickly. It can also be the case that different parts of the system is written in different programming languages. Bazel can handle different programming languages with rules, which is a set of actions which Bazel need to perform for making a build for that specific programming language or toolchain.

A simple example

The following example is a minimal example of a simple rust project consisting of a binary, library and external crates. This can be handled by Bazel.

You can view and clone the following project from github
Anders-Linden / rust_bazel_hello_world

git clone https://github.com/Anders-Linden/rust_bazel_hello_world.git

or just continue to follow bellow and manually set up the project.

Folder structure

mkdir -p hello_world/src
mkdir -p hello_world/utilities/src
mkdir hello_world/assets

Workspace

The Workspace file is the place Bazel is considering as the root path of the workspace, in this file it is stated what rules this project is going to be built by.

This example uses the rust rules

cat <<EOF > hello_world/WORKSPACE.bazel
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")

http_archive(
    name = "io_bazel_rules_rust",
    sha256 = "618cba29165b7a893960de7bc48510b0fb182b21a4286e1d3dbacfef89ace906",
    strip_prefix = "rules_rust-5998baf9016eca24fafbad60e15f4125dd1c5f46",
    urls = [
        # Master branch as of 2020-09-24
        "https://github.com/bazelbuild/rules_rust/archive/5998baf9016eca24fafbad60e15f4125dd1c5f46.tar.gz",
    ],
)


load("@io_bazel_rules_rust//rust:repositories.bzl", "rust_repositories")

rust_repositories(version = "1.47.0", edition="2018")


load("@io_bazel_rules_rust//:workspace.bzl", "rust_workspace")

rust_workspace()

load("//cargo:crates.bzl", "hello_cargo_library_fetch_remote_crates")

hello_cargo_library_fetch_remote_crates()
EOF

Build files

For each binary or library or other artifact there is a BUILD file, this defines the source files and its dependendencies.

Binary build file

cat <<EOF > hello_world/BUILD.bazel
load("@io_bazel_rules_rust//rust:rust.bzl", "rust_binary")

rust_binary(
    name = "hello_world",
    srcs = ["src/main.rs"],
    data = ["assets/hello_world.txt"],
    deps = ["//utilities",
            "//cargo:log",
            "//cargo:env_logger"],
)
EOF

Library build file

A library is also built and then linked to a binary and the same code can be used in multiple binaries.

cat <<EOF > hello_world/utilities/BUILD.bazel
package(default_visibility = ["//visibility:public"])

load("@io_bazel_rules_rust//rust:rust.bzl", "rust_library")

rust_library(
    name = "utilities",
    edition = "2018",
    srcs = [
        "src/lib.rs",
    ],
)
EOF

Source files

This is ordinary Rust source files for the project

Library source

cat <<EOF > hello_world/utilities/src/lib.rs
use std::fs::File;
use std::io::BufReader;

pub fn open_input(file_path: &str) -> std::io::BufReader<std::fs::File> {
	let file = match File::open(file_path) {
		Err(why) => panic!("couldn't open, {}", why),
		Ok(file) => file,
	};
	BufReader::new(file)
}
EOF

Binary source

cat <<EOF > hello_world/src/main.rs
use std::io::prelude::*;
extern crate utilities as utils;

use log;

fn main()  {
    env_logger::init();
    log::info!("Starting");
	for line in utils::open_input("./assets/hello_world.txt").lines() {
        log::info!("{}", line.unwrap());
	}
}
EOF

Asset file

Just a simple text-file which is going to be read and parsed.

cat <<EOF > hello_world/assets/hello_world.txt
hello, this is first row
and this is the second row
EOF

Cargo file

The cargo file is used by cargo-raze for generating an environment and the BUILD files for each external crate.

Cargo-raze have different ways of handling crates

  • Vendoring mode
  • Remote Dependency Mode

And then there is different ways of handling the building of crates depending on the complexity of the crate.

  • Simple crates
  • Unconventional Crates
  • Crates that need system libraries
  • Crates that supply useful binaries
  • Crates that only provide binaries

Read the documentation on Github cargo-raze

cat <<EOF > hello_world/Cargo.toml
[package]
name = "hello_world"
version = "0.0.1"
edition = "2018"

[dependencies]
log = {version = "0.4.11", features = ["std"]}
env_logger = "0.8.1"

[[bin]]
name = "hello_world"
path = "src/main.rs"

[raze]
workspace_path = "//cargo"
target = "x86_64-apple-darwin"
output_buildfile_suffix = "BUILD.bazel"
gen_workspace_prefix = "hello_cargo_library"
genmode = "Remote"
default_gen_buildrs = true
EOF

Go inside the workspace path

cd hello_world

The cargo lockfile consist of the exact information for each dependency which is used in the rust project

cargo generate-lockfile

The command for generating the Bazel Build files for external crates

cargo raze

Command for building and running the application.

bazel run :hello_world

Web Key Directory

I think everyone had the the moment from time to time when you feel like you have been sleeping under a rock for one year or two, that happened me last night. Web Key Directory was the secret words, which made me to realize that I have been blind or just had so much other things to do the last year.

Share your public key with friends and strangers with ease. This article describe what Web Key Directory is for GnuPG.

Clients and e-mail providers are listed at GnuPG WKD had added support for Web Key Directory also known as WKD. GnuPG which is an implementation of OpenPGP which is a system for signing or encrypting messages, documents and files. Authentication to login on different systems with your keys.

Background

There is two different encryption models on how you can unlock an encryption; Symmetric and Asymmetric which have its own applications and can also be combined for different actions.

In simple words Symmetric is when both sender and receiver must have the same key to encrypt and decrypt a message and everyone which get their hands on this shared key will also be able to read the messages.

Asymmetric is harder to understand how it works because the sender does not need to share any secret key to the receiver, the only thing she needs is a public key from the receiver which is not enough to decrypt any message, but it is well enough to encrypt with if the sender have its own private key which he will never share with anyone.

This means that every time you want to encrypt a message with some one new, you need to ask for the public key. To make it easier there exists some dictionaries like the old fashion yellow pages but digital OpenPGP keyserver, where people can chose to publish their contact information together with a public key. For many years the servers have been useful, but about a year ago some one decided to exploit a known vulnerability which made some of these key servers to crash. This have become a very critical infrastructure for many opensource projects who kept their public keys there to let users of their software verify that the software they are about to install is really released from the correct team.

So one of the solution of this is Web Key Directory which is a super simple way of how to distribute the centralization on not just one central point of failure but on every possible domain which is connected to an e-mail. It uses traditional http with SSL (https) and a folder structure with your public key stored in a file. The URL for reaching your public key will be like bellow.

example.com/.well-known/openpgpkey/hu/<hash e-mail prefix>
├──────────┘├─────────┘├─────────┘├──┘├───────────────────┘
│           │          │          │   │
│           │          │          │   └ sha1 + z-Base-32 
│           │          │          └ Human use folder
│           │          └ openpgpkey folder, RFC Draft (1) 
│           └ RFC8615 Well-known folder for metadata
└ Domain name same as your e-mail

(1): https://tools.ietf.org/html/draft-koch-openpgp-webkey-service-08

if you cannot use your primary naked domain record create a subdomain named openpgpkey and point it to a http server which can handle SSL and keep your .well-known folder safe. Se bellow how the URL shall look like.

openpgpkey.example.org/.well-known/example.org/openpgpkey/hu/<hash e-mail prefix>

Files needed in the root of your web server for the domain

.
└── .well-known
    └── openpgpkey
        ├── hu
        │   └── im4cc8qhazwkfsi65a8us1bc5gzk1o4p
        └── policy

3 directories, 2 files

the policy file is for clients to check for Web Key Directory support. It is also used for different policy flags, as default this needs to be an empty file.

How to

To list your keys

gpg --list-keys --with-wkd

pub   rsa4096/0xFB12FB1BCB8D0713 2019-09-26 [SC]
      Key fingerprint = 9D1A 01CF C4C2 5B90 DA81  55D8 FB12 FB1B CB8D 0713
uid                   [ultimate] hello <hello@example.com>
                      im4cc8qhazwkfsi65a8us1bc5gzk1o4p@example.com
sub   rsa4096/0x002C57D3FC48ABFB 2019-09-26 [E]

in this example the sha1 + z-Base-32 hash of hello is im4cc8qhazwkfsi65a8us1bc5gzk1o4p which will be the name of the file you place on your web server.

Export your public key where im4cc8qhazwkfsi65a8us1bc5gzk1o4p is the sha1 + z-Base-32 of your e-mail prefix, execute following command.

gpg --output\
    im4cc8qhazwkfsi65a8us1bc5gzk1o4p\
     --export\
     hello@example.com

Copy the file to the .well-known folder on your webserver place it under openpgpkey/hu

test your configuration
https://metacode.biz/openpgp/web-key-directory

For downloading public keys with Web Key Directory
gpg --auto-key-locate clear,nodefault,wkd --locate-keys hello@example.com

Git rebase when you understand why

I find often that people have hard time to understand Git, they often get stuck in between the local repository and the remote repository where the local is detached and suddenly they end up in a merge conflict.

The conceptual idea of what Git is, is most of the time the main issue, and what shall be stored in Git. You cannot always blame the users when organizations force non-experienced developers to use a version-control system to store various things, without a thought of what shall be stored in Git.

In organizations there are a mixture of people in which does not always have a background in software development, they will more or less use Git as it was a network file system with some extra steps where you put things in to share with your peers.

What experienced developers do really bad is explaining Git with Git terms which will only make a lot more confusing situation. The moment you trying to explain Git you will start pouring out words like, clone, commit, stage, branch, merge, merge conflict, resolve, push, pull, rebase.

When it is just the conceptual idea of how Git sees the world as a tree with snapshots which pointing to a parent snapshot, and how you can as a user change these links between the snapshots, then you can also explain why some commands need to be executed.

stolen from xkcd

Otherwise it will end up in as XKCD #1597 explains which you can read above. So if you try to explain Git for some one please make sure you know whats happening with the graph tree, which Git is based on; DAG (Directed Acyclic Graph) which is in simple words a graph without loops, If you follow the relationships from one node you will never end up in the same one you started with. In Git every node in the graph is a commit, a snapshot of the code you working with.

The snapshots can be stored on your computer or on other computers, how you choose to transfer the state of your DAG is up to you, one way is to format patches with git format-patch give it you your friend and and then use git am to apply it, or you can send it by e-mail with git send-mail you can find a good tutorial of git send-mail here. Most common way is to setup a remote with git remote add URL and use the commands push and pull.

The way it is designed the only commit you will have to care about is the latest, because it will always point to a parent commit, and the parent commit will point to it's parent until the "initial commit", so then you have the full history.

A commit can have many children which means it is branched. You can change the parent of a commit with the command rebase, this is useful if your local commit is behind the remote tree which is common if you have team mates which is faster than you to write and commit code to the repository. Simple solution is the command bellow:

git pull origin --rebase

That's how you most of the time can avoid merging every time there is a new commit on the remote. What git is doing it will apply your changes as new commits on top of the latest from the remote origin. This means that you can also delete commits your are not proud of, by adding git pull origin --rebase=i and you have options for each commit. You can also use rebase within branches git rebase a_branch_name

Want to play with rebase follow this tutorial