Running Garage in a FreeBSD Jail

My work at $DAYJOB that involves a lot of work with data pipeline these days. Particularly the design of new and better ones, which help us serve users with better data (more advanced processing, better cartographic decisions, etc.).

If you hear someone complain about Rust compile times, you should assign that person a task involving large datasets hosted in S3 buckets ;) One of the biggest iteration time constraints for me lately has been this, not the compiler. To work around that, I set up my own S3 cluster at home using Garage, a lightweight S3-compatible server.

Garage configuration

My homelab uses FreeBSD jails for most things I run, and this is no exception. There's already a package available, so installing this is pretty easy. To keep things simple, I have all of my services defined in a public git repository, which I've mentioned before. Each directory gets a Makejail file, one or more support files, and a justfile for consistent automation.

Garage is pretty simple: you have a config file with a few settings and a data directory, and that's pretty much it.

Since I prefer to have as much of my jail configurations checked into git as possible, this posed an interesting dilemma. The official documentation for Garage has you executing a shell script to write secrets into the config file. That's not ideal if you want to check your config into version control.

Digging around the docs, I found that all of the secrets come in variants like rpc_secret and rpc_secret_file. This let me keep a deterministic config:

metadata_dir = "/var/db/garage/meta"
data_dir = "/var/db/garage/data"
db_engine = "sqlite"

# Just a humble single node
replication_factor = 1

# Loopback-only RPC
rpc_bind_addr = "127.0.0.1:3901"
rpc_public_addr = "127.0.0.1:3901"
rpc_secret_file = "/usr/local/etc/garage/rpc.secret"

[s3_api]
s3_region = "garage"
api_bind_addr = "0.0.0.0:3900"
root_domain = ".s3.garage.localhost"

[s3_web]
bind_addr = "0.0.0.0:3902"
root_domain = ".web.garage.localhost"
index = "index.html"

[admin]
api_bind_addr = "0.0.0.0:3903"
admin_token = "/usr/local/etc/garage/admin.token"
metrics_token = "/usr/local/etc/garage/metrics.token"

Not much to it, really; it's more or less like the quickstart, except for the secret files and a more useful data directory than /tmp. I left the domain settings as defaults from the quickstart. This will not be usable over the internet, but that's okay; I've already firewalled it off and it's just going over the LAN.

Makejail

There's still a question of what actually goes in those secrets. Well, I decided that maybe their idea of random credentials wasn't so bad after all. I don't really need (or even necessarily want) to admin this over anything but ssh, and the CLI automatically gets secrets from the files in your config (I discovered this by accident when running service garage status; it needs the RPC secret!).

INCLUDE options/options.makejail
INCLUDE options/network.makejail

INCLUDE ../latest-ports.makejail

# Install
PKG garage
SYSRC garage_enable=YES

# Generate random secrets
# Credential rotation ftw ;)
CMD install -d -m 0750 -o garage -g garage /usr/local/etc/garage
CMD sh -c 'umask 077 && openssl rand -hex 32 > /usr/local/etc/garage/rpc.secret'
CMD chown garage:garage /usr/local/etc/garage/rpc.secret

CMD sh -c 'umask 077 && openssl rand -hex 32 > /usr/local/etc/garage/admin.token'
CMD chown garage:garage /usr/local/etc/garage/admin.token

CMD sh -c 'umask 077 && openssl rand -hex 32 > /usr/local/etc/garage/metrics.token'
CMD chown garage:garage /usr/local/etc/garage/metrics.token

# Persistent data directory: plain nullfs mount (probably a more sophisticated way, but I'm lazy)
ARG storageroot
CMD --local mkdir -p "${storageroot}"

CMD mkdir -p /var/db/garage
MOUNT "${storageroot}" /var/db/garage
# The pkg creates the garage user inside the jail.
# NB: With nullfs, chowning inside the jail also affects the host directory.
CMD chown -R garage:garage /var/db/garage

# Copy config toml
COPY garage.toml /usr/local/etc/garage.toml

SERVICE garage start

STAGE stop

SERVICE garage stop

So that's the Makejail: randomized creds and a directory setup. The tunables for the service are pretty minimal. You can find the full set in Ports here.

Next, I run my just freshjail recipe to create the jail. If you're not familiar with AppJail, poke around my repo and check their docs + manpages. From here, I'll assume a running jail with networking and firewall rules configured.

Cluster setup

Now it's time to set up our new cluster (of one node :P). By default, Garage nodes come unconfigured, and will not store any files until you give them some parameters!

First, get the node ID from garage status (this persists across jail creation as long as you don't delete the data dir).

appjail cmd jexec storage service garage status

Then, assign space like so (-z is a physical zone name, and -c is capacity):

appjail cmd jexec storage garage layout assign -z zone1 -c 500G NODE_ID
appjail cmd jexec storage garage layout apply --version 1

You can follow the official docs for a multi-node setup, but this is the basic idea of how you'd manage it with jails. I use AppJail, but other tools are largely similar with some sort of jexec concept.

Create your first bucket

You can create buckets like this:

appjail cmd jexec storage garage bucket create foo-bucket

Similarly, you can use garage bucket list and garage bucket info <bucket> to get info on the buckets.

Create an API key

Finally, you'll need to create an API key to actually read or write files from the bucket! You can do this with jexec-prefixed garage invocations, as usual:

# Create a key pair (save this somewhere safe)
appjail cmd jexec storage garage key create foo-key
# Give it access (in this case, full ownership) of a bucket
appjail cmd jexec storage garage bucket allow --read --write --owner foo-bucket --key foo-key

And that's pretty much it! An internet-facing production setup would obviously have a few more steps, but for my use case, I just wanted to accelerate development times without having to talk to AWS servers across the ocean.

This turned out to be well worth the investment for me! I rcloned some large buckets to my homelab overnight, and in the morning, my large query ran literally 10x faster than over the internet! (And if I had 10G, it'd be even faster!)

Posts from blogs I follow

Hoisting Expressions

IntroductionThere is an RFC open on Rust which proposes what I’m calling hoisting expressions into the language. These are expressions which can be introduced inside of closures-only (for now), and are hoisted by the compiler to run before ...

via Yosh Wuyts — Blog

Generated by openring-rs from my blogroll.