Running Garage in a FreeBSD Jail
My work at $DAYJOB that involves a lot of work with data pipeline these days.
Particularly the design of new and better ones,
which help us serve users with better data (more advanced processing, better cartographic decisions, etc.).
If you hear someone complain about Rust compile times, you should assign that person a task involving large datasets hosted in S3 buckets ;) One of the biggest iteration time constraints for me lately has been this, not the compiler. To work around that, I set up my own S3 cluster at home using Garage, a lightweight S3-compatible server.
Garage configuration
My homelab uses FreeBSD jails for most things I run, and this is no exception.
There's already a package available, so installing this is pretty easy.
To keep things simple, I have all of my services defined in a public git repository,
which I've mentioned before.
Each directory gets a Makejail file, one or more support files, and a justfile for consistent automation.
Garage is pretty simple: you have a config file with a few settings and a data directory, and that's pretty much it.
Since I prefer to have as much of my jail configurations checked into git as possible, this posed an interesting dilemma. The official documentation for Garage has you executing a shell script to write secrets into the config file. That's not ideal if you want to check your config into version control.
Digging around the docs, I found that all of the secrets come in variants like rpc_secret and rpc_secret_file.
This let me keep a deterministic config:
metadata_dir = "/var/db/garage/meta"
data_dir = "/var/db/garage/data"
db_engine = "sqlite"
# Just a humble single node
replication_factor = 1
# Loopback-only RPC
rpc_bind_addr = "127.0.0.1:3901"
rpc_public_addr = "127.0.0.1:3901"
rpc_secret_file = "/usr/local/etc/garage/rpc.secret"
[ s3_api ]
s3_region = "garage"
api_bind_addr = "0.0.0.0:3900"
root_domain = ".s3.garage.localhost"
[ s3_web ]
bind_addr = "0.0.0.0:3902"
root_domain = ".web.garage.localhost"
index = "index.html"
[ admin ]
api_bind_addr = "0.0.0.0:3903"
admin_token = "/usr/local/etc/garage/admin.token"
metrics_token = "/usr/local/etc/garage/metrics.token"
Not much to it, really; it's more or less like the quickstart,
except for the secret files and a more useful data directory than /tmp.
I left the domain settings as defaults from the quickstart.
This will not be usable over the internet,
but that's okay; I've already firewalled it off and it's just going over the LAN.
Makejail
There's still a question of what actually goes in those secrets.
Well, I decided that maybe their idea of random credentials wasn't so bad after all.
I don't really need (or even necessarily want) to admin this over anything but ssh,
and the CLI automatically gets secrets from the files in your config
(I discovered this by accident when running service garage status; it needs the RPC secret!).
INCLUDE options/options.makejail
INCLUDE options/network.makejail
INCLUDE ../latest-ports.makejail
# Install
PKG garage
SYSRC garage_enable=YES
# Generate random secrets
# Credential rotation ftw ;)
CMD install -d -m 0750 -o garage -g garage /usr/local/etc/garage
CMD sh -c 'umask 077 && openssl rand -hex 32 > /usr/local/etc/garage/rpc.secret'
CMD chown garage: garage /usr/local/etc/garage/rpc.secret
CMD sh -c 'umask 077 && openssl rand -hex 32 > /usr/local/etc/garage/admin.token'
CMD chown garage: garage /usr/local/etc/garage/admin.token
CMD sh -c 'umask 077 && openssl rand -hex 32 > /usr/local/etc/garage/metrics.token'
CMD chown garage: garage /usr/local/etc/garage/metrics.token
# Persistent data directory: plain nullfs mount (probably a more sophisticated way, but I'm lazy)
ARG storageroot
CMD --local mkdir -p "${storageroot}"
CMD mkdir -p /var/db/garage
MOUNT "${storageroot}" /var/db/garage
# The pkg creates the garage user inside the jail.
# NB: With nullfs, chowning inside the jail also affects the host directory.
CMD chown -R garage: garage /var/db/garage
# Copy config toml
COPY garage.toml /usr/local/etc/garage.toml
SERVICE garage start
STAGE stop
SERVICE garage stop
So that's the Makejail: randomized creds and a directory setup. The tunables for the service are pretty minimal. You can find the full set in Ports here.
Next, I run my just freshjail recipe
to create the jail.
If you're not familiar with AppJail, poke around my repo and check their docs + manpages.
From here, I'll assume a running jail with networking and firewall rules configured.
Cluster setup
Now it's time to set up our new cluster (of one node :P). By default, Garage nodes come unconfigured, and will not store any files until you give them some parameters!
First, get the node ID from garage status (this persists across jail creation as long as you don't delete the data dir).
appjail cmd jexec storage service garage status
Then, assign space like so (-z is a physical zone name, and -c is capacity):
appjail cmd jexec storage garage layout assign -z zone1 -c 500G NODE_ID
appjail cmd jexec storage garage layout apply --version 1
You can follow the official docs for a multi-node setup,
but this is the basic idea of how you'd manage it with jails.
I use AppJail, but other tools are largely similar with some sort of jexec concept.
Create your first bucket
You can create buckets like this:
appjail cmd jexec storage garage bucket create foo-bucket
Similarly, you can use garage bucket list and garage bucket info <bucket>
to get info on the buckets.
Create an API key
Finally, you'll need to create an API key to actually read or write files from the bucket! You can do this with jexec-prefixed garage invocations, as usual:
# Create a key pair (save this somewhere safe)
appjail cmd jexec storage garage key create foo-key
# Give it access (in this case, full ownership) of a bucket
appjail cmd jexec storage garage bucket allow --read --write --owner foo-bucket --key foo-key
And that's pretty much it! An internet-facing production setup would obviously have a few more steps, but for my use case, I just wanted to accelerate development times without having to talk to AWS servers across the ocean.
This turned out to be well worth the investment for me! I rcloned some large buckets to my homelab overnight, and in the morning, my large query ran literally 10x faster than over the internet! (And if I had 10G, it'd be even faster!)