{
  "version": "https://jsonfeed.org/version/1",
  "title": "Ian's Digital Garden",
  "home_page_url": "https://ianwwagner.com/",
  "feed_url": "https://ianwwagner.com//tag-rust.json",
  "description": "",
  "items": [
    {
      "id": "https://ianwwagner.com//reqwest-0-13-upgrade-and-webpki.html",
      "url": "https://ianwwagner.com//reqwest-0-13-upgrade-and-webpki.html",
      "title": "reqwest 0.13 Upgrade and WebPKI",
      "content_html": "<p>In case you missed the <a href=\"https://seanmonstar.com/blog/reqwest-v013-rustls-default/\">announcement</a>,\nthe <code>reqwest</code> crate has a new and very important release out!\n<code>reqwest</code> is an opinionated, high-level HTTP client for Rust,\nand the main feature of this release is that <a href=\"https://rustls.dev/\"><code>rustls</code></a>\nis now the default TLS backend.\nRead the excellent blog posts from Sean and others on why <code>rustls</code>\nsafer and often faster than native TLS.\nIt's also a lot more convenient most of the time!</p>\n<h1><a href=\"#changes-to-certificate-verification\" aria-hidden=\"true\" class=\"anchor\" id=\"changes-to-certificate-verification\"></a>Changes to certificate verification</h1>\n<p>This post is about one of the more mundane parts of the release.\nPreviously there were a lot of somewhat confusing features related to certificate verification.\nThese have been condensed down to a smaller number of feature flags.\nThe summary of these changes took a bit to &quot;click&quot; for me so here's a rephrasing in my own words.</p>\n<ul>\n<li>By default, it uses the <a href=\"https://docs.rs/rustls-platform-verifier/latest/rustls_platform_verifier/\">native platform verifier</a>,\nwhich looks for root certificates in your system store, and inherits systemwide revocations and explicit trust settings\nin addition to the &quot;baseline&quot; root CAs trusted by your OS.</li>\n<li>The feature flag to enable WebPKI bundling of roots is gone.\nWebPKI is a bundle of CA root certificates trusted and curated by Mozilla.\nIt's a reasonably standard set, and most other trust stores look pretty similar.</li>\n<li>You can merge in your own <em>additionally</em> trusted root certificates using <a href=\"https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html#method.tls_certs_merge\"><code>tls_certs_merge</code></a>.</li>\n<li>You can be extra exclusive and use <a href=\"https://docs.rs/reqwest/latest/reqwest/struct.ClientBuilder.html#method.tls_certs_only\"><code>tls_certs_only</code></a>\nto limit verification to only the certificates you specify.</li>\n</ul>\n<p>The documentation and release notes also mention that <code>tls_certs_merge</code> is not always supported.\nI frankly have no idea what conditions cause this to be supported or not.\nBut <code>tls_certs_only</code> apparently can't fail. ¯\\_(ツ)_/¯</p>\n<h1><a href=\"#what-this-means-for-containerized-applications\" aria-hidden=\"true\" class=\"anchor\" id=\"what-this-means-for-containerized-applications\"></a>What this means for containerized applications</h1>\n<p>The reason I'm interested in this in mostly because at <code>$DAYJOB</code>, just about everything is deployed in containers.\nFor reasons that I don't fully understand (something about image size maybe??),\nthe popular container images like <code>debian:trixie-slim</code> <strong>do not include any root CAs</strong>.\nYou have to <code>apt-get install</code> them yourself.\nThis is to say that most TLS applications will straight up break in the out-of-the-box config.</p>\n<p>Previously I had seen this solved in two ways.\nThe first is to install the certs from your distribution's package manager like so:</p>\n<pre><code class=\"language-dockerfile\">RUN apt-get update \\\n &amp;&amp; apt-get install -y --no-install-recommends ca-certificates \\\n &amp;&amp; rm -rf /var/lib/apt/lists/*\n</code></pre>\n<p>The second is to add the WebPKI roots to your cargo dependencies.\nThis actually requires some manual work; adding the crate isn't enough.\nYou then have to add all of the roots (e.g. via <code>tls_certs_merge</code> or <code>tls_certs_only</code>).</p>\n<h1><a href=\"#which-approach-is-better\" aria-hidden=\"true\" class=\"anchor\" id=\"which-approach-is-better\"></a>Which approach is better?</h1>\n<p>The net result is <em>approximately</em> the same, but not entirely.\nThe system-level approach is more flexible.\nPresumably you would get updates in some cases without having to rebuild your application\n(though you do <em>not</em> get these automatically; the certs are only loaded once on app startup\nby <code>rustls_platform_verifier</code>!).\nPresumably you would also get any, say, enterprise-level trust, distrust, CRLs, etc.\nthat are dictated by your corporate IT department.</p>\n<p>The WebPKI approach on the other hand is baked at build time.\nThe <a href=\"https://docs.rs/webpki-root-certs/latest/webpki_root_certs/\">crate</a>\nhas a pretty strong, if slightly obtuse warning about this:</p>\n<blockquote>\n<p>This library is suitable for use in applications that can always be recompiled and instantly deployed. For applications that are deployed to end-users and cannot be recompiled, or which need certification before deployment, consider a library that uses the platform native certificate verifier such as <code>rustls-platform-verifier</code>. This has the additional benefit of supporting OS provided CA constraints and revocation data.</p>\n</blockquote>\n<p>Attempting to read between the lines, past that &quot;instantly deployed&quot; jargon,\nI think they are really just saying &quot;if you use this, certs are baked at compile time and you <em>never</em> get automatic updates. Be careful with that.&quot;</p>\n<p>So it's clear to me you shouldn't ship, say, a static binary to users with certs baked like this.\nBut I'm building server-side software.\nAnd as of February 2026, people look at you funny if you don't deploy using containers.\nI <em>can</em> deploy sufficiently instantly,\nthough to be honest I would have no idea <em>when</em> I should.\nMost apps get deployed frequently enough that I would assume this just doesn't matter,\nand so I'm not sure the warning as-written does much to help a lot of the Rust devs I know.</p>\n<h1><a href=\"#conclusion\" aria-hidden=\"true\" class=\"anchor\" id=\"conclusion\"></a>Conclusion</h1>\n<p>My conclusion is that if you're deploying containerized apps, there is approximately no functional difference.\nYour container is a static image anyways.\nThey don't typically run background tasks of any sort.\nAnd even if they did, the library won't reload the trusted store during application.\nSo it's functionally the same (delta any minor differences between WebPKI and Debian, which should be minimal).\nSimilarly, unless you work for a large enterprises / government,\nyou probably don't have mandated, hand-picked set of CAs and CRLs.\nSo again here there really is no difference as far as I can tell.</p>\n<p>In spite of that, I decided to switch away from using WebPKI in one of our containers that I upgraded.\nThe reason is that structuring this way\n(provided that the sources are copied from a previous layer!)\nensures that every image build always has the latest certs from Debian.\n<code>cargo build</code> is a lot more deterministic,\nand will use whatever you have in the lockfile unless you explicitly run <code>cargo update</code>.</p>\n<p>And even though I'm fortunate to not have an IT apparatus dictating cert policy today,\nyou never know... this approach seems to be both more flexible and creates a &quot;pit of success&quot;\nrather than a landmine where the trust store may not see an update for a year\ndespite regular rebuilds.</p>\n<p>In other words, I think Sean made the right choice, and you should <em>probably</em> delegate to the system,\nunless you have a particular reason to do otherwise.</p>\n<p>Hope this helps; I wrote this because I didn't understand the tradeoffs initially,\nand had some trouble parsing the existing writing on the subject.</p>\n",
      "summary": "",
      "date_published": "2026-02-13T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust",
        "cryptography"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//even-safer-rust-with-miri.html",
      "url": "https://ianwwagner.com//even-safer-rust-with-miri.html",
      "title": "Even Safer Rust with Miri",
      "content_html": "<p>Recently some of the Miri contributors published a <a href=\"https://plf.inf.ethz.ch/research/popl26-miri.html\">paper that was accepted to POPL</a>.\nI've been using Rust professionally for about 7 years now,\nand while I'd <em>heard of</em> Miri several times over the years,\nI think there's a wide lack of knowledge about what it does, and why anyone should care.\nI only recently started using it myself, so I'm writing this post to share\nwhat Miri is, why you should care, and how you can get started easily.</p>\n<h1><a href=\"#what-is-miri\" aria-hidden=\"true\" class=\"anchor\" id=\"what-is-miri\"></a>What is Miri?</h1>\n<p>Miri is an interpreter for Rust's mid-level intermediate representation (MIR; hence the acronym).\nThat's how I first remember seeing it described years ago,\nand that's what the GitHub project description still says.</p>\n<p>The latest README is a bit more helpful though: it's a tool for detecting <em>undefined behavior</em> (UB) in Rust code.\nIn other words, it helps you identify code that's unsafe or unsound.\nWhile it would be a bug to hit such behaviors in safe Rust,\nif you're using <code>unsafe</code> (or any of your dependency chain does!),\nthen this is a real concern!\nMiri has in fact even found soundness bugs in the Rust standard library,\nso even a transitive sort of <code>#![forbid(unsafe_code)]</code> won't help you.</p>\n<h1><a href=\"#what-is-ub-and-why-is-it-bad\" aria-hidden=\"true\" class=\"anchor\" id=\"what-is-ub-and-why-is-it-bad\"></a>What is UB (and why is it bad)?</h1>\n<p>I think to understand why Miri matters,\nwe first need to understand why UB is bad.\nThis is not something that most professional programmers have a great understanding of (myself included).</p>\n<p>In abstract, UB can mean &quot;anything that isn't specified&quot;, or something like that...\nBut that's not very helpful!\nAnd it doesn't really explain the stakes if we don't avoid it.\nThe Rust Reference has a <a href=\"https://doc.rust-lang.org/reference/behavior-considered-undefined.html\">list</a>\nof behaviors that are considered to be undefined in Rust,\nbut they note that this list is not exhaustive.</p>\n<p>When searching for a better understanding,\nI've seen people online make statements like\n&quot;UB means your program can do literally anything at this point, like launch nuclear missiles.&quot;\nWhile this is technically true, this isn't particularly helpful to most readers.\nI want something more concrete...</p>\n<p>The authors of the paper put UB's consequences in terms which really &quot;clicked&quot; for me\nusing a logical equivalence, which I'll quote here:</p>\n<blockquote>\n<p>Furthermore, Undefined Behavior is a massive security problem. Around 70% of critical security vulnerabilities are caused by memory safety violations [38, 18, 32], and all of these memory safety violations are instances of Undefined Behavior. After all, if the attacker overflows a buffer to eventually execute their own code, this is not something that the program does because the C or C++ specification says so—the specification just says that doing out-of-bounds writes (or overwriting the vtable, or calling a function pointer that does not actually point to a function, or doing any of the other typical first steps of an exploit chain) is Undefined Behavior, and executing the attacker’s code is just how Undefined Behavior happens to play out in this particular case.</p>\n</blockquote>\n<p>I never made this connection on my own.\nI equate UB most often with things like data races between threads,\nwhere you can have unexpected update visibility without atomics or locks.\nOr maybe torn reads of shared memory that's not properly synchronized.\nBut this is a new way of looking at it that makes the stakes more clear,\nespecially if you're doing anything with pointers.</p>\n<p>Another connection I never made previously is that UB is relative to a very specific context.\nHere's another quote from the paper:</p>\n<blockquote>\n<p>The standard random number crate used across the Rust ecosystem performed an unaligned memory access. Interestingly, the programmers seemed to have been aware that alignment is a problem in this case: there were dedicated code paths for x86 and for other architectures. Other architectures used read_unaligned, but the x86 code path had a comment saying that x86 allows unaligned reads, so we do not need to use this (potentially slower) operation. Unfortunately, this is a misconception: even though x86 allows unaligned accesses, Rust does not, no matter the target architecture—and this can be relevant for optimizations.</p>\n</blockquote>\n<p>This is REALLY interesting to me!\nIt makes sense in retrospect, but it's not exactly obvious.\nLanguages are free to define their own semantics in addition to or independently of hardware.\nI suspect Rust's specification here is somehow related to its concept of allocations\n(which the paper goes into more detail about).</p>\n<p>It is obviously not &quot;undefined&quot; what the hardware will do when given a sequence of instructions.\nBut it <em>is</em> undefined in Rust, which controls how those instructions are generated.\nAnd here the Rust Reference is explicit in calling this UB.\n(NOTE: I don't actually know what the &quot;failure modes&quot; are here, but you can imagine they could be very bad\nsince it could enable the compiler to make a bad assumption that leads to a program correctness or memory safety vulnerability.)</p>\n<p>I actually encountered the same confusion re: what the CPU guarantees vs what Rust guarantees for unaligned reads in <a href=\"https://github.com/stadiamaps/valinor/blob/5e75b2b8267cee2a57d4f22fcc5605728e0cf76e/valhalla-graphtile/src/graph_tile.rs#L857\">one of my own projects</a>,\nas a previous version of this function didn't account for alignment.\nI addressed the issue by using the native zerocopy <a href=\"https://docs.rs/zerocopy/latest/zerocopy/byteorder/struct.U32.html\"><code>U32</code></a> type,\nwhich is something I'd have needed to do anyways to ensure correctness regardless of CPU endianness.\n(If you need to do something like this at a lower level for some reason, there's a <a href=\"https://doc.rust-lang.org/std/ptr/fn.read_unaligned.html\"><code>read_unaligned</code> function in <code>std::ptr</code></a>).</p>\n<p>TL;DR - UB is both a correctness and a security issue, so it's really bad!</p>\n<h1><a href=\"#using-miri-for-great-good\" aria-hidden=\"true\" class=\"anchor\" id=\"using-miri-for-great-good\"></a>Using Miri for great good</h1>\n<p>One of the reasons I write pretty much everything that I can in Rust is because\nit naturally results in more correct and maintainable software.\nThis is a result of the language guarantees of safe Rust,\nthe powerful type system,\nand the whole ecosystem of excellent tooling.\nIt's a real <a href=\"https://blog.codinghorror.com/falling-into-the-pit-of-success/\">pit of success</a> situation.</p>\n<p>While you can run a program under Miri as a one-shot test,\nthis isn't a practical approach to ensuring correctness long-term.\nMiri is a <em>complementary</em> tool to existing things that you should be doing already.\nAutomated testing is the most obvious one,\nbut fuzzing and other strategies may also be relevant for you.</p>\n<p>If you're already running automated tests in CI, adding Miri is easy.\nHere's an example of how I use it in GitHub actions:</p>\n<pre><code class=\"language-yaml\">steps:\n    - uses: actions/checkout@v4\n    - uses: taiki-e/install-action@nextest\n\n    - name: Build workspace\n      run: cargo build --verbose\n\n    - name: Run tests\n      run: cargo nextest run --no-fail-fast\n\n    - name: Run doc tests (not currently supported by nextest https://github.com/nextest-rs/nextest/issues/16)\n      run: cargo test --doc\n\n    - name: Install big-endian toolchain (s390x)\n      run: rustup target add s390x-unknown-linux-gnu\n\n    - name: Install s390x cross toolchain and QEMU (Ubuntu only)\n      run: sudo apt-get update &amp;&amp; sudo apt-get install -y gcc-s390x-linux-gnu g++-s390x-linux-gnu libc6-dev-s390x-cross qemu-user-static\n\n    - name: Run tests (big-endian s390x)\n      run: cargo nextest run --no-fail-fast --target s390x-unknown-linux-gnu\n\n    - name: Install Miri\n      run: rustup +nightly component add miri\n\n    - name: Run tests in Miri\n      run: cargo +nightly miri nextest run --no-fail-fast\n      env:\n        RUST_BACKTRACE: 1\n        MIRIFLAGS: -Zmiri-disable-isolation\n\n    - name: Run doc tests in Miri\n      run: cargo +nightly miri test --doc\n      env:\n        RUST_BACKTRACE: 1\n        MIRIFLAGS: -Zmiri-disable-isolation\n\n    - name: Install nightly big-endian toolchain (s390x)\n      run: rustup +nightly target add s390x-unknown-linux-gnu\n\n    - name: Run tests in Miri (big-endian s390x)\n      run: cargo +nightly miri nextest run --no-fail-fast --target s390x-unknown-linux-gnu\n      env:\n        RUST_BACKTRACE: 1\n        MIRIFLAGS: -Zmiri-disable-isolation\n</code></pre>\n<p>I know that's a bit longer than what you'll find in the README,\nbut I wanted to highlight my usage in a more complex codebase\nsince these examples are less common.\n(NOTE: I assume an Ubuntu runner here, since Linux has the best support for Miri right now.)\nSome things to highlight:</p>\n<ul>\n<li>I use <a href=\"https://nexte.st/\">nextest</a>, which is significantly faster for large suites. (NOTE: It <a href=\"https://github.com/nextest-rs/nextest/issues/16\">does not support doc tests</a> at the time of this writing).</li>\n<li>I pass some <code>MIRIFLAGS</code> to disable host isolation for my tests, since they require direct filesystem access. You may not need this for your project, but I do for mine.</li>\n<li>Partly because I can, and partly because big-endian CPUS do still exist, I do tests under two targets. Miri is capable of doing this with target flags, which is REALLY cool, and the <code>s390x-unknown-linux-gnu</code> is the &quot;big-endian target of choice&quot; from the Miri authors. This requires a few dependencies and flags.</li>\n<li>Note that cargo doc tests <a href=\"https://github.com/rust-lang/cargo/issues/6460\">do not support building for alternate targets</a>.</li>\n</ul>\n<p>Hopefully you learned something from this post.\nI'm pretty sure I wrote my first line of unsafe Rust less than a year ago\n(after using it professionally for over 6 years prior),\nso even if you don't need this today, file it away for later.\nAs I said at the start, I'm still not an expert,\nso if you spot any errors, please reach out to me on Mastodon!</p>\n",
      "summary": "",
      "date_published": "2026-01-07T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust",
        "software-reliability"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//const-assertions.html",
      "url": "https://ianwwagner.com//const-assertions.html",
      "title": "Const Assertions",
      "content_html": "<p>I'm currently working on a <a href=\"https://github.com/stadiamaps/valinor\">project</a> which involves a lot of lower level\ndata structures.\nBy lower level I mean things like layout and bit positions, and exact sizes being important.\nAs such, I have a number of pedantic lints enabled.</p>\n<p>One of the lints I use is <a href=\"https://rust-lang.github.io/rust-clippy/master/index.html#cast_precision_loss\"><code>cast_precision_loss</code></a>.\nFor example, casting from <code>usize</code> to <code>f32</code> using the <code>as</code> keyword is not guaranteed to be exact,\nsince <code>f32</code> can only precisely represent integrals up to 23 bits of precision (due to how floating point is represented).\nAbove this you can have precision loss.</p>\n<p>This lint is pedantic because it can generate false positives where you <em>know</em> the input can't ever exceed some threshold.\nBut wouldn't it be nice if we could go from &quot;knowing&quot; we can safely disable a lint to actually <em>proving</em> it?</p>\n<p>The first thing that came to mind was runtime assertions, but this is kind of ugly.\nIt requires that we actually exercise the code at runtime, for one.\nWe <em>should</em> be able to cover this in unit tests, but even if we do that,\nan assertion isn't as good as a compile time guarantee.</p>\n<h1><a href=\"#const\" aria-hidden=\"true\" class=\"anchor\" id=\"const\"></a><code>const</code></h1>\n<p>One thing I didn't mention, and the reason I &quot;know&quot; that the lint would be fine is that I'm using a <code>const</code> declaration.\nHere's a look at what that's like:</p>\n<pre><code class=\"language-rust\">pub const BUCKET_SIZE_MINUTES: u32 = 5;\npub const BUCKETS_PER_WEEK: usize = (7 * 24 * 60) as usize / BUCKET_SIZE_MINUTES as usize;\n</code></pre>\n<p>This isn't the same as a <code>static</code> or a <code>let</code> binding.\n<code>const</code> expressions are actually evaluated at compile time.\n(Well, most of the time... there's a funny edge case where <code>const</code> blocks which can <em>never</em> executed at runtime\n<a href=\"https://doc.rust-lang.org/reference/expressions/block-expr.html#const-blocks\">is not guaranteed to evaluate</a>.)</p>\n<p>You can't do everything in <code>const</code> contexts, but you can do quite a lot, including many kinds of math.\nNot all math; some things like square root and trigonometry are not yet usable in <code>const</code> contexts\nsince they are not reproducible across architectures (and sometimes even on the same machine, it seems).</p>\n<h1><a href=\"#assert-in-a-const-block\" aria-hidden=\"true\" class=\"anchor\" id=\"assert-in-a-const-block\"></a><code>assert!</code> in a <code>const</code> block</h1>\n<p>And now for the cool trick!\nI want to do a division here, and to do so, I need to ensure the types match.\nThis involves casting a <code>usize</code> to <code>f32</code>,\nwhich can cause truncation as noted above.</p>\n<p>But since <code>BUCKETS_PER_WEEK</code> is a constant value,\nwe can actually do an assertion against it <em>in our <code>const</code> context</em>.\nThis lets us safely enable the lint, while ensuring we'll get a compile-time error if this ever changes!\nThis has no runtime overhead.</p>\n<pre><code class=\"language-rust\">#[allow(clippy::cast_precision_loss, reason = &quot;BUCKETS_PER_WEEK is always &lt;= 23 bits&quot;)]\nconst PI_BUCKET_CONST: f32 = {\n    // Asserts the invariant; panics at compile time if violated\n    assert!(BUCKETS_PER_WEEK &lt; 2usize.pow(24));\n\t// Computes the value\n    std::f32::consts::PI / BUCKETS_PER_WEEK as f32\n};\n</code></pre>\n<p>This is all possible in stable Rust at the time of this writing (tested on 1.89).\nI saw some older crates out there which appeared to do this,\nbut as far as I can tell, they are no longer necessary.</p>\n<p>Here's a <a href=\"https://play.rust-lang.org/?version=stable&amp;mode=debug&amp;edition=2024&amp;gist=dd294501c156f8d67f72a21f7dea27c4\">Rust Playground</a>\npreloaded with the sample code\nwhere you can verify that changing <code>BUCKETS_PER_WEEK</code> to a disallowed value causes a compile-time error.</p>\n",
      "summary": "",
      "date_published": "2025-10-07T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//optimizing-rust-builds-with-target-flags.html",
      "url": "https://ianwwagner.com//optimizing-rust-builds-with-target-flags.html",
      "title": "Optimizing Rust Builds with Target Flags",
      "content_html": "<p>Recently I've been doing some work using <a href=\"https://datafusion.apache.org/\">Apache DataFusion</a> for some high-throughput data pipelines.\nOne of the interesting things I noticed on the user guide was the suggestion to set\n<code>RUSTFLAGS='-C target-cpu=native'</code>.\nThis is actually a pretty common optimization (which I periodically forget about and rediscover),\nso I thought I'd do a quick writeup on this.</p>\n<h1><a href=\"#background-cpu-features\" aria-hidden=\"true\" class=\"anchor\" id=\"background-cpu-features\"></a>Background: CPU features</h1>\n<p>A compiler translates your &quot;idiomatic&quot; code into low-level instructions.\nModern optimizing compilers are pretty good at figuring out ways to cleverly rewrite your code\nto make it faster, while still being functionally equivalent at execution time.\nThe instructions may be reordered from what your simple mental model expects,\nand they may even have no resemblance.\nThis includes rewriting some loop-like (or iterator) patterns into &quot;vectorized&quot; code using SIMD instructions\nthat perform some operation on multiple values at once.</p>\n<p>Special instruction families like this often vary within a single architecture,\nwhich may be surprising at first.\nThe compiler can be configured to enable (or disable!) specific &quot;features&quot;,\noptimizing for compatibility or speed.</p>\n<p>In <code>rustc</code>, each <em>target triple</em> has a default set of CPU features enabled.\nIn the case of my work laptop, that's <code>aarch64-apple-darwin</code>.\nSince this architecture doesn't have a lot of variation among chips,\nthe compiler can make some pretty good assumptions about what's available.\n(In fact, for my specific CPU, the M1 Max, it's perfect!)\nBut we'll soon see this is not the case for the most common target: x86_64 Linux.</p>\n<h1><a href=\"#checking-available-features\" aria-hidden=\"true\" class=\"anchor\" id=\"checking-available-features\"></a>Checking available features</h1>\n<p>To figure out what features we could theoretically enable,\nwe need some CPU info from the machine we intend to deploy on.\nThe canonical way of checking CPU features on Linux is probably to <code>cat /proc/cpuinfo</code>.\nThis gives a lot more output that you probably need though.\nHelpfully, <code>rustc</code> includes a simple command that shows you the config\nfor the native CPU capabilities: <code>rustc --print=cfg -C target-cpu=native</code>.\nHere's what it looks like on one Linux machine:</p>\n<pre><code>debug_assertions\npanic=&quot;unwind&quot;\ntarget_abi=&quot;&quot;\ntarget_arch=&quot;x86_64&quot;\ntarget_endian=&quot;little&quot;\ntarget_env=&quot;gnu&quot;\ntarget_family=&quot;unix&quot;\ntarget_feature=&quot;adx&quot;\ntarget_feature=&quot;aes&quot;\ntarget_feature=&quot;avx&quot;\ntarget_feature=&quot;avx2&quot;\ntarget_feature=&quot;bmi1&quot;\ntarget_feature=&quot;bmi2&quot;\ntarget_feature=&quot;cmpxchg16b&quot;\ntarget_feature=&quot;f16c&quot;\ntarget_feature=&quot;fma&quot;\ntarget_feature=&quot;fxsr&quot;\ntarget_feature=&quot;lzcnt&quot;\ntarget_feature=&quot;movbe&quot;\ntarget_feature=&quot;pclmulqdq&quot;\ntarget_feature=&quot;popcnt&quot;\ntarget_feature=&quot;rdrand&quot;\ntarget_feature=&quot;rdseed&quot;\ntarget_feature=&quot;sse&quot;\ntarget_feature=&quot;sse2&quot;\ntarget_feature=&quot;sse3&quot;\ntarget_feature=&quot;sse4.1&quot;\ntarget_feature=&quot;sse4.2&quot;\ntarget_feature=&quot;ssse3&quot;\ntarget_feature=&quot;xsave&quot;\ntarget_feature=&quot;xsavec&quot;\ntarget_feature=&quot;xsaveopt&quot;\ntarget_feature=&quot;xsaves&quot;\ntarget_has_atomic=&quot;16&quot;\ntarget_has_atomic=&quot;32&quot;\ntarget_has_atomic=&quot;64&quot;\ntarget_has_atomic=&quot;8&quot;\ntarget_has_atomic=&quot;ptr&quot;\ntarget_os=&quot;linux&quot;\ntarget_pointer_width=&quot;64&quot;\ntarget_vendor=&quot;unknown&quot;\nunix\n</code></pre>\n<p><del>Aside: I'm not quite sure why, but this isn't a 1:1 match with <code>/proc/cpuinfo</code> on this box!\nIt definitely does support some AVX512 instructions,\nbut those don't show up in the native CPU options.\nIf anyone knows why, let me know!</del></p>\n<p><strong>UPDATE:</strong> Shortly after publishing this post, Rust 1.89 was released.\n<a href=\"https://github.com/rust-lang/rust/pull/138940\">This PR</a> linked in the release notes caught my eye.\nApparently the target features for AVX512 were not actually stable at the time of writing,\nbut they are now.\nRe-running the above command with version 1.89 of rustc now includes the AVX512 instructions.</p>\n<h1><a href=\"#checking-the-default-features\" aria-hidden=\"true\" class=\"anchor\" id=\"checking-the-default-features\"></a>Checking the default features</h1>\n<p>Perhaps the more interesting question which motivates this investigation is\nwhat the <em>defaults</em> are.\nYou can get this with <code>rustc --print cfg</code>.\nThis shows what you get when you run <code>cargo build</code> without any special configuration.\nHere's the output for the same machine:</p>\n<pre><code>debug_assertions\npanic=&quot;unwind&quot;\ntarget_abi=&quot;&quot;\ntarget_arch=&quot;x86_64&quot;\ntarget_endian=&quot;little&quot;\ntarget_env=&quot;gnu&quot;\ntarget_family=&quot;unix&quot;\ntarget_feature=&quot;fxsr&quot;\ntarget_feature=&quot;sse&quot;\ntarget_feature=&quot;sse2&quot;\ntarget_has_atomic=&quot;16&quot;\ntarget_has_atomic=&quot;32&quot;\ntarget_has_atomic=&quot;64&quot;\ntarget_has_atomic=&quot;8&quot;\ntarget_has_atomic=&quot;ptr&quot;\ntarget_os=&quot;linux&quot;\ntarget_pointer_width=&quot;64&quot;\ntarget_vendor=&quot;unknown&quot;\nunix\n</code></pre>\n<p>Well, that's disappointing, isn't it?\nBy default, you'd only get up to SSE2, which is over 20 years old by now!\nThis is a consequence of the diversity of the <code>x86_64</code> architecture.\nIf you want your binary to run <em>everywhere</em>, this is the price you'd have to pay.</p>\n<h1><a href=\"#enabling-features-individually\" aria-hidden=\"true\" class=\"anchor\" id=\"enabling-features-individually\"></a>Enabling features individually</h1>\n<p>While <code>-C target-cpu=native</code> will usually make your code faster on the build machine,\na lot of modern software is built by a CI pipeline on cheap runners, but deployed elsewhere.\nTo reliably target a specific set of features, use the <code>target-feature</code> flag.\nThis lets you specifically enable features you know will be available on the machine running the code.\nHere's an example of <code>RUSTFLAGS</code> that incorporates all of the above features.\nThis should enable builds to proceed from <em>any</em> other x86_64 Linux machine and while producing a binary\nthat supports the exact features of the deployment machine.</p>\n<pre><code class=\"language-shell\">RUSTFLAGS=&quot;-C target-feature=+adx,+aes,+avx,+avx2,+bmi1,+bmi2,+cmpxchg16b,+f16c,+fma,+fxsr,+lzcnt,+movbe,+pclmulqdq,+popcnt,+rdrand,+rdseed,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+ssse3,+xsave,+xsavec,+xsaveopt,+xsaves&quot;\n</code></pre>\n<h1><a href=\"#enabling-features-by-x86-microarchitecture-level\" aria-hidden=\"true\" class=\"anchor\" id=\"enabling-features-by-x86-microarchitecture-level\"></a>Enabling features by x86 microarchitecture level</h1>\n<p>A few days after writing this, I accidentally stumbled upon something else when working out target flags\nfor a program I knew would have wider support across several datacenters.\nIt sure would be nice if there were some &quot;groups&quot; of commonly supported features, right?</p>\n<p>Turns out this exists, and it was staring right at me in the CPU list: microarchitecture levels!\nIf you list out all the available target CPUs via <code>rustc --print target-cpus</code> on a typical x86_64 Linux box,\nyou'll see that your default target CPU is <code>x86-64</code>.\nThis means it will run on all x86_64 CPUs, and as we discussed above, this doesn't give much of a baseline.\nBut there are 4 versions in total, going up to <code>x86-64-v4</code>.\nIt turns out that AMD, Intel, RedHat, and SUSE got together in 2020 to define these,\nand came up with some levels which are specifically designed for our use case of optimizing compilers!\nYou can find the <a href=\"https://en.wikipedia.org/wiki/X86-64\">full list of supported features by level on Wikipedia</a>\n(search for &quot;microarchitecture levels&quot;).</p>\n<p><code>rustc --print target-cpus</code> will also tell you which <em>specific</em> CPU you're on.\nYou can use this info to find which &quot;level&quot; you support.\nBut a more direct way to map to level support is to run <code>/lib64/ld-linux-x86-64.so.2 --help</code>.\nThanks, internet!\nYou'll get some output like this on a modern CPU:</p>\n<pre><code>Subdirectories of glibc-hwcaps directories, in priority order:\n  x86-64-v4 (supported, searched)\n  x86-64-v3 (supported, searched)\n  x86-64-v2 (supported, searched)\n</code></pre>\n<p>And if you run on slightly older hardware, you might get something like this:</p>\n<pre><code>Subdirectories of glibc-hwcaps directories, in priority order:\n  x86-64-v4\n  x86-64-v3 (supported, searched)\n  x86-64-v2 (supported, searched)\n</code></pre>\n<p>This should help if you're trying to aim for broader distribution rather than enabling specific features for some known host.\nThe line to target an x86_64 microarch level is a lot shorter.\nFor example:</p>\n<pre><code>RUSTFLAGS=&quot;-C target-cpu=x86-64-v3&quot;\n</code></pre>\n<p><strong>NOTE:</strong> As mentioned above, Rust 1.89 was released shortly after this post.\nThis incidentally brings support for AVX512 CPU features in the <code>x86-64-v4</code> target CPU,\nwhich were previously marked unstable.</p>\n<h1><a href=\"#dont-forget-to-measure\" aria-hidden=\"true\" class=\"anchor\" id=\"dont-forget-to-measure\"></a>Don't forget to measure!</h1>\n<p>Enabling CPU features doesn't always make things faster.\nIn fact, in some cases, it can even do the opposite!\nThis <a href=\"https://internals.rust-lang.org/t/slower-code-with-c-target-cpu-native/17315\">thread</a>\nhas some interesting anecdotes.</p>\n<h1><a href=\"#summary-of-helpful-commands\" aria-hidden=\"true\" class=\"anchor\" id=\"summary-of-helpful-commands\"></a>Summary of helpful commands</h1>\n<p>In conclusion, here's a quick reference of the useful commands we covered:</p>\n<ul>\n<li><code>rustc --print cfg</code> - Shows the compiler configuration that your toolchain will use by default.</li>\n<li><code>rustc --print=cfg -C target-cpu=native</code> - List the configuration if you were to specifically target for your CPU. Use this to see the delta between the defaults and the featurse supported for a specific CPU.</li>\n<li><code>rustc --print target-cpus</code> - List all known target CPUs. This also tells you what your current CPU and what the default CPU is for your current toolchain.</li>\n<li><code>/lib64/ld-linux-x86-64.so.2 --help</code> - Specifically for x86_64 Linux users, will show you what microarchitecture levels your CPU supports.</li>\n<li><code>rustc --print target-features</code> - List <em>all available</em> target features with a short description. You can scope to a specific CPU with <code>-C target-cpu=</code>. Useful mostly to see what you're missing, I guess.</li>\n</ul>\n",
      "summary": "",
      "date_published": "2025-07-28T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust",
        "devops"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//ownership-benefits-beyond-memory-safety.html",
      "url": "https://ianwwagner.com//ownership-benefits-beyond-memory-safety.html",
      "title": "Ownership Benefits Beyond Memory Safety",
      "content_html": "<p>Rust's ownership system is well-known for the ways it enforces memory safety guaranteees.\nFor example, you can't use some value after it's been freed.\nFurther, it also ensures that mutability is explicit,\nand it enforces some extra rules that make <em>most</em> data races impossible.\nBut the ownership system has benefits beyond this which don't get as much press.</p>\n<p>Let's look at a fairly common design pattern: the builder.\nA builder typically takes zero or few arguments to create.\nIn Rust, it's often implemented as a <code>struct</code> that implements the <code>Default</code> trait.\nThen, you progressively &quot;chain&quot; method invocations to ergonomically\nto specify how to build the thing you want.\nFor example:</p>\n<pre><code class=\"language-rust\">let client = reqwest::Client::builder()\n    .user_agent(APP_USER_AGENT)\n    .timeout(Duration::from_secs(3))\n    .build()?;\n</code></pre>\n<p>This pattern is useful because it avoids bloated constructors with dozens of arguments.\nIt also lets you encode <strong>failability</strong> into the process:\nsome combinations of arguments may be invalid.</p>\n<p>If you look at the signature for the <code>timeout</code> function,\nyou'll find it takes <code>self</code> as its first parameter and returns a value of the same type.\nThe key thing to note is that a non-reference <code>self</code> parameter\nwill &quot;consume&quot; the receiver!\nSince it takes ownership, you can't hold on to a reference to the original value!</p>\n<p>This prevents a whole class of subtle bugs.\nPython, for example, doesn't prevent you from modifying inputs,\nand it's not always clear if a function/method is supposed to return a new value,\nwhether that value has the same contents as the original reference (which is still valid!)\nor if it's completely fresh,\nand so on.</p>\n<p>A few other languages, mostly in the purely functional tradition (Haskell comes to mind)\nalso have a similar property.\nThey don't use a concept of &quot;ownership&quot; but rather remove mutability from the language.\nRust makes what I consider to be a nice compromise\nwhich retains most of the benefits while being easier to use.</p>\n<p>In summary, the borrow checker is a powerful ally,\nand you can leverage it to make truly better APIs,\nsaving hours of debugging in the future.</p>\n",
      "summary": "",
      "date_published": "2025-05-31T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust",
        "functional programming"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//unicode-normalization.html",
      "url": "https://ianwwagner.com//unicode-normalization.html",
      "title": "Unicode Normalization",
      "content_html": "<p>Today I ran into an <a href=\"https://www.openstreetmap.org/node/9317391311/history/2\">amusingly named place</a>,\nthanks to some sharp eyes on the OpenStreetMap US Slack.\nThe name of this restaurant is listed as &quot;𝐊𝐄𝐁𝐀𝐁 𝐊𝐈𝐍𝐆 𝐘𝐀𝐍𝐆𝐎𝐍&quot;.\nThat isn't some font trickery; it's a bunch of Unicode math symbols\ncleverly used to emphasize the name.\n(Amusingly, this does not actually show up properly on most maps, but that's another story for another post).</p>\n<p>I was immediately curious how well the geocoder I spent the last few months building handles this.</p>\n<p><figure><img src=\"media/kebab-king-duplicates.png\" alt=\"A screenshot of a search result list showing two copies of the Kebab King Yangon, one in plain ASCII and the other using the math symbols\" /></figure></p>\n<p>Well, at least it found the place, despite the very SEO-unfriendly name!\nBut what's up with the second search result?</p>\n<p>Well, that's a consequence of us pulling in data from multiple sources.\nIn this case, the second result comes from the <a href=\"https://opensource.foursquare.com/os-places/\">Foursquare OS Places</a> dataset.\nIt seems that either the Placemaker validators decided to clean this up,\nor the Foursquare user who added the place didn't have that key on their phone keyboard.</p>\n<p>One of the things our geocoder needs to do when combining results is deduplicating results.\n(Beyond that, it needs to decide which results to keep, but that's a much longer post!)\nWe use a bunch of factors to make that decision, but one of them is roughly\n&quot;does this place have the same name,&quot; where <em>same</em> is a bit fuzzy.</p>\n<p>One of the ways we can do this is normalizing away things like punctuation and diacritics.\nThese are quite frequently inconsistent across datasets, so two nearby results with similar enough names\nare <em>probably</em> the same place.\nFortunately, Unicode provides a few standardized transformations into canonical forms\nthat make this easier.</p>\n<h1><a href=\"#composed-and-decomposed-characters\" aria-hidden=\"true\" class=\"anchor\" id=\"composed-and-decomposed-characters\"></a>Composed and decomposed characters</h1>\n<p>What we think of as a &quot;character&quot; does not necessarily have a single representation in Unicode.\nFor example, there are multiple ways of encoding &quot;서울&quot; which will look the same when rendered,\nbut have a different binary representation.\nThe Korean writing system is perhaps a less familiar case for many,\nbut characters with diacritical marks such as accents are usually the same.\nThey can be either &quot;composed&quot; or &quot;decomposed&quot; into the component parts\nat the binary level.</p>\n<p>This composition and decomposition transform is useful for (at least) two reasons:</p>\n<ol>\n<li>It gives us a consistent form that allows for easy string comparison when multiple valid encodings exist.</li>\n<li>It lets us strip away parts that we don't want to consider in a comparison, like diacritics.</li>\n</ol>\n<p>I use the <a href=\"https://docs.rs/unicode-normalization/latest/unicode_normalization/\"><code>unicode_normalization</code></a> crate\nto do this &quot;decompose and filter&quot; operation.\nSpecifically, the <a href=\"https://docs.rs/unicode-normalization/latest/unicode_normalization/trait.UnicodeNormalization.html\"><code>UnicodeNormalization</code> trait</a>,\nwhich has helpers which will work on most string-like types.</p>\n<h1><a href=\"#normalization-forms\" aria-hidden=\"true\" class=\"anchor\" id=\"normalization-forms\"></a>Normalization forms</h1>\n<p>You might notice there are four confusingly named methods in the trait:\n<code>nfd</code>, <code>nfkd</code>, <code>nfc</code>, and <code>nfkc</code>.\nThe <code>nf</code> stands for &quot;normalization form&quot;.\nThese functions <em>normalize</em> your strings.\n<code>c</code> and <code>d</code> stand for composition and decomposition.\nThe composed form is, roughly, the more compact form,\nwhereas the decomposed form is the version where you separate the base from the modifiers,\nthe <a href=\"https://en.wikipedia.org/wiki/List_of_Hangul_jamo\">jamo</a> from the syllables, etc.</p>\n<p>We were already decomposing strings so that we could remove the diacritics, using form NFD.\nThis works great for diacritics and even Hangul,\nbut 𝐊𝐄𝐁𝐀𝐁 𝐊𝐈𝐍𝐆 𝐘𝐀𝐍𝐆𝐎𝐍 shows that we were missing something.</p>\n<p>That something is the <code>k</code>, which stands for &quot;compatibility.&quot;\nYou can refer to <a href=\"https://www.unicode.org/reports/tr15/#Canon_Compat_Equivalence\">Unicode Standard Annex #15</a>\nfor a full definition,\nbut the intuition is that <em>compatibility</em> equivalence of two characters\nis a bit more permissive than the stricter <em>canonical</em> equivalence.\nBy reducing two characters (or strings) to their canonical form,\nyou will be able to tell if they represent the same &quot;thing&quot; with the same visual appearance,\nbehavior, semantic meaning, etc.\nCompatibility equivalence is a weaker form.</p>\n<p>Compatibility equivalence is extremely useful in our quest for determining whether two nearby place names\nare a fuzzy match.\nIt reduces things like ligatures, superscripts, and width variations into a standard form.\nIn the case of &quot;𝐊𝐄𝐁𝐀𝐁 𝐊𝐈𝐍𝐆 𝐘𝐀𝐍𝐆𝐎𝐍,&quot; compatibility decomposition transforms it into the ASCII\n&quot;KEBAB KING YANGON.&quot;\nAnd now we can correctly coalesce the available information into a single search result.</p>\n<p>Hopefully this shines a light on one small corner of the complexities of unicode!</p>\n",
      "summary": "",
      "date_published": "2025-05-09T00:00:00-00:00",
      "image": "media/kebab-king-duplicates.png",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "unicode",
        "rust"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//databases-as-an-alternative-to-application-logging.html",
      "url": "https://ianwwagner.com//databases-as-an-alternative-to-application-logging.html",
      "title": "Databases as an Alternative to Application Logging",
      "content_html": "<p>In my <a href=\"https://stadiamaps.com/\">work</a>, I've been doing a lot of ETL pipeline design recently for our geocoding system.\nThe system processes on the order of a billion records per job,\nand failures are part of the process.\nWe want to log these.</p>\n<p>Most applications start by dumping logs to <code>stderr</code>.\nUntil they overflow their terminal scrollback buffer.\nThe next step is usually text files.\nBut getting insights from 10k+ of lines of text with <code>grep</code> is a chore.\nIt may even be impossible unless you've taken extra care with how your logs are formatted.</p>\n<p>In this post we'll explore some approcahes to do application logging better.</p>\n<h1><a href=\"#structured-logging\" aria-hidden=\"true\" class=\"anchor\" id=\"structured-logging\"></a>Structured logging</h1>\n<p>My first introduction to logs with a structural element was probably Logcat for Android.\nLogcat lets you filter the fire hose of Android logs down to a specific application,\nand can even refine the scope further if you learn how to use it.\nLogcat is a useful tool, but fundamentally all it can do is <em>filter</em> logs from a stream\nand it has most of the same drawbacks as grepping plain text files.</p>\n<p>Larger systems often benefit from something like the <code>tracing</code> crate,\nwhich integrates with services like <code>journald</code> and Grafana Loki.\nThis is a great fit for a long-running <em>service</em>,\nbut is total overkill for an application that does some important stuff ™\nand exits.\nLike our ETL pipeline example.</p>\n<p>(Aside: I have a love/hate relationship with <code>journalctl</code>.\nI mostly interact with it through Ctrl+R in my shell history,\nwhich is problemating when connecting to a new server.\nBut it does have the benefit of being a nearly ubiquitous local structured logging system!)</p>\n<h1><a href=\"#databases-for-application-logs\" aria-hidden=\"true\" class=\"anchor\" id=\"databases-for-application-logs\"></a>Databases for application logs</h1>\n<p>Using a database as an application log can be a brilliant level up for many applications\nbecause you can actually <em>query</em> your logs with ease.\nI'll give a few examples, and then show some crazy cool stuff you can do with that.</p>\n<p>One type of failure we frequently encounter is metadata that looks like a URL where it shouldn't be.\nFor example, the name of a shop being <code>http://spam.example.com/</code>,\nor having a URL in an address or phone number field.\nIn this case, we usually drop the record, but we also want to log it so we can clean up the source data.\nSome other common failures are missing required fields, data in the wrong format, and the like.</p>\n<h2><a href=\"#a-good-schema-enables-analytics\" aria-hidden=\"true\" class=\"anchor\" id=\"a-good-schema-enables-analytics\"></a>A good schema enables analytics</h2>\n<p>Rather than logging these to <code>stderr</code> or some plain text files, we write to a DuckDB database.\nThis has a few benefits beyond the obvious.\nFirst, using a database forces you to come up with a schema.\nAnd just like using a language with types, this forces you to clarify your thinking a bit upfront.\nIn our case, we log things like the original data source, an ID, a log level (warn, error, info, etc.),\na failure code, and additional details.</p>\n<p>From here, we can do meaningufl <em>analytical</em> queries like\n&quot;how many records were dropped due to invalid geographic coordinates&quot;\nor &quot;how many records were rejected due to metadata mismatches&quot;\n(ex: claiming to be a US address but appearing in North Korea).</p>\n<h2><a href=\"#cross-dataset-joins-anyone\" aria-hidden=\"true\" class=\"anchor\" id=\"cross-dataset-joins-anyone\"></a>Cross-dataset joins, anyone?</h2>\n<p>If this query uncovers a lot of rejected records from one data source,\nwouldn't it be nice if we could look at a sample?\nWe have the IDs right there in the log, and the data source identifier, after all.\nBut since we're in DuckDB rather than a plain text file,\nwe can pretty much effortlessly join on the data files!\n(This assumes that your data is in some halfway sane format like JSON, CSV, Parquet, or even another database).</p>\n<p>We can even take this one step further and compare logs across imports!\nWhat's up with that spike in errors compared to last month's release from that data source?</p>\n<p>These are the sort of insights which are almost trivial to uncover when your log is a database.</p>\n<h1><a href=\"#practical-bits\" aria-hidden=\"true\" class=\"anchor\" id=\"practical-bits\"></a>Practical bits</h1>\n<p>Now that I've described all the awesome things you can do,\nlet's get down to the practical questions like how you'd do this in your app.\nMy goals for the code were to make it easy to use and impossible to get wrong at the use site.\nFortunately that's pretty easy in Rust!</p>\n<pre><code class=\"language-rust\">#[derive(Clone)]\npub struct ImportLogger {\n    pool: Pool&lt;DuckdbConnectionManager&gt;,\n    // Implementation detail for our case: we have multiple ETL importers that share code AND logs.\n    // If you have any such attributes that will remain fixed over the life of a logger instance,\n    // consider storing them as struct fields so each event is easier to log.\n    importer_name: String,\n}\n</code></pre>\n<p>Pretty standard struct setup using DuckDB and <a href=\"https://github.com/sfackler/r2d2\"><code>r2d2</code></a> for connection pooling.\nWe put this in a shared logging crate in a workspace containing multiple importers.\nThe <code>importer_name</code> is a field that will get emitted with every log,\nand doesn't change for a logger instance.\nIf your logging has any such attributes (ex: a component name),\nstoring them as struct fields makes each log invocation easier!</p>\n<div class=\"markdown-alert markdown-alert-note\">\n<p class=\"markdown-alert-title\">Note</p>\n<p>At the time of this writing, I couldn't find any async connection pool integrations for DuckDB.\nIf anyone knows of one (or wants to add it to <a href=\"https://github.com/djc/bb8\"><code>bb8</code></a>), let me know!</p>\n</div>\n<pre><code class=\"language-rust\">pub fn new(config: ImportLogConfig, importer_name: String) -&gt; anyhow::Result&lt;ImportLogger&gt; {\n    let manager = DuckdbConnectionManager::file(config.import_log_path)?;\n    let pool = Pool::new(manager)?;\n\n    pool.get()?.execute_batch(include_str!(&quot;schema.sql&quot;))?;\n\n    Ok(Self {\n        pool,\n        importer_name,\n    })\n}\n</code></pre>\n<p>The constructor isn't anything special; it sets up a DuckDB connection to a file-backed database\nbased on our configuration.\nIt also initializes the schema from a file.\nThe schema file lives in the source tree, but the lovely <a href=\"https://doc.rust-lang.org/std/macro.include_str.html\"><code>include_str!</code></a>\nmacro bakes it into a static string at compile time (so we can still distribute a single binary).</p>\n<pre><code class=\"language-rust\">pub fn log(&amp;self, level: Level, source: &amp;str, id: Option&lt;&amp;str&gt;, code: &amp;str, reason: &amp;str) {\n    log::log!(level, &quot;{code}\\t{source}\\t{id:?}\\t{reason}&quot;);\n    let conn = match self.pool.get() {\n        Ok(conn) =&gt; conn,\n        Err(e) =&gt; {\n            log::error!(&quot;failed to get connection: {}&quot;, e);\n            return;\n        }\n    };\n    match conn.execute(\n        &quot;INSERT INTO logs VALUES (current_timestamp, ?, ?, ?, ?, ?, ?)&quot;,\n        params![level.as_str(), self.importer_name, source, id, code, reason],\n    ) {\n        Ok(_) =&gt; (),\n        Err(e) =&gt; log::error!(&quot;Failed to insert log entry: {}&quot;, e),\n    }\n}\n</code></pre>\n<p>And now the meat of the logging!\nThe <code>log</code> method does what you'd expect.\nThe signature is a reflection of the schema:\nwhat you need to log, what you may optionally log, and what type of data you're logging.</p>\n<p>For our use case, we decided to additionally log via the <code>log</code> crate.\nThis way, we can see critical errors on the console to as the job is running.</p>\n<p>And that's pretty much it!\nIt took significantly more time to write this post than to actually write the code.\nSomeone could probably write a macro-based crate to generate these sorts of loggers if they had some spare time ;)</p>\n<h2><a href=\"#bonus-filter_log\" aria-hidden=\"true\" class=\"anchor\" id=\"bonus-filter_log\"></a>Bonus: <code>filter_log</code></h2>\n<p>We have a pretty common pattern in our codebase,\nwhere most operations / pipeline stages yield results,\nand we want to chain these together.\nWhen it succeeds, we pass the result on to the next stage.\nOtherwise, we want to log what went wrong.</p>\n<p>We called this <code>filter_log</code> because it usually shows up in <code>filter_map</code> over streams\nand as such yields an <code>Option&lt;T&gt;</code>.</p>\n<p>This was extremely easy to add to our logging struct,\nand saves loads of boilerplate!</p>\n<pre><code class=\"language-rust\">/// Converts a result to an option, logging the failure if the result is an `Err` variant.\npub fn filter_log&lt;T, E: Debug&gt;(\n    &amp;self,\n    level: Level,\n    source: &amp;str,\n    id: Option&lt;&amp;str&gt;,\n    code: &amp;str,\n    result: Result&lt;T, E&gt;,\n) -&gt; Option&lt;T&gt; {\n    match result {\n        Ok(result) =&gt; Some(result),\n        Err(err) =&gt; {\n            self.log(level, source, id, code, &amp;format!(&quot;{:?}&quot;, err));\n            None\n        }\n    }\n}\n</code></pre>\n<h1><a href=\"#conclusion\" aria-hidden=\"true\" class=\"anchor\" id=\"conclusion\"></a>Conclusion</h1>\n<p>The concept of logging to a database is not at all original with me\nMany enterprise services log extensively to special database tables.\nBut I think the technique is rarely applied to applications.</p>\n<p>Hopefully this post convinced you to give it a try in the next situation where it makes sense.</p>\n",
      "summary": "",
      "date_published": "2025-01-13T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "software-engineering",
        "duckdb",
        "databases",
        "rust"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//the-rust-toolchain-toml-file.html",
      "url": "https://ianwwagner.com//the-rust-toolchain-toml-file.html",
      "title": "The rust-toolchain.toml file",
      "content_html": "<p>This isn't so much a TIL as a quick PSA.\nIf you're a Rust developer and need to ensure specific things about your toolchain,\nthe <code>rust-toolchain.toml</code> file is a real gem!</p>\n<p>I don't quite remember how, but I accidentally discovered this file a year or two ago.\nSince then, I've spread the good news to at least half a dozen other devs,\nand most of them simply had no idea it existed.\nSo, without further ado...</p>\n<h1><a href=\"#what-does-the-file-do\" aria-hidden=\"true\" class=\"anchor\" id=\"what-does-the-file-do\"></a>What does the file do?</h1>\n<p><code>rust-toolchain.toml</code> is a file that lets you specify certain things about your Rust toolchain.\nFor example, if you need to use nightly rust for a project,\nyou can specify that in your toolchain file.\nIt also lets you specify other cargo components to install\nand specify cross-compilation targets you want to have available.</p>\n<h1><a href=\"#why-would-i-need-this\" aria-hidden=\"true\" class=\"anchor\" id=\"why-would-i-need-this\"></a>Why would I need this?</h1>\n<p>The headline use case in <a href=\"https://rust-lang.github.io/rustup/overrides.html#the-toolchain-file\">The rustup book</a>\nis to pin to a specific release.\nThis is pretty rare in practice I think, unless you need <code>nightly</code>.\nYou can specify channels like <code>nightly</code>, <code>stable</code>, and <code>beta</code> in addition to specific releases.</p>\n<p>The killer use case in my opinion is for easier cross-compilation.\nI do a lot of cross compiling, and codifying all required targets in a single file makes life much easier!</p>\n<p>The best part is that, as long as you're using <code>rustup</code>, everything is automatic!\nFor projects with a large number of collaborators (like an open-source library),\nthis makes it a lot easier to onboard new devs.</p>\n<h2><a href=\"#what-if-im-not-using-rustup\" aria-hidden=\"true\" class=\"anchor\" id=\"what-if-im-not-using-rustup\"></a>What if I'm not using rustup?</h2>\n<p>Not everyone uses rustup.\nFor example, some devs I know use nix.\nWhen I asked one of them about how to do this without duplicating work,\nthey suggested <a href=\"https://github.com/nix-community/fenix\">Fenix</a>,\nwhich is able to consume the <code>rust-toolchain.toml</code>.</p>\n<p>If you have suggestions or experiences with other invironments,\nlet me know and I'll update this post.\nContact links in the footer.</p>\n<h1><a href=\"#show-me-an-example\" aria-hidden=\"true\" class=\"anchor\" id=\"show-me-an-example\"></a>Show me an example!</h1>\n<p>Here's what the file looks like for a cross-platform mobile library that I maintain:</p>\n<pre><code class=\"language-toml\">[toolchain]\nchannel = &quot;stable&quot;\ntargets = [\n    # iOS\n    &quot;aarch64-apple-ios&quot;,\n    &quot;x86_64-apple-ios&quot;,\n    &quot;aarch64-apple-ios-sim&quot;,\n\n    # Android\n    &quot;armv7-linux-androideabi&quot;,\n    &quot;i686-linux-android&quot;,\n    &quot;aarch64-linux-android&quot;,\n    &quot;x86_64-linux-android&quot;,\n    &quot;x86_64-unknown-linux-gnu&quot;,\n    &quot;x86_64-apple-darwin&quot;,\n    &quot;aarch64-apple-darwin&quot;,\n    &quot;x86_64-pc-windows-gnu&quot;,\n    &quot;x86_64-pc-windows-msvc&quot;,\n\n    # WebAssembly\n    &quot;wasm32-unknown-unknown&quot;\n]\ncomponents = [&quot;clippy&quot;, &quot;rustfmt&quot;]\n</code></pre>\n",
      "summary": "",
      "date_published": "2025-01-13T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust",
        "cross-compilation"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//conserving-memory-while-streaming-from-duckdb.html",
      "url": "https://ianwwagner.com//conserving-memory-while-streaming-from-duckdb.html",
      "title": "Conserving Memory while Streaming from DuckDB",
      "content_html": "<p>In the weeks since my previous post on <a href=\"working-with-arrow-and-duckdb-in-rust.html\">Working with Arrow and DuckDB in Rust</a>,\nI've found a few gripes that I'd like to address.</p>\n<h1><a href=\"#memory-usage-of-query_arrow-and-stream_arrow\" aria-hidden=\"true\" class=\"anchor\" id=\"memory-usage-of-query_arrow-and-stream_arrow\"></a>Memory usage of <code>query_arrow</code> and <code>stream_arrow</code></h1>\n<p>In the previous post, I used the <code>query_arrow</code> API.\nIt's pretty straightforward and gives you iterator-compatible access to the query results.\nHowever, there's one small problem: its memory consumption scales roughly linearly with your result set.</p>\n<p>This isn't a problem for many uses of DuckDB, but if your datasets are in the tens or hundreds of gigabytes\nand you're wanting to process a large number of rows, the RAM requirements can be excessive.\nThe memory profile of <code>query_arrow</code> seems to be &quot;create all of the <code>RecordBatch</code>es upfront\nand keep them around for as long as you hold the <code>Arrow</code> handle.</p>\n<div class=\"markdown-alert markdown-alert-note\">\n<p class=\"markdown-alert-title\">Disclaimer</p>\n<p>I have <strong>not</strong> done extensive allocation-level memory profiling as of this writing.\nIt's quite possible that I've missed something, but this seems to be what's happening\nfrom watching Activity Monitor.\nPlease let me know if I've misrepresented anything!</p>\n</div>\n<p>Fortunately, DuckDB also has another API: <a href=\"https://docs.rs/duckdb/latest/duckdb/struct.Statement.html#method.stream_arrow\"><code>stream_arrow</code></a>.\nThis appears to allocate <code>RecordBatch</code>es on demand rather than all at once.\nThere is also some overhead, which I'll revisit later that varies with result size.\nBut overall, profiling indicates that <code>stream_arrow</code> requires significantly less RAM over the life of a large <code>Arrow</code> iterator.</p>\n<p>Unfortunately, none of the above information about memory consumption appears to be documented,\nand there are no (serious) code samples demonstrating the use of <code>stream_arrow</code>!</p>\n<blockquote>\n<p>[!question] Down the rabbit hole...\nDigging into the code in duckdb-rs raises even more questions,\nsince several underlying C functions, like <a href=\"https://duckdb.org/docs/api/c/api.html\"><code>duckdb_execute_prepared_streaming</code></a>\nare marked as deprecated.\nPresumably, alternatives are being developed or the methods are just not stable yet.</p>\n</blockquote>\n<h1><a href=\"#getting-a-schemaref\" aria-hidden=\"true\" class=\"anchor\" id=\"getting-a-schemaref\"></a>Getting a <code>SchemaRef</code></h1>\n<p>The signature of <code>stream_arrow</code> is a bit different from that of <code>query_arrow</code>.\nHere's what it looks like as of crate version 1.1.1:</p>\n<pre><code class=\"language-rust\">pub fn stream_arrow&lt;P: Params&gt;(\n    &amp;mut self,\n    params: P,\n    schema: SchemaRef,\n) -&gt; Result&lt;ArrowStream&lt;'_&gt;&gt;\n</code></pre>\n<p>This looks pretty familiar at first if you've used <code>query_arrow</code>,\nbut there's a new third parameter: <code>schema</code>.\n<code>SchemaRef</code> is just a type alias for <code>Arc&lt;Schema&gt;</code>.\nArrow objects have a schema associated with them,\nso this is a reasonable detail for a low-level API.\nBut DuckDB is fine at inferring this when needed!\nSurely there is a way of getting it from a query, right?\n(After all, <code>query_arrow</code> has to do something similar, but doesn't burden the caller.)</p>\n<p>My first attempt at getting a <code>Schema</code> object was to call the <a href=\"https://docs.rs/duckdb/latest/duckdb/struct.Statement.html#method.schema\"><code>schema()</code></a> method on <code>Statement</code>.\nThe <code>Statement</code> type in duckdb-rs is actually a high-level wrapper around <code>RawStatement</code>,\nand at the time of this writing, the schema getter <a href=\"https://github.com/duckdb/duckdb-rs/blob/2bd811e7b1b7398c4f461de4de263e629572dc90/crates/duckdb/src/raw_statement.rs#L212\">hides an <code>unwrap</code></a>.\nThe docs do tell you this (using a somewhat nonstandard heading?),\nbut basically you can't get a schema without executing a query.\nI wish they used the <a href=\"https://cliffle.com/blog/rust-typestate/\">Typestate pattern</a>\nor at least made the result an <code>Option</code>, but alas...</p>\n<p>This leaves developers with three options.</p>\n<ol>\n<li>Construct the schema manually.</li>\n<li>Construct a different <code>Statement</code> that is the same SQL, but with a <code>LIMIT 0</code> clause at the end.</li>\n<li>Execute the statement, but don't load all the results into RAM.</li>\n</ol>\n<h2><a href=\"#manually-construct-a-schema\" aria-hidden=\"true\" class=\"anchor\" id=\"manually-construct-a-schema\"></a>Manually construct a Schema?</h2>\n<p>Manually constructing the schema is a non-starter for me.\nA program which has a hand-written code dependency on a SQL string is a terrible idea\non several levels.\nBesides, DuckDB clearly <em>can</em> infer the schema in <code>query_arrow</code>, so why not here?</p>\n<h2><a href=\"#query-another-nearly-identical-statement\" aria-hidden=\"true\" class=\"anchor\" id=\"query-another-nearly-identical-statement\"></a>Query another, nearly identical statement</h2>\n<p>The second idea is, amusingly, what ChatGPT o1 suggested (after half a dozen prompts;\nit seems like it will just confidently refuse to fetch documentation now,\nand hallucinates new APIs based off its outdated training data).\nThe basic idea is to add <code>LIMIT 0</code> to the end of the original query\nso it's able to get the schema, but doesn't actually return any results.</p>\n<pre><code class=\"language-rust\">fn fetch_schema_for_query(db: &amp;Connection, sql: &amp;str) -&gt; duckdb::Result&lt;SchemaRef&gt; {\n    // Append &quot;LIMIT 0&quot; to the original query, so we don't actually fetch anything\n    // NB: This does NOT handle cases such as the original query ending in a semicolon!\n    let schema_sql = format!(&quot;{} LIMIT 0&quot;, sql);\n\n    let mut statement = db.prepare(&amp;schema_sql)?;\n    let arrow_result = statement.query_arrow([])?;\n\n    Ok(arrow_result.get_schema())\n}\n</code></pre>\n<p>There is nothing fundamentally unsound about this approach.\nBut it requires string manipulation, which is less than ideal.\nThere is also at least one obvious edge case.</p>\n<h2><a href=\"#execute-the-stamement-without-loading-all-results-first\" aria-hidden=\"true\" class=\"anchor\" id=\"execute-the-stamement-without-loading-all-results-first\"></a>Execute the stamement without loading all results first</h2>\n<p>The third option is not as straightforward as I expected it to be.\nAt first, I tried the <code>row_count</code> method,\nbut internally this <a href=\"https://github.com/duckdb/duckdb-rs/blob/2bd811e7b1b7398c4f461de4de263e629572dc90/crates/duckdb/src/raw_statement.rs#L79\">just calls a single FFI function</a>.\nThis doesn't actually update the internal <code>schema</code> field.\nYou really <em>do</em> need to run through a more &quot;normal&quot; execution path.</p>\n<p>A solution that <em>seems</em> reasonably clean is to do what the docs say and call <code>stmt.execute()</code>.\nIt's a bit strange to do this on a <code>SELECT</code> query to be honest,\nbut the API does indeed internally mutate the <code>Schema</code> property,\n<em>and</em> returns a row count.\nSo it seems semantically equivalent to a <code>SELECT COUNT(*) FROM (...)</code>\n(and in my case, getting the row count was helpful too).</p>\n<p>In my testing, it <em>appears</em> that this may actually allocate a non-trivial amount of memory,\nwhich may be mildly surprising.\nHowever, the max amount of memory we require during execution is definitely less overall.\nAny ideas why this is?</p>\n<h1><a href=\"#full-example-using-stream_arrow\" aria-hidden=\"true\" class=\"anchor\" id=\"full-example-using-stream_arrow\"></a>Full example using <code>stream_arrow</code></h1>\n<p>Let's bring what we've learned into a &quot;real&quot; example.</p>\n<pre><code class=\"language-rust\">// let sql = &quot;SELECT * FROM table;&quot;;\nlet mut stmt = conn.prepare(sql)?;\n// Execute the query (so we have a usable schema)\nlet size = stmt.execute([])?;\n// Now we run the &quot;real&quot; query using `stream_arrow`.\n// This returned in a few hundred milliseconds for my dataset.\nlet mut arrow = stmt.stream_arrow([], stmt.schema())?;\n// Iterate over arrow...\n</code></pre>\n<p>When you structure your code like this rather than using the easier <code>query_arrow</code>,\nyou can significantly reduce your memory footprint for large datasets.\nIn my testing, there was no appreciable impact on performance.</p>\n<h1><a href=\"#open-questions\" aria-hidden=\"true\" class=\"anchor\" id=\"open-questions\"></a>Open Questions</h1>\n<p>The above leaves me with a few open questions.\nFirst, with my use case (a dataset of around 12GB of Parquet files), <code>execute</code> took several <em>seconds</em>.\nThe &quot;real&quot; <code>stream_arrow</code> query took a few hundred milliseconds.\nWhat's going on here?\nPerhaps it's doing a scan and/or caching some data initially the way to make subsequent queries faster?</p>\n<p>Additionally, the memory profile does have a &quot;spike&quot; which makes me wonder what exactly each step loads into RAM,\nand thus, the memory requirements for working with extremely large datasets.\nIn my testing, adding a <code>WHERE</code> clause that significantly reduces the result set\nDOES reduce the memory footprint.\nThat's somewhat worrying to me, since it implies there is still measurable overhead\nproportional to the dataset size.\nWhat practical limits does this impose on dataset size?</p>\n<div class=\"markdown-alert markdown-alert-note\">\n<p class=\"markdown-alert-title\">Note</p>\n<p>An astute reader may be asking whether the memory profile of the <code>LIMIT 0</code> and <code>execute</code> approaches are equivalent.\nThe answer appears to be yes.</p>\n</div>\n<p>I've <a href=\"https://github.com/duckdb/duckdb-rs/issues/418\">opened issue #418</a>\nasking for clarification.\nIf any readers have any insights, post them in the issue thread!</p>\n",
      "summary": "",
      "date_published": "2024-12-31T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust",
        "apache arrow",
        "parquet",
        "duckdb",
        "big data",
        "data engineering"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//how-and-why-to-work-with-arrow-and-duckdb-in-rust.html",
      "url": "https://ianwwagner.com//how-and-why-to-work-with-arrow-and-duckdb-in-rust.html",
      "title": "How (and why) to work with Arrow and DuckDB in Rust",
      "content_html": "<p>My day job involves wrangling a lot of data very fast.\nI've heard a lot of people raving about several technologies like DuckDB,\n(Geo)Parquet, and Apache Arrow recently.\nBut despite being an &quot;early adopter,&quot;\nit took me quite a while to figure out how and why to leverage these practiclaly.</p>\n<p>Last week, a few things &quot;clicked&quot; for me, so I'd like to share what I learned in case it helps you.</p>\n<h1><a href=\"#geoparquet\" aria-hidden=\"true\" class=\"anchor\" id=\"geoparquet\"></a>(Geo)Parquet</h1>\n<p>(Geo)Parquet is quite possibly the best understood tech in the mix.\nIt is not exactly new.\nParquet has been around for quite a while in the big data ecosystem.\nIf you need a refresher, the <a href=\"https://guide.cloudnativegeo.org/geoparquet/\">Cloud-optimized Geospatial Formats Guide</a>\ngives a great high-level overview.</p>\n<p>Here are the stand-out features:</p>\n<ul>\n<li>It has a schema and some data types, unlike CSV (you can even have maps and lists!).</li>\n<li>On disk, values are written in groups per <em>column</em>, rather than writing one row at a time.\nThis makes the data much easier to compress, and lets readers easily skip over data they don't need.</li>\n<li>Statistics at several levels which enable &quot;predicate pushdown.&quot; Even though the files are columnar in nature,\nyou can narrow which files and &quot;row groups&quot; within each file have the data you need!</li>\n</ul>\n<p>Practically speaking, parquet lets you can distribute large datasets in <em>one or more</em> files\nwhich will be significantly <em>smaller and faster to query</em> than other familiar formats.</p>\n<h2><a href=\"#why-you-should-care\" aria-hidden=\"true\" class=\"anchor\" id=\"why-you-should-care\"></a>Why you should care</h2>\n<p>The value proposition is clear for big data processing.\nIf you're trying to get a record of all traffic accidents in California,\nor find the hottest restaurants in Paris based on a multi-terabyte dataset,\nparquet provides clear advantages.\nYou can skip row groups within the parquet file or even whole files\nto narrow your search!\nAnd since datasets can be split across files,\nyou can keep adding to the dataset over time, parallelize queries,\nand other nice things.</p>\n<p>But what if you're not doing these high-level analytical things?\nWhy not just use a more straightforward format like CSV\nthat avoids the need to &quot;rotate&quot; back into rows\nfor non-aggregation use cases?\nHere are a few reasons to like Parquet:</p>\n<ul>\n<li>You actually have a schema! This means less format shifting and validation in your code.</li>\n<li>Operating on row groups turns out to be pretty efficient, even when you're reading the whole dataset.\nCombining batch reads with compression, your processing code will usually get faster.</li>\n<li>It's designed to be readable from object storage.\nThis means you can often process massive datasets from your laptop.\nParquet readers are smart and can skip over data you don't need.\nYou can't do this with CSV.</li>\n</ul>\n<p>The upshot of all this is that it generally gets both <em>easier</em> and <em>faster</em>\nto work with your data...\nprovided that you have the right tools to leverage it.</p>\n<h1><a href=\"#duckdb\" aria-hidden=\"true\" class=\"anchor\" id=\"duckdb\"></a>DuckDB</h1>\n<p>DuckDB describes itself as an in-process, portable, feature-rich, and fast database\nfor analytical workloads.\nDuckDB was that tool that triggered my &quot;lightbulb moment&quot; last week.\nFoursquare, an app which I've used for a decade or more,\nrecently released an <a href=\"https://location.foursquare.com/resources/blog/products/foursquare-open-source-places-a-new-foundational-dataset-for-the-geospatial-community/\">open data set</a>,\nwhich was pretty cool!\nIt was also in Parquet format (just like <a href=\"https://overturemaps.org/\">Overture</a>'s data sets).</p>\n<p>You can't just open up a Parquet file in a text editor or spreadsheet software like you can a CSV.\nMy friend Oliver released a <a href=\"https://wipfli.github.io/foursquare-os-places-pmtiles/\">web-based demo</a>\na few weeks ago which lets you inspect the data on a map at the point level.\nBut to do more than spot checking, you'll probably want a database that can work with Parquet.\nAnd that's where DuckDB comes in.</p>\n<h2><a href=\"#why-you-should-care-1\" aria-hidden=\"true\" class=\"anchor\" id=\"why-you-should-care-1\"></a>Why you should care</h2>\n<h3><a href=\"#its-embedded\" aria-hidden=\"true\" class=\"anchor\" id=\"its-embedded\"></a>It's embedded</h3>\n<p>I understood the in-process part of DuckDB's value proposition right away.\nIt's similar to SQLite, where you don't have to go through a server\nor over an HTTP connection.\nThis is both simpler to reason about and <a href=\"quadrupling-the-performance-of-a-data-pipeline.html\">is usually quite a bit faster</a>\nthan having to call out to a separate service!</p>\n<p>DuckDB is pretty quick to compile from source.\nYou probably don't need to muck around with this if you're just using the CLI,\nbut I wanted to eventually use it embedded in some Rust code.\nCompiling from source turned out to be the easiest way to get their crate working.\nIt looks for a shared library by default, but I couldn't get this working after a <code>brew</code> install.\nThis was mildly annoying, but on the other hand,\nvendoring the library does make consistent Docker builds easier 🤷🏻‍♂️</p>\n<h3><a href=\"#features-galore\" aria-hidden=\"true\" class=\"anchor\" id=\"features-galore\"></a>Features galore!</h3>\n<p>DuckDB includes a mind boggling number of features.\nNot in a confusing way; more in a Python stdlib way where just about everything you'd want is already there.\nYou can query a whole directory (or bucket) of CSV files,\na Postgres database, SQLite, or even an OpenStreetMap PBF file 🤯\nYou can even write a SQL query against a glob expression of Parquet files in S3\nas your &quot;table.&quot;\n<strong>That's really cool!</strong>\n(If you've been around the space, you may recognize this concept from\nAWS Athena and others.)</p>\n<h3><a href=\"#speed\" aria-hidden=\"true\" class=\"anchor\" id=\"speed\"></a>Speed</h3>\n<p>Writing a query against a local directory of files is actually really fast!\nIt does a bit of munging upfront, and yes,\nit's not quite as fast as if you'd prepped the data into a clean table,\nbut you actually can run quite efficient queries this way locally!</p>\n<p>When running a query against local data,\nDuckDB will make liberal use of your system memory\n(the default is 80% of system RAM)\nand as many CPUs as you can throw at it.\nBut it will reward you with excellent response times,\ncourtesy of the &quot;vectorized&quot; query engine.\nWhat I've heard of the design reminds me of how array-oriented programming languages like APL\n(or less esoteric libraries like numpy) are often implemented.</p>\n<p>I was able to do some spatial aggregation operations\n(bucketing a filtered list of locations by H3 index)\nin about <strong>10 seconds on a dataset of more than 40 million rows</strong>!\n(The full dataset is over 100 million rows, so I also got to see the selective reading in action.)\nThat piqued my interest, to say the least.\n(Here's the result of that query, visualized).</p>\n<p><figure><img src=\"media/foursquare-os-places-density-2024.png\" alt=\"A map of the world showing heavy density in the US, southern Canada, central Mexico, parts of coastal South America, Europe, Korea, Japan, parts of SE Aaia, and Australia\" /></figure></p>\n<h3><a href=\"#that-analytical-thing\" aria-hidden=\"true\" class=\"anchor\" id=\"that-analytical-thing\"></a>That analytical thing...</h3>\n<p>And now for the final buzzword in DuckDB's marketing: analytical.\nDuckDB frequently describes itself as optimized for OLAP (OnLine Analytical Processing) workloads.\nThis is contrasted with OLTP (OnLine Transaction Processing).\n<a href=\"https://en.wikipedia.org/wiki/Online_analytical_processing\">Wikipedia</a> will tell you some differences\nin a lot of sweepingly broad terms, like being used for &quot;business reporting&quot; and read operations\nrather than &quot;transactions.&quot;</p>\n<p>When reaching for a definition, many sources focus on things like <em>aggregation</em> queries\nas a differentiator.\nThis didn't help, since most of my use cases involve slurping most or all of the data set.\nThe DuckDB marketing and docs didn't help clarify things either.</p>\n<p>Let me know on Mastodon if you have a better explanatation of what an &quot;analytical&quot; database is 🤣</p>\n<p>I think a better explanation is probably 1) you do mostly <em>read</em> queries,\nand 2) it can execute highly parallel queries.\nSo far, DuckDB has been excellent for both the &quot;aggregate&quot; and the &quot;iterative&quot; use case.\nI assume it's just not the best choice per se if your workload is a lot of single-record writes?</p>\n<h2><a href=\"#how-im-using-duckdb\" aria-hidden=\"true\" class=\"anchor\" id=\"how-im-using-duckdb\"></a>How I'm using DuckDB</h2>\n<p>Embedding DuckDB in a Rust project allowed me to deliver something with a better end-user experience,\nis easier to maintain,\nand saved writing hundreds of lines of code in the process.</p>\n<p>Most general-purpose languages like Python and Rust\ndon't have primitives for expressing things like joins across datasets.\nDuckDB, like most database systems, does!\nYes, I <em>could</em> write some code using the <code>parquet</code> crate\nthat would filter across a nested directory tree of 5,000 files.\nBut DuckDB does that out of the box!</p>\n<p>It feels like this is a &quot;regex moment&quot; for data processing.\nJust like you don't (usually) need to hand-roll string processing,\nthere's now little reason to hand-roll data aggregation.</p>\n<p>For the above visualization, I used the Rust DuckDB crate for the data processing,\nconverted the results to JSON,\nand served it up from an Axum web server.\nAll in a <em>single binary</em>!\nThat's lot nicer than a bash script that executes SQL,\ndumps to a file, and then starts up a Python or Node web server!\nAnd breaks when you don't have Python or Node installed,\nyour OS changes its default shell,\nyou forget that some awk flag doesn't work on the GNU version,\nand so on.</p>\n<h1><a href=\"#apache-arrow\" aria-hidden=\"true\" class=\"anchor\" id=\"apache-arrow\"></a>Apache Arrow</h1>\n<p>The final thing I want to touch on is <a href=\"https://arrow.apache.org/\">Apache Arrow</a>.\nThis is yet another incredibly useful technology which I've been following for a while,\nbut never quite figured out how to properly use until last week.</p>\n<p>Arrow is a <em>language-independent memory format</em>\nthat's <em>optimized for efficient analytic operations</em> on modern CPUs and GPUs.\nThe core idea is that, rather than having to convert data from one format to another (this implies copying!),\nArrow defines as shared memory format which many systems understand.\nIn practice, this ends up being a bunch of standards which define common representations for different types,\nand libraries for working with them.\nFor example, the <a href=\"https://geoarrow.org/\">GeoArrow</a> spec\nbuilds on the Arrow ecosystem to enable operations on spatial data in a common memory format.\nPretty cool!</p>\n<h2><a href=\"#why-you-should-care-2\" aria-hidden=\"true\" class=\"anchor\" id=\"why-you-should-care-2\"></a>Why you should care</h2>\n<p>It turns out that copying and format shifting data can really eat into your processing times.\nArrow helps you sidestep that by reducing the amount of both you'll need to do,\nand by working on data in groups.</p>\n<h2><a href=\"#how-the-heck-to-use-it\" aria-hidden=\"true\" class=\"anchor\" id=\"how-the-heck-to-use-it\"></a>How the heck to use it?</h2>\n<p>Arrow is mostly hidden from view beneath other libraries.\nSo most of the time, especially if you're writing in a very high level language like Python,\nyou won't even see it.</p>\n<p>But if you're writing something at a slightly lower level,\nit's something you may have to touch for critical sections.\nThe <a href=\"https://docs.rs/duckdb/latest/duckdb/\">DuckDB crate</a>\nincludes an <a href=\"https://docs.rs/duckdb/latest/duckdb/struct.Statement.html#method.query_arrow\">Arrow API</a>\nwhich will give you an iterator over <code>RecordBatch</code>es.\nThis is pretty convenient, since you can use DuckDB to gather all your data\nand just consume the stream of batches!</p>\n<p>So, how do we work with <code>RecordBatch</code>es?\nThe Arrow ecosystem, like Parquet, takes a lot of work to understand,\nand using the low-level libraries directly is difficult.\nEven as a seasoned Rustacean, I found the docs rather obtuse.</p>\n<p>After some searching, I finally found <a href=\"https://docs.rs/serde_arrow/\"><code>serde_arrow</code></a>.\nIt builds on the <code>serde</code> ecosystem with easy-to-use methods that operate on <code>RecordBatch</code>es.\nFinally; something I can use!</p>\n<p>I was initilaly worried about how performant the shift from columns to rows + any (minimal) <code>serde</code> overhead would be,\nbut this turned out to not be an issue.</p>\n<p>Here's how the code looks:</p>\n<pre><code class=\"language-rust\">serde_arrow::from_record_batch::&lt;Vec&lt;FoursquarePlaceRecord&gt;&gt;(&amp;batch)\n</code></pre>\n<p>A few combinators later and you've got a proper data pipeline!</p>\n<h1><a href=\"#review-what-this-enables\" aria-hidden=\"true\" class=\"anchor\" id=\"review-what-this-enables\"></a>Review: what this enables</h1>\n<p>What this ultimately enabled for me was being able to get a lot closer to &quot;scripting&quot;\na pipeline in Rust.\nMost people turn to Python or JavaScript for tasks like this,\nbut Rust has something to add: strong typing and all the related guarantees <em>which can only come with some level of formalism</em>.\nBut that doesn't necessarily have to get in the way of productivity!</p>\n<p>Hopefully this sparks some ideas for making your next data pipeline both fast and correct.</p>\n",
      "summary": "",
      "date_published": "2024-12-08T00:00:00-00:00",
      "image": "media/foursquare-os-places-density-2024.png",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "rust",
        "apache arrow",
        "parquet",
        "duckdb",
        "big data",
        "data engineering",
        "gis"
      ],
      "language": "en"
    },
    {
      "id": "https://ianwwagner.com//quadrupling-the-performance-of-a-data-pipeline.html",
      "url": "https://ianwwagner.com//quadrupling-the-performance-of-a-data-pipeline.html",
      "title": "Quadrupling the Performance of a Data Pipeline",
      "content_html": "<p>Over the past two weeks, I've been focused on optimizing some data pipelines.\nI inherited some old ones which seemed especially slow,\nand I finally hit a limit where an overhaul made sense.\nThe pipelines process and generate data on the order of hundreds of gigabytes,\nrequiring correlation and conflated across several datasets.</p>\n<p>The pipelines in question happened to be written in Node.js,\nwhich I will do my absolute best not to pick on too much throughout.\nNode is actually a perfectly fine solution for certain problems,\nbut was being used especially badly in this case.\nThe rewritten pipeline, using Rust, clocked in at 4x faster than the original.\nBut as we'll soon see, the choice of language wasn't even the main factor in the sluggishness.</p>\n<p>So, let's get into it...</p>\n<h1><a href=\"#problem-1-doing-cpu-bound-work-on-a-single-thread\" aria-hidden=\"true\" class=\"anchor\" id=\"problem-1-doing-cpu-bound-work-on-a-single-thread\"></a>Problem 1: Doing CPU-bound work on a single thread</h1>\n<p>Node.js made a splash in the early 2010s,\nand I can remember a few years where it was the hot new thing to write everything in.\nOne of the selling points was its ability to handle thousands (or tens of thousands)\nof connections with ease; all from JavaScript!\nThe key to this performance is <strong>async I/O</strong>.\nModern operating systems are insanely good at this, and Node made it <em>really</em> easy to tap into it.\nThis was novel to a lot of developers at the time, but it's pretty standard now\nfor building I/O heavy apps.</p>\n<p><strong>Node performs well as long as you were dealing with I/O-bound workloads</strong>,\nbut the magic fades if your workload requires a lot of CPU work.\nBy default, Node is single-threaded.\nYou need to bring in <code>libuv</code>, worker threads (Node 10 or so), or something similar\nto access <em>parallel</em> processing from JavaScript.\nI've only seen a handful of Node programs use these,\nand the pipelines in question were not among them.</p>\n<h2><a href=\"#going-through-the-skeleton\" aria-hidden=\"true\" class=\"anchor\" id=\"going-through-the-skeleton\"></a>Going through the skeleton</h2>\n<p>If you ingest data files (CSV and the like) record-by-record in a naïve way,\nyou'll just read one record at a time, process, insert to the database, and so on in a loop.\nThe original pipeline code was fortunately not quite this bad (it did have batching at least),\nbut had some room for improvemnet.</p>\n<p>The ingestion phase, where you're just reading data from CSV, parquet, etc.\nmaps naturally to Rust's <a href=\"https://rust-lang.github.io/async-book/05_streams/01_chapter.html\">streams</a>\n(the cousin of futures).\nThe original node code was actually fine at this stage,\nif a bit less elegant.\nBut the Rust structure we settled on is worth a closer look.</p>\n<pre><code class=\"language-rust\">fn csv_record_stream&lt;'a, S: AsyncRead + Unpin + Send + 'a, T: TryFrom&lt;StringRecord&gt;&gt;(\n    stream: S,\n    delimiter: u8,\n) -&gt; impl Stream&lt;Item = T&gt; + 'a\nwhere\n    &lt;T as TryFrom&lt;StringRecord&gt;&gt;::Error: Debug,\n{\n    let reader = AsyncReaderBuilder::new()\n        .delimiter(delimiter)\n        // Other config elided...\n        .create_reader(stream);\n    reader.into_records().filter_map(|res| async move {\n        let Ok(record) = res else {\n            log::error!(&quot;Error reading from the record stream: {:?}&quot;, res);\n            return None;\n        };\n\n        match T::try_from(record) {\n            Ok(parsed) =&gt; Some(parsed),\n            Err(e) =&gt; {\n                log::error!(&quot;Error parsing record: {:?}.&quot;, e);\n                None\n            }\n        }\n    })\n}\n</code></pre>\n<p>It starts off dense, but the concept is simple.\nWe'll take an async reader,\nconfigure a CSV reader to pull records for it,\nand map them to another data type using <code>TryFrom</code>.\nIf there are any errors, we just drop them from the stream and log an error.\nThis usually isn't a reason to stop processing for our use case.</p>\n<p>You should <em>not</em> be putting expensive code in your <code>TryFrom</code> implementation.\nBut really quick things like verifying that you have the right number of fields,\nor that a field contains an integer or is non-blank are usually fair game.</p>\n<p>Rust's trait system really shines here.\nOur code can turn <em>any</em> CSV(-like) file\ninto an arbitrary record type.\nAnd the same techniques can apply to just about any other data format too.</p>\n<h2><a href=\"#how-to-use-tokio-for-cpu-bound-operations\" aria-hidden=\"true\" class=\"anchor\" id=\"how-to-use-tokio-for-cpu-bound-operations\"></a>How to use Tokio for CPU-bound operations?</h2>\n<p>Now that we've done the light format shifting and discarded some obviously invalid records,\nlet's turn to the heavier processing.</p>\n<pre><code class=\"language-rust\">let available_parallelism = std::thread::available_parallelism()?.get();\n// let record_pipeline = csv_record_stream(...);\nrecord_pipeline\n    .chunks(500)  // Batch the work (your optimal size may vary)\n    .for_each_concurrent(available_parallelism, |chunk| {\n        // Clone your database connection pool or whatnot before `move`\n        // Every app is different, but this is a pretty common pattern\n        // for sqlx, Elastic Search, hyper, and more which use Arcs and cheap clones for pools.\n        let db_pool = db_pool.clone();\n        async move {\n            // Process your records using a blocking threadpool\n            let documents = tokio::task::spawn_blocking(move || {\n                // Do the heavy work here!\n                chunk\n                    .into_iter()\n                    .map(do_heavy_work)\n                    .collect()\n            })\n            .await\n            .expect(&quot;Problem spawning a blocking task&quot;);\n\n            // Insert processesd data to your database\n            db_pool.bulk_insert(documents).await.expect(&quot;You probably need an error handling strategy here...&quot;);\n        }\n    })\n    .await;\n</code></pre>\n<p>We used the <a href=\"https://docs.rs/futures/latest/futures/stream/trait.StreamExt.html#method.chunks\"><code>chunks</code></a>\nadaptor to pull hundreds of items at a time for more efficient processing in batches.\nThen, we used <a href=\"https://docs.rs/futures/latest/futures/stream/trait.StreamExt.html#method.for_each_concurrent\"><code>for_each_concurrent</code></a>\nin conjunction with <a href=\"https://docs.rs/tokio/latest/tokio/task/fn.spawn_blocking.html\"><code>spawn_blocking</code></a>\nto introduce parallel processing.</p>\n<p>Note that neither <code>chunks</code> nor even <code>for_each_concurrent</code> imply any amount of <em>parallelism</em>\non their own.\n<code>spawn_blocking</code> is the only thing that can actually create a new thread of execution!\nChunking simply splits the work into batches (most workloads like this tend to benefit from batching).\nAnd <code>for_each_concurrent</code> allows for <em>concurrent</em> operations over multiple batches.\nBut <code>spawn_blocking</code> is what enables computation in a background thread.\nIf you don't use <code>spawn_blocking</code>,\nyou'll end up blocking Tokio's async workers,\nand your performance will tank.\nJust like the old Node.js code.</p>\n<p>The astute reader may point out that using <code>spawn_blocking</code> like this\nis not universally accepted as a solution.\nTokio is (relatively) optimized for non-blocking workloads, so some claim that you should avoid this pattern.\nBut my experience having done this for 5+ years in production code serving over 2 billion requests/month,\nis that Tokio can be a great scheduler for heavier tasks too!</p>\n<p>One thing that's often overlooked in these discussions\nis that not all &quot;long-running operations&quot; are the same.\nOne category consists of graphics event loops,\nlong-running continuous computations,\nor other things that may not have an obvious &quot;end.&quot;\nBut some tasks <em>can</em> be expected to complete within some period of time,\nthat's longer than a blink.</p>\n<p>In the case of the former (&quot;long-lived&quot; tasks), then spawning a dedicated thread often makes sense.\nIn the latter scenario though, Tokio tasks with <code>spawn_blocking</code> can be a great choice.</p>\n<p>For our workload, we were doing a lot of the latter sort of operation.\nOne helpful rule of thumb I've seen is that if your task takes longer than tens of microseconds,\nyou should move it off the Tokio worker threads.\nUsing <code>chunks</code> and <code>spawn_blocking</code> avoids this death by a thousand cuts.\nIn our case, the parallelism resulted in a VERY clear speedup.</p>\n<h1><a href=\"#problem-2-premature-optimization-rather-than-backpressure\" aria-hidden=\"true\" class=\"anchor\" id=\"problem-2-premature-optimization-rather-than-backpressure\"></a>Problem 2: Premature optimization rather than backpressure</h1>\n<p>The original data pipeline was very careful to not overload the data store.\nPerhaps a bit too careful!\nThis may have been necessary at some point in the distant past,\nbut most data storage, from vanilla databases to multi-node clustered storage,\nhave some level of natural backpressure built-in.\nThe Node implementation was essentially limiting the amount of work in-flight that hadn't been flushed.</p>\n<p>This premature optimization and the numerous micro-pauses it introduced\nwere another death by a thousand cuts problem.\nDropping the artificial limits approximately doubled throughput.\nIt turned out that our database was able to process 2-4x more records than under the previous implementation.</p>\n<p><strong>TL;DR</strong> — set a reasonable concurrency, let the server tell you when it's chugging (usually via slower response times),\nand let your async runtime handle the rest!</p>\n<h1><a href=\"#problem-3-serde-round-trips\" aria-hidden=\"true\" class=\"anchor\" id=\"problem-3-serde-round-trips\"></a>Problem 3: Serde round-trips</h1>\n<p>Serde, or serialization + deserialization, can be a silent killer.\nAnd unless you're tracking things carefully, you often won't notice!</p>\n<p>I recently listened to <a href=\"https://www.recodingamerica.us/\">Recoding America</a> at the recommendation of a friend.\nOne of the anecdotes made me want to laugh and cry at the same time.\nEngineers had designed a major improvemnet to GPS, but the rollout is delayed\ndue to a performance problem that renders it unusable.</p>\n<p>The project is overseen by Raytheyon, a US government contractor.\nAnd they can't deliver because some arcane federal guidance (not even a regulation proper)\n&quot;recommends&quot; an &quot;Enterprise Service Bus&quot; in the architecture.\nThe startuppper in me dies when I hear such things.\nThe &quot;recommendation&quot; boils down to a data exchange medium where one &quot;service&quot; writes data and another consumes it.\nThink message queues like you may have used before.</p>\n<p>This is fine (even necessary) for some applications,\nbut positively crippling for others.\nIn the case of the new positioning system,\nwhich was heavily dependent on timing,\nthis was a wildly inefficient architecture.\nEven worse, the guidelines stated that it should be encrypted.</p>\n<p>This wasn't even &quot;bad&quot; guidance, but in the context of the problem,\nwhich depended on rapid exchange of time-sensitive messages,\nit was a horrendously bad fit.</p>\n<p>In our data pipeline, I discovered a situation with humorous resemblance in retrospect.\nThe pipeline was set up using a microservice architecture,\nwhich I'm sure souded like a good idea at the time,\nbut it introduced some truly obscene overhead.\nAll services involved were capable of working with data in the same format,\nbut the Node.js implementation was split into multiple services with HTTP and JSON round trips in the middle!\nDouble whammy!</p>\n<p>The new data pipeline simply imports the &quot;service&quot; as a crate,\nand gets rid of all the overhead by keeping everything in-process.\nIf you do really need to have a microservice architecture (ex: to scale another service up independently),\nthen other communication + data exchange formats may improve your performance.\nBut if it's possible to keep everything in-process, your overhead is roughly zero.\nThat's hard to beat!</p>\n<h1><a href=\"#conclusion\" aria-hidden=\"true\" class=\"anchor\" id=\"conclusion\"></a>Conclusion</h1>\n<p>In the end, the new pipeline was 4x the speed of the old.\nI happened to rewrite it in Rust, but Rust itself wasn't the source of all the speedups:\nunderstanding the architecture was.\nYou could achieve similar results in Node.js or Python,\nbut Rust makes it significantly easy to reason about the architecture and correctness of your code.\nThis is especially important when it comes to parallelizing sections of a pipeline,\nwhere Rust's type system will save you from the most common mistakes.</p>\n<p>These and other non-performance-related reasons to use Rust will be the subject of a future blog post (or two).</p>\n",
      "summary": "",
      "date_published": "2024-11-29T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "algorithms",
        "rust",
        "elasticsearch",
        "nodejs",
        "data engineering",
        "gis"
      ],
      "language": "en"
    }
  ]
}