In my work, I've been doing a lot of ETL pipeline design recently for our geocoding system.
The system processes on the order of a billion records per job,
and failures are part of the process.
We wan...
In the weeks since my previous post on Working with Arrow and DuckDB in Rust,
I've found a few gripes that I'd like to address.
Memory usage of query_arrow and stream_arrow
In the previous post, I use...
How (and why) to work with Arrow and DuckDB in Rust
My day job involves wrangling a lot of data very fast.
I've heard a lot of people raving about several technologies like DuckDB,
(Geo)Parquet, and Apache Arrow recently.
But despite being an "ear...