{
  "version": "https://jsonfeed.org/version/1",
  "title": "Ian's Digital Garden",
  "home_page_url": "https://ianwwagner.com/",
  "feed_url": "https://ianwwagner.com//tag-ogr2ogr.json",
  "description": "",
  "items": [
    {
      "id": "https://ianwwagner.com//searching-for-tiger-features.html",
      "url": "https://ianwwagner.com//searching-for-tiger-features.html",
      "title": "Searching for TIGER Features",
      "content_html": "<p>Today I had a rather peculiar need to search through features from TIGER\nmatching specific attributes.\nThese files are not CSV or JSON, but rather ESRI Shapefiles.\nShapefiles are a binary format which have long outlived their welcome\naccording to many in the industry, but they still persist today.</p>\n<h1><a href=\"#context\" aria-hidden=\"true\" class=\"anchor\" id=\"context\"></a>Context</h1>\n<p>Yeah, so this post probably isn't interesting to very many people,\nbut here's a bit of context in case you don't know what's going on and you're still reading.\nTIGER is a geospatial dataset published by the US government.\nThere's far more to this dataset than fits in this TIL post,\nbut my interest in it lies in finding addresses.\nSpecifically, <em>guessing</em> at where an address might be.</p>\n<p>When you type an address into your maps app,\nthey might not actually have the exact address in their database.\nThis happens more than you might imagine,\nbut you can usually get a pretty good guess of where the address is\nvia a process called interpolation.\nThe basic idea is that you take address data from multiple sources and use that to make a better guess.</p>\n<p>Some of the input to this is existing address points.\nBut there's one really interesting form of data that brings us to today's TIL:\naddress ranges.\nOne of the TIGER datasets is a set of lines (for the roads.\nEach segment is annotated with info letting us know the range of house numbers on each side of the road.</p>\n<p>I happen to use this data for my day job at Stadia Maps,\nwhere I was investigating a data issue today related to our geocoder and TIGER data.</p>\n<h1><a href=\"#getting-the-data\" aria-hidden=\"true\" class=\"anchor\" id=\"getting-the-data\"></a>Getting the data</h1>\n<p>In case you find yourself in a similar situation,\nyou may notice that the data from the government is sitting in an FTP directory,\nwhich contains a bunch of confusingly named ZIP files.\nThe data that I'm interested in (address features)\nhas names like <code>tl_2024_48485_addrfeat.zip</code>.</p>\n<p>The year might be familiar, but what's that other number?\nThat's a FIPS code for the county whose data is contained in the archive.\nYou can find a <a href=\"https://transition.fcc.gov/oet/info/maps/census/fips/fips.txt\">list here</a>.\nThis is somewhat interesting in itself, since the first 2 characters are a state code.\nTexas, in this case.\nThe full number makes up a county: Wichita County, in this case.\nYou can suck down the entire dataset, just one file, or anything in-between\nfrom the <a href=\"https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html\">Census website</a>.</p>\n<h1><a href=\"#searching-for-features\" aria-hidden=\"true\" class=\"anchor\" id=\"searching-for-features\"></a>Searching for features</h1>\n<p>So, now you have a directory full of ZIP files.\nEach of which has a bunch of files necessary to interpret the shapefile.\nIsn't GIS lovely?</p>\n<p>The following script will let you write a simple &quot;WHERE&quot; clause,\nfiltering the data exactly as it comes from the Census Bureau!</p>\n<pre><code class=\"language-bash\">#!/bin/bash\nset -e;\n\nfind &quot;$1&quot; -type f -iname &quot;*.zip&quot; -print0 |\\\n  while IFS= read -r -d $'\\0' filename; do\n\n    filtered_json=$(ogr2ogr -f GeoJSON -t_srs crs:84 -where &quot;$2&quot; /vsistdout/ /vsizip/$filename);\n    # Check if the filtered GeoJSON has any features\n    feature_count=$(echo &quot;$filtered_json&quot; | jq '.features | length')\n\n    if [ &quot;$feature_count&quot; -gt 0 ]; then\n      # echo filename to stderr\n      &gt;&amp;2 echo $(date -u) &quot;Match(es) found in $filename&quot;;\n      echo &quot;$filtered_json&quot;;\n    fi\n\n  done;\n</code></pre>\n<p>You can run it like so:</p>\n<pre><code class=\"language-shell\">./find-tiger-features.sh $HOME/Downloads/tiger-2021/ &quot;TFIDL = 213297979 OR TFIDR = 213297979&quot;\n</code></pre>\n<p>This ends up being a LOT easier and faster than QGIS in my experience\nif you want to search for specific known attributes.\nEspecially if you don't know the specific area that you're looking for.\nI was surprised that so such tool for things like ID lookps existed already!</p>\n<p>Note that this isn't exactly &quot;fast&quot; by typical data processing workload standards.\nIt takes around 10 minutes to run on my laptop.\nBut it's a lot faster than the alternatives in many circumstances,\nespecilaly if you don't know exactly which file the data is in!</p>\n<p>For details on the fields available,\nrefer to the technical documentation on the <a href=\"https://www.census.gov/geographies/mapping-files/time-series/geo/tiger-line-file.html\">Census Bureau website</a>.</p>\n",
      "summary": "",
      "date_published": "2024-11-09T00:00:00-00:00",
      "image": "",
      "authors": [
        {
          "name": "Ian Wagner",
          "url": "https://fosstodon.org/@ianthetechie",
          "avatar": "media/avi.jpeg"
        }
      ],
      "tags": [
        "gis",
        "shell",
        "ogr2ogr"
      ],
      "language": "en"
    }
  ]
}