Zarr for Effective Weather Data Management: Wet Dog Weather’s Perspective

Written by Steve Gifford

April 8, 2024

At Wet Dog Weather, our data management consists of primarily moving data around, particularly weather data. We get new data sets every few minutes or hours and process them for rapid display.

Once you have all that weather data neatly organized and easily accessible, the temptation is to do something with it. We’re not strong enough to avoid that temptation, and customers are asking for it.

 

Exploring Our Data Management Approach

We mostly do visuals, but we’re also spending time on data queries, which calls for a slightly different approach to data management.

Right now, our storage is oriented toward rapid, scalable display. When we get a data set, we chop it into pieces, make a pyramid for it, park it in blob storage, and publish the metadata in our own format.

When a web or mobile app wants to display radar, for instance, it will hit our service to see what’s available. We return a highly constrained version of what we have, and the app picks the data slices it needs and starts fetching.

We have our own weird cloud-native formats to make that efficient. It’s nothing we’d bother to write up, but we’re familiar enough with the issues to appreciate Zarr.

Zarr for Weather Data Storage

We’re hardly the first to use Zarr for weather data. The Met Office has been using it for years, and our partner Zeus AI introduced us to it.

Without getting into the details, we consume some data via cloud blob storage using Zarr. Getting notifications right required a bit of tweaking. Still, once you do that, it’s surprisingly easy to process and store in Zarr.

We typically go through a download step for our internal data processing, and hiding that via Zarr was very compelling. Most of our processing is very chunkable as well.

Zarr for Weather Queries and Data Processing

Right now, we store big flat files when we rip the variables out of their input, at least for processing and querying. Formats like grib2 or NetCDF aren’t great for blob storage-style access. So, we typically have a download step when processing or performing data queries.

By switching to Zarr internally, we can write simple wrappers for things like point or route queries. We can pair those with access to the ‘raw’ data for much heavier data processing. Then, we can use our standard tools for fast or slow data processing as needed.

These are all tricks you might use in your cloud data processing system. It’s nice to have them standardized so we can share them directly with customers.