Want To Enhance Your Weather Data Workflows? Ask Python

Written by Steve Gifford

April 23, 2024

Open source is integral to meteorology, particularly for data processing. There are free packages to read everything from GRIB to Nexrad messages, and even up-and-coming formats like Zarr focus on open source.

We can credit NOAA for that open attitude in large part. Giving data away for free sets the expectation that you should at least be able to read the data without paying for an expensive tool. Moreover, there’s a whole ecosystem of data processing revolving around Python, which is also open source.

Python at the Core of Our Weather Data Workflows

In my career, I’ve set up several weather data processing pipelines. The basics weren’t even in question for Wet Dog Weather: We process our weather data workflows in Python.

We do visualization in a variety of other languages. Python isn’t suited to real-time work on mobile devices or web browsers. How that stack works is a topic for another day.

We use Python for data import, simple and complex data processing, and dynamic data queries. Most of our back end is written in Python, but here are a few of our favorite packages that may interest you.

Data Import Packages

1. PYGRIB 

We use this one to, you guessed it, read GRIB2. Once you’ve made peace with the format, it’s not so bad. But the industry should move from this to the next on our list.

2. netCDF4 

Lots of data comes in netCDF, and it’s usually better that way. Usually. It’s something you have to get your head around, but once you do, it’s pleasant to use.

3. MetPy 

MetPy is more of a package for meteorological data calculations, but it’s got a few importers. We love the one for NEXRAD messages!

4- Zarr 

Zarr is the new kid on the block and is more of a general array data storage format. Well, netCDF is the same, for that matter. It’s great for storing and transferring data in a cloud-native way and works well with NumPy, xarray, and similar packages.

Data Processing Packages

Processing our weather data workflows is relatively simple compared to our customers’. We’re aiming for visualization right now and data query in the future. As such, most of our work involves extraction, metadata, and reorganization.

Unsurprisingly, our go-to packages for that work are NumPy and SciPy. There’s an art to making those fast, and we spend a fair bit of time optimizing for speed and memory use, particularly the latter.

Radar Advection

Okay, sometimes we get more ambitious. There’s an old joke in 3D that you should write your own Quaternion library and then throw it out and use someone else’s. We have one like that.

When preparing to write a radar advection system for a client, we worked through the steps and packages and then realized pySTEPS already did what we were going to do!  

Doesn’t advection feel like something we’ll do with a random ML algorithm shortly? Yeah, it kind of does. But for now, pySTEPS does an excellent job. It preps meteo data sets for OpenCV to do the optical flow. Honestly, that’s a lot of work, and it has several interesting pre- and post-processes, too.

Other Libraries

We haven’t mentioned any of the rogue’s gallery of geospatial libraries we use, the text format libraries, the multiprocessing support, all the libraries we use in the REST API services, or how we organize them with conda. The list goes on for quite a while.

That’s the beauty of the Python community right now. There are a ton of great libraries that you can use in various weird and wonderful ways, especially for weather data workflows.