AWS Lambda for Weather Data Processing: A Real-World Implementation

Written by Steve Gifford



Data



October 24, 2024

When processing weather data, timing is everything. At Wet Dog Weather, we rely on AWS Lambda for weather data processing to handle an intricate dance of forecasts, observations, and radar data that arrives in unpredictable bursts. Our journey with this technology has taught us valuable lessons about handling time-sensitive meteorological information at scale.

This data is bursty. Some of it comes in every 3 minutes, some every 6 hours, and some hourly. Some of it needs to be processed immediately, while other data sets can wait a bit, but never too long. This is precisely why we chose AWS Lambda- its ability to scale instantly with our needs.

What is AWS Lambda

Introduced in 2014, “AWS Lambda is an event-driven, server-less Function as Function as a Service” addition to AWS. It started as a fairly minimal ‘run some code’ based on a trigger service and has grown into something much more complex since.

That’s precisely what we said the last time we discussed Lambda, but this time, we will specifically discuss our importers.

Lambda Containers for Importers

Our implementation of AWS Lambda for weather data processing required some creative solutions, mainly regarding containers. We use a lot of libraries to read data, so it’s not easy for us to stay under the size limit for Python code in Lambda. Thus, we use containers. Using containers helps us optimize AWS Lambda, especially when handling complex libraries.

We talked a bit about our containers in the previous blog post. Suffice it to say they’re large and pull in a lot of strange stuff. We organize it with Conda, and they work reliably, which is kind of the point of containers.

Simple Lambda Importers

After a few years of using AWS Lambda, we’ve learned to do one thing per invocation. Keep it short and simple, and make it repeatable, and it’ll work well.

Data import fits nicely into that category. In our importers, we open the files in question and rip out the variables we want. We store those individually in Zarr on AWS s3 and then exit.

That’s easy within the AWS Lambda time limit and memory constraints. Latency is nice and low, and for GFS, the invocations look like this.

GFS runs every six hours, so we get a burst right around then. How many we process at once will vary a bit depending on how they appear on AWS.

How We Use AWS Lambda for Weather Data Processing

That’s just the import step, and we keep our importers super simple. The next step is using AWS Lambda to prep data for Terrier. Terrier is what draws the pretty pictures, and it wants data in its own form.

Once we’ve got our variable data in Zarr, we apply a general-purpose pipeline to turn it into the various forms needed by Terrier and legacy display toolkits. These can operate on the variables and time slices individually; thus, this pipeline stage processes data from all our importers.

With radar, the 1-hr, 3-hr, and 6-hr data sets produce much more noise. Still, it’s very bursty.

AWS Lambda Advantages

The real power of AWS Lambda for weather data processing becomes apparent when dealing with these burst patterns. The beauty of using an on-demand system like AWS Lambda is we can scale up quickly when we get a whole mess of data. That might be every 6 hours or 1 hour. For the 3-minute data, it may be a wash.

This works well for tasks that take less than a minute and don’t use too much memory. Latency stays low, and costs are pretty good for what we’re doing. It’s also straightforward to maintain.

AWS Lambda Disadvantages

We’ve found anything that takes over a couple of minutes is more problematic, mainly if it uses more memory. For us, at least, there’s a sweet spot for AWS Lambda: relatively low memory operations of less than a minute.

Initially, we ran longer jobs to process visual data for legacy map toolkits—basically, data support for WMTS and WMS. That step involved gdalwarp, a complete map reprojection for image data. That ballooned our costs, so we’ve moved it off AWS Lambda.

Future-Proofing Our AWS Lambda for Weather Data Processing

While AWS Lambda for weather data processing isn’t perfect for every scenario, it has proven invaluable for our specific needs. Its low latency, cost-effectiveness, and scalability combination make it ideal for processing weather data that arrives in bursts. As Lambda capabilities and our requirements evolve, we’ll continue to refine our approach, but for now, this architecture serves as a robust foundation for our weather data processing needs.

Terrier is where we guarantee low latency, and Lambda helps us with that. Legacy data display can wait a bit.

← Prev: The Unexpected Joys of Web Development vs. Mobile Development Next: Data Science with Python: An Unexpected Tech Evolution →