Smarter Query Design with OGC Services and AWS

Written by Steve Gifford

October 16, 2025

I work a lot with open standards and on AWS. I’ve been thinking about how some of the standards we depend on result in open-ended queries, and how I wish they were more like some of the AWS services we use to build those services. That’s especially true for OGC services, which tend to ask for far more than you actually need.

Open-Ended Queries

An open-ended query is when you hit an endpoint to ask a service for a result that may be big or small or somewhere in between. You don’t know in advance. Open Geospatial Consortium (OGC) services can generate big results, and that’s my beef.

Let me give you an example with three of the OGC services we implement:

Each of them follows a similar design with their GetCapabilities call. The basics are simple: you ask the service what it has, and it returns a list. In XML. And it’s huge.

Back in the day, when we were buying big Sun servers to run our one service, it sort of made sense. Here was a beefy machine telling you what it had available several times per second. Heck, it probably had that information in the enormous (for the time) memory you paid to have shoved into the thing. If you were smart, you would have three of these beasts running in parallel for redundancy.

That’s not really how things work now, or for much of the last 20 years. We like to build services from smaller units that can do small things quickly.

The Capabilities approach was well-intentioned, if bad, so what’s the alternative?

Pagination and Breaking Things Up

The alternative is to make the caller do some work. Let’s look to AWS’s DynamoDB as inspiration. It’s hardly the only one to make the caller pay a bit for a big query; it’s just the one I’ve been working with lately.

Their trick is the LastEvaluatedKey and pagination. Let’s say you want to scan an entire table. You can do that, but it’ll play out like so:

  • Ask for the first batch of data
  • Get your data back, along with a LastEvaluatedKey
  • For the next batch of data, send back the LastEvaluatedKey
  • Lather, rinse, repeat

If you want the full contents of that table, you can get it, but DynamoDB makes you do the work. It also points you in the right direction, which is to:

  • Make exact queries so you’re not wasting time
  • Structure your tables to support that

In other words, something like DynamoDB instills good habits in the developer. The amount of work you’re asking for matches what the back end is doing. That’s something OGC services could really benefit from—teaching developers to be more precise rather than returning an entire catalog every time.

Bad Habits

Most packages that support WMTS and WMS don’t even parse the Capabilities return. It tends to be used by a developer once to figure out what a service offers. Then they just hit the exact endpoint they want for the data they need.

That’s pretty smart, and it’s the right outcome from the client side. Why burden your web app with fetching 20MB of XML if it just needs one reference? Better to hardwire it.

That’s good for the client, but the service must be prepared to compile that 20MB of XML at a moment’s notice. If you have a lot of data, this can get enormous, and it’s unlikely you can return it in under a second, at least not without a significant amount of work.

As a result, the service has to be a bit over-provisioned, or you have to come up with complex approaches like pre-caching the index data. It’s fixable, it’s just… kind of silly. Going through a lot of work to fill out a return that’s not used in the normal course of business.

I have a real issue with returning data that the user isn’t going to use. That’s why our visual displays are so fast: We don’t. 

It’s Not All Bad in Open Geospatial

Thankfully, the world has moved away from open-ended queries in other areas. Cloud-native geospatial formats like Cloud Optimized GeoTIFF or Zarr put the onus back on developers to think about what they need.

With Zarr, for example, you can accidentally request 100GB of data, but it’s obvious when you do. Your Python client will lock up while it fetches that. It’s better for the client app to feel the pain immediately rather than wait for a service timeout.

Looking Ahead

We’re not abandoning the Capabilities approach for OGC standards. It is what it is, but we reserve the right to complain about it. It can be made to work, it’s just a bit… dumb.

When you’re picking a solution, though, I’d look more toward newer Cloud Native approaches that take service design into account. They work well, scale nicely, and teach you good query habits rather than bad. And if your stack still relies heavily on OGC services, make good use of the shortcuts that work around these giant returns.  Then at least your apps will be more performant.