Schedule for Cloud-Native Geo Event

This event took place on April 19th and 20th 2022. The sessions are all available online at this youtube playlist, and you can click on the session titles below to go straight to the recording.

14:00 UTC April 19

10:00 EDT

Chris Holmes (Planet & Radiant Earth) and Nadine Alameh (OGC).

Join Chris Holmes and Nadine Alameh for the welcome and kickoff of the event, followed by an overview of the Cloud-Native Geospatial movement and its potential to help geospatial break out of its niche to have the impact we all dream of.

14:40 UTC

10:40 EDT

Cloud-Optimized GeoTIFF (COG) Overview & Lightning Talks

Moderator: Chris Holmes, Planet & Leo Thomas, Development Seed

Jed Sundwall, Amazon

An overview of what a Cloud-Optimized GeoTIFF is and how it works, its origin story, and highlights of tools and workflows it is used in.

The talk will cover 3 apps built to decrease the threshold of using COGs. These three apps focus on specific tasks connected to the usage of the COG data format. The first app enables viewing COG on top of the map, the second one validates the COG and the third one enables users to create the COG from the "regular" GeoTIFF.

The main concept behind the talk is to highlight the value of simple solutions on top of the technologies and specs that we are developing. Building simple solutions helps people to be more informed and involved in their usage of them.

Vincent Sarago, Development Seed

Everything about TiTiler, what it does, what it doesn't do, how to use it.

OpenLayers has recently added support for working with GeoTIFF sources. This is part of broader support for rendering arbitrary data tiles with WebGL. This talk will highlight the new functionality, demonstrating how to manipulate and interact with multi-band COG data - running band math expressions, applying color maps, and combining data from multiple sources on your GPU.

Cogger is an open-source tool designed and used at Airbus to efficiently create a COG from an existing tiled geotiff. Compared to other COG creation tools, cogger is usually orders of magnitude faster as it only needs to reshuffles the source bytes instead of going through a full image decompression/re-compression step.

This talk will present the different usages of this tool, from simply converting a single tiled geotiff with overviews to a COG, to more advanced usage combining individually created overviews in order to fine-tune content and compression.

How I used Cloud Optimized GeoTIFFs to make a geospatial art/music project that was presented at Data Through Design, an exhibition for NYC's Open Data Week 2022. People will come away with an overview of the kind of data processing that goes into a project like this, as well as inspiration for the kinds of creative work you can do by applying these Cloud-Native and other geospatial methods.


You can view the project, called Counterpoints, here: www.iammansi.com/counterpoints

Daniel Dufour, GeoSurge, LLC

GeoBlaze is a blazing fast raster analysis engine written in pure JavaScript. It takes advantage of cloud optimization. When a user calculates statistics on an area within a GeoTIFF, it only reads the parts (tiles) of the GeoTIFF that cover that area. Because it runs in JavaScript, it can run completely client-side in the web browser without need for a server.

This talk will show how to create spatially aggregated time series from multiple COGs. People will learn to do serverless spatio-temporal analysis with AWS Step Functions and AWS Lambda. I want to enlighten other people to combine Cloud Native Geospatial with serverless architectures to create cost-efficient and powerful applications.

16:00 UTC

12:00 EDT

Vincent Sarago, Development Seed

Recognize COG / Create COG / Use COG and learn about Gotcha to make sure you don't use Bad COG

17:00 UTC (Tuesday)

13:00 EDT

Organizational Perspectives 1

Moderator: Nadine Alameh, OGC & Chris Holmes, Planet

Planet has been a key contributor a number of cloud-native geospatial efforts. Learn about why Planet contributes, where those contributions have been happening, and what's next.

Gilberto Camara, National Institute for Space Research (INPE), Brazil

This lighting talk will focus on the organisational adoption of cloud-native geospatial computing by the Brazilian National Institute for Space Research (INPE). Among its missions, INPE is responsible for official statistics and maps of land use change in Brazil, including deforestation and degradation in the Amazon rain forest area. The discussion will present the rationale and benefits for INPE to adopt cloud solutions for big Earth observation data analysis; it will also highlight the importance of standards such as STAC for institutional adoption of cloud-based solutions in Earth observation.

The Image Hub project at Anglo American (a large mining and minerals company) aims to provide company-wide storage, search, visualisation and processing of image data – ranging from satellite and aerial imagery to images of geological cores and rock faces. We are building Image Hub on a foundation of Cloud Native Geospatial technologies including STAC (through stac-fastapi), COGs, TiTiler, Dask and JupyterHub, all hosted on the Azure cloud platform.


In this talk, we will discuss how we have combined these tools to produce a valuable system for Anglo American. We will discuss the benefits of an open-source approach (and how we argued for this internally), the amazing responsiveness we’ve seen from the community, and some of the challenges we are currently tackling.

This talk describes how RPS Group is helping NOAA design a new cloud infrastructure to supporting the Integrated Ocean Observing System (IOOS). They have been prototyping ideas using STAC and zarr and will ultimately make technical recommendations on how NOAA can utilize these technologies to build a robust infrastructure that will better serve their customers.

This lightning talk will describe the Microsoft Planetary Computer and how it leverages cloud-optimized data formats and the STAC specification to enable cloud native workflows on Azure.


This lightning talk will describe how various organizations are leveraging cloud-native geospatial solutions to fully automate the process of getting insight.

Marc Pfister, Maxar

What is the Maxar's ARD delivery format and Why did we choose it?

18:00 UTC

14:00 EDT

SpatioTemporal Asset Catalog (STAC) Overview & Lightning Talks

Moderator: Renee Pieschke, Radiant Earth

Matt Hanson, Element 84

This talk will describe what STAC is, what it is used for, and give an overview of the tools, datasets and broader ecosystem around STAC. It will touch on the core STAC specification, STAC extensions as well as STAC API.

Matthias Mohr, Radiant Earth Foundation

STAC Browser is a browser-based tool to browse through and visualize static STAC catalogs and STAC APIs (including any referenced COGs). It's been recently rewritten from scratch and the new version 3 is soon to be released (beta). This talk will show the current state of STAC Browser with a highlight on new functionality that was not present in the previous versions.

Jupyter STAC UI

Darren Wiens, Sparkgeo

This talk will introduce stac-nb, a Jupyter notebook/lab user interface for querying any STAC API

In this talk, I'll give an overview of the stactools ecosystem. I'll call out command-line functionality provided by stactools, and provide a quick tour of useful stactools packages. I'll also highlight issues and features where the community can help contribute to make stactools better!

A reflection on my experience of using STAC & AWS S3 for hosting EO data and making it available to OGC API server/client implementations. I will outline what I learnt and what was useful for someone relatively new to STAC.

Kitware offers ResonantGeoData: a Django web application for visualizing and searching geospatial datasets. We added support for interacting with these datasets via the STAC API. This lightning talk will discuss how we made a schema crosswalk from our current API and data models to comply with the STAC API.

Marco Wolsza, Friedrich Schiller University Jena

A new Analysis Ready Data (ARD) product for Copernicus Sentinel-1 is currently being developed: the Sentinel-1 Normalised Radar Backscatter Product (S1-NRB) [1].

This high-quality product is designed to be compliant with the CEOS ARD for Land (CARD4L) NRB specification [2] and aims to be a flexible and easily exploitable data source, also suited for non-expert users. By implementing innovative technologies and standards like Cloud Optimized GeoTIFF (COG) [3], Limited Error Raster Compression (LERC) [4], and the provision of extensive SpatioTemporal Asset Catalogue (STAC) [5] metadata, the accessibility and ease of use of the product can be maximized, especially in cloud computing infrastructures like data cubes, while keeping the data volume as low as possible.


This talk will give a quick overview of how STAC has so far been implemented in the S1-NRB product and prototype processor [6], and how it enables users to easily explore stacks of S1-NRB datasets as on-the-fly data cubes using open source libraries such as StackSTAC [7] or odc-stac [8].


[1] ESA, Sentinel-1 ARD Normalised Radar Backscatter (NRB) Product, 2022, https://sentinel.esa.int/web/sentinel/sentinel-1-ard-normalised-radar-backscatter-nrb-product

[2] CEOS, Analysis Ready Data For Land: Normalized Radar Backscatter, Version 5.5, 2021, https://ceos.org/ard/files/PFS/NRB/v5.5/CARD4L-PFS_NRB_v5.5.pdf

[3] Cloud Optimized GeoTIFF (COG), https://www.cogeo.org

[4] Limited Error Raster Compression (LERC), https://github.com/esri/lerc

[5] SpatioTemporal Asset Catalogs (STAC), https://stacspec.org

[6] S1-NRB prototype processor, https://github.com/SAR-ARD/S1_NRB

[7] StackSTAC, https://github.com/gjoseph92/stackstac

[8] odc-stac, https://github.com/opendatacube/odc-stac



19:00 UTC

15:00 EDT

Jeff Albrecht, Arturo & Jon Healy, SparkGeo

Learn to build your own customized STAC API using stac-fastapi. Note that this tutorial will get into coding and thus requires at least some level of working software knowledge to get value out of it, but all are welcome.

20:00 UTC

16:00 EDT

Hosted by Aimee Barciauskas (Development Seed) and Chris Holmes (Planet)

This is an open forum to give space to anyone new to any aspect of the whole 'Cloud-Native Geo' world to ask questions and learn. There is no set agenda, so show up with your questions. We'll likely stay away from super technical topics, to make this as friendly as possible to beginners.


21:00 UTC

17:00 EDT

In this hands-on tutorial, we'll cover the basics of cloud-native geospatial data analysis at scale. We'll learn about STAC for discovery and searching a large catalog of data assets, and tools for accessing data in a cloud-native way. Note that this tutorial will get into coding and thus requires at least some level of working software knowledge to get value out of it, but all are welcome and you will get to see some cool capabilities.


We'll use the Microsoft Planetary Computer as a source for all the data. Users will be able to log into a JupyterHub deployment in the same Azure region as the data, granting fast access to the data. We'll learn the basics of using Dask for analyzing geospatial data at scale, including distributing computations on a cluster of machines.

22:00 UTC

18:00 EDT

Lightning Talks

Moderator: Dean Hintz, Safe Software

Presentation of work supporting the US Army Corps of Engineers (USACE) Civil Works Business Intelligence Program (CWBI) develop new open source cloud-native solutions for dynamic translation of spatio-temporal vector and raster waterways and climate data. Attendees will learn the architectures, methodologies, and technologies implemented and lessons learned and recommendations for different applications and future research.

ArcGIS provides a deep set of raster data management, analysis, and visualization tools for imagery, terrain, and multidimensional data. ArcGIS makes it easy to use cloud-optimized data formats, and cloud-native workflows across the toolset, APIs, applications, and devices. In this lightning talk, we'll highlight the main interfaces between ArcGIS, COG, STAC, and Zarr.

In this talk, you will learn about the deep learning frameworks and libraries that support cloud-native geospatial data formats for training and deploying machine learning models.


By understanding the current state of cloud-native geospatial data in machine learning, you'll be well-positioned to avoid common challenges and successfully build valuable models.


With this knowledge, you'll also be empowered to take part in shaping the future of advanced machine learning in the geospatial community.

The Growing STAC Ecosystem in Python

Nathan Zimmerman, Azavea

STAC has been on people's minds for a few years now. Recent advances in tooling are starting to bring its promise to life. In this talk, I'll discuss a few python libraries which are enabling sophisticated workflows for serving catalogs and searching through their contents as well as a complementary project which makes serving image assets from STAC sources a breeze.

At Kitware, we recently developed ResonantGeoData: a Django web application for visualizing and searching geospatial datasets focusing on large geospatial images. We have worked to bring our existing tools for tile serving of pyramidal Tiffs to Django and work within GeoDjango’s offerings for quickly deploying geospatial image serving web applications. This lightning talk will cover the limitations of working with COGs for tile serving in GeoDjango and overview an approach we have taken to address some limitations.

Hub is a cloud-first format for storing large-scale datasets optimized for deep learning. We will discuss the future of handing off aerial image data to compute and visualizing datasets at scale in-browser.


References

- Hub - https://github.com/activeloopai/hub

- Visualization - https://app.activeloop.ai

How WeatherLayers uses STAC & COG to integrate weather visualization into web map applications

Grega Milcinski, Sinergise/Sentinel Hub

Handling variability of COG and ZARR within Sentinel Hub


Sentinel Hub is a cloud-native processing service for Earth Observation data. It is tightly integrated with various open and commercial collections (Sentinel, Landsat, PlanetScope and others). We have also added bring your own data option for COG and ZARR formats. While implementing support for these two, we realized that there is not just one "COG" and "ZARR". There are plenty. We will share our experience on working with these two.


docs.sentinel-hub.com/api/latest/api/byoc/

docs.sentinel-hub.com/api/latest/api/zarr/


Break 23:00 UTC April 19th - 02:00 UTC April 20th (Wednesday)

02:00 UTC April 20th

22:00 EDT April 19th

Organizational Perspectives 2

Moderator: Alex Leith, Geoscience Australia

Daniel Silk, Toitū Te Whenua Land Information New Zealand

Toitū Te Whenua LINZ has a strategic intent to become cloud-native across our geospatial data ecosystem. The Basemaps team has developed aerial imagery map services that use COGs for scalable serverless delivery, along with STAC metadata for operational needs and attribution. Basemaps is open source and the map services are free to use, and we've also contributed to GDAL to improve COG creation along the way.

Pavel Dorovskoy, CarbonMapper, Inc.

I will talk about how CarbonMapper goals, and STAC ecosystem contributes to our pipeline, data delivery and services.

In 2019 Geoscience Australia announced the creation of the ambitious Digital Earth Africa (DE Africa) initiative, modelled on the success of Digital Earth Australia. DE Africa’s mission is to produce decision-ready products and to harness and increase the capacity of Earth observation users across the African continent. This is achieved by developing a platform on the public cloud that makes petabytes of Earth observation (EO) data accessible in Africa for free.


This talk will explore how cloud native geospatial technologies, such as the Cloud Optimised GeoTIFF data storage format and Spatio-Temporal Asset Catalog metadata standard, enable a small team of people to organize, share and analyse these petabytes of data. We’ll discuss how we work with the global EO community to develop standards that enable federation and interoperability. And we’ll demonstrate how DE Africa has been able to engage with people and organisations across Africa to help build capacity, from enabling individuals to run scientific analyses through to national space agencies setting up their own platform.

Using STAC to create reproducible ML Workflows on Radiant MLHub

Cloud Optimized GeoTIFF (COG) format is widely adopted in the industry for interactive analytics and dynamic maps. However it's a core enabler of less explored, but much more powerful workflows including real-time AI. We will demonstrate how COG helps disaster response allowing to detect damaged infrastructure at city scale in minutes or even seconds.

Cyrus Nikko A. Pante, Philippine Space Agency (PhilSA)

The DIWATA Image Browser is a QGIS plugin that allows users to browse and download optical images captured by the DIWATA-2 satellite, a technology demonstration satellite owned by the Philippines. This plugin utilizes STAC and COG to index, visualize, and distribute satellite images. We will demonstrate how the QGIS environment communicates with the STAC endpoint to search and retrieve desired STAC items. The plugin speeds up the workflow by letting users load the images straight to the QGIS environment, bypass data preparation and make data ready for analysis.

"The Charter is a worldwide collaboration, through which satellite data are made available for the benefit of disaster management. By combining Earth observation assets from different space agencies, the Charter allows resources and expertise to be coordinated for rapid response to major disaster situations; thereby helping civil protection authorities and the international humanitarian community.

Terradue was selected to design and operate a new online service to visualize and manipulate the satellite acquisitions at full resolution. After several months of development, in September 2021 a new portal named ESA Charter Mapper, was officially opened to support Charter operations and in particular the product screening activities. Behind the portal is deployed a cloud native platform integrating the latest state of the art technologies for a seamless visualization and manipulation of the satellite imagery directly from a web browser."


Break 02:00 UTC - 09:00 UTC April 20th (Wednesday)

09:00 UTC April 20th

05:00 EDT April 19th

Lightning Talks

Moderator: Sanket Verma, Zarr Community Manager

Björn Harrtell, Septima P/S

History and purpose of the cloud optimized vector format FlatGeobuf

Cloud-optimized formats promise seamless, efficient consumption of geodata using web standards. Solutions ultimately depend on a vendor, and object storage quirks or differences in feature sets stand in the way of applications that “just work”.


This lightning talk overviews what I learned about each cloud in designing and deploying the open PMTiles archive specification for raster or vector tile pyramids, but with lessons applicable to other formats like COG and FlatGeobuf. Topics include: latency trade-offs of Byte Serving, compression, security, cost planning, CORS, and HTTP/2.

Some crops, such as corn and soy, are difficult to differentiate in a single snapshot of time due to the similarity of their reflectance, aka spectral signatures. In this talk, we use temporal analysis with Planet COGs to investigate the change in the spectral signatures over the growth cycle and identify how these temporal signatures can be used to differentiate difficult crops.

EODAG usage demonstration for searching and accessing data from various data providers (STAC or not) into Xarray Datasets

Matthias Mohr and Edzer Pebesma, WWU Münster

The open standards, open source geospatial and open science communities still have a very limited answer to the question how researchers active in applied domains such as agriculture, ecology, hydrology, oceanography or land use planning can benefit from the large amounts of open Earth Observation (EO) data currently available. Solutions are very much tied to platforms operated and controlled by big tech (Earth Engine, Planetary Computer), particular programming languages, software stacks and/or file formats (xarray, Pangeo, ODC, GeoPySpark/GeoTrellis). The openEO initiative provides an API and a set of processes that separate the “what” from the “how”: users specify what they want to compute, and back-end processing engines decide how to do this.


The openEO API is OpenAPI compliant, and has client interfaces for Python, R, and JavaScript, and in addition graphical user interfaces running in the browser or in QGIS. The underlying data model is that of a data cube view: image collections or vector data may be stored as they are, but are analysed as if they were laid out as a raster or vector data cube, e.g. for raster with dimensions x, y, band and time, or for vector with dimensions geometry, band and time. Because openEO assumes that imagery is described as STAC collections and the implementation is composed of open source components, it is relatively easy to set it up and compute on infrastructure where imagery is available through a STAC interface.


Having a single interface to carry out computations on back-ends with different architecture makes it possible to compare results across implementations, to verify that EO processing is reproducible. So far, over 100 processes have been defined, and user-defined functions written in Python or R extend this ad infinitum. Since the initiative is designed to be an open science, all users and developers are invited to engage. We will present what openEO is and how it's coupled with and uses other cloud-native technologies.

I will talk about the Xpublish tool, a flexible FastAPI web application for serving Xarray data in Zarr and other formats. I'll briefly cover how Xpublish works and how it can be thought of as a flexible on-demand tile server for derived datasets.

Fabian Schindler, EOX IT Services GmbH

How to effectively use geotiff.js in Browsers

10:00 UTC

06:00 EDT

This tutorial will explain how STAC/COG perfectly fits tools like Coiled to scale satellite data processing.

Note that this tutorial will get into coding and thus requires at least some level of working software knowledge to get value out of it, but all are welcome.


The challenge in NetCarbon’s solution is the deployment of earth observation insights at scale. And be able to shift between cloud providers or on-premise architecture if needed. The best tool for us is up-to-now PANGEO.


An example of our pangeo usage will be shown in the following three points.


1°) Connection to satellite data / Extract


2°) Processing satellite data at scale / Transform


3°) Saving the data within a data warehouse / Load


First, some of the building blocks to search for satellite data based on STAC will be shown. Moreover, the stackstac package will be tested to convert STAC into xarray, allowing researchers and companies to create their datacubes with all the metadata inside.


The second part of the presentation will involve the computation layer. Indeed, computations algorithms like filtering by cloud cover, applying cloud mask, computing the land surface temperature, and applying an interpolation will be run. Land surface temperature is one data needed for the NetCarbon algorithm. The result of these previous steps will lead us to retrieve a dask computation graph. This computation graph will be run at scale within the cloud, based on Dask and Coiled.


To conclude, the output of the processing part (spatial and temporal mean of the land surface temperature) will be displayed within a notebook and finally, the data will be loaded into a data warehouse (google bigquery).


All the steps will be demonstrated in a reproducible notebook

11:00 UTC

7:00 EDT

Organizational Perspectives 3

Moderator: Joana Simoes, OGC

Challenges of Migration from In-House Catalog of Satellite Imagery to the cloud-based environments and techniques (STAC, COG,etc.)

Rhys Evans, CEDA (Centre for Environmental Data Analysis)

The Centre for Environmental Data Analysis (CEDA) is a UK national data centre for atmospheric and earth observation research. We host over 15 Petabytes of atmospheric and earth observation data from a range of different sources, including aircraft campaigns, satellites, automatic weather stations, and climate models, amongst many more. Although there are data standards, these standards vary through time and across different domains. This leads to a heterogeneous archive in which there are many interpretations of how to structure datasets, describe metadata, and format the actual data files. Such an archive, containing 100s of millions of individual files, presents a significant indexing and cataloguing challenge.


Our existing work aims to produce a service that allows consistent and performant search across the entire archive. This standard would need to be flexible enough to meet the different data formats within the archive, while being specific enough to allow for faceted and free-text search. Additionally, given the size and growth rate of the CEDA archive it would need to be scalable. We are developing a solution built on the SpatialTemporal Asset Catalog (STAC) standard, with an elasticsearch backend. This has led to the creation of generators to build the STAC indexes, an elasticsearch backed API, and web and python clients. Our prototype may have applications across any environmental data domain as well providing a possible next-generation Earth System Grid Federation (ESGF) search service.

UP42 provides access to a variety of data sources. This variety of data sources brings the variety of the API definitions at every data ordering step, and search in particular. At UP42 we have made a decision to use STAC spec as the unification layer between data hosts and the backend that serves the Public API. The talk highlights the value of using a single specification across the industry.

The Global Wind Atlas helps wind energy specialists from around the globe to assess the wind energy potential on any given location on earth. More than 30.000 wind energy seekers per month visit globalwindatlas.info to find wind resource hot spots anywhere on earth. The web interface is designed to make it very easy for users to switch between layers and regions. The usage of COG's is a crucial part of the interface, this talk will explain how we used COG's to:

  • serve global raster files with 250m resolution in a mixed approach (pre-cooked tiles + dynamic tiling using TiTiler)

  • apply dynamic & highly interactive colour scaling to the raster layers directly in the interface by letting TiTiler serve TIFF map tiles instead of PNG map tiles

  • clip custom areas from global COG files on-the-fly (user-request-based)


General info about Global Wind Atlas:

The Global Wind Atlas is a free, web-based application developed to help policymakers and investors identify potential high-wind areas for wind power generation virtually anywhere in the world, and perform preliminary calculations. The tool facilitates online queries and provides freely downloadable datasets based on the latest input data and modeling methodologies.

The Global Wind Atlas is the product of a partnership between the Department of Wind Energy at the Technical University of Denmark (DTU Wind Energy), Vortex and the World Bank Group (consisting of The World Bank and the International Finance Corporation, or IFC). Work on GWA was primarily funded by the Energy Sector Management Assistance Program (ESMAP). The GWA interface has been reimagined and developed from the bottom up by Nazka Mapps (www.nazka.be)

Basile Gousssard, NetCarbon

NetCarbon will talk about how we managed to connect to multiple satellite data thanks to STAC and COG

Our use of stackstac and stac-fastapi within Satellite Vu enables us consistent data access to our flight trial data to do repeatable experimentation, integrating data from multiple STAC data sources. Will also mention our public static STAC for access to sample data from our flight trials.

Tom Kralidis, Meteorological Serivce of Canada

This talk will discuss the use of STAC as part of weather/climate/water workflows at the Meteorological Service of Canada, and considerations as part of UN World Meteorological Organization (WMO) future data exchange activities.

12:00 UTC

8:00 EDT

Ryan Abernathey, Columbia University & Sanket Verma, Zarr Community Manager

This tutorial will give a live hands-on introduction to working with Zarr in the cloud using Python. Participants will be encouraged to follow along in binder notebook. We will introduce the core objects in the zarr-python library--storage, arrays, and groups--and explain how to create and manipulate Zarr data directly. We will see how zarr-python leverages fsspec to provide interoperability with all major cloud object storage protocols. We will then shift gears to accessing Zarr via Xarray and explore some real-world use cases using public data from the Coupled Model Intercomparison Project (CMIP6) stored in S3. Materials will be posted at github.com/zarr-developers/tutorials

13:00 UTC

9:00 EDT

Zarr Overview & Lightning Talks

Moderator: Sanket Verma, Zarr Community Manager

Sanket Verma, Zarr Community Manager & Ryan Abernathey, Columbia University

Zarr is a cloud-native format for the storage of large multidimensional arrays. This overview will present the motivation for Zarr, the core Zarr specification, and the implementations of the Zarr protocol. We will focus in on the Zarr Python ecosystem and how Zarr interoperates with Numpy, Dask, Xarray, and Filesystem Spec. We will also present some new developments in the Zarr and review the ongoing effort to standardize the representation of geospatial data raster data in Zar

Data service for weather data based on zarr

Nestor Tarin Burriel, MeteoSwiss

Gridefix is a data store and data retrieval API for gridded meteorological data built around zarr and xarray. Data stored in Gridefix are used for training statistical and machine learning methods for enhancing the quality of weather forecasts. The software consists of distinct modules providing data storage, import, catalogue, and retrieval.


This talk will focus on the Architecture using zarr together with PostgresSQL as metadata store and MinIO as chunk store.

Visualizing Zarr data in web maps

Kata Martin, CarbonPlan

I will talk about the @carbonplan/maps library we built to visualize Zarr chunks as map "tiles" in interactive web maps. I will cover how the library integrates with Zarr and briefly demo maps we've built with the library.

CMIP6 Zarr Example Calculations Across 100s of CPU cores using Dask on AWS

Ethan Fahy, Amazone Web Services (AWS)

We'll perform example calculations with the CMIP6 Zarr-formatted dataset from the AWS Open Data Program in a Jupyter notebook. We'll use XArray and Dask running on Amazon EKS to parallelize these calculations across hundreds of CPU cores to show how much faster these calculations can be with lots of CPU cores. And when the calculations are done, we will automatically scale our CPU cores back down to demonstrate how you only need to pay as you go for what you need when you take advantage of cloud computing.

Exploring Cloud backends for the new OGC Environmental Retrieval API

Shane Mill, NOAA - NWS - MDL

1. Use of Pangeo Technologies within an AWS Cloud Implementation of the OGC - EDR-API

- The use of Pangeo technologies provides optimized trimming and slicing of geospatial data. Xarray, Dask, Zarr, and Cloud Optimized GeoTIFF are the technologies that we have focused on.

- There is great synergy and growing support between the OGC and ESIP and the underlying cloud technologies to support API's.

- The EDR-API Specification allows great flexibility in how to configure an individual collection's backend, that doesn't necessarily need to use Pangeo.


2. Assessing the OGC EDR-API Performance with Big Data Program S3 Buckets

- It is difficult for users to interface with Big Data Program S3 Buckets that aren't Cloud Optimized.

- Through the ESIP NOAA Cloud Pathfinder Project, we are investigating cloud technologies and the interaction with BDP Buckets regardless of backend data formats. Zarr, NetCDF, Cloud Optimized GeoTIFF, and HDF are being considered.

- Through the ESIP NOAA Cloud Pathfinder Project, we are interacting with the National Water Model archive within a Big Data Program S3 Bucket in the form of Zarr, and we have found that there is a low barrier when interacting directly with Zarr.

- An important feature of the EDR-API is its alignment with CoverageJSON, which can be used both as a data format as well the visualization of backend datastores.

- Pangeo and the use of Dask provides a lot of versatility to horizontally and vertically scale the computation of data queries. For example, a Dask LocalCluster as well as a Dask SlurmCluster using AWS ParallelCluster have been prototyped with EDR-API implementation.


3. Incorporating OGC API's as Building Blocks - Integrating OGC API - Processes and OGC EDR-API

- We have created an interactive client that demonstrates the use of multiple OGC API's as building blocks.

- There has been a buildout of Computational and Derived Parameter classes of Processes.

- The goal is to provide some decision support mechanism for users while at the same time reducing provider bandwidth utilization.

- Cloud Optimized GeoTIFF is one of several output formats supported with the OGC API Building Block client."

14:00 UTC

10:00 EDT

Charles Stern, Lamont-Doherty Earth Observatory (LDEO) and Rachel Wegener, Department of Atmospheric and Ocean Science, University of Maryland

This tutorial will present a hands on introduction to Pangeo Forge, a new open source library and cloud platform for the production of Analysis Read, Cloud Optimized (ARCO) data from legacy data formats. The tutorial will focus on the most common use case in our community: extracting NetCDF files from a data provider server and transforming them into a single Zarr archive in the cloud. We will also discuss other possible use cases, including application of Pangeo Forge to the production of COGs. The target audience of the tutorial are data users who appreciate working with ARCO data formats but don't necessarily know how to generate them themselves. Familiarity with basic python coding expected.


The tutorial material is online here: https://pangeo-forge.readthedocs.io/en/latest/introduction_tutorial/index.html

15:00 UTC

11:00 EDT

Lightning Talks: Cloud-Native Vector

Moderator: Joana Simoes, OGC

Spatial Cloud Data Warehouses: The race towards a fully native geospatial cloud (10 Minutes)

Javier de la Torre, Carto


GeoParquet: a columnar format for geospatial vector data using Apache Parquet

Joris Van den Bossche, Voltron Data / GeoPandas

This lightning talk will give an introduction about Apache Parquet, how to store geospatial vector data in it with the GeoParquet specification, and why you would want to use this.

Spatial indexing in GeoParquet format

Eugene Cheipesh, Azavea

Update on the design of GeoParquet format, how it relates to spatial indexing of large vector datasets, and benchmarks and prototypes explored to date.

Potential of GeoParquet and GeoArrow on the Web

Kyle Barron, Foursquare

The new GeoParquet and GeoArrow specifications under discussion bring great promise to using Cloud-Native vector data from the Web. This talk will discuss the state of GeoParquet and GeoArrow and how to get involved.

GeoParquet Support in Apache Sedona

Mo Sarwat, Wherobots, Inc.

Learn about Apache Sedona, an open-source software for processing geospatial data at scale. We will also describe our plans to support the GeoParquet format in Sedona.

Cloud native formats for planet-scale vector tilesets

Markus Tremmel

The last few month i worked with COMTiles (https://github.com/mactrem/com-tiles) on a case study of a streamable and read optimized cloud native archive format with the focus on the visualization of planet-scale tilesets in the browser. One main design goal of COMTiles is to minimize the number of HTTP GET range requests for the download of parts of the index for performance and cost reasons. In a COMTiles archive most of the time only one additional pre-fetch per zoom level is needed before accessing the map tiles for the current viewport of the map. This is bearly or not at all noticeable for the user regarding the user experience and workflow (see https://www.youtube.com/watch?v=5StxZbfvMUw). With ordering the index on space-filling curves (Hilbert, Z-Order, Row-Major), packing tile pyramids into directories and aggregating the index in fragments three different approaches has been evaluated and should be presented in this talk. COMTiles are already successfully used in some projects with significantly reducing the hosting costs because no tile server or backend (database, server) is needed.


Proposed structure of the talk

- Basic concepts of COMTiles e.g. how to structure the index (pyramid of directories vs space-filling curves vs fragments) and how to define a tileset as part of the metadata based on the ""OGC Two Dimensional Tile Matrix Set"" draft

- Concept of tile request batching to reduce the number of requests by up to 90%

- Comparison of existing cloud native (vector) formats regarding the visualization of large geospatial datasets in the browser (PMTiles, FlatGeobuf, Tapalcatl, Cotar, COMTiles)

- Advantages regarding browser based visualization of streamable archive formats (like COMTiles or PMTiles) over directly hosting the map tiles in the cloud and over extending existing chunked formats like Zarr or TileDB for geospatial support

Cloud native vector data with TileDB

Norman Barker TileDB

Querying and accessing vector data with TileDB arrays

16:00 UTC

12:00 EDT

Lightning Talks

Moderator: Alex Mandel, Development Seed

How realtime geospatial data can work

Tom Macwright, Placemark

Collaboration in maps has many forms, from a geodatabase on a shared drive to web-native tools that update instantly. This talk will cover the cutting edge of the collaboration technologies like CRDTs that are being used in many fields, and how they work well - and poorly - with geospatial data. Can realtime geospatial data be efficient? Can it be decentralized? This talk will reference Placemark, a geospatial platform in development, but cover this from the side of the technology and from the view of where this field might be going.

QGIS STAC API Browser

Samweli Mwakisambwe, Kartoza

Since the development of STAC started, the STAC ecosystem was not able to use the STAC data in desktop softwares. Recently through collaboration between Kartoza and Microsoft, a QGIS (a desktop GIS application) plugin called “STAC API Browser” was developed to bridge the gap between QGIS users and STAC data.


Now using “STAC API Browser” users can access, download, analyze and use a vast amount of imagery data offered by various STAC specification providers, such as Microsoft Planetary Computer.


The aim of this talk is to introduce the “STAC API Browser” plugin, give a guide on how to use the plugin inside QGIS, showcase cool things that the plugin supports and how users/developers can collaborate on the plugin project.

Cloud-ready GeoServer

Andrea Aime, GeoSolutions

We will discuss GeoServer support for cloud technologies, in particular blob storage, COG, STAC, FlatGeoBuf, and the geoserver-cloud project.

Cloud Native Geo at Arturo

Jeff Albrecht, Arturo

How cloud native geospatial technology has transformed Arturo's tech and product over the past several years.

GeoRasterLayer: Cloud-Optimized GeoTIFF Visualization

Daniel Dufour, GeoSurge, LLC.

GeoRasterLayer is a LeafletJS plugin that visualizes GeoTIFFs. It takes full advantage of cloud-optimization by only reading the parts of a GeoTIFF (tiles) that fall within the bounds of a web map. It can run completely independent of a server by directly accessing a Cloud-Optimized GeoTIFF stored on S3 and other static hosting providers.

stac-rs: a Rust implementation of the STAC specification

Peter Gadomski, Element 84

A quick tour through my Rust implementation of the STAC spec. Includes:

- Architecture decisions

- Where stac-rs could fit in to the STAC tooling ecosystem

- Roadmap"

Using Leafmap to Visualize COG and STAC with Minimal Coding

Qiusheng Wu, University of Tennessee, Knoxville

Leafmap is a Python package for interactive mapping and geospatial analysis with minimal coding in a Jupyter environment. It provides an interactive graphical user interface for working with vector and raster datasets. Built upon localtileserver and titiler, leafmap supports visualizing any local raster datasets and Cloud Optimized GeoTIFFs (COG). Users can also use leafmap to discover a STAC catalog (e.g., Microsoft Planetary Computer), filter data, and visualize data interactively without coding. This presentation highlights the use of leafmap and the Jupyter ecosystem for visualizing COG and STAC.

STAC Discovery Tool. Data-driven data selection.

Mykola Kozyr, UP42

The talk highlights the flexibility of STAC and the values it brings by integrating into Location Business Intelligence tools. The STAC Discovery tool is just a first step towards making data-driven decisions about which data can and should be used for a given use case.

The end of the talk will showcase the concept of a scalable platform that enables discovering data over multiple data sources.

17:00 UTC

13:00 EDT

This tutorial will explore he use of the STAC API Filter Extension to query catalog with CQL2. No software development experience is required, but assumes at least a basic understanding of STAC and STAC API.

18:00 UTC

14:00 EDT

Cloud-Optimized Point Clouds (COPC) and Various Talks

Moderator: Matt Hanson, Element 84

An overview of what COPC is, why it was developed, the problem it solves, and pointers to tools people can use to go forward to build their own solutions.

STAC Video Extension Introduction

Darren Wiens, Sparkgeo

This talk will introduce potential users to the STAC video extension

STAC and Google Earth Engine

Kurt Schwehr, Google Earth Engine

This talk will share Google Earth Engine's support of STAC and describe the experience of switching from a custom proto/yaml catalog format to STAC built on Jsonnet

Defining STAC in an uncompromising way

Serhii Hulko, UP42

The current definition of STAC specification is flexible, sometimes too flexible. This brings huge benefits in terms of being able to cover a variety of data catalogs. At the same time, the experience of developing integrations on top of such a flexible specification makes it difficult to handle. The talk focused on the idea of defining a more strict definition of STAC.

Building Communities leveraging Cloud-Native Geospatial Open Data on AWS

Mike Jeffe, Amazon Web Services (AWS)

This talk will focus on how users can find public STAC enabled geospatial data on AWS, talk about some customer examples and a few exciting opportunities ahead .

Geocube and MultiDataset COG

Vincent Varoquaux, Airbus DS Geo SA

The Geocube, a cloud-native geospatial database, optimized to deliver timeseries of images thanks to the MultiDataset COG, a modified version of the COG that stores several images efficiently.

Cloud-Native Geospatial and the Open Geospatial Consortium

Scott Simmons, OGC

This talk will share all that the OGC is doing in support of cloud-native geospatial, giving a status update of the official adoption of key cloud-native geospatial standards like COG and Zarr.

I3S: A Web Optimized Format for Large 3D Data

Matt Berra, Esri

Esri has been collaborating with the OGC community on Indexed 3D Scene Layers (I3S) since 2017. As a format that was specifically designed for the rapid streaming of large caches of 3D data over the web, the I3S dataset could integrate as part of a cloud-native focused geospatial solution.

Using Cloud-Optimized GeoTIFFs to Support Scalable, On-Demand Drone Solutions at Airspace Link

Quercus Hamlin, Airspace Link

Airspace Link is creating the digital infrastructure required to support the safe and legal use of drones in communities at scale, advancing the growth of drone operations, drone service providers, drone manufacturers, package delivery, and air taxi deployment in the future. These scalable, on-demand services are built on an extensive repository of geospatial data managed in ArcGIS Online and Azure stack. Cloud-Optimized GeoTIFFs and open-source Python package rasterio form the foundation of Airspace Link’s scalable raster data analytics.


By using COGs, we can aggregate raster data such as land use, elevation, and population density into h3 hexagon data in seconds supporting diverse analytical calculations including on-demand suitability, attitude and elevation data transformations, and population density distribution forecasts. COG infrastructure also enables sub-second delivery of point-based elevation telemetry. Overall, COGs have proven an exceptional addition to our stack and we are well positioned to integrate other cloud-native geospatial formats in the future.

Visualize, Download, and Use ESRI 10m Global Land Use Dataset in QGIS using QGIS Actions and COGs

Abdul Raheem Siddiqui, Dewberry

I will just go through the following article that I published at medium.com/p/71667c623311.

Cloud native formats and the OGC API

Jerome Jacovella-St-Louis, Ecere

How cloud native formats (COG, COPC, Zarr, GeoParquet...) can serve both as an efficient backend for OGC API implementations (either as remote or local data sources), enabling suppport for accessing Area of Interest and Resolution of Interest (downsampled overviews), as well as an output format for the data delivery end-points (e.g., /coverage, /map/, /items).