Tips for parallelization in GRASS GIS

in the context of land change modeling

Anna Petrasova, Vaclav Petras

NCSU GeoForAll Lab at the Center for Geospatial Analytics
NC State University

Anna Petrasova

Geospatial Research Software Engineer at the Center for Geospatial Analytics, NC State University
GRASS GIS Development Team Member
GRASS GIS Project Steering Committee Member
Open Source Geospatial Foundation Charter Member

GRASS GIS

Open-source GIS and geoprocessing engine
Processing tools: 400+ in core, 400+ addons
Interfaces: graphical, command line, Python, C
3rd party interfaces: actinia (REST API), R, QGIS, OGC WPS

Intro to GRASS parallelization

Tool-level parallelization

r.neighbors input=elevation output=elevation_smoothed size=15  nprocs=4

Workflow-level parallelization


            r.grow.distance input=roads distance=dist_to_roads &
            r.grow.distance input=water distance=dist_to_water &
            r.grow.distance input=forest distance=dist_to_forest &

Multi-threading with OpenMP

parallelization of geospatial algorithms in C
relatively easy integration into existing source code
single code base for all platforms


#pragma omp parallel if(threaded) private(row, col, i)
    {
        int t_id = 0;
#if defined(_OPENMP)
        t_id = omp_get_thread_num();
#endif
        struct input *in = inputs[t_id];
        DCELL *val = values[t_id];
        ...

OpenMP-enabled tools (GRASS GIS 8.2)

r.slope.aspect

r.neighbors

r.mfilter

r.series

r.patch

r.sun

v.surf.rst

r.sim.water

r.sim.sediment

Even more OpenMP-enabled tools

More tools coming in GRASS GIS 8.3:

r.resamp.interp

r.resamp.filter

r.univar

Authors of OpenMP code:
Hofierka et al. 2017 and Aaron Saw Min Sern (GSoC 2021)

Multi-processing with Python

multiprocessing package and others
separate OS processes


            from multiprocessing import Pool
    
            def compute(value):
                # do something with the value
    
            with Pool(processes=4) as pool:
                pool.map_async(compute, range(0, 10))

Used by GRASS tools and addons calling other GRASS tools:

r.sun.daily, r.in.usgs, t.rast.what, r.viewshed.exposure, ...

Multiple independent tasks: Python


            from multiprocessing import Pool
            import grass.script as gs
    
            def inundate(level):
                name = f"inundation_{level}"
                gs.run_command("r.lake", elevation="elevation", lake=name,
                               seed="seed", water_level=level)
                return name
    
            with Pool(processes=4) as pool:
                maps = pool.map_async(inundate, range(0, 10)).get()

Multiple independent tasks: Bash

Run tasks in the background:


        r.grow.distance input=roads distance=dist_to_roads &
        r.grow.distance input=water distance=dist_to_water &
        r.grow.distance input=forest distance=dist_to_forest &

Run tasks with GNU Parallel (or alternatives)


        echo r.grow.distance input=roads distance=dist_to_roads > jobs.txt
        echo r.grow.distance input=water distance=dist_to_water >> jobs.txt
        echo r.grow.distance input=forest distance=dist_to_forest >> jobs.txt
        parallel --jobs 3 < jobs.txt

Run "hybrid" tasks:


        r.neighbors input=forest output=forest_percentage size=37 nprocs=4 &
        r.neighbors input=wetland output=wetland_percentage size=37 nprocs=4 &

Tiling approach


    from grass.pygrass.modules.grid import GridModule

    grd = GridModule("v.to.rast", input="roads", output="roads",
                     use="val", processes=4)
    grd.run()


    r.mapcalc.tiled "log_dist = if (dist == 0, 0, log(dist))" nprocs=4

Tips & Tricks & Benchmarks

r.neighbors benchmark 400 million cells

\[\mbox{parallel efficiency} = \frac{\mbox{serial processing time}}{N \times \mbox{parallel processing time with N cores}} \]

➨ Use more cores for r.neighbors and r.mfilter with large window sizes.

Even more benchmarks in manual pages

r.series

r.patch

r.neighbors

(derived with GRASS Benchmarking library)

➨ Use 4 cores to get most speed improvements with high parallel efficiency.

Multiple tasks with different region


        import os
        from multiprocessing import Pool
        import grass.script as gs
        
        def viewshed(point):
            x, y, cat = point
            name = f"viewshed_{cat}"
            os.environ["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
                                                       n=y + 300, s=y - 300,
                                                       align="elevation")
            gs.run_command("r.viewshed", input="elevation", output=name,
                           coordinates=(x, y), max_distance=300)
            return name
        
        # viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
        with Pool(processes=4) as pool:
            maps = pool.map_async(viewshed, viewpoints).get()

Multiple tasks with different region


        import os
        from multiprocessing import Pool
        import grass.script as gs
        
        def viewshed(point):
            x, y, cat = point
            name = f"viewshed_{cat}"
            os.environ["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
                                                       n=y + 300, s=y - 300,
                                                       align="elevation")
            gs.run_command("r.viewshed", input="elevation", output=name,
                           coordinates=(x, y), max_distance=300)
            return name
        
        # viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
        with Pool(processes=4) as pool:
            maps = pool.map_async(viewshed, viewpoints).get()

Multiple tasks with different region


        import os
        from multiprocessing import Pool
        import grass.script as gs
        
        def viewshed(point):
            x, y, cat = point
            name = f"viewshed_{cat}"
            env = os.environ.copy()
            env["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
                                                n=y + 300, s=y - 300,
                                                align="elevation")
            gs.run_command("r.viewshed", input="elevation", output=name,
                           coordinates=(x, y), max_distance=300, env=env)
            return name
        
        # viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
        with Pool(processes=4) as pool:
            maps = pool.map_async(viewshed, viewpoints).get()

Tiling approach: large overhead

Overhead from merging data and I/O:

process (12 cores, 2 billion cells)	run time (MM:SS)	speedup	efficiency
rasterization with GridModule	00:35	1.9	16 %
r.mapcalc.tiled (simple expression)	00:39	1.8	15 %

➨Use for large data

➨Long running in-memory computations

➨ Use r.mapcalc.tiled for complex raster algebra expressions

Tiling approach: tiling scheme

➨Slice data horizontally

Running GRASS in non-interactive session

Run module in existing mapset:


        grass ~/grassdata/US_albers/visibility --exec r.viewshed input=elevation ...

Run module in a newly created mapset:


        grass -c ~/grassdata/US_albers/visibility --exec input=elevation ...

Run Python script in existing mapset:


        grass ~/grassdata/US_albers/visibility --exec python viewshed_script.py

Run Python script in a temporary mapset:


        grass --tmp-mapset ~/grassdata/US_albers/ --exec python viewshed_script.py

Running GRASS commands in parallel

Generate commands:

jobs.sh
grass ~/grassdata/nc_spm_08_grass7/analysis1 --exec python myscript.py 1
grass ~/grassdata/nc_spm_08_grass7/analysis2 --exec python myscript.py 2
grass ~/grassdata/nc_spm_08_grass7/analysis3 --exec python myscript.py 3
grass ~/grassdata/nc_spm_08_grass7/analysis4 --exec python myscript.py 4
grass ~/grassdata/nc_spm_08_grass7/analysis5 --exec python myscript.py 5
grass ~/grassdata/nc_spm_08_grass7/analysis6 --exec python myscript.py 6
...

Run in parallel:


parallel --jobs 8 < jobs.sh

Caveats: r.mask

➨ Do not use mask in parallel within the same mapset:


        from multiprocessing import Pool
        import grass.script as gs

        def process(category):
            gs.run_command("r.mask", raster=f"mask_{category}")
            # do some other stuff

        with Pool(processes=4) as pool:
            maps = pool.map_async(process, range(0, 10)).get()

This will be addressed in GRASS 8.3 with PR 2390 and PR 2392

Caveats: r.reclass

➨ Do not reclassify from the same raster in parallel with r.reclass (creates virtual raster)

Caveats: r.reclass

➨ r.reclass creates backlinks by editing the raster file.
Instead, use e.g., r.recode to create a copy.

Multiprocessing in Jupyter Notebooks

This may not work from Jupyter Notebook:

Multiprocessing in Jupyter Notebooks

Scaling urban growth model

FUTURES

FUTure Urban-Regional Environment Simulation
explicitly captures the spatial structure of development
stochastic, flexible (in terms of predictors, scenarios)
implemented as r.futures tool set in GRASS Addons

cnr.ncsu.edu/geospatial/research/FUTURES

FUTURES case studies

Parallelized workflow for Southeast US at 30 m (2 billion cells)

➨ github.com/petrasovaa/parallel-GRASS-FUTURES-notebook

Scaled up to Contiguous US (16 billion cells) on institutional HPC

When parallelization matters

r.futures.devpressure (internally uses r.mfilter)

window size	cores	run time (HH:MM)	serial time
61 x 61	32	4:20	4.9 days

When parallelization is less important

tool	cores	run time (MM:SS)	serial time (MM:SS)
r.slope.aspect	12	11:54	36:50
r.mapcalc.tiled	12	09:44	20:33

Parallelization by US states

simulation doesn't fit into memory
simulation itself is not parallelized
we need a lot of stochastic runs

50 states x 50 stochastic runs => 2500 cores

Parallelization by US states

Distribute tasks across nodes using MPI (Message Passing Interface):

jobs.txt
        grass ~/grassdata/albers/state_1 --exec r.futures.simulation subregions=state_1 ...
        grass ~/grassdata/albers/state_2 --exec r.futures.simulation subregions=state_2 ...
        grass ~/grassdata/albers/state_3 --exec r.futures.simulation subregions=state_3 ...

submit.sh
        mpiexec python -m mpi4py -m pynodelauncher jobs.txt

github.com/ncsu-landscape-dynamics/pynodelauncher

Parallelization by US states

Gets tricky if some processes take much longer than others (looking at you, Texas)

texasproud.com/how-big-is-texas-its-huge

Parallelization by US states

Gets tricky if some processes take much longer than others (looking at you, Texas)

texasproud.com/how-big-is-texas-its-huge

Results

Atlanta

New York

San Francisco

Phoenix

Houston

Resources

This talk:

petrasovaa.github.io/FUTURES-CONUS-talk/foss4g2022.html

geospatial.ncsu.edu/engage/

More benchmarks

12 cores, 2 billion cells Southeast US at 30 m resolution, System76 Thelio with Ubuntu 20.04

tool	window size	run time (MM:SS)	speedup	efficiency
r.slope.aspect	3 x 3	00:32	2.7	22.6 %
r.neighbors	37 x 37	09:16	10.1	83.8 %
r.mfilter	61 x 61	48:33	11.2	93.4 %