Anna Petrasova, Vaclav Petras
NCSU GeoForAll Lab
at the
Center for Geospatial Analytics
NC State University
Tool-level parallelization
r.neighbors input=elevation output=elevation_smoothed size=15 nprocs=4
Workflow-level parallelization
r.grow.distance input=roads distance=dist_to_roads &
r.grow.distance input=water distance=dist_to_water &
r.grow.distance input=forest distance=dist_to_forest &
#pragma omp parallel if(threaded) private(row, col, i)
{
int t_id = 0;
#if defined(_OPENMP)
t_id = omp_get_thread_num();
#endif
struct input *in = inputs[t_id];
DCELL *val = values[t_id];
...
More tools coming in GRASS GIS 8.3:
Authors of OpenMP code:
Hofierka et al. 2017 and
Aaron Saw Min Sern (GSoC 2021)
from multiprocessing import Pool
def compute(value):
# do something with the value
with Pool(processes=4) as pool:
pool.map_async(compute, range(0, 10))
Used by GRASS tools and addons calling other GRASS tools:
from multiprocessing import Pool
import grass.script as gs
def inundate(level):
name = f"inundation_{level}"
gs.run_command("r.lake", elevation="elevation", lake=name,
seed="seed", water_level=level)
return name
with Pool(processes=4) as pool:
maps = pool.map_async(inundate, range(0, 10)).get()
Run tasks in the background:
r.grow.distance input=roads distance=dist_to_roads &
r.grow.distance input=water distance=dist_to_water &
r.grow.distance input=forest distance=dist_to_forest &
Run tasks with GNU Parallel (or alternatives)
echo r.grow.distance input=roads distance=dist_to_roads > jobs.txt
echo r.grow.distance input=water distance=dist_to_water >> jobs.txt
echo r.grow.distance input=forest distance=dist_to_forest >> jobs.txt
parallel --jobs 3 < jobs.txt
Run "hybrid" tasks:
r.neighbors input=forest output=forest_percentage size=37 nprocs=4 &
r.neighbors input=wetland output=wetland_percentage size=37 nprocs=4 &
from grass.pygrass.modules.grid import GridModule
grd = GridModule("v.to.rast", input="roads", output="roads",
use="val", processes=4)
grd.run()
r.mapcalc.tiled "log_dist = if (dist == 0, 0, log(dist))" nprocs=4
\[\mbox{parallel efficiency} = \frac{\mbox{serial processing time}}{N \times \mbox{parallel processing time with N cores}} \]
➨ Use more cores for r.neighbors and r.mfilter with large window sizes.
(derived with GRASS Benchmarking library)
➨ Use 4 cores to get most speed improvements with high parallel efficiency.
import os
from multiprocessing import Pool
import grass.script as gs
def viewshed(point):
x, y, cat = point
name = f"viewshed_{cat}"
os.environ["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
n=y + 300, s=y - 300,
align="elevation")
gs.run_command("r.viewshed", input="elevation", output=name,
coordinates=(x, y), max_distance=300)
return name
# viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
with Pool(processes=4) as pool:
maps = pool.map_async(viewshed, viewpoints).get()
import os
from multiprocessing import Pool
import grass.script as gs
def viewshed(point):
x, y, cat = point
name = f"viewshed_{cat}"
os.environ["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
n=y + 300, s=y - 300,
align="elevation")
gs.run_command("r.viewshed", input="elevation", output=name,
coordinates=(x, y), max_distance=300)
return name
# viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
with Pool(processes=4) as pool:
maps = pool.map_async(viewshed, viewpoints).get()
import os
from multiprocessing import Pool
import grass.script as gs
def viewshed(point):
x, y, cat = point
name = f"viewshed_{cat}"
env = os.environ.copy()
env["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
n=y + 300, s=y - 300,
align="elevation")
gs.run_command("r.viewshed", input="elevation", output=name,
coordinates=(x, y), max_distance=300, env=env)
return name
# viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
with Pool(processes=4) as pool:
maps = pool.map_async(viewshed, viewpoints).get()
Overhead from merging data and I/O:
process (12 cores, 2 billion cells) | run time (MM:SS) | speedup | efficiency |
---|---|---|---|
rasterization with GridModule | 00:35 | 1.9 | 16 % |
r.mapcalc.tiled (simple expression) | 00:39 | 1.8 | 15 % |
➨Use for large data
➨Long running in-memory computations
➨ Use r.mapcalc.tiled for complex raster algebra expressions
➨Slice data horizontally
Run module in existing mapset:
grass ~/grassdata/US_albers/visibility --exec r.viewshed input=elevation ...
Run module in a newly created mapset:
grass -c ~/grassdata/US_albers/visibility --exec input=elevation ...
Run Python script in existing mapset:
grass ~/grassdata/US_albers/visibility --exec python viewshed_script.py
Run Python script in a temporary mapset:
grass --tmp-mapset ~/grassdata/US_albers/ --exec python viewshed_script.py
Generate commands:
jobs.sh
grass ~/grassdata/nc_spm_08_grass7/analysis1 --exec python myscript.py 1
grass ~/grassdata/nc_spm_08_grass7/analysis2 --exec python myscript.py 2
grass ~/grassdata/nc_spm_08_grass7/analysis3 --exec python myscript.py 3
grass ~/grassdata/nc_spm_08_grass7/analysis4 --exec python myscript.py 4
grass ~/grassdata/nc_spm_08_grass7/analysis5 --exec python myscript.py 5
grass ~/grassdata/nc_spm_08_grass7/analysis6 --exec python myscript.py 6
...
Run in parallel:
parallel --jobs 8 < jobs.sh
➨ Do not use mask in parallel within the same mapset:
from multiprocessing import Pool
import grass.script as gs
def process(category):
gs.run_command("r.mask", raster=f"mask_{category}")
# do some other stuff
with Pool(processes=4) as pool:
maps = pool.map_async(process, range(0, 10)).get()
This will be addressed in GRASS 8.3 with PR 2390 and PR 2392
➨ Do not reclassify from the same raster in parallel with r.reclass (creates virtual raster)
➨
r.reclass creates backlinks by editing the raster file.
Instead, use e.g., r.recode to create a copy.
Parallelized workflow for Southeast US at 30 m (2 billion cells)
➨ github.com/petrasovaa/parallel-GRASS-FUTURES-notebook
Scaled up to Contiguous US (16 billion cells) on institutional HPC
r.futures.devpressure (internally uses r.mfilter)
window size | cores | run time (HH:MM) | serial time |
---|---|---|---|
61 x 61 | 32 | 4:20 | 4.9 days |
tool | cores | run time (MM:SS) | serial time (MM:SS) |
---|---|---|---|
r.slope.aspect | 12 | 11:54 | 36:50 |
r.mapcalc.tiled | 12 | 09:44 | 20:33 |
50 states x 50 stochastic runs => 2500 cores
Distribute tasks across nodes using MPI (Message Passing Interface):
jobs.txt
grass ~/grassdata/albers/state_1 --exec r.futures.simulation subregions=state_1 ...
grass ~/grassdata/albers/state_2 --exec r.futures.simulation subregions=state_2 ...
grass ~/grassdata/albers/state_3 --exec r.futures.simulation subregions=state_3 ...
submit.sh
mpiexec python -m mpi4py -m pynodelauncher jobs.txt
github.com/ncsu-landscape-dynamics/pynodelauncher
texasproud.com/how-big-is-texas-its-huge
texasproud.com/how-big-is-texas-its-huge
12 cores, 2 billion cells Southeast US at 30 m resolution, System76 Thelio with Ubuntu 20.04
tool | window size | run time (MM:SS) | speedup | efficiency |
---|---|---|---|---|
r.slope.aspect | 3 x 3 | 00:32 | 2.7 | 22.6 % |
r.neighbors | 37 x 37 | 09:16 | 10.1 | 83.8 % |
r.mfilter | 61 x 61 | 48:33 | 11.2 | 93.4 % |