Anna Petrasova, Vaclav Petras
NCSU GeoForAll Lab
at the
Center for Geospatial Analytics
NC State University
Open-source desktop GIS
Processing backend in QGIS
Image source: baharmon.github.io/grass-in-qgisGeospatial data analytics tool in RStudio
Image source: https://veroandreo.github.io/grass_opengeohub2021/presentation.htmlGeovisualization and data analytics tool in a Python notebook
Geoprocessing engine running in HPC environment
Geospatial platform for developing custom models
Cloud geoprocessing backend
Image source: neteler.gitlab.io/actinia-introduction/r.neighbors input=elevation output=elevation_smoothed size=15 nprocs=4
Use 4 cores to get most speed improvements with high parallel efficiency.
Run multiple independent tasks in the background
r.grow.distance input=roads distance=dist_to_roads &
r.grow.distance input=water distance=dist_to_water &
r.grow.distance input=forest distance=dist_to_forest &
Run tasks with GNU Parallel (or alternatives)
echo r.grow.distance input=roads distance=dist_to_roads > jobs.txt
echo r.grow.distance input=water distance=dist_to_water >> jobs.txt
echo r.grow.distance input=forest distance=dist_to_forest >> jobs.txt
parallel --jobs 3 < jobs.txt
Run "hybrid" tasks:
r.neighbors input=forest output=forest_percentage size=37 nprocs=4 &
r.neighbors input=wetland output=wetland_percentage size=37 nprocs=4 &
from grass.pygrass.modules.grid import GridModule
grd = GridModule("v.to.rast", input="roads", output="roads",
use="val", processes=4)
grd.run()
r.mapcalc.tiled "log_dist = if (dist == 0, 0, log(dist))" nprocs=4
from multiprocessing import Pool
def compute(value):
# do something with the value
with Pool(processes=4) as pool:
pool.map(compute, range(0, 10))
Used in r.sun.daily, r.in.usgs, t.rast.what, r.viewshed.exposure, ...
from multiprocessing import Pool
import grass.script as gs
def viewshed(point):
x, y, cat = point
name = f"viewshed_{cat}"
gs.run_command("r.viewshed", input="elevation", output=name,
coordinates=(x, y))
return name
# viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
with Pool(processes=4) as pool:
maps = pool.map(viewshed, viewpoints)
import os
from multiprocessing import Pool
import grass.script as gs
def viewshed(point):
x, y, cat = point
name = f"viewshed_{cat}"
os.environ["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
n=y + 300, s=y - 300,
align="elevation")
gs.run_command("r.viewshed", input="elevation", output=name,
coordinates=(x, y), max_distance=300)
return name
# viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
with Pool(processes=4) as pool:
maps = pool.map_async(viewshed, viewpoints).get()
import os
from multiprocessing import Pool
import grass.script as gs
def viewshed(point):
x, y, cat = point
name = f"viewshed_{cat}"
os.environ["GRASS_REGION"] = gs.region_env(e=x + 300, w=x - 300,
n=y + 300, s=y - 300,
align="elevation")
gs.run_command("r.viewshed", input="elevation", output=name,
coordinates=(x, y), max_distance=300)
return name
# viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
with Pool(processes=4) as pool:
maps = pool.map_async(viewshed, viewpoints).get()
from tqdm import tqdm
from multiprocessing import Pool
import grass.script as gs
def viewshed(point):
x, y, cat = point
name = f"viewshed_{cat}"
gs.run_command("r.viewshed", input="elevation", output=name,
coordinates=(x, y))
return name
# viewpoints = [(x1, y1, category1), (x2, y2, category2), ...]
with Pool(processes=4) as pool:
maps = list(tqdm(pool.imap(viewshed, viewpoints),
total=len(viewpoints)))
#pragma omp parallel if(threaded) private(row, col, i)
{
int t_id = 0;
#if defined(_OPENMP)
t_id = omp_get_thread_num();
#endif
struct input *in = inputs[t_id];
DCELL *val = values[t_id];
...
from grass.benchmark import benchmark_nprocs, nprocs_plot
from grass.pygrass.modules import Module
module = Module("r.neighbors", input="DEM", size=9, output="benchmark",
run_=False, overwrite=True)
result = benchmark_nprocs(module, label="size 3", max_nprocs=12, repeat=3)
nprocs_plot([result], filename="benchmark.svg")
Focal (neighborhood) operations on a raster
\[\mbox{parallel efficiency} = \frac{\mbox{serial processing time}}{N \times \mbox{parallel processing time with N cores}} \]
Use more cores for r.neighbors with large window sizes.
Run module in existing mapset:
grass ~/grassdata/US_albers/visibility --exec r.viewshed input=elevation ...
Run module in a newly created mapset:
grass -c ~/grassdata/US_albers/visibility --exec r.viewshed input=elevation ...
Run Python script in existing mapset:
grass ~/grassdata/US_albers/visibility --exec python viewshed_script.py
Run Python script in a temporary mapset:
grass --tmp-mapset ~/grassdata/US_albers/ --exec python viewshed_script.py
Generate commands:
jobs.sh
grass ~/grassdata/nc_spm_08_grass7/analysis1 --exec python myscript.py 1
grass ~/grassdata/nc_spm_08_grass7/analysis2 --exec python myscript.py 2
grass ~/grassdata/nc_spm_08_grass7/analysis3 --exec python myscript.py 3
grass ~/grassdata/nc_spm_08_grass7/analysis4 --exec python myscript.py 4
grass ~/grassdata/nc_spm_08_grass7/analysis5 --exec python myscript.py 5
grass ~/grassdata/nc_spm_08_grass7/analysis6 --exec python myscript.py 6
...
Run in parallel:
parallel --jobs 8 < jobs.sh
GRASS tool | window size | cores | serial time | with OpenMP |
---|---|---|---|---|
r.slope.aspect | 3 x 3 | 12 | 37 min | 12 min |
r.mfilter | 61 x 61 | 32 | 4.9 days | 4.3 h |
jobs.txt
grass state_1 --exec r.futures.simulation subregions=state_1 ...
grass state_2 --exec r.futures.simulation subregions=state_2 ...
grass state_3 --exec r.futures.simulation subregions=state_3 ...
texasproud.com/how-big-is-texas-its-huge
texasproud.com/how-big-is-texas-its-huge
petrasovaa.github.io/FUTURES-CONUS-talk/foss4gNA2023.html