When processing large batches of satellite imagery for spectral index calculation, I kept running into
the same bottleneck: gdal_calc.py
was taking longer than I'd like for routine operations
like NDVI and EVI calculation across hundreds of Sentinel-2 tiles.
Don't get me wrong—gdal_calc.py
is an excellent general-purpose tool that handles virtually
any raster calculation you can throw at it. Its flexibility to process arbitrary mathematical
expressions makes it invaluable for ad-hoc analysis and complex custom formulas. But when you're
calculating the same spectral indices repeatedly on large datasets, that flexibility comes with a
performance cost.
Working with Sentinel-2 data from the Parc Agrari del Baix Llobregat project, I was processing thousands
of 10,980×10,980 pixel images to generate time series of vegetation indices. A typical NDVI calculation
on a single tile was taking around 12 seconds with gdal_calc.py
.
# Typical gdal_calc.py command
time gdal_calc.py -A B08.jp2 -B B04.jp2 --calc="(A-B)/(A+B)" --outfile=ndvi.tif
# real 0m12.47s
When you're processing hundreds of tiles across multiple years, those seconds add up quickly. The math
was straightforward: (NIR-RED)/(NIR+RED)
, but the general-purpose nature of gdal_calc.py
meant parsing expressions at runtime and handling arbitrary formulas rather than leveraging
optimizations possible for known calculations.
It's worth noting that gdal_calc.py uses NumPy for the actual mathematical operations, which is already quite efficient for array computations. The performance bottleneck wasn't the math itself, but rather the overhead from expression parsing, suboptimal I/O patterns, and missing opportunities for specialization when repeatedly calculating the same indices.
I realized that most of my work involved calculating well-known spectral indices: NDVI, EVI, SAVI, NDWI, and a handful of others. These aren't arbitrary mathematical expressions—they're standardized formulas with known inputs and outputs.
This presented an opportunity to build something more focused. Instead of a tool that could calculate any formula, what if I built one that calculated spectral indices really well?
The key insights were:
I chose Rust for the implementation, mainly for its performance characteristics and excellent
parallelization libraries. The core idea was straightforward: instead of parsing
(A-B)/(A+B)
every time, implement each spectral index with its own optimized calculation
function that processes pixel data in parallel chunks.
Each index gets its own implementation that knows exactly what computation to perform, how to handle edge cases like division by zero, and what the expected input and output formats should be. The tool supports the most common spectral indices with dedicated implementations for each:
(NIR-RED)/(NIR+RED)
- vegetation healthThe results were encouraging. Testing with the same Sentinel-2 tile:
Implementation | NDVI (float32) | NDVI (int16) |
---|---|---|
gdal_calc.py | 12.47s | 6.97s |
raster-calc | 2.86s | 2.69s |
Speedup | 4.4x | 2.6x |
The performance gain comes from several optimizations:
One feature that proved particularly useful was batch processing. Instead of running individual commands, you can define multiple operations in a JSON configuration:
{
"global": {
"compress": "DEFLATE",
"threads": 12
},
"operations": [
{
"type": "ndi",
"params": {"a": "B08.jp2", "b": "B04.jp2"},
"output": "ndvi.tif"
},
{
"type": "evi",
"params": {"a": "B08.jp2", "b": "B04.jp2", "c": "B02.jp2"},
"output": "evi.tif"
}
]
}
The batch mode includes dataset caching, so if multiple indices use the same input bands, the files are only read once. This provided additional speedups when calculating multiple indices from the same imagery.
Building a specialized tool meant making some trade-offs:
What you gain:
What you lose:
I still use gdal_calc.py
regularly—it's perfect for custom calculations, prototyping new
indices, or one-off mathematical operations. But when I'm processing production datasets with standard
spectral indices, the specialized tool has become my go-to solution.
The performance difference becomes significant when processing large batches of imagery or calculating multiple indices from the same source data.
A few technical details that proved important:
Input scaling: Sentinel-2 L2A data comes pre-scaled by 10,000, which affects indices with constants (like EVI, SAVI). The tool handles this automatically:
# For L2A data, scaling is applied automatically for indices that need it
raster-calc evi -a B08_L2A.jp2 -b B04_L2A.jp2 -c B02_L2A.jp2 --input-scale-factor 10000
Output formats: Both floating-point and fixed-point integer outputs are supported, with the latter providing smaller file sizes and faster processing.
Parallel tuning: The tool automatically detects optimal thread counts, but can be configured for different system configurations.
Building raster-calc taught me about the value of specialized tools in a workflow. General-purpose
solutions like gdal_calc.py
are essential for flexibility and experimentation, but when you
have well-defined, repeated operations, there's significant value in optimization.
The 4x performance improvement wasn't just about using a faster language—it came from understanding the specific problem domain and optimizing for it. Sometimes the best tool for the job is the one you build yourself, focused on solving your particular bottleneck.
For routine spectral index calculation on large datasets, the specialized approach has proven worthwhile.
For everything else, I'm thankful that tools like gdal_calc.py
exist to handle the infinite
variety of spatial analysis needs.
The tool is available as open source at https://github.com/ominiverdi/raster-calc, and the performance gains have made a noticeable difference in processing workflows that involve large-scale vegetation monitoring and agricultural assessment.
Special thanks to Inicola for his guidance on the Rust implementation and optimization techniques.