Building a Specialized Tool for Spectral Index Calculation

Created: July 1, 2025

Last update: July 1, 2025

When processing large batches of satellite imagery for spectral index calculation, I kept running into the same bottleneck: gdal_calc.py was taking longer than I'd like for routine operations like NDVI and EVI calculation across hundreds of Sentinel-2 tiles.

Don't get me wrong—gdal_calc.py is an excellent general-purpose tool that handles virtually any raster calculation you can throw at it. Its flexibility to process arbitrary mathematical expressions makes it invaluable for ad-hoc analysis and complex custom formulas. But when you're calculating the same spectral indices repeatedly on large datasets, that flexibility comes with a performance cost.

The Performance Problem

Working with Sentinel-2 data from the Parc Agrari del Baix Llobregat project, I was processing thousands of 10,980×10,980 pixel images to generate time series of vegetation indices. A typical NDVI calculation on a single tile was taking around 12 seconds with gdal_calc.py.

# Typical gdal_calc.py command
time gdal_calc.py -A B08.jp2 -B B04.jp2 --calc="(A-B)/(A+B)" --outfile=ndvi.tif
# real    0m12.47s

When you're processing hundreds of tiles across multiple years, those seconds add up quickly. The math was straightforward: (NIR-RED)/(NIR+RED), but the general-purpose nature of gdal_calc.py meant parsing expressions at runtime and handling arbitrary formulas rather than leveraging optimizations possible for known calculations.

It's worth noting that gdal_calc.py uses NumPy for the actual mathematical operations, which is already quite efficient for array computations. The performance bottleneck wasn't the math itself, but rather the overhead from expression parsing, suboptimal I/O patterns, and missing opportunities for specialization when repeatedly calculating the same indices.

A Specialized Approach

I realized that most of my work involved calculating well-known spectral indices: NDVI, EVI, SAVI, NDWI, and a handful of others. These aren't arbitrary mathematical expressions—they're standardized formulas with known inputs and outputs.

This presented an opportunity to build something more focused. Instead of a tool that could calculate any formula, what if I built one that calculated spectral indices really well?

The key insights were:

Known formulas: No need to parse mathematical expressions at runtime
Parallel processing: Spectral indices are perfect for parallelization
Memory optimization: Reuse buffers and minimize allocations
Batch processing: Process multiple operations together with shared caching

Implementation

I chose Rust for the implementation, mainly for its performance characteristics and excellent parallelization libraries. The core idea was straightforward: instead of parsing (A-B)/(A+B) every time, implement each spectral index with its own optimized calculation function that processes pixel data in parallel chunks.

Each index gets its own implementation that knows exactly what computation to perform, how to handle edge cases like division by zero, and what the expected input and output formats should be. The tool supports the most common spectral indices with dedicated implementations for each:

NDI/NDVI: (NIR-RED)/(NIR+RED) - vegetation health
EVI: Enhanced vegetation index with atmospheric correction
SAVI/MSAVI2/OSAVI: Soil-adjusted vegetation indices
NDWI: Water detection
NDSI: Snow and ice detection
BSI: Bare soil index

Performance Results

The results were encouraging. Testing with the same Sentinel-2 tile:

Implementation	NDVI (float32)	NDVI (int16)
gdal_calc.py	12.47s	6.97s
raster-calc	2.86s	2.69s
Speedup	4.4x	2.6x

The performance gain comes from several optimizations:

No expression parsing: Direct formula implementation
Parallel chunk processing: Using all CPU cores efficiently
Memory reuse: Optimized buffer management
Dataset caching: Avoiding redundant file operations in batch mode

Batch Processing

One feature that proved particularly useful was batch processing. Instead of running individual commands, you can define multiple operations in a JSON configuration:

{
  "global": {
    "compress": "DEFLATE",
    "threads": 12
  },
  "operations": [
    {
      "type": "ndi",
      "params": {"a": "B08.jp2", "b": "B04.jp2"},
      "output": "ndvi.tif"
    },
    {
      "type": "evi", 
      "params": {"a": "B08.jp2", "b": "B04.jp2", "c": "B02.jp2"},
      "output": "evi.tif"
    }
  ]
}

The batch mode includes dataset caching, so if multiple indices use the same input bands, the files are only read once. This provided additional speedups when calculating multiple indices from the same imagery.

Trade-offs

Building a specialized tool meant making some trade-offs:

What you gain:

Significant performance improvements for supported indices
Memory-efficient processing of large rasters
Built-in support for fixed-point output (smaller files)
Batch processing with intelligent caching
Proper handling of L2A vs TOA scaling

What you lose:

Limited to predefined spectral indices
Can't calculate custom mathematical expressions
Requires learning new command syntax
Less flexibility than general-purpose tools

When to Use What

I still use gdal_calc.py regularly—it's perfect for custom calculations, prototyping new indices, or one-off mathematical operations. But when I'm processing production datasets with standard spectral indices, the specialized tool has become my go-to solution.

The performance difference becomes significant when processing large batches of imagery or calculating multiple indices from the same source data.

Implementation Details

A few technical details that proved important:

Input scaling: Sentinel-2 L2A data comes pre-scaled by 10,000, which affects indices with constants (like EVI, SAVI). The tool handles this automatically:

# For L2A data, scaling is applied automatically for indices that need it
raster-calc evi -a B08_L2A.jp2 -b B04_L2A.jp2 -c B02_L2A.jp2 --input-scale-factor 10000

Output formats: Both floating-point and fixed-point integer outputs are supported, with the latter providing smaller file sizes and faster processing.

Parallel tuning: The tool automatically detects optimal thread counts, but can be configured for different system configurations.

Conclusion

Building raster-calc taught me about the value of specialized tools in a workflow. General-purpose solutions like gdal_calc.py are essential for flexibility and experimentation, but when you have well-defined, repeated operations, there's significant value in optimization.

The 4x performance improvement wasn't just about using a faster language—it came from understanding the specific problem domain and optimizing for it. Sometimes the best tool for the job is the one you build yourself, focused on solving your particular bottleneck.

For routine spectral index calculation on large datasets, the specialized approach has proven worthwhile. For everything else, I'm thankful that tools like gdal_calc.py exist to handle the infinite variety of spatial analysis needs.

The tool is available as open source at https://github.com/ominiverdi/raster-calc, and the performance gains have made a noticeable difference in processing workflows that involve large-scale vegetation monitoring and agricultural assessment.

Acknowledgments

Special thanks to Inicola for his guidance on the Rust implementation and optimization techniques.