Getting started#
Sertit utils is a set of functions for geo analysis. Blablabla
Open a raster image#
from sertit import AnyPath, rasters
from sertit.unistra import unistra_s3
with unistra_s3():
pth = AnyPath("s3://sertit-sertit-utils-ci/tutorials/getting_started/image.tif")
r = rasters.read(pth)
For the moment, let ignore the unistra_s3() call since you don’t need it to read local images, we’ll talk about it later.
What happened in this code ?
We initialize a geotiff file with AnyPath
and then read it thanks to rasters.read
. What is the return type of this function ?
r[0:2, :, :]
<xarray.DataArray 'image' (band: 2, y: 1497, x: 2224)> Size: 27MB dask.array<getitem, shape=(2, 1497, 2224), dtype=float32, chunksize=(1, 122, 2224), chunktype=numpy.ndarray> Coordinates: * band (band) int64 16B 1 2 * x (x) float64 18kB 3.487e+05 3.487e+05 ... 3.931e+05 3.931e+05 * y (y) float64 12kB 3.965e+06 3.965e+06 ... 3.935e+06 3.935e+06 spatial_ref int64 8B 0 Attributes: AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0
r[0:3, ::10, ::10].plot.imshow(robust=True)
<matplotlib.image.AxesImage at 0x7f04fc2f7850>

r.data
|
The return type in xarray.DataArray
.
xarray
is a python library to work with labelled multi-dimensional arrays.
Actually, sertit
use rioxarray to open your raster. rioxarray
extends xarray
with the rio
accessor. Nice. What does it mean ? Simply that you can read and process geographic images thanks to rioxarray
while xarray
only understand “classical” images.
Then what is a geographic DataArray ? It’s a containers with data, coordinates and attributes. Let’s print them all.
# Print x and y coordinates
print("X coordinates:\n", r.x)
print("\n\nY ccordinates: ", r.y)
X coordinates:
<xarray.DataArray 'x' (x: 2224)> Size: 18kB
array([348659.997752, 348679.993255, 348699.988759, ..., 393070.011241,
393090.006745, 393110.002248])
Coordinates:
* x (x) float64 18kB 3.487e+05 3.487e+05 ... 3.931e+05 3.931e+05
spatial_ref int64 8B 0
Y ccordinates: <xarray.DataArray 'y' (y: 1497)> Size: 12kB
array([3964760.00334, 3964740.01002, 3964720.0167 , ..., 3934889.9833 ,
3934869.98998, 3934849.99666])
Coordinates:
* y (y) float64 12kB 3.965e+06 3.965e+06 ... 3.935e+06 3.935e+06
spatial_ref int64 8B 0
Coordinates are themselfs DataArray ! What about the data and spatial_ref ? Spatial ref can be accessed with r.spatial_ref
. For data, it’s a little bit tricky because rioxarray
does not print it by default. But you can access it with the data
attribute. data
is a dask array containing the real value of our pixels in the image. Our data array only contains one band.
# print the value of the pixels in the image
r.data
|
# The spatial reference is stored in the coordinates of our raster and not in attributes !
r.spatial_ref
<xarray.DataArray 'spatial_ref' ()> Size: 8B array(0) Coordinates: spatial_ref int64 8B 0 Attributes: spatial_ref: PROJCS["WGS 84 / UTM zone 11N",GEOGCS["WGS 84",DATUM["WGS_... crs_wkt: PROJCS["WGS 84 / UTM zone 11N",GEOGCS["WGS 84",DATUM["WGS_... GeoTransform: 348650.0 19.9955035971223 0.0 3964770.0 0.0 -19.9933199732...
You probably noticed that coordinates rasters contain coordinates in the CRS of the image. In GeoTiff, coordinated are stored in a matrice whose origin is (0,0). Thanks to spatial_ref and to the vector GeoTransform, rioxarray is able to convert the coordinates centered in (0,0) to coordinates in CRS image.
# So finally what are the attributes of our array ?
r.attrs
# Ok not really interisting...
{'AREA_OR_POINT': 'Area', 'scale_factor': 1.0, 'add_offset': 0.0}
Crop the image#
Sertit library offers a ready to use function to crop a raster by an AOI. First we use vectors from sertit which can read vectors from any type of data (shapefile, kml…) and return a GeoDtaFrame.
from sertit import vectors
with unistra_s3():
aoi_path = AnyPath("s3://sertit-sertit-utils-ci/tutorials/getting_started/aoi.shp")
aoi = vectors.read(aoi_path)
aoi
id | geometry | |
---|---|---|
0 | NaN | POLYGON ((366868.619 3952756.559, 376331.609 3... |
aoi.explore()
print(type(aoi))
<class 'geopandas.geodataframe.GeoDataFrame'>
We can use any functions from geopandas, for example esimate_crs
print("Estimate CRS is: ", aoi.estimate_utm_crs())
Estimate CRS is: EPSG:32611
crop_image = rasters.crop(r, aoi)
crop_image[0:3, ::10, ::10].plot.imshow(robust=True)
<matplotlib.image.AxesImage at 0x7f04d039d750>

Then we can write the output thanks to rasters.write
. This function is powerful enough to handle compression automatically and write big images (see notebook about processing big images).
rasters.write(crop_image, "crop_image.tif")
Reprojection#
Custom process#
Sometimes sertit utils does not contain the function you need. You can use odc-geo (for reporjection for example) or rioxarray. But it could be not enough again. It’s where apply_func comes in handy.
Let’s say we want to classify our raster with the following condition:
If pixel < 6.7 : pixel=1
If 6.7 <= pixel < 11.2 : pixel=2
If 11.2 <= pixel < 22.4 : pixel=3
If 22.4 <= pixel < 33.6 : pixel=4
If pixel >= 33.6 : pixel=5
We can use xarray to proceed. Never convert your array to numpy array !
import xarray
from sertit import AnyPath, rasters
from sertit.unistra import unistra_s3
with unistra_s3():
pth = AnyPath(
"s3://sertit-sertit-utils-ci/tutorials/getting_started/MeanSoilLoss.tif"
)
r = rasters.read(pth)
conditions = [
(r.data < 6.7),
(r.data >= 6.7) & (r.data < 11.2),
(r.data >= 11.2) & (r.data < 22.4),
(r.data >= 22.4) & (r.data < 33.6),
(r.data >= 33.6),
]
for i, condition in enumerate(conditions):
r.data = xarray.where(condition, i + 1, r.data)
# dask array processes are lazy. The cell above does not perform anything.
# We can explicitely call compute() to load the result in memory
# but some methods implicitely call it (plot, write)
# r = r.compute()
r
<xarray.DataArray 'MeanSoilLoss' (band: 1, y: 1820, x: 1819)> Size: 26MB dask.array<where, shape=(1, 1820, 1819), dtype=float64, chunksize=(1, 1820, 1819), chunktype=numpy.ndarray> Coordinates: * band (band) int64 8B 1 * x (x) float64 15kB 3.624e+05 3.625e+05 ... 3.806e+05 3.806e+05 * y (y) float64 15kB 5.381e+06 5.381e+06 ... 5.363e+06 5.363e+06 spatial_ref int64 8B 0 Attributes: AREA_OR_POINT: Area scale_factor: 1.0 add_offset: 0.0
# The plot method actually calls compute()
r.plot()
<matplotlib.collections.QuadMesh at 0x7f04cdb3d050>
