pygeobase

This is the documentation of pygeobase.

The pygeobase package implements base class definitions for the I/O interface used in pytesmo, pynetCF, and other packages.

Usage

The Abstract base classes in this package are used to provide a consistent interface for reading various kinds of data.

Image datasets

When we talk about a image dataset we are generally talking about a dataset that can be represented through one or several two dimensional arrays. Such a dataset might consist of multiple layers or bands. It can have implicit or explicit geolocation information attached. In the simplest case all the datapoints of a image are referenced to the the same time. But we can also envision a reference timestamp for a image with a layer of exact time stamps for each observation.

The pygeobase.io_base.ImageBase implements the reading and writing of a single file whereas pygeobase.io_base.MultiTemporalImageBase is responsible for building the filename for a reference timestamp and using the ImageBase class for the io. In this way any number of underlying file formats can be supported.

Figure 1 shows pygeobase.io_base.ImageBase which is the abstract base class for implementing a reader for a single image linked to one file on disk. The read, write, flush and close methods have to be implemented. For reading from a dataset it is generally enough to implement the read and close methods and use dummy methods for write and flush. The read method must return a pygeobase.object_base.Image instance.

Figure 1

A implemented class for the ImageBase can then be used in pygeobase.io_base.MultiTemporalImageBase. This class models a dataset consisting of several files on disk. Each file is linked to a reference timestamp from which the filename can be built. pygeobase.io_base.MultiTemporalImageBase can be used directly when configured correctly. Configuration means setting the fname_templ and the datetime_format so that it fits to the dataset. If the single files are stored in subfolders by e.g. month or day then the keyword subpath_templ can be used to specify that. Please see pygeobase.io_base.MultiTemporalImageBase for detailed information about each keyword.

Example for implementing a new image dataset

Let’s imagine we have a regular daily dataset stored on a global regular grid of 0.1 degrees. The folder structure and filenames of the dataset are e.g.

  • /2015/01/dataset_2015-01-01.dat
  • /2015/01/dataset_2015-01-02.dat
  • ...
  • /2015/02/dataset_2015-02-01.dat

For simplicities sake lets assume that the dat files are just pickled python dictionaries.

We could now write a new class based on pygeobase.io_base.ImageBase that reads one of these files:

from pygeobase.object_base import Image
from pygeobase.io_base import ImageBase
import pygeogrids.grids as grids
import pickle


class PickleImg(ImageBase):

    def __init__(self, filename, mode='r'):
        super(PickleImg, self).__init__(filename, mode=mode)
        self.grid = grids.genreg_grid(1, 1)

    def read(self, timestamp=None):

        data = pickle.load(self.filename)
        metadata = {'Type': 'pickle'}

        return Image(self.grid.arrlon,
                     self.grid.arrlat,
                     data,
                     metadata,
                     timestamp)

    def write(self, data):
        raise NotImplementedError()

    def flush(self):
        pass

    def close(self):
        pass

This new class PickleImg will read a pickled dictionary of data from the given filename. For the representation of the longitude and latitude of each datapoint the attributes of a pygeogrids.grids.BasicGrid object can be used but a regular numpy array would also do.

The next code snippet shows how this newly written class can be used in an implementation of pygeobase.io_base.MultiTemporalImageBase:

class PickleDs(MultiTemporalImageBase):

    def __init__(self, root_path):
        sub_path = ['%Y', '%m']
        fname_templ = "dataset_{datetime}.dat"
        datetime_format = "%Y-%m-%d"

        super(PickleDs, self).__init__(root_path, PickleImg,
                                      fname_templ=fname_templ,
                                      datetime_format=datetime_format,
                                      subpath_templ=sub_path)

The sub_path variable is a list of strings that build the path to the file from the python datetime object. The strftime syntax is used. fname_templ specifies the filename template in which {datetime} will be substituted by the string built by datetime_format according to the strftime syntax. There are more options to customize how the filename is build from a given python datetime. Please see the pygeobase.io_base.MultiTemporalImageBase documentation.

Please see the Module Index for more details.

Indices and tables