08.04.2016, 02:21, alex kirjoitti:
That is one aspect, yes. However, so far I've considered only operations on one
or two bands at time. Operations on more than two can of course be dissolved
into a series of operations with max two at a time.
That will often be possible, but not always. For instance how would you 
dissolve the following function? (without resorting to complete ugliness like c 
* a  + !c * b)

int conditional(bool c, int a, int b) { return c ? a : b; }

As an assignment that would be

d = c ? a : b;

If all variables are spatial then that's four bands. To save space we'll do in-place modifications to bands. If we do not want to change any of the bands a, b and c, we'll begin by making d a copy of b:

d = b;

and then use a conditional

d = a if c;

In the case where c is in fact a simple function of a (like a > 50) this is already possible in the current API. I admit that this kind of operation is often needed, and it should be possible.

One problem with functions of multiple bands is the curse of dimensionality related to templates. The datatype of a band is one of 7 values. If we want to support all, then for three bands it is 7x7x7 variations. Already now with a small number of methods for two bands the object file of just that source file is ~3.5 MB. However, I think that in conditional functions like above the c could be limited to byte datatype.


Also, would you expect the user to break down all his/her functions into 
two-argument or fewer functions?

In fact, more or less yes. If one can devise a mathematical function she needs, then she probably can also think of a way to do it with a series of a simple operations. I any mathematical operation involving n variables can be broken down into a series of operations involving max two variables, then I'd be happy with that set of operations. It is in fact an interesting theoretical question what would be the simplest such set of operations.


I agree that supporting overloading is important and it is more than just
syntactic sugar since it can delegate some tasks to the compiler, which may
create temporaries as needed etc.
To be fair, in my project it is just syntactic sugar: "a + b" is just a short way to write 
"apply(std::plus<>{}, a, b) ";

I've written a map algebra overload code in Perl with underlying code in C. It was possible to write a set of simple methods, which the Perl compiler would use to compute any algebraic expression. Some of the simple methods would create a result, which Perl would use as an input to other methods and then discard it. That would happen invisible to the user.


If you want to support this, it seems to be best practice to use expression
templates (https://en.wikipedia.org/wiki/Expression_templates), because that
would allow to create complex compositions of operations without having to
create temporary datasets (e.g. for the sum of band1 and band2 in the example
above).

Yes, I agree that avoiding temporaries is a worthy goal, and what I
understand of expression templates, motivation for them has been that.

However, expression templates seem to be very C++ centric and do their
work in compile time. The C API and high level language bindings are
very important in GDAL. Therefore expression templates might not be the
best solution in this case. However, perhaps the idea can be exploited
somehow.
I would think that you can get the principle to work in object-oriented fashion as well, 
where the template based function "apply" looks like this

auto apply(Function&& function, Args&&... args)

Couldn't the object oriented version for two arguments be something like this:

GDALRange* apply(GDALValue*( *function)(GDALValue*, GDALValue*), GDALArgument * 
arg1 GDALArgument * arg2)

Assuming that all ranges derive from GDALRange and all values from GDALValue, 
and both of these derive from GDALArgument. You can now sort out the function 
composition in runtime instead of compile-time.

I know which approach I would prefer, but I am not convinced it is not possible.

[snip]
You
seem to be applying an iterator that iterates over blocks, rather than over 
pixels.
How can that work when my_average, band1, band2 and band3 have different
block sizes? Wouldn't your life be a lot easier if you iterated over pixels in 
the
same order for all bands?.

I agree that it takes a bit of haggling to be able to work with more
than one band at the same time when one has to maintain a separate cache
for each band and each band has different block sizes. The situation is
even worse for focal methods since one focal region can be on four
blocks of one band in the worst case.
The worst-case can be a lot worse, consider a focal window with a radius of 500 
pixels.

It is however, in my opinion, an
efficient approach when it is not feasible to load all raster data into
memory.
Iterating pixel-by-pixel does not imply having to load all raster data into 
memory. It does imply loading multiple blocks into memory, i.e. a whole row of 
blocks.

The approach is also strongly supported by the basic design of
GDAL and the fundamental algorithm is presented in GDAL documentation (I
gave the link in earlier email).
I didn't see that email and just trawled through the whole thread, can you give 
it again?

http://www.gdal.org/classGDALRasterBand.html#ad80cecc562fd48f5783677de625360ac

it is a simple algorithm but demonstrates the two level approach.


The solution I already have seems to work well for the one and two band
case even with 3x3 focal region and as far as I can see it could be
extended to cases involving more bands.

I'll add the disclaimer about the curse of dimensionality I mention above for which a solution should be found first. It could be a simple one like copy all blocks to same datatype first.

  The macros and functions I've
written make cell (pixel) access possible with only a few lines of code
but it could be probably improved. That would probably be desirable if
more than two bands are present.

If you are interested, I have a solution, albeit in C++11
(https://github.com/ahhz/map_algebra ).

I remember your email and I've taken a look at your code. I have nothing
against using modern tools but I think GDAL is a bit conservative on
them and portability is a big issue. However, I believe the map algebra
implementation I'm writing will be a add-on or plugin for GDAL, and not
something that will be tightly integrated into it.

[snip]
2. Focal operations: operations that calculate an indicator for each pixel based
on the values in surrounding pixels
I think the key challenge here is to allow users to specify the function to be
applied on the surrounding pixels. In my opinion a Map Algebra library should
support all kinds of moving window analysis, and not just a number of built-in
types of analysis. This can be implemented as yet another range that allows
iteration over the pixels in the same order as the raster bands, but instead of
return pixel values, it returns neighbourhood / window summaries.

The hydrological algorithm would benefit a lot from random access to
extended neighborhood cells but so far I've written only code that uses
linear, block by block and pixel by pixel within blocks, access.


I think I need to read up on your fundamental algorithm to understand this. Is 
linear access line-by-line? That would ease my worries because line-by-line is 
independent of block size, right?

It really depends on the block size, which is something GDAL core and drivers handle.


[snip]
3. Zonal operations: operations that calculate indicators for groups (zones) of
pixels.
This seems more trivial than the other two components of map algebra and I
haven't given it much attention, but I think should be supported.

One complication, which I just today worked on, seems to stem from one
requirement that I have. I want the API be free of templates and thus
there is a need for generic classes for, e.g., numbers (GDAL supports
bands where data is various length integers or floats). Thus, for
example a simple zonal method, which returns which zones share a
boundary returns a rather complex data structure.
I understand zonal methods to be operating on pixels within zones and not on 
the relationship between zones. What kind of methods do you have in mind?

I admit that it is perhaps not a "pure" zonal method. For example, think about the question: "What are the neighboring countries of each country?".


4. Global operations
I don't think these need to be explicitly supported, because there is no
genericity to exploit.

I find that I often need methods, which return for example the histogram
of values in a band.

Me too. But I don't feel it urgently needs special library support.


In my mind it goes logically into this API, but that's just a matter of taste.



In summary:

What I am really trying to say is that developing map algebra functionality on
top of GDAL or as part of GDAL seems very worthwhile and feasible. However, I
would aim for functionality that is more generically supporting map algebra,
rather than implementing specific map algebra tools. The foundation of such
functionality in my opinion must be a GDALRange class that facilitates iteration
over pixels in a uniform order. If I were you, I would first push a RFC to 
create a
GDALRange, and only once that is done and dusted, consider the map algebra on
top of it.

The RFC procedure of GDAL is open to everybody and the deciding body is
the GDAL PSC, to which I do not belong to. As I wrote, the API I'm
suggesting is essentially an add-on and thus leaves, even if adopted,
room for other solutions.
I thought the idea of discussing on this mailing-list was to look for consensus 
or at least strengthen ideas.

Sure.

Ari


Kind regards, Alex
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to