Jason,

are you constrained to retaining your data in an ArcGIS compatible format? If so and if you do not have ArcSDE, then what follows may not be much help.

Otherwise, I think it likely that you will find using a DBMS as your data repository advantageous for many reasons. Apart from the built in indexing and index based operations, it is *very* much easier to share data between users, retaining a single copy and all user having effective access. Until the File Geodatabase format is published (later this year?) and someone has the effort to build an OGR interface, the DBMS route is probably the best route to compatibility. We happen to be a corporate Oracle site, but PostGres is pretty similar. PostGres is supported by ESRI with ArcSDE, so it is possible to retain ArcGIS compatibility this way.

Many years ago, I had a Simula class for performing many of these basic spatial operations, however now my data is all in Oracle: I am able to use the Oracle functions and no longer have to worry about building and rebuilding indexes, etc. - other than USER_SDO_GEOM_METADATA which, unfortunately, OGR only writes to at table creation and does not update. Frankly, life (and maintenance) is much easier now and, certainly with Oracle, I think there have been performance gains.

Just my ha'pence-worth.

Peter

Mateusz Loskot wrote:
Jason Roberts wrote:
Mateusz,

I'm not an expert in this area, but I think that big performance gains can be obtained by using a spatial index.

Yes, likely true.

For example, consider a situation where you want to clip out a study region from the full resolution GSHHS shoreline database, a polygon layer. The shoreline polygons have very large, complicated geometries. It would be expensive to loop over every polygon, loading its full geometry and calling GEOS. Instead, you would use the spatial index to isolate the polygons that are likely to overlap with
 the study region, then loop over just those ones.

GEOS as JTS provides support of various spatial indexes.
It is possible to index data and optimise it in this manner as you
mention. In fact, GEOS uses index internally in various operations.
The problem is that such index is not persistent, not serialised
anywhere, so happens in memory only. In fact, there are much more
problems than this one.

BTW, PostGIS is an index serialisation.

OGR does not provide any spatial indexing layer common to various
vector datasets. For many simple formats it performs the brute-force
selection.

Alternative is to try to divide the tasks:
1. Query features from data source using spatial index capability of
data source.
2. Having only subject features selected, apply geometric processing.

I did it that way, actually.

If OGR takes advantage of spatial indexes internally (e.g. if the data source drivers can tell the core about these indexes, and the core can use them when OGRLayer::SetSpatialFilter is called), then many scenarios could be efficiently implemented by just OGR and GEOS alone.

The problem with OGR and GEOS is cost of translation from OGR geometry
to GEOS geometry. It can be a bottleneck.

However, if such processing functionality would be considered as
built in to OGR, that would make sense, but I still see limitations:

Let's brainstom a bit and assume it implements operation:

OGRLayer OGR::SymDifference(OGRLayer layer1, OGRLayer layer2);

Depending on data source, OGR could exploit its capabilities.,
If both layers sit in the same PostGIS (or other spatial)
database, OGR just delegates the processing to PostGIS
where ST_SymDifference is executed and OGR only grabs the
results and generates OGRLayer.

What if layer1 is a Shapefile and layer2 is Oracle table?
Let's assume Shapefile has .qix file with spatial index
and Oracle has its own index. What does OGR do?

Loads .qix to memory, then grabs layer2 and decides which features to
select form layer1?
Loads the whole Shapefile to memory and uses Oracle index to select
features from layer2 "masked" by layer1?
How to calculate cost which one to transfer in which direction, etc.

Certainly, it depends on number of elements, what algorithm is used,
direction of application of algorithm (who is subject, who is object),
and many more.

It's plenty of combinations and my point is that if performance (it's
not only in terms of speed, but any resource) is critical, it would be
extremely difficult to provide efficient  implementation of such
features in OGR with guaranteed or even determinable degree of
complexity. Without these guarantees, I see little of use of
such solution.

Given that, depending on needs, write a specialised application using
available tools like OGR and GEOS, that is optimised according to
specifics of datasets, type of processing, system requirements, etc.

If not, then your suggestion may be as fast as any other. For example, the idea of loading the features in to PostGIS or SpatiaLite will require loading all of the full geometries, passing them to another database system, etc, etc. It may be that shuffling all of the data around will be hugely expensive and that just using OGR functions with simple approaches like calling GEOS from nested loops will be faster than shuffling the data to a system that implements a more efficient approach once the data gets there.

It's never "just using". Performance is usualy a concern regarding large
datasets. Large datasets are unlikely to be stored in a simple
format, but in proper spatial data storage, like PostGIS.
It nicely combines all the elements necessary to perform geometrical
processing in usable and optimised form, with index.

Is that basically what you are saying?

It is.

Best regards,

--
--------------------------------------------------------------------------------
Peter J Halls, GIS Advisor, University of York
Telephone: 01904 433806     Fax: 01904 433740
Snail mail: Computing Service, University of York, Heslington, York YO10 5DD
This message has the status of a private and personal communication
--------------------------------------------------------------------------------
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to