Tobias,
please file an issue about that at https://github.com/OSGeo/gdal/issues/new
We can likely increase the limit and make it runtime configurable
Even
Le 13/05/2022 à 14:30, Schmetzer, Tobias a écrit :
Hello,
thanks for that helpful analysis and hints! So I get the planet.pdf
file is read in entirely before any spatial or key-wise restrictions
are applied to narrow down the data that needs to be treated.
Of course using a 1°x1° area in a planet file doesn’t make much sense
but this tiny area was just a test run on the huge file. In the end I
need to scan a way larger spatial area.
As of now I am restricted to non-Java based tools on the Windows
platform (Java has been abandoned years ago by our IT department due
to vulnerability) so I cannot use the versatile Osmosis tool.
I was already considering to loop over all continents which are
supplied as well by some OSM partners but clipping the planet file as
suggested will probably be more efficient as the data source need to
be read in only once and this seems to be the main time consuming
factor – given the required area doesn’t exceed 32768 keys either.
I could imagine the following improvements for gdal‘s osm extraction
algorithm that could be discussed based on this experience
1.Improve the error message: “Too many different keys in file” ->
“Total number of keys in data source file exceeds the defined maximum
of [DEFINITION]. \nNote: All keys are read in before any other
boundary conditions are considered. You may consider clipping or
splitting the data source file.”
2.Make the current limit of 32768 a definition (#DEFINE) and enlarge it
3.Have the algorithm read in only features of the given area (Makes
only sense if .pbf files contain spatial indexes)
For number 1 and 2 I can create a PR. For number 3 I could create a
feature request.
Any opinions?
Tobias Schmetzer
*Von:*Rahkonen Jukka [mailto:jukka.rahko...@maanmittauslaitos.fi]
*Gesendet:* Freitag, 13. Mai 2022 10:58
*An:* Schmetzer, Tobias <tobias.schmet...@zae-bayern.de>;
gdal-dev@lists.osgeo.org
*Betreff**:* Re: OSM extract: Too many different keys in file
Hi,
The error comes from
https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp#L2067<https://pulsar.zae-bayern.de/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/,DanaInfo=github.com,SSL+ogrosmdatasource.cpp#L2067>and
it happens before your SQL, when GDAL is reading the data in from the
huge planet.pbf file.
if( nNextKeyIndex >= 32768 ) /* somewhat arbitrary */
The error means that there are more than 32768 keys in the planet
file. Maybe that hard coded limit could be enlarged but if you need
for example 1 by 1 degree area I believe that there are much better
tools than GDAL for splitting a subset. I would recommend to try for
example osmosis
https://wiki.openstreetmap.org/wiki/Osmosis/Examples#Breaking_OSM_file_into_several_bounding_boxes<https://pulsar.zae-bayern.de/wiki/Osmosis/,DanaInfo=wiki.openstreetmap.org,SSL+Examples#Breaking_OSM_file_into_several_bounding_boxes>or
osmconvert
https://wiki.openstreetmap.org/wiki/Osmconvert#Clipping_based_on_a_Polygon<https://pulsar.zae-bayern.de/wiki/,DanaInfo=wiki.openstreetmap.org,SSL+Osmconvert#Clipping_based_on_a_Polygon>.
The cropped .pbf file probably has less than 32768 distinct keys and
GDAL can handle it. You would also save very much time.
-Jukka Rahkonen-
*Lähettäjä**:*gdal-dev
<gdal-dev-boun...@lists.osgeo.org<mailto:gdal-dev-boun...@lists.osgeo.org>>
*Puolesta***Schmetzer, Tobias
*Lähetetty:* perjantai 13. toukokuuta 2022 10.47
*Vastaanottaja**:*
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
*Aihe**:* [gdal-dev] OSM extract: Too many different keys in file
Dear GDAL dev team,
I am not sure if I am following a wrong approach, if there is an issue
with the osm driver, the distributed OSM file or if the error message
is just ambiguous and could be improved.
I used ogr2ogr to select 12 keys to be extracted as polygons along
with something around 40 conditions. The algorithm had worked well on
a tiny OSM file with the city of Munich so tested it I on a small
sample area of 1°x1° on the global planet OSM file:
ogr2ogr -spat 10 45 11 46 -f gpkg c:\daten\osm_planet\1x1.gpkg
c:\daten\osm_planet\planet-220502.osm.pbf multipolygons -select
"name,aeroway,amenity,building,historic,landuse,leisure,military,office,tourism,shop,landuse
" -where @ogr2ogr_condition.txt
The first 70% were reached after one hour but then the process slowed
down and after 19 hours I got an error message:
0...10...20...30...40...50...60...70...80...90.ERROR 1: Too many
different keys in file
If this is because one or more features exceed the maximum amount of
doable keys, is the officially by OSM distributed file wrong or too
large to be processed by ogr2ogr or what's the matter? I tried to read
the relevant source code file where the error message occurs but it's
too cryptic to me.
Content of ogr2ogr_condition.txt for the sake of completeness:
historic is null and
(
office is not null or
building='hotel' or
building='hospital' or
building='apartments' or
building='barracks' or
building='dormitory' or
building='warehouse' or
building='monastery' or
building='public' or
building='hangar' or
tourism='guest_house' or
tourism='apartment' or
tourism='hostel' or
tourism='museum' or
tourism='gallery' or
tourism='motel' or
tourism='hotel' or
amenity='university' or
amenity='research_institute' or
amenity='social_facility' or
amenity='school' or
amenity='kindergarten' or
amenity='kindergarden' or
amenity='exhibition centre' or
amenity='student_accommodation' or
amenity='library' or
amenity='clinic' or
amenity='hospital' or
amenity='public_building' or
amenity='concert_hall' or
amenity='prison' or
amenity='theatre' or
amenity='courthouse' or
aeroway='terminal' or
shop='mall' or
military='base' or
military='barracks' or
military='office' or
landuse='education' or
landuse='commercial' or
landuse='industrial'
)
I’d be grateful for any hints and glad to contribute to any error
message improval if indicated.
Kind regards, Tobias Schmetzer
ZAE Bayern
Tobias Schmetzer, Dipl. Ing.
Wissenschaftlicher Mitarbeiter Systementwicklung | Scientific Staff
Member Systems Engineering
Bereich Energiespeicherung| Division Energy Storage
Walther-Meißner-Str. 6
85748 Garching
Tel.: +49 89 329442-65
Fax: +49 89 329442-12
tobias.schmet...@zae-bayern.de<mailto:tobias.schmet...@zae-bayern.de>
http://www.zae-bayern.de<https://pulsar.zae-bayern.de/,DanaInfo=eur06.safelinks.protection.outlook.com,SSL+?url=http%3A%2F%2Fwww.zae-bayern.de%2F&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7Cb03bc6c9f5b542ed51ff08da34b64dd1%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C1%7C637880254860879658%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=xkCacr5vK0eKSDXGyhCzWrhN5ckc%2BYNEVWPcChPcs2Y%3D&reserved=0>
ZAE Bayern - Bayerisches Zentrum für Angewandte Energieforschung e. V.
Vorstand/Board:
Prof. Dr. Hartmut Spliethoff (Vorsitzender/Chairman),
Prof. Dr. Vladimir Dyakonov
Sitz/Registered Office: Würzburg
Registergericht/Register Court: Amtsgericht Würzburg
Registernummer/Register Number: VR 1386
Sämtliche Willenserklärungen, z. B. Angebote, Aufträge, Anträge und
Verträge, sind für das ZAE Bayern nur in schriftlicher und
ordnungsgemäß unterschriebener Form rechtsverbindlich. Diese E-Mail
ist ausschließlich zur Nutzung durch den/die vorgenannten Empfänger
bestimmt. Jegliche unbefugte Offenbarung, Nutzung oder Verbreitung,
sei es insgesamt oder teilweise, ist untersagt. Sollten Sie diese
E-Mail irrtümlich erhalten haben, benachrichtigen Sie bitte
unverzüglich den Absender und löschen Sie diese E-Mail.
Any declarations of intent, such as quotations, orders, applications
and contracts, are legally binding for ZAE Bayern only if expressed in
a written and duly signed form. This e-mail is intended solely for use
by the recipient(s) named above. Any unauthorised disclosure, use or
dissemination, whether in whole or in part, is prohibited. If you have
received this e-mail in error, please notify the sender immediately
and delete this e-mail.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev
--
http://www.spatialys.com
My software is free, but my time generally not.
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev