Hi,

Osmconvert is non-Java http://m.m.i24.cc/osmconvert.c
OSM pbf does not support indexes 
https://dev.openstreetmap.narkive.com/Z8cvwP7Y/osm-indexing-of-pbf-files

-Jukka Rahkonen-

Lähettäjä: gdal-dev <gdal-dev-boun...@lists.osgeo.org> Puolesta Schmetzer, 
Tobias
Lähetetty: perjantai 13. toukokuuta 2022 15.30
Vastaanottaja: gdal-dev@lists.osgeo.org
Aihe: Re: [gdal-dev] OSM extract: Too many different keys in file


Hello,



thanks for that helpful analysis and hints! So I get the planet.pdf file is 
read in entirely before any spatial or key-wise restrictions are applied to 
narrow down the data that needs to be treated.



Of course using a 1°x1° area in a planet file doesn't make much sense but this 
tiny area was just a test run on the huge file. In the end I need to scan a way 
larger spatial area.



As of now I am restricted to non-Java based tools on the Windows platform (Java 
has been abandoned years ago by our IT department due to vulnerability) so I 
cannot use the versatile Osmosis tool.

I was already considering to loop over all continents which are supplied as 
well by some OSM partners but clipping the planet file as suggested will 
probably be more efficient as the data source need to be read in only once and 
this seems to be the main time consuming factor - given the required area 
doesn't exceed 32768 keys either.



I could imagine the following improvements for gdal's osm extraction algorithm 
that could be discussed based on this experience

1.       Improve the error message: "Too many different keys in file" -> "Total 
number of keys in data source file exceeds the defined maximum of [DEFINITION]. 
\nNote: All keys are read in before any other boundary conditions are 
considered. You may consider clipping or splitting the data source file."

2.       Make the current limit of 32768 a definition (#DEFINE) and enlarge it

3.       Have the algorithm read in only features of the given area (Makes only 
sense if .pbf files contain spatial indexes)



For number 1 and 2 I can create a PR. For number 3 I could create a feature 
request.



Any opinions?





Tobias Schmetzer



Von: Rahkonen Jukka [mailto:jukka.rahko...@maanmittauslaitos.fi]
Gesendet: Freitag, 13. Mai 2022 10:58
An: Schmetzer, Tobias 
<tobias.schmet...@zae-bayern.de<mailto:tobias.schmet...@zae-bayern.de>>; 
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
Betreff: Re: OSM extract: Too many different keys in file



Hi,



The error comes from 
https://github.com/OSGeo/gdal/blob/master/ogr/ogrsf_frmts/osm/ogrosmdatasource.cpp#L2067<https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpulsar.zae-bayern.de%2FOSGeo%2Fgdal%2Fblob%2Fmaster%2Fogr%2Fogrsf_frmts%2Fosm%2F%2CDanaInfo%3Dgithub.com%2CSSL%2Bogrosmdatasource.cpp%23L2067&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C54aa58ab4c3c4b38f76608da34dc8858%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C1%7C637880419042957469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=s4gEPLo%2B1Qwygk3W8%2Fvul866zzDoMOg2f9oJkaGGBrg%3D&reserved=0>
 and it happens before your SQL, when GDAL is reading the data in from the huge 
planet.pbf file.



if( nNextKeyIndex >= 32768 ) /* somewhat arbitrary */



The error means that there are more than 32768 keys in the planet file. Maybe 
that hard coded limit could be enlarged but if you need for example 1 by 1 
degree area I believe that there are much better tools than GDAL for splitting 
a subset. I would recommend to try for example osmosis 
https://wiki.openstreetmap.org/wiki/Osmosis/Examples#Breaking_OSM_file_into_several_bounding_boxes<https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpulsar.zae-bayern.de%2Fwiki%2FOsmosis%2F%2CDanaInfo%3Dwiki.openstreetmap.org%2CSSL%2BExamples%23Breaking_OSM_file_into_several_bounding_boxes&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C54aa58ab4c3c4b38f76608da34dc8858%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C1%7C637880419042957469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=GS3ZDoDPOUPAnKgbuPWFIyVa5Tp4M4%2B7OEDbucF2jvY%3D&reserved=0>
 or osmconvert 
https://wiki.openstreetmap.org/wiki/Osmconvert#Clipping_based_on_a_Polygon<https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpulsar.zae-bayern.de%2Fwiki%2F%2CDanaInfo%3Dwiki.openstreetmap.org%2CSSL%2BOsmconvert%23Clipping_based_on_a_Polygon&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C54aa58ab4c3c4b38f76608da34dc8858%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C1%7C637880419042957469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=rg6V4cTldUm0mRXulVaWvtn%2FRhkJ9%2BX0X12zhfZuzE0%3D&reserved=0>.
 The cropped .pbf file probably has less than 32768 distinct keys and GDAL can 
handle it. You would also save very much time.



-Jukka Rahkonen-





Lähettäjä: gdal-dev 
<gdal-dev-boun...@lists.osgeo.org<mailto:gdal-dev-boun...@lists.osgeo.org>> 
Puolesta Schmetzer, Tobias
Lähetetty: perjantai 13. toukokuuta 2022 10.47
Vastaanottaja: gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
Aihe: [gdal-dev] OSM extract: Too many different keys in file



Dear GDAL dev team,



I am not sure if I am following a wrong approach, if there is an issue with the 
osm driver, the distributed OSM file or if the error message is just ambiguous 
and could be improved.



I used ogr2ogr to select 12 keys to be extracted as polygons along with 
something around 40 conditions. The algorithm had worked well on a tiny OSM 
file with the city of Munich so tested it I on a small sample area of 1°x1° on 
the global planet OSM file:



ogr2ogr -spat 10 45 11 46 -f gpkg c:\daten\osm_planet\1x1.gpkg 
c:\daten\osm_planet\planet-220502.osm.pbf multipolygons -select 
"name,aeroway,amenity,building,historic,landuse,leisure,military,office,tourism,shop,landuse
 " -where @ogr2ogr_condition.txt



The first 70% were reached after one hour but then the process slowed down and 
after 19 hours I got an error message:

0...10...20...30...40...50...60...70...80...90.ERROR 1: Too many different keys 
in file



If this is because one or more features exceed the maximum amount of doable 
keys, is the officially by OSM distributed file wrong or too large to be 
processed by ogr2ogr or what's the matter? I tried to read the relevant source 
code file where the error message occurs but it's too cryptic to me.





Content of ogr2ogr_condition.txt for the sake of completeness:

historic is null and

(

                office is not null or

                building='hotel' or

                building='hospital' or

                building='apartments' or

                building='barracks' or

                building='dormitory' or

                building='warehouse' or

                building='monastery' or

                building='public' or

                building='hangar' or



                tourism='guest_house' or

                tourism='apartment' or

                tourism='hostel' or

                tourism='museum' or

                tourism='gallery' or

                tourism='motel' or

                tourism='hotel' or



                amenity='university' or

                amenity='research_institute' or

                amenity='social_facility' or

                amenity='school' or

                amenity='kindergarten' or

                amenity='kindergarden' or

                amenity='exhibition centre' or

                amenity='student_accommodation' or

                amenity='library' or

                amenity='clinic' or

                amenity='hospital' or

                amenity='public_building' or

                amenity='concert_hall' or

                amenity='prison' or

                amenity='theatre' or

                amenity='courthouse' or



                aeroway='terminal' or



                shop='mall' or

                military='base' or

                military='barracks' or

                military='office' or



                landuse='education' or

                landuse='commercial' or

                landuse='industrial'

)



I'd be grateful for any hints and glad to contribute to any error message 
improval if indicated.



Kind regards, Tobias Schmetzer



ZAE Bayern

Tobias Schmetzer, Dipl. Ing.

Wissenschaftlicher Mitarbeiter Systementwicklung | Scientific Staff Member 
Systems Engineering

Bereich Energiespeicherung| Division Energy Storage



Walther-Meißner-Str. 6

85748 Garching



Tel.: +49 89 329442-65

Fax: +49 89 329442-12

tobias.schmet...@zae-bayern.de<mailto:tobias.schmet...@zae-bayern.de>

http://www.zae-bayern.de<https://eur06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpulsar.zae-bayern.de%2F%2CDanaInfo%3Deur06.safelinks.protection.outlook.com%2CSSL%2B%3Furl%3Dhttp%253A%252F%252Fwww.zae-bayern.de%252F%26data%3D05%257C01%257Cjukka.rahkonen%2540maanmittauslaitos.fi%257Cb03bc6c9f5b542ed51ff08da34b64dd1%257Cc4f8a63255804a1c92371d5a571b71fa%257C0%257C1%257C637880254860879658%257CUnknown%257CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%253D%257C1000%257C%257C%257C%26sdata%3DxkCacr5vK0eKSDXGyhCzWrhN5ckc%252BYNEVWPcChPcs2Y%253D%26reserved%3D0&data=05%7C01%7Cjukka.rahkonen%40maanmittauslaitos.fi%7C54aa58ab4c3c4b38f76608da34dc8858%7Cc4f8a63255804a1c92371d5a571b71fa%7C0%7C1%7C637880419043113686%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000%7C%7C%7C&sdata=LHcEOcBTvpN1ia45KhsGgEa99FYPk%2By2GJBYvPVD7sU%3D&reserved=0>





ZAE Bayern - Bayerisches Zentrum für Angewandte Energieforschung e. V.

Vorstand/Board:

Prof. Dr. Hartmut Spliethoff (Vorsitzender/Chairman),

Prof. Dr. Vladimir Dyakonov

Sitz/Registered Office: Würzburg

Registergericht/Register Court: Amtsgericht Würzburg

Registernummer/Register Number: VR 1386



Sämtliche Willenserklärungen, z. B. Angebote, Aufträge, Anträge und Verträge, 
sind für das ZAE Bayern nur in schriftlicher und ordnungsgemäß unterschriebener 
Form rechtsverbindlich. Diese E-Mail ist ausschließlich zur Nutzung durch 
den/die vorgenannten Empfänger bestimmt. Jegliche unbefugte Offenbarung, 
Nutzung oder Verbreitung, sei es insgesamt oder teilweise, ist untersagt. 
Sollten Sie diese E-Mail irrtümlich erhalten haben, benachrichtigen Sie bitte 
unverzüglich den Absender und löschen Sie diese E-Mail.



Any declarations of intent, such as quotations, orders, applications and 
contracts, are legally binding for ZAE Bayern only if expressed in a written 
and duly signed form. This e-mail is intended solely for use by the 
recipient(s) named above. Any unauthorised disclosure, use or dissemination, 
whether in whole or in part, is prohibited. If you have received this e-mail in 
error, please notify the sender immediately and delete this e-mail.


_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to