Can something such as head -n -2 Be part of the pipeline? The 3 text files are being combined into 1 stream.
- Line 1 CRS/SRID from the .prj - Line 2 Types from the .cvst - Line 3 to the end from the .csv Which is great in some ways as the SRID does not go missing and header info is at the head. It is just that I found from line 3 to the end were well formed with the renamed geometry column but I am testing on Windows 10 with 3.6. I do not know if /vsizip/ as output is allowed or works i.e. all three text files as one streamed zip file then extract just the CSV file later in the process. Moving to a one file spatial format as mentioned above might help. It is just that a GeoCSV dataset is a combination of three files. Maybe a many-to-one-back-to-many-scenario might help. There are several multi-file spatial formats that would need to be zipped so that you could stream just one thing. I hope that makes sense. . On Fri, May 5, 2023 at 2:58 AM Rahkonen Jukka < jukka.rahko...@maanmittauslaitos.fi> wrote: > Hi, > > > > Have you considered to output GeoJSONseq > https://gdal.org/drivers/vector/geojsonseq.html instead of CSV, that for > my mind is a workaround as a geodata format. Maybe JSON could handle your > newlines by the same. > > > > -Jukka Rahkonen- > > > > *Lähettäjä:* gdal-dev <gdal-dev-boun...@lists.osgeo.org> *Puolesta *Moises > Calzado via gdal-dev > *Lähetetty:* perjantai 5. toukokuuta 2023 12.32 > *Vastaanottaja:* gdal-dev@lists.osgeo.org > *Aihe:* Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line > breaks inside columns > > > > Hi Even! > > > > I've just created the two issues: > > - https://github.com/OSGeo/gdal/issues/7699 > > - https://github.com/OSGeo/gdal/issues/7700 > > > > Robert, as I explained before, we need the `/vsistdout/` driver as we're > processing the file in streaming mode, so we can't save the result to the > storage. > > Unforteunately, the problem arises when using that driver. > > > > El jue, 4 may 2023 a las 15:39, Even Rouault (<even.roua...@spatialys.com>) > escribió: > > Moises, > > please fild 2 issues in the github issue tracker: > > - one about /vsistdout/ where .csvt and .prj content shouldn't be emitted > > - one about decoupling the layer GEOMETRY_NAME creation option with > CREATE_CSVT=YES > > Even > > Le 04/05/2023 à 13:58, Moises Calzado via gdal-dev a écrit : > > Hi Robert! > > > > I think that we're losing a bit the main issue that we reported, as in > fact the problem is related with line breaks in the output generated while > using /vsistdout and the CREATE_CSVT=YES option. > > > > Even pointed out that avoiding that flag it works as expected, but when > it's used the generated output is not okay as the "Fields with embedded > line breaks must be quoted" rule is not followed. > > IMHO although the generated output is not a CSV itself, we should be able > to delete the first two lines (projection info and types) and deal with the > rest of the content as a CSV. > > > > What we're doing is streaming the output of the /vsistdout driver to > another process that perform some steps with the resultant CSV. In all > cases it works correctly, as the output of the ogr2ogr execution is a valid > CSV when deleting the first two lines, but in the case reported in my first > email it's not. > > The CREATE_CSVT=YES option is mandatory for us as for the moment, it's > requires to use the GEOMETRY_NAME=*geom *one, so we don't have any > workaround. > > > > Just wanted to confirm if that's expected for you (generating an output > that it's not a valid CSV in the end)! > > > > El mié, 3 may 2023 a las 21:05, Robert Hewlett (<rob.h...@gmail.com>) > escribió: > > Hi, > > > > I just tested with : GDAL 3.6.4, released 2023/04/17 > > > > Using the ogr2ogr as follows: > > ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES > > I get three files but no geometry > > > > ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco > GEOMETRY=AS_WKT > > I get three file with the geometry as WKT with the column name WKT > > > > *WKT*,id,poi_name,poi_types > > "POINT (508878.602179846 5433913.2763688)","1",crescent,"4" > "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1" > > > > ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco > GEOMETRY=AS_WKT -lco GEOMETRY_NAME=*geom* > > I get three file with the geometry as WKT but the column called *geom* > > *geom*,id,poi_name,poi_types > "POINT (508878.602179846 5433913.2763688)","1",crescent,"4" > "POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1" > > > > What does > > *ogr2ogr --version * > > report back > > > > > > > > On Wed, May 3, 2023 at 9:38 AM Robert Hewlett <rob.h...@gmail.com> wrote: > > Hi, > > > > Not to start a controversy but it feels like the standard hints at three > files. Did the standard change? > > > > If it is three files which works for me in QGIS and geopandas i.e. data > lands where it is suppose to, then more layer creations options are needed > to handle the SRID/CRS > > > > CREATE_PRJ=YES/NO > > or -t_srs and/or -s_srs triggers the dot-prj file being created. > > > > Just saying 😊. > > > > In the meantime would a short python script help parse the one file into > three? > > > > > > On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev < > gdal-dev@lists.osgeo.org> wrote: > > Hi Robert, > > > > Yes, we're getting one with all the info! > > > > El mié, 3 may 2023 a las 18:14, Robert Hewlett (<rob.h...@gmail.com>) > escribió: > > Just to clarify, instead of getting three files you are getting one with > all the info: types, projection, data? > > https://giswiki.hsr.ch/GeoCSV > > > > On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev < > gdal-dev@lists.osgeo.org> wrote: > > We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if > with that option we could use the GEOMETRY_NAME without using the > CREATE_CSVT=YES option. > > > > Regarding emitting the .prj and .csvt in /vsistdout mode, that's why I'm > saying that there is an issue while generating the resultant CSV. > > The way we see it is that when using the /vsistdout mode, the result is a > CSV file with the .prj information in the first line, and the .csvt in the > second line. We're dealing with the result deleting the first two lines and > using the rest of the content as a CSV, which should be equal to the result > obtained when using ogr2ogr without the CREATE_CSVT=YES option. > > Probably we're losing something, but as we see it, the generated CSV > should be a valid one. Does that make sense? > > > > Thanks so much for your help! > > > > El mié, 3 may 2023 a las 15:10, Robert Hewlett (<rob.h...@gmail.com>) > escribió: > > The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with QGIS > And geopandas. The column name that I use in the CSV is usually geom and > WKT shows up in the CSVT file which seems to be a one line file that hints > at the data types in the CSV file. > > > > I hope that makes sense. > > > > CSVT > > Integer, Integer,WKT > > > > CSV > > line_id,point_id,geom > > 1,1,"POINT(1000 1000)" > > > > PRJ > > EPSG:26910 > > > > > > > > > > On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev < > gdal-dev@lists.osgeo.org> wrote: > > Hi Even, > > > > Thanks so much for taking a look into that one! > > > > I have one doubt regarding the CSVT content, as we're not really using it, > but it's required when using the GEOMETRY_NAME layer creation option, as > can be checked in the CSV driver documentation: > > > > · *GEOMETRY_NAME*=name (Starting with GDAL 2.1): Name of geometry > column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT > > We really need this flag as we are processing files that contain > geometries with different column names, and we always want the same > geometry name in the generated output. Are we losing something when using > that flag to avoid this problem? > > In my humble opinion, generating an invalid CSV when using the -lco > CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why > strings containing line breaks can't be quoted. > > > > Could you please shed some light on this? > > > > Looking forward to your reply, > > Regards. > > > > El mié, 3 may 2023 a las 14:00, Even Rouault (<even.roua...@spatialys.com>) > escribió: > > you didn't post to the list > > Le 03/05/2023 à 13:49, Moises Calzado a écrit : > > Hi Even, > > > > Thanks so much for taking a look into that one! > > > > I have one doubt regarding the CSVT content, as we're not really using it, > but it's required when using the GEOMETRY_NAME layer creation option, as > can be checked in the CSV driver documentation: > > > > · *GEOMETRY_NAME*=name (Starting with GDAL 2.1): Name of geometry > column. Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT > > We really need this flag as we are processing files that contain > geometries with different column names, and we always want the same > geometry name in the generated output. Are we losing something when using > that flag to avoid this problem? > > In my humble opinion, generating an invalid CSV when using the -lco > CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why > strings containing line breaks can't be quoted. > > > > Could you please shed some light on this? > > > > Looking forward to your reply, > > Regards. > > > > El sáb, 29 abr 2023 a las 15:44, Even Rouault (<even.roua...@spatialys.com>) > escribió: > > Moises, > > as far as I can see with your example, the CSV driver behaves "properly" > in reading and writing of field values with line breaks. > > It follows the "Fields with embedded line breaks must be quoted" rule of > https://en.wikipedia.org/wiki/Comma-separated_values > > $ ogr2ogr out.csv /vsizip/dataframe.zip > > $ cat out.csv > id,descriptio > "1",This is my third row > "2","this is > my string > " > "3",This is my third row > > $ ogrinfo out.csv -al > INFO: Open of `out.csv' > using driver `CSV' successful. > > Layer name: out > Geometry: None > Feature Count: 3 > Layer SRS WKT: > (unknown) > id: String (0.0) > descriptio: String (0.0) > OGRFeature(out):1 > id (String) = 1 > descriptio (String) = This is my third row > > OGRFeature(out):2 > id (String) = 2 > descriptio (String) = this is > my string > > > OGRFeature(out):3 > id (String) = 3 > descriptio (String) = This is my third row > > But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is going to > result in an invalid CSV file which will mix both the .csvt and .csv content > > Even > > Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit : > > Hello! > > > > We're trying to convert a Shapefile into a CSV using ogr2ogr and we're > having some issues while dealing with some columns that contain line breaks > inside their values. If we have a line with the following string, ogr2ogr > detects that the line break is a new line and it returns two lines. > > > > "this is my \n value" > > > > That's the command that we're executing: > > > > ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ /vsizip/shapefile.zip > -simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT -lco > GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv > > > > Is this an expected behaviour, or is there any way to avoid this? > > Sharing an example Shapefile so that you can try to reproduce that > behaviour: > https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing > > > > Thanks so much in advance, > > Regards. > > > > -- > > *Moises Calzado* > > Support Engineer > > +34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/> > > <https://spatial-data-science-conference.com/2023/london/> > > > > _______________________________________________ > > gdal-dev mailing list > > gdal-dev@lists.osgeo.org > > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- > > http://www.spatialys.com > > My software is free, but my time generally not. > > > > > -- > > *Moises Calzado* > > Support Engineer > > +34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/> > > <https://spatial-data-science-conference.com/2023/london/> > > -- > > http://www.spatialys.com > > My software is free, but my time generally not. > > > > > -- > > *Moises Calzado* > > Support Engineer > > +34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/> > > <https://spatial-data-science-conference.com/2023/london/> > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > > > > -- > > *Moises Calzado* > > Support Engineer > > +34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/> > > <https://spatial-data-science-conference.com/2023/london/> > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > > > > -- > > *Moises Calzado* > > Support Engineer > > +34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/> > > <https://spatial-data-science-conference.com/2023/london/> > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > > > > -- > > *Moises Calzado* > > Support Engineer > > +34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/> > > <https://spatial-data-science-conference.com/2023/london/> > > > > _______________________________________________ > > gdal-dev mailing list > > gdal-dev@lists.osgeo.org > > https://lists.osgeo.org/mailman/listinfo/gdal-dev > > -- > > http://www.spatialys.com > > My software is free, but my time generally not. > > > > > -- > > *Moises Calzado* > > Support Engineer > > +34671264286 | mcalz...@carto.com | CARTO <https://www.carto.com/> > > <https://spatial-data-science-conference.com/2023/london/> > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org > https://lists.osgeo.org/mailman/listinfo/gdal-dev >
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev