Hi,

Have you considered to output GeoJSONseq 
https://gdal.org/drivers/vector/geojsonseq.html instead of CSV, that for my 
mind is a workaround as a geodata format. Maybe JSON could handle your newlines 
by the same.

-Jukka Rahkonen-

Lähettäjä: gdal-dev <gdal-dev-boun...@lists.osgeo.org> Puolesta Moises Calzado 
via gdal-dev
Lähetetty: perjantai 5. toukokuuta 2023 12.32
Vastaanottaja: gdal-dev@lists.osgeo.org
Aihe: Re: [gdal-dev] Ogr2ogr CSV driver not handling correctly line breaks 
inside columns

Hi Even!

I've just created the two issues:
- https://github.com/OSGeo/gdal/issues/7699
- https://github.com/OSGeo/gdal/issues/7700

Robert, as I explained before, we need the `/vsistdout/` driver as we're 
processing the file in streaming mode, so we can't save the result to the 
storage.
Unforteunately, the problem arises when using that driver.

El jue, 4 may 2023 a las 15:39, Even Rouault 
(<even.roua...@spatialys.com<mailto:even.roua...@spatialys.com>>) escribió:

Moises,

please fild 2 issues in the github issue tracker:

- one about /vsistdout/ where .csvt and .prj content shouldn't be emitted

- one about decoupling the layer GEOMETRY_NAME creation option with 
CREATE_CSVT=YES

Even
Le 04/05/2023 à 13:58, Moises Calzado via gdal-dev a écrit :
Hi Robert!

I think that we're losing a bit the main issue that we reported, as in fact the 
problem is related with line breaks in the output generated while using 
/vsistdout and the CREATE_CSVT=YES option.

Even pointed out that avoiding that flag it works as expected, but when it's 
used the generated output is not okay as the "Fields with embedded line breaks 
must be quoted" rule is not followed.
IMHO although the generated output is not a CSV itself, we should be able to 
delete the first two lines (projection info and types) and deal with the rest 
of the content as a CSV.

What we're doing is streaming the output of the /vsistdout driver to another 
process that perform some steps with the resultant CSV. In all cases it works 
correctly, as the output of the ogr2ogr execution is a valid CSV when deleting 
the first two lines, but in the case reported in my first email it's not.
The CREATE_CSVT=YES option is mandatory for us as for the moment, it's requires 
to use the GEOMETRY_NAME=geom one, so we don't have any workaround.

Just wanted to confirm if that's expected for you (generating an output that 
it's not a valid CSV in the end)!

El mié, 3 may 2023 a las 21:05, Robert Hewlett 
(<rob.h...@gmail.com<mailto:rob.h...@gmail.com>>) escribió:
Hi,

I just tested with : GDAL 3.6.4, released 2023/04/17

Using the ogr2ogr as follows:
ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES
I get three files but no geometry

ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco GEOMETRY=AS_WKT
I get three file with the geometry as WKT with the column name WKT

WKT,id,poi_name,poi_types
"POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
"POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"

ogr2ogr -f CSV poi_out.csv poi.shp -lco CREATE_CSVT=YES -lco GEOMETRY=AS_WKT 
-lco GEOMETRY_NAME=geom
I get three file with the geometry as WKT but the column called  geom
geom,id,poi_name,poi_types
"POINT (508878.602179846 5433913.2763688)","1",crescent,"4"
"POINT (517836.918121302 5447702.01715829)","2",Tynehead Regional Park,"1"

What does
ogr2ogr --version
report back



On Wed, May 3, 2023 at 9:38 AM Robert Hewlett 
<rob.h...@gmail.com<mailto:rob.h...@gmail.com>> wrote:
Hi,

Not to start a controversy but it feels like the standard hints at three files. 
Did the standard change?

If it is three files which works for me in QGIS and geopandas i.e. data lands 
where it is suppose to, then more layer creations options are needed to handle 
the SRID/CRS

CREATE_PRJ=YES/NO
or -t_srs and/or -s_srs triggers the dot-prj file being created.

Just saying 😊.

In the meantime would a short python script help parse the one file into three?


On Wed, May 3, 2023 at 9:16 AM Moises Calzado via gdal-dev 
<gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>> wrote:
Hi Robert,

Yes, we're getting one with all the info!

El mié, 3 may 2023 a las 18:14, Robert Hewlett 
(<rob.h...@gmail.com<mailto:rob.h...@gmail.com>>) escribió:
Just to clarify, instead of getting three files you are getting one with all 
the info: types, projection, data?
https://giswiki.hsr.ch/GeoCSV

On Wed, May 3, 2023 at 8:57 AM Moises Calzado via gdal-dev 
<gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>> wrote:
We're also specifying the GEOM_POSSIBLE_NAMES, so it would be great if with 
that option we could use the GEOMETRY_NAME without using the CREATE_CSVT=YES 
option.

Regarding emitting the .prj and .csvt in /vsistdout mode, that's why I'm saying 
that there is an issue while generating the resultant CSV.
The way we see it is that when using the /vsistdout mode, the result is a CSV 
file with the .prj information in the first line, and the .csvt in the second 
line. We're dealing with the result deleting the first two lines and using the 
rest of the content as a CSV, which should be equal to the result obtained when 
using ogr2ogr without the CREATE_CSVT=YES option.
Probably we're losing something, but as we see it, the generated CSV should be 
a valid one. Does that make sense?

Thanks so much for your help!

El mié, 3 may 2023 a las 15:10, Robert Hewlett 
(<rob.h...@gmail.com<mailto:rob.h...@gmail.com>>) escribió:
The .CSVT and .PRJ help to make a proper geocsv dataset. Helps with QGIS And 
geopandas. The column name that I use in the CSV is usually geom and WKT shows 
up in the CSVT file which seems to be a one line file that hints at the data 
types in the CSV file.

I hope that makes sense.

CSVT
Integer, Integer,WKT

CSV
line_id,point_id,geom
1,1,"POINT(1000 1000)"

PRJ
EPSG:26910




On Wed, May 3, 2023, 05:23 Moises Calzado via gdal-dev 
<gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>> wrote:
Hi Even,

Thanks so much for taking a look into that one!

I have one doubt regarding the CSVT content, as we're not really using it, but 
it's required when using the GEOMETRY_NAME layer creation option, as can be 
checked in the CSV driver documentation:


·       GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry column. 
Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
We really need this flag as we are processing files that contain geometries 
with different column names, and we always want the same geometry name in the 
generated output. Are we losing something when using that flag to avoid this 
problem?
In my humble opinion, generating an invalid CSV when using the -lco 
CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why strings 
containing line breaks can't be quoted.

Could you please shed some light on this?

Looking forward to your reply,
Regards.

El mié, 3 may 2023 a las 14:00, Even Rouault 
(<even.roua...@spatialys.com<mailto:even.roua...@spatialys.com>>) escribió:

you didn't post to the list
Le 03/05/2023 à 13:49, Moises Calzado a écrit :
Hi Even,

Thanks so much for taking a look into that one!

I have one doubt regarding the CSVT content, as we're not really using it, but 
it's required when using the GEOMETRY_NAME layer creation option, as can be 
checked in the CSV driver documentation:


·       GEOMETRY_NAME=name (Starting with GDAL 2.1): Name of geometry column. 
Only used if GEOMETRY=AS_WKT and CREATE_CSVT=YES. Defaults to WKT
We really need this flag as we are processing files that contain geometries 
with different column names, and we always want the same geometry name in the 
generated output. Are we losing something when using that flag to avoid this 
problem?
In my humble opinion, generating an invalid CSV when using the -lco 
CREATE_CSVT=YES looks like a bug for me, as I can't see the reason why strings 
containing line breaks can't be quoted.

Could you please shed some light on this?

Looking forward to your reply,
Regards.

El sáb, 29 abr 2023 a las 15:44, Even Rouault 
(<even.roua...@spatialys.com<mailto:even.roua...@spatialys.com>>) escribió:

Moises,

as far as I can see with your example, the CSV driver behaves "properly" in 
reading and writing of field values with line breaks.

It follows the "Fields with embedded line breaks must be quoted" rule of 
https://en.wikipedia.org/wiki/Comma-separated_values

$ ogr2ogr out.csv /vsizip/dataframe.zip

$ cat out.csv
id,descriptio
"1",This is my third row
"2","this is
my string
"
"3",This is my third row

$ ogrinfo out.csv -al
INFO: Open of `out.csv'
      using driver `CSV' successful.

Layer name: out
Geometry: None
Feature Count: 3
Layer SRS WKT:
(unknown)
id: String (0.0)
descriptio: String (0.0)
OGRFeature(out):1
  id (String) = 1
  descriptio (String) = This is my third row

OGRFeature(out):2
  id (String) = 2
  descriptio (String) = this is
my string


OGRFeature(out):3
  id (String) = 3
  descriptio (String) = This is my third row

But in your example using /vsistdout/ and -lco CREATE_CSVT=YES is going to 
result in an invalid CSV file which will mix both the .csvt and .csv content

Even
Le 24/04/2023 à 13:34, Moises Calzado via gdal-dev a écrit :
Hello!

We're trying to convert a Shapefile into a CSV using ogr2ogr and we're having 
some issues while dealing with some columns that contain line breaks inside 
their values. If we have a line with the following string, ogr2ogr detects that 
the line break is a new line and it returns two lines.

"this is my \n value"

That's the command that we're executing:

ogr2ogr -f CSV -skipfailures -makevalid /vsistdout/ /vsizip/shapefile.zip 
-simplify 0.00001 -dim XY -t_srs EPSG:4326 -lco GEOMETRY=AS_WKT -lco 
GEOMETRY_NAME=geom -lco CREATE_CSVT=YES > result.csv

Is this an expected behaviour, or is there any way to avoid this?
Sharing an example Shapefile so that you can try to reproduce that behaviour: 
https://drive.google.com/file/d/1gFqfTP02KTFoavJyyO-Ix05YwZB2tS24/view?usp=sharing

Thanks so much in advance,
Regards.

--
Moises Calzado

Support Engineer

+34671264286 | mcalz...@carto.com<mailto:mcalz...@carto.com> | 
CARTO<https://www.carto.com/>
[https://storage.googleapis.com/carto-it-files/signature/SDSC-2023-LND_Signature-mail.jpg]<https://spatial-data-science-conference.com/2023/london/>


_______________________________________________

gdal-dev mailing list

gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>

https://lists.osgeo.org/mailman/listinfo/gdal-dev

--

http://www.spatialys.com<http://www.spatialys.com/>

My software is free, but my time generally not.


--
Moises Calzado

Support Engineer

+34671264286 | mcalz...@carto.com<mailto:mcalz...@carto.com> | 
CARTO<https://www.carto.com/>
[https://storage.googleapis.com/carto-it-files/signature/SDSC-2023-LND_Signature-mail.jpg]<https://spatial-data-science-conference.com/2023/london/>

--

http://www.spatialys.com<http://www.spatialys.com/>

My software is free, but my time generally not.


--
Moises Calzado

Support Engineer

+34671264286 | mcalz...@carto.com<mailto:mcalz...@carto.com> | 
CARTO<https://www.carto.com/>
[https://storage.googleapis.com/carto-it-files/signature/SDSC-2023-LND_Signature-mail.jpg]<https://spatial-data-science-conference.com/2023/london/>
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev


--
Moises Calzado

Support Engineer

+34671264286 | mcalz...@carto.com<mailto:mcalz...@carto.com> | 
CARTO<https://www.carto.com/>
[https://storage.googleapis.com/carto-it-files/signature/SDSC-2023-LND_Signature-mail.jpg]<https://spatial-data-science-conference.com/2023/london/>
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev


--
Moises Calzado

Support Engineer

+34671264286 | mcalz...@carto.com<mailto:mcalz...@carto.com> | 
CARTO<https://www.carto.com/>
[https://storage.googleapis.com/carto-it-files/signature/SDSC-2023-LND_Signature-mail.jpg]<https://spatial-data-science-conference.com/2023/london/>
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>
https://lists.osgeo.org/mailman/listinfo/gdal-dev


--
Moises Calzado

Support Engineer

+34671264286 | mcalz...@carto.com<mailto:mcalz...@carto.com> | 
CARTO<https://www.carto.com/>
[https://storage.googleapis.com/carto-it-files/signature/SDSC-2023-LND_Signature-mail.jpg]<https://spatial-data-science-conference.com/2023/london/>


_______________________________________________

gdal-dev mailing list

gdal-dev@lists.osgeo.org<mailto:gdal-dev@lists.osgeo.org>

https://lists.osgeo.org/mailman/listinfo/gdal-dev

--

http://www.spatialys.com<http://www.spatialys.com/>

My software is free, but my time generally not.


--
Moises Calzado

Support Engineer

+34671264286 | mcalz...@carto.com<mailto:mcalz...@carto.com> | 
CARTO<https://www.carto.com/>
[https://storage.googleapis.com/carto-it-files/signature/SDSC-2023-LND_Signature-mail.jpg]<https://spatial-data-science-conference.com/2023/london/>
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to