Joaquim

Thanks for the notice. I was getting very worried that the many small 
allocations (a few per value) and finalizers were somehow incredibly expensive. 
Apparently they do have a cost, but just a reasonable one.

-erik

> On Jun 24, 2025, at 16:28, Joaquim Manuel Freire Luís <jl...@ualg.pt> wrote:
> 
> To finish this. The problem was in the GMT.jl wrapper.
> Short history.
> Julia 1.9 started to crash with some GMT.jl tests
> Opened an issue (https://github.com/JuliaLang/julia/issues/47003) but got 
> little-to-none help
> Found a patch for the situation (that seemed innocuous)
> https://github.com/GenericMappingTools/GMT.jl/blob/master/src/gdal.jl#L1953
>  
> but that patch ended up causing this extreme slowdown. Since Julia doesn’t 
> crash anymore, I removed that patch and now
>  
> julia> @time 
> gdalread("/vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz";);
>   1.007115 seconds (130.95 k allocations: 3.804 MiB)
>  
> Thanks for the discussion that helped a lot figuring out the problem.
>  
> Joaquim
>  
> From: gdal-dev <gdal-dev-boun...@lists.osgeo.org> On Behalf Of Joaquim Manuel 
> Freire Luís via gdal-dev
> Sent: Tuesday, June 24, 2025 6:19 PM
> To: Erik Schnetter <schnet...@gmail.com>; gdal-dev@lists.osgeo.org
> Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once
>  
> Even
>  
> In case you are spending any time on this, please do not. It this time I’m 
> persuaded that this is a Juia wrapper(s) issue but have no time to 
> investigating it much more right now.
>  
> From: Joaquim Manuel Freire Luís <jl...@ualg.pt <mailto:jl...@ualg.pt>>
> Sent: Tuesday, June 24, 2025 5:52 PM
> To: Joaquim Manuel Freire Luís <jl...@ualg.pt <mailto:jl...@ualg.pt>>; Erik 
> Schnetter <schnet...@gmail.com <mailto:schnet...@gmail.com>>; 
> gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org>
> Subject: RE: [gdal-dev] Read a /vsigzip/ csv.gz all at once
>  
> Erik, BINGO.
>  
> Since, in this case, I know that the field type is string, if I replace the 
> calls to getfield(…) to
>  
> OGR_F_GetFieldAsString(f.ptr, k)
>  
> I get these timings (on a local file)
>  
> julia> @time 
> gdalread("/vsigzip/C:/TMP/.meteostat/cache/hourly/2025/08554.csv.gz");
>   0.046272 seconds (56.04 k allocations: 1.751 MiB)
>  
> and, for a remote one
>  
> julia> @time 
> gdalread("/vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz";);
>   1.191215 seconds (113.43 k allocations: 3.537 MiB)
>  
>  
> From: gdal-dev <gdal-dev-boun...@lists.osgeo.org 
> <mailto:gdal-dev-boun...@lists.osgeo.org>> On Behalf Of Joaquim Manuel Freire 
> Luís via gdal-dev
> Sent: Tuesday, June 24, 2025 5:33 PM
> To: Erik Schnetter <schnet...@gmail.com <mailto:schnet...@gmail.com>>; 
> gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org>
> Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once
>  
> To complement what Eric said, here’s the ‘getfield’ function (This code was 
> taken from ArchGDAL so we are talking of the same thing)
>  
> https://github.com/GenericMappingTools/GMT.jl/blob/master/src/gdal.jl#L2161
>  
> From: gdal-dev <gdal-dev-boun...@lists.osgeo.org 
> <mailto:gdal-dev-boun...@lists.osgeo.org>> On Behalf Of Erik Schnetter via 
> gdal-dev
> Sent: Tuesday, June 24, 2025 5:21 PM
> To: gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org>
> Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once
>  
> The Julia wrapper (ArchGDAL.jl) for `getfield` calls `OGR_FD_GetFieldDefn` 
> and several related function (to get the type of the field etc.). Are these 
> possibly expensive operations in GDAL?
>  
> Any C function in GDAL can easily be called from Julia. Which C function 
> would get all fields at once? I assume that e.g. `OGR_F_GetFieldAsDoubleList` 
> would not work; this would be for values that are themselves lists?
>  
> The Julia code for `getfield` spends quite a bit of work to find out the type 
> of the field. This includes a bit of reference counting, allocating small 
> structures on the heap, registering finalizers for them etc. This could be 
> avoided by adding a Julia wrapper that calls `getfield` repeatedly (even from 
> Julia, calling C has no overhead by itself) for a range of integers. This 
> would avoid the additional overhead having to do with handling types, and the 
> Julia/GDAL reference counting. Even, is that what you had in mind?
>  
> -erik
>  
> On Jun 24, 2025, at 11:01, Even Rouault via gdal-dev 
> <gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org>> wrote:
>  
> Hi,
> 
> I don't know anything about Julia but I'd suspect that there must be 
> something particularly slow in the way it interacts with C. For comparison,  
> "time python3 swig/python/gdal-utils/osgeo_utils/samples/ogrinfo.py  
> /vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz  -al 
> > /dev/null" that does essentially your loop, and also prints on stdout, runs 
> in 1.5 seconds (compared to native ogrinfo that runs in 0.7 s). Perhaps you 
> could write a Julia wrapper to get all fields of feature at once and return 
> whatever dictionary or equivalent data structure is idiomatic (and efficient 
> )in Julia ? Also are you sure your Julia wrapper is built with optimization 
> enabled?
> 
> Even
> 
> Le 24/06/2025 à 16:33, Joaquim Manuel Freire Luís via gdal-dev a écrit :
> Hi,
>  
> Im trying to read files like 
> https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz
>  
> in my Julia wrapper. The point is that, although I’m kind off succeeding, the 
> hole operation is very slow.
> What I’m doing (code not committed yet so can’t post a link) is to read like 
> this
>  
> layer = getlayer(dataset, 0)
> for f in layer
>                for k = 1: Gdal.nfield(f)
>                               Gdal.getfield(f, k-1)
> …
>  
> This works but it’s extremely slow because each “getfield” takes about 1e-4 
> seconds and the file has ~8 k rows, each with 13 fields. That amounts to > 10 
> sec.
>  
> I’ve searched but couldn’t find a way to read the entire file at once (which 
> takes 1e-2 seconds if I read it, locally, with a gzip wrapper) and return it 
> as a single string array that I could parse later.
>  
> Is that possible? 
>  
> Thanks
>  
> Joaquim
>  
> 
> _______________________________________________
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/gdal-dev
> -- 
> http://www.spatialys.com <http://www.spatialys.com/>
> My software is free, but my time generally not.
> _______________________________________________
> gdal-dev mailing list
> gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org>
> https://lists.osgeo.org/mailman/listinfo/gdal-dev

_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
https://lists.osgeo.org/mailman/listinfo/gdal-dev

Reply via email to