Joaquim Thanks for the notice. I was getting very worried that the many small allocations (a few per value) and finalizers were somehow incredibly expensive. Apparently they do have a cost, but just a reasonable one.
-erik > On Jun 24, 2025, at 16:28, Joaquim Manuel Freire Luís <jl...@ualg.pt> wrote: > > To finish this. The problem was in the GMT.jl wrapper. > Short history. > Julia 1.9 started to crash with some GMT.jl tests > Opened an issue (https://github.com/JuliaLang/julia/issues/47003) but got > little-to-none help > Found a patch for the situation (that seemed innocuous) > https://github.com/GenericMappingTools/GMT.jl/blob/master/src/gdal.jl#L1953 > > but that patch ended up causing this extreme slowdown. Since Julia doesn’t > crash anymore, I removed that patch and now > > julia> @time > gdalread("/vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz"); > 1.007115 seconds (130.95 k allocations: 3.804 MiB) > > Thanks for the discussion that helped a lot figuring out the problem. > > Joaquim > > From: gdal-dev <gdal-dev-boun...@lists.osgeo.org> On Behalf Of Joaquim Manuel > Freire Luís via gdal-dev > Sent: Tuesday, June 24, 2025 6:19 PM > To: Erik Schnetter <schnet...@gmail.com>; gdal-dev@lists.osgeo.org > Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once > > Even > > In case you are spending any time on this, please do not. It this time I’m > persuaded that this is a Juia wrapper(s) issue but have no time to > investigating it much more right now. > > From: Joaquim Manuel Freire Luís <jl...@ualg.pt <mailto:jl...@ualg.pt>> > Sent: Tuesday, June 24, 2025 5:52 PM > To: Joaquim Manuel Freire Luís <jl...@ualg.pt <mailto:jl...@ualg.pt>>; Erik > Schnetter <schnet...@gmail.com <mailto:schnet...@gmail.com>>; > gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org> > Subject: RE: [gdal-dev] Read a /vsigzip/ csv.gz all at once > > Erik, BINGO. > > Since, in this case, I know that the field type is string, if I replace the > calls to getfield(…) to > > OGR_F_GetFieldAsString(f.ptr, k) > > I get these timings (on a local file) > > julia> @time > gdalread("/vsigzip/C:/TMP/.meteostat/cache/hourly/2025/08554.csv.gz"); > 0.046272 seconds (56.04 k allocations: 1.751 MiB) > > and, for a remote one > > julia> @time > gdalread("/vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz"); > 1.191215 seconds (113.43 k allocations: 3.537 MiB) > > > From: gdal-dev <gdal-dev-boun...@lists.osgeo.org > <mailto:gdal-dev-boun...@lists.osgeo.org>> On Behalf Of Joaquim Manuel Freire > Luís via gdal-dev > Sent: Tuesday, June 24, 2025 5:33 PM > To: Erik Schnetter <schnet...@gmail.com <mailto:schnet...@gmail.com>>; > gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org> > Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once > > To complement what Eric said, here’s the ‘getfield’ function (This code was > taken from ArchGDAL so we are talking of the same thing) > > https://github.com/GenericMappingTools/GMT.jl/blob/master/src/gdal.jl#L2161 > > From: gdal-dev <gdal-dev-boun...@lists.osgeo.org > <mailto:gdal-dev-boun...@lists.osgeo.org>> On Behalf Of Erik Schnetter via > gdal-dev > Sent: Tuesday, June 24, 2025 5:21 PM > To: gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org> > Subject: Re: [gdal-dev] Read a /vsigzip/ csv.gz all at once > > The Julia wrapper (ArchGDAL.jl) for `getfield` calls `OGR_FD_GetFieldDefn` > and several related function (to get the type of the field etc.). Are these > possibly expensive operations in GDAL? > > Any C function in GDAL can easily be called from Julia. Which C function > would get all fields at once? I assume that e.g. `OGR_F_GetFieldAsDoubleList` > would not work; this would be for values that are themselves lists? > > The Julia code for `getfield` spends quite a bit of work to find out the type > of the field. This includes a bit of reference counting, allocating small > structures on the heap, registering finalizers for them etc. This could be > avoided by adding a Julia wrapper that calls `getfield` repeatedly (even from > Julia, calling C has no overhead by itself) for a range of integers. This > would avoid the additional overhead having to do with handling types, and the > Julia/GDAL reference counting. Even, is that what you had in mind? > > -erik > > On Jun 24, 2025, at 11:01, Even Rouault via gdal-dev > <gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org>> wrote: > > Hi, > > I don't know anything about Julia but I'd suspect that there must be > something particularly slow in the way it interacts with C. For comparison, > "time python3 swig/python/gdal-utils/osgeo_utils/samples/ogrinfo.py > /vsigzip//vsicurl/https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz -al > > /dev/null" that does essentially your loop, and also prints on stdout, runs > in 1.5 seconds (compared to native ogrinfo that runs in 0.7 s). Perhaps you > could write a Julia wrapper to get all fields of feature at once and return > whatever dictionary or equivalent data structure is idiomatic (and efficient > )in Julia ? Also are you sure your Julia wrapper is built with optimization > enabled? > > Even > > Le 24/06/2025 à 16:33, Joaquim Manuel Freire Luís via gdal-dev a écrit : > Hi, > > Im trying to read files like > https://bulk.meteostat.net/v2/hourly/2022/08554.csv.gz > > in my Julia wrapper. The point is that, although I’m kind off succeeding, the > hole operation is very slow. > What I’m doing (code not committed yet so can’t post a link) is to read like > this > > layer = getlayer(dataset, 0) > for f in layer > for k = 1: Gdal.nfield(f) > Gdal.getfield(f, k-1) > … > > This works but it’s extremely slow because each “getfield” takes about 1e-4 > seconds and the file has ~8 k rows, each with 13 fields. That amounts to > 10 > sec. > > I’ve searched but couldn’t find a way to read the entire file at once (which > takes 1e-2 seconds if I read it, locally, with a gzip wrapper) and return it > as a single string array that I could parse later. > > Is that possible? > > Thanks > > Joaquim > > > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org> > https://lists.osgeo.org/mailman/listinfo/gdal-dev > -- > http://www.spatialys.com <http://www.spatialys.com/> > My software is free, but my time generally not. > _______________________________________________ > gdal-dev mailing list > gdal-dev@lists.osgeo.org <mailto:gdal-dev@lists.osgeo.org> > https://lists.osgeo.org/mailman/listinfo/gdal-dev
_______________________________________________ gdal-dev mailing list gdal-dev@lists.osgeo.org https://lists.osgeo.org/mailman/listinfo/gdal-dev