[GitHub] [arrow-adbc] nbenn opened a new issue, #1142: r/adbcdrivermanager: behavior of `depth` argument in `adbc_connection_get_objects()`
nbenn opened a new issue, #1142: URL: https://github.com/apache/arrow-adbc/issues/1142 Going by the docs I was expecting that passing `depth = 4L` I get back schema info up to columns. ```r library(adbcdrivermanager) db <- adbc_database_init(adbcsqlite::adbcsqlite(), uri = ":memory:") con <- adbc_connection_init(db) write_adbc(datasets::swiss, con, "swiss") res <- nanoarrow::convert_array_stream( adbc_connection_get_objects(con, 4L) ) res[["catalog_db_schemas"]][[1L]][["db_schema_tables"]][[1L]] #> table_name table_type table_columns table_constraints #> 1 swiss table NULL NULL ``` If I use default depth of `0L`, I get the expected info. @paleolimbot Is `3L` the largest "sensible" value? Maybe a warning for larger values would be helpful? Or maybe being a bit more clear about this in the docs? As things stand, it might be a bit misleading. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] alamb merged pull request #94: Update nested_redords.avro to support nullable records
alamb merged PR #94: URL: https://github.com/apache/arrow-testing/pull/94 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] alamb commented on pull request #94: Update nested_redords.avro to support nullable records
alamb commented on PR #94: URL: https://github.com/apache/arrow-testing/pull/94#issuecomment-1742042330 Thank you @sarutak -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] sarutak opened a new pull request, #95: Add xz,zstd,bzip2,snappy variant of alltypes_plain.avro
sarutak opened a new pull request, #95: URL: https://github.com/apache/arrow-testing/pull/95 This PR proposes to add xz, zstd, bzip2 and snappy variant of `alltypes_plain.avro`. This change is necessary for [this PR](https://github.com/apache/arrow-datafusion/pull/7718). The contents is the same as existing `alltypes_plain.avro`. The content represented as JSON is as follows. ``` {"bigint_col":0,"bool_col":true,"date_string_col":[48,51,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":4,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12358656,"tinyint_col":0} {"bigint_col":10,"bool_col":false,"date_string_col":[48,51,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":5,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123586566000,"tinyint_col":1} {"bigint_col":0,"bool_col":true,"date_string_col":[48,52,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":6,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12385440,"tinyint_col":0} {"bigint_col":10,"bool_col":false,"date_string_col":[48,52,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":7,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123854406000,"tinyint_col":1} {"bigint_col":0,"bool_col":true,"date_string_col":[48,50,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":2,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12334464,"tinyint_col":0} {"bigint_col":10,"bool_col":false,"date_string_col":[48,50,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":3,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123344646000,"tinyint_col":1} {"bigint_col":0,"bool_col":true,"date_string_col":[48,49,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":0,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12307680,"tinyint_col":0} {"bigint_col":10,"bool_col":false,"date_string_col":[48,49,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":1,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123076806000,"tinyint_col":1} ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-testing] alamb merged pull request #95: Add xz,zstd,bzip2,snappy variant of alltypes_plain.avro
alamb merged PR #95: URL: https://github.com/apache/arrow-testing/pull/95 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] tschaub opened a new issue, #37968: [Go][Parquet] Panic reading records from Overture Parquet file
tschaub opened a new issue, #37968: URL: https://github.com/apache/arrow/issues/37968 ### Describe the bug, including details regarding any error messages, version, and platform. I'm running into an issue using a record reader to read Parquet data from https://github.com/OvertureMaps/data. Here is a test that demonstrates the panic: ```go func TestOvertureRead(t *testing.T) { reader, err := os.Open("testdata/overture.parquet") require.NoError(t, err) fileReader, err := file.NewParquetReader(reader) require.NoError(t, err) arrowReader, err := pqarrow.NewFileReader(fileReader, pqarrow.ArrowReadProperties{BatchSize: 1024}, memory.DefaultAllocator) require.NoError(t, err) recordReader, err := arrowReader.GetRecordReader(context.Background(), nil, nil) require.NoError(t, err) rowsRead := int64(0) for { rec, err := recordReader.Read() if err == io.EOF { break } require.NoError(t, err) rowsRead += rec.NumRows() } assert.Equal(t, fileReader.NumRows(), rowsRead) } ``` The `testdata/overture.parquet` file is from https://storage.googleapis.com/open-geodata/ch/20230725_211237_00132_5p54t_3b7d7eb3-dd9c-442a-a9b9-404dc936c5d9 Here is the output ```bash # go test -timeout 30s -run ^TestOvertureRead$ github.com/apache/arrow/go/v14/parquet/pqarrow panic: runtime error: slice bounds out of range [:160] with capacity 0 goroutine 99 [running]: github.com/apache/arrow/go/v14/parquet/internal/encoding.(*PlainByteArrayDecoder).DecodeSpaced(0x0?, {0x0?, 0x140005ffce8?, 0x105304140?}, 0x105f2c2d8?, {0x14000402dc0?, 0x5ffc01?, 0x401?}, 0x700010400?) /Users/tim/projects/arrow/go/parquet/internal/encoding/byte_array_decoder.go:83 +0x130 github.com/apache/arrow/go/v14/parquet/file.(*byteArrayRecordReader).ReadValuesSpaced(0x140005b4900, 0x0, 0x800?) /Users/tim/projects/arrow/go/parquet/file/record_reader.go:841 +0x134 github.com/apache/arrow/go/v14/parquet/file.(*recordReader).ReadRecordData(0x140005c55c0, 0x400) /Users/tim/projects/arrow/go/parquet/file/record_reader.go:548 +0x288 github.com/apache/arrow/go/v14/parquet/file.(*recordReader).ReadRecords(0x140005c55c0, 0x400) /Users/tim/projects/arrow/go/parquet/file/record_reader.go:632 +0x32c github.com/apache/arrow/go/v14/parquet/pqarrow.(*leafReader).LoadBatch(0x140005c5620, 0x400) /Users/tim/projects/arrow/go/parquet/pqarrow/column_readers.go:104 +0xd8 github.com/apache/arrow/go/v14/parquet/pqarrow.(*structReader).LoadBatch.func1() /Users/tim/projects/arrow/go/parquet/pqarrow/column_readers.go:242 +0x30 golang.org/x/sync/errgroup.(*Group).Go.func1() /Users/tim/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75 +0x58 created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 97 /Users/tim/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:72 +0x98 FAIL github.com/apache/arrow/go/v14/parquet/pqarrow 0.546s FAIL ``` This is using the latest commit from this repo (a381c05d596cddd341437de6b277520345f9bb8e). It appears that the issue is due to the encoding of the `geometry` column (a `BYTE_ARRAY`). I'll try to dig more to narrow down the issue. ### Component(s) Go, Parquet -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] amoeba opened a new issue, #37969: [R] segfault when writing to ParquetFileWriter after closing
amoeba opened a new issue, #37969: URL: https://github.com/apache/arrow/issues/37969 ### Describe the bug, including details regarding any error messages, version, and platform. Writing to a closed writer causes a segfault rather than an error. I ran into this while testing something unrelated, evaluating portions of a larger script in a REPL. Writing to a closed output errors as expected so the key here is `writer$Close()` and the subsequent `writer$WriteTable` call: ```r library(arrow) outfile <- tempfile(fileext = ".parquet") sink <- FileOutputStream$create(outfile) my_schema <- schema(letters = string()) writer <- ParquetFileWriter$create( schema = my_schema, sink, properties = ParquetWriterProperties$create( column_names = names(my_schema), compression = arrow:::default_parquet_compression() ) ) tbl_arrow <- as_arrow_table(data.frame(letters=LETTERS), schema = my_schema) writer$WriteTable(tbl_arrow, chunk_size = 1) writer$Close() sink$close() tbl_arrow <- as_arrow_table(data.frame(letters=LETTERS), schema = my_schema) writer$WriteTable(tbl_arrow, chunk_size = 1) ``` Result: ``` *** caught segfault *** address 0x0, cause 'invalid permissions' Traceback: 1: parquet___arrow___FileWriter__WriteTable(self, table, chunk_size) 2: writer$WriteTable(tbl_arrow, chunk_size = 1) An irrecoverable exception occurred. R is aborting now ... fish: Job 1, 'Rscript arrow_memorypool_crashe…' terminated by signal SIGSEGV (Address boundary error) ``` - OS/arch: macOS 14.0 (Sonoma), aarch64 (M2) - R: 4.3.1 - arrow version: 13.0.0.1 ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow] kou opened a new issue, #37971: [CI][Java] java-nightly cache has 8.6 GB
kou opened a new issue, #37971: URL: https://github.com/apache/arrow/issues/37971 ### Describe the enhancement requested https://github.com/apache/arrow/actions/caches > java-nightly-6371112382 > 8.6 GB cached hours ago We can use 10 GB in apache/arrow for cache. If the java-nightly cache uses 8.6 GB, other caches will be expired soon. The java-nightly cache was introduced by GH-13839. ### Component(s) Continuous Integration, Java -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [arrow-adbc] matquant14 opened a new issue, #1143: Returning Snowflake query id
matquant14 opened a new issue, #1143: URL: https://github.com/apache/arrow-adbc/issues/1143 I'm starting to explore the adbc snowflake driver for python. Is there a way for the adbc cursor to return the Snowflake query id, like the cursor from the snowflake python connector does, after executing a query? Or do I have to run ` SELECT LAST_QUERY_ID() ` after I execute my SQL query? I'm not seeing anything in the documentation or in the code -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org