[GitHub] [arrow-adbc] nbenn opened a new issue, #1142: r/adbcdrivermanager: behavior of `depth` argument in `adbc_connection_get_objects()`

2023-10-01 Thread via GitHub


nbenn opened a new issue, #1142:
URL: https://github.com/apache/arrow-adbc/issues/1142

   Going by the docs I was expecting that passing `depth = 4L` I get back 
schema info up to columns. 
   
   ```r
   library(adbcdrivermanager)
   db <- adbc_database_init(adbcsqlite::adbcsqlite(), uri = ":memory:")
   con <- adbc_connection_init(db)
   write_adbc(datasets::swiss, con, "swiss")
   
   res <- nanoarrow::convert_array_stream(
 adbc_connection_get_objects(con, 4L)
   )
   
   res[["catalog_db_schemas"]][[1L]][["db_schema_tables"]][[1L]]
   #>   table_name table_type table_columns table_constraints
   #> 1  swiss  table  NULL  NULL
   ```
   
   If I use default depth of `0L`, I get the expected info.
   
   @paleolimbot Is `3L` the largest "sensible" value? Maybe a warning for 
larger values would be helpful? Or maybe being a bit more clear about this in 
the docs? As things stand, it might be a bit misleading.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] alamb merged pull request #94: Update nested_redords.avro to support nullable records

2023-10-01 Thread via GitHub


alamb merged PR #94:
URL: https://github.com/apache/arrow-testing/pull/94


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] alamb commented on pull request #94: Update nested_redords.avro to support nullable records

2023-10-01 Thread via GitHub


alamb commented on PR #94:
URL: https://github.com/apache/arrow-testing/pull/94#issuecomment-1742042330

   Thank you @sarutak 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] sarutak opened a new pull request, #95: Add xz,zstd,bzip2,snappy variant of alltypes_plain.avro

2023-10-01 Thread via GitHub


sarutak opened a new pull request, #95:
URL: https://github.com/apache/arrow-testing/pull/95

   This PR proposes to add xz, zstd, bzip2 and snappy variant of 
`alltypes_plain.avro`.
   This change is necessary for [this 
PR](https://github.com/apache/arrow-datafusion/pull/7718).
   
   The contents is the same as existing `alltypes_plain.avro`.
   The content represented as JSON is as follows.
   ```
   
{"bigint_col":0,"bool_col":true,"date_string_col":[48,51,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":4,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12358656,"tinyint_col":0}
   
{"bigint_col":10,"bool_col":false,"date_string_col":[48,51,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":5,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123586566000,"tinyint_col":1}
   
{"bigint_col":0,"bool_col":true,"date_string_col":[48,52,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":6,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12385440,"tinyint_col":0}
   
{"bigint_col":10,"bool_col":false,"date_string_col":[48,52,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":7,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123854406000,"tinyint_col":1}
   
{"bigint_col":0,"bool_col":true,"date_string_col":[48,50,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":2,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12334464,"tinyint_col":0}
   
{"bigint_col":10,"bool_col":false,"date_string_col":[48,50,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":3,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123344646000,"tinyint_col":1}
   
{"bigint_col":0,"bool_col":true,"date_string_col":[48,49,47,48,49,47,48,57],"double_col":0.0,"float_col":0.0,"id":0,"int_col":0,"smallint_col":0,"string_col":[48],"timestamp_col":12307680,"tinyint_col":0}
   
{"bigint_col":10,"bool_col":false,"date_string_col":[48,49,47,48,49,47,48,57],"double_col":10.1,"float_col":1.10023841858,"id":1,"int_col":1,"smallint_col":1,"string_col":[49],"timestamp_col":123076806000,"tinyint_col":1}
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-testing] alamb merged pull request #95: Add xz,zstd,bzip2,snappy variant of alltypes_plain.avro

2023-10-01 Thread via GitHub


alamb merged PR #95:
URL: https://github.com/apache/arrow-testing/pull/95


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] tschaub opened a new issue, #37968: [Go][Parquet] Panic reading records from Overture Parquet file

2023-10-01 Thread via GitHub


tschaub opened a new issue, #37968:
URL: https://github.com/apache/arrow/issues/37968

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I'm running into an issue using a record reader to read Parquet data from 
https://github.com/OvertureMaps/data.
   
   Here is a test that demonstrates the panic:
   ```go
   func TestOvertureRead(t *testing.T) {
reader, err := os.Open("testdata/overture.parquet")
require.NoError(t, err)
   
fileReader, err := file.NewParquetReader(reader)
require.NoError(t, err)
   
arrowReader, err := pqarrow.NewFileReader(fileReader, 
pqarrow.ArrowReadProperties{BatchSize: 1024}, memory.DefaultAllocator)
require.NoError(t, err)
   
recordReader, err := arrowReader.GetRecordReader(context.Background(), 
nil, nil)
require.NoError(t, err)
   
rowsRead := int64(0)
for {
rec, err := recordReader.Read()
if err == io.EOF {
break
}
require.NoError(t, err)
rowsRead += rec.NumRows()
}
   
assert.Equal(t, fileReader.NumRows(), rowsRead)
   }
   ```
   
   The `testdata/overture.parquet` file is from 
https://storage.googleapis.com/open-geodata/ch/20230725_211237_00132_5p54t_3b7d7eb3-dd9c-442a-a9b9-404dc936c5d9
   
   Here is the output
   ```bash
   # go test -timeout 30s -run ^TestOvertureRead$ 
github.com/apache/arrow/go/v14/parquet/pqarrow
   
   panic: runtime error: slice bounds out of range [:160] with capacity 0
   
   goroutine 99 [running]:
   
github.com/apache/arrow/go/v14/parquet/internal/encoding.(*PlainByteArrayDecoder).DecodeSpaced(0x0?,
 {0x0?, 0x140005ffce8?, 0x105304140?}, 0x105f2c2d8?, {0x14000402dc0?, 
0x5ffc01?, 0x401?}, 0x700010400?)

/Users/tim/projects/arrow/go/parquet/internal/encoding/byte_array_decoder.go:83 
+0x130
   
github.com/apache/arrow/go/v14/parquet/file.(*byteArrayRecordReader).ReadValuesSpaced(0x140005b4900,
 0x0, 0x800?)
/Users/tim/projects/arrow/go/parquet/file/record_reader.go:841 +0x134
   
github.com/apache/arrow/go/v14/parquet/file.(*recordReader).ReadRecordData(0x140005c55c0,
 0x400)
/Users/tim/projects/arrow/go/parquet/file/record_reader.go:548 +0x288
   
github.com/apache/arrow/go/v14/parquet/file.(*recordReader).ReadRecords(0x140005c55c0,
 0x400)
/Users/tim/projects/arrow/go/parquet/file/record_reader.go:632 +0x32c
   
github.com/apache/arrow/go/v14/parquet/pqarrow.(*leafReader).LoadBatch(0x140005c5620,
 0x400)
/Users/tim/projects/arrow/go/parquet/pqarrow/column_readers.go:104 +0xd8
   
github.com/apache/arrow/go/v14/parquet/pqarrow.(*structReader).LoadBatch.func1()
/Users/tim/projects/arrow/go/parquet/pqarrow/column_readers.go:242 +0x30
   golang.org/x/sync/errgroup.(*Group).Go.func1()
/Users/tim/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:75 
+0x58
   created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 97
/Users/tim/go/pkg/mod/golang.org/x/sync@v0.3.0/errgroup/errgroup.go:72 
+0x98
   FAIL github.com/apache/arrow/go/v14/parquet/pqarrow  0.546s
   FAIL
   ```
   
   This is using the latest commit from this repo 
(a381c05d596cddd341437de6b277520345f9bb8e).  It appears that the issue is due 
to the encoding of the `geometry` column (a `BYTE_ARRAY`).  I'll try to dig 
more to narrow down the issue.
   
   ### Component(s)
   
   Go, Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] amoeba opened a new issue, #37969: [R] segfault when writing to ParquetFileWriter after closing

2023-10-01 Thread via GitHub


amoeba opened a new issue, #37969:
URL: https://github.com/apache/arrow/issues/37969

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Writing to a closed writer causes a segfault rather than an error. I ran 
into this while testing something unrelated, evaluating portions of a larger 
script in a REPL. Writing to a closed output errors as expected so the key here 
is `writer$Close()` and the subsequent `writer$WriteTable` call:
   
   ```r
   library(arrow)
   
   outfile <- tempfile(fileext = ".parquet")
   sink <- FileOutputStream$create(outfile)
   
   my_schema <- schema(letters = string())
   writer <- ParquetFileWriter$create(
 schema = my_schema,
 sink,
 properties = ParquetWriterProperties$create(
   column_names = names(my_schema),
   compression = arrow:::default_parquet_compression()
 )
   )
   tbl_arrow <- as_arrow_table(data.frame(letters=LETTERS), schema = my_schema)
   writer$WriteTable(tbl_arrow, chunk_size = 1)
   
   writer$Close()
   sink$close()
   
   tbl_arrow <- as_arrow_table(data.frame(letters=LETTERS), schema = my_schema)
   writer$WriteTable(tbl_arrow, chunk_size = 1)
   ```
   
   Result:
   
   ```
*** caught segfault ***
   address 0x0, cause 'invalid permissions'
   
   Traceback:
1: parquet___arrow___FileWriter__WriteTable(self, table, chunk_size)
2: writer$WriteTable(tbl_arrow, chunk_size = 1)
   An irrecoverable exception occurred. R is aborting now ...
   fish: Job 1, 'Rscript arrow_memorypool_crashe…' terminated by signal SIGSEGV 
(Address boundary error)
   ```
   
   - OS/arch: macOS 14.0 (Sonoma), aarch64 (M2)
   - R: 4.3.1
   - arrow version: 13.0.0.1
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow] kou opened a new issue, #37971: [CI][Java] java-nightly cache has 8.6 GB

2023-10-01 Thread via GitHub


kou opened a new issue, #37971:
URL: https://github.com/apache/arrow/issues/37971

   ### Describe the enhancement requested
   
   https://github.com/apache/arrow/actions/caches
   
   > java-nightly-6371112382 
   > 8.6 GB cached hours ago
   
   We can use 10 GB in apache/arrow for cache. If the java-nightly cache uses 
8.6 GB, other caches will be expired soon.
   
   The java-nightly cache was introduced by GH-13839.
   
   ### Component(s)
   
   Continuous Integration, Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [arrow-adbc] matquant14 opened a new issue, #1143: Returning Snowflake query id

2023-10-01 Thread via GitHub


matquant14 opened a new issue, #1143:
URL: https://github.com/apache/arrow-adbc/issues/1143

   I'm starting to explore the adbc snowflake driver for python.  Is there a 
way for the adbc cursor to return the Snowflake query id, like the cursor from 
the snowflake python connector does, after executing a query?  Or do I have to 
run
   
   `
   SELECT LAST_QUERY_ID()
   ` 
   after I execute my SQL query?  I'm not seeing anything in the documentation 
or in the code


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org