venkat-oss opened a new issue, #209:
URL: https://github.com/apache/arrow-go/issues/209

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hi @zeroshade I've come across this closed issue 
[#38616](https://github.com/apache/arrow/issues/38616) and I could still 
reproduce it while writing arrow data to a parquet file using pqarrow.
   
   Here is the code that's writing to parquet file, I'm using one of your 
[examples](https://voltrondata.com/blog/make-data-files-easier-to-work-with-golang-arrow):
   
   ```go
   arrChan := make(chan arrow.Record, 10)
   
   go func(ch <-chan arrow.Record) {
     
       first_rec := <-ch
       f, err := os.OpenFile("./test.parquet", os.O_CREATE|os.O_WRONLY, 0644)
       if err != nil {
            panic(err)
       }
       defer f.Close()
       // ...
       // we'll use the default writer properties, but you could easily pass
       // properties to customize the writer
       props := parquet.NewWriterProperties()
       writer, err := pqarrow.NewFileWriter(first_rec.Schema(), f, props,
            pqarrow.DefaultWriterProps())
       if err != nil {
            panic(err)
       }
       defer writer.Close()
       fmt.Println("here")
       
       if err := writer.Write(first_rec); err != nil {
            fmt.Println(err)
            panic(err)
       }
       // first_rec.Release()
       
       for rec := range ch {
            if err := writer.Write(rec); err != nil {
                    panic(err)
            }
            // rec.Release()
   }
   }(arrChan)
   ```
   
   The arrow records are Released outside this function.
   
   This code writes out a test.parquet file and when I read it using DuckDB, I 
get this error: 
   
   > Error: Invalid Input Error: Failed to cast value: Type UINT32 with value 
4294967295 can't be cast because the value is out of range for the destination 
type UINT16
   
   Here is the output from the 
[parquet-cli](https://formulae.brew.sh/formula/parquet-cli) tool similar to 
what's in [#38616](https://github.com/apache/arrow/issues/38616)
   
   ```shell
   $ parquet pages test.parquet
   
   Column: id
   
--------------------------------------------------------------------------------
     page   type  enc  count   avg size   size       rows     nulls   min / max
     0-D    dict  _ _  1       4.00 B     4 B       
     0-1    data  _ R  1       3.00 B     3 B                 0       "0" / "0"
   
   
   Column: resource.id
   
--------------------------------------------------------------------------------
     page   type  enc  count   avg size   size       rows     nulls   min / max
     0-D    dict  _ _  1       4.00 B     4 B       
     0-1    data  _ R  1       9.00 B     9 B                 0       
"4294967295" / "0"
   
   
   Column: resource.schema_url
   
--------------------------------------------------------------------------------
     page   type  enc  count   avg size   size       rows     nulls   min / max
     0-D    dict  _ _  1       43.00 B    43 B      
     0-1    data  _ R  1       9.00 B     9 B                 0       
"https://opentelemetry.io/..."; / "https://opentelemetry.io/...";
   
   
   Column: scope.id
   
--------------------------------------------------------------------------------
     page   type  enc  count   avg size   size       rows     nulls   min / max
     0-D    dict  _ _  1       4.00 B     4 B       
     0-1    data  _ R  1       9.00 B     9 B                 0       
"4294967295" / "0"
   
   
   Column: metric_type
   
--------------------------------------------------------------------------------
     page   type  enc  count   avg size   size       rows     nulls   min / max
     0-D    dict  _ _  1       4.00 B     4 B       
     0-1    data  _ R  1       3.00 B     3 B                 0       "1" / "1"
   
   
   Column: name
   
--------------------------------------------------------------------------------
     page   type  enc  count   avg size   size       rows     nulls   min / max
     0-D    dict  _ _  1       7.00 B     7 B       
     0-1    data  _ R  1       3.00 B     3 B                         "gen" / 
"gen"
   
   ``` 
   
   ```shell
   $ parquet meta test.parquet
   
   File path:  test.parquet
   Created by: parquet-go version 18.0.0-SNAPSHOT
   Properties: (none)
   Schema:
   message schema {
     required int32 id (INTEGER(16,false));
     required group resource {
       optional int32 id (INTEGER(16,false));
       optional binary schema_url (STRING);
     }
     required group scope {
       optional int32 id (INTEGER(16,false));
     }
     required int32 metric_type (INTEGER(8,false));
     required binary name (STRING);
   }
   
   
   Row group 0:  count: 1  464.00 B records  start: 4  total(compressed): 464 B 
total(uncompressed):464 B 
   
--------------------------------------------------------------------------------
                        type      encodings count     avg size   nulls   min / 
max
   id                   INT32     _ _ R     1         56.00 B    0       "0" / 
"0"
   resource.id          INT32     _ _ R     1         62.00 B    0       
"4294967295" / "0"
   resource.schema_url  BINARY    _ _ R     1         171.00 B   0       
"https://opentelemetry.io/..."; / "https://opentelemetry.io/...";
   scope.id             INT32     _ _ R     1         62.00 B    0       
"4294967295" / "0"
   metric_type          INT32     _ _ R     1         56.00 B    0       "1" / 
"1"
   name                 BINARY    _ _ R     1         57.00 B            "gen" 
/ "gen"
   
   ``` 
   
   I'm hoping these reproduction details are sufficient., if there are any 
missing details that I can provide, please let me know and I can produce them 
as soon as possible. Thank you :thank
   
   ### Component(s)
   
   Parquet


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to