darcysaum-toast opened a new issue, #131:
URL: https://github.com/apache/arrow-java/issues/131

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Hello arrow team,
   
   I believe the Avro adapter fails on avro union types nested in records and 
arrays. The allocator tries to allocate increasingly more memory until it 
reaches the configured max (ie you can use 
`-Darrow.vector.max_allocation_bytes=8589934592` to tell the allocator to use 
1GB and it will try to allocate all of it for an empty array field with a 
nested union type field).
   
   `Memory required for vector capacity 508400 is (2097152), which is more than 
max allowed (1048576)`
   
   I believe this is because the while loop 
[here](https://github.com/apache/arrow/blob/main/java/adapter/avro/src/main/java/org/apache/arrow/adapter/avro/consumers/AvroStructConsumer.java#L69-L75)
 never terminates as `UnionVector::getValueCapacity` always returns 0 when the 
union is nested in either a record or an array.
   
   I have a branch 
[here](https://github.com/darcysaum-toast/arrow/tree/avro-nested-union) with a 
[failing 
test](https://github.com/darcysaum-toast/arrow/blob/avro-nested-union/java/adapter/avro/src/test/java/org/apache/arrow/adapter/avro/AvroToArrowIteratorTest.java#L171).
 
   ```
     @Test
     public void testNestedUnion() throws Exception {
       Schema schema = getSchema("test_nested_union.avsc");
       Schema topChildSchema = schema
               .getField("f0")
               .schema();
       GenericRecord parent = new GenericData.Record(schema);
       GenericRecord topChild = new GenericData.Record(topChildSchema);
   
       topChild.put("fNestedUnion", 1);
       parent.put("f0", topChild);
   
       List<VectorSchemaRoot> roots = new ArrayList<>();
       try (AvroToArrowVectorIterator iterator = convert(schema, 
singletonList(parent))) {
         while (iterator.hasNext()) {
           roots.add(iterator.next());
           System.out.println(roots.get(roots.size() - 1).contentToTSVString());
         }
       }
     }
   ```
   
   The 
[schema](https://github.com/darcysaum-toast/arrow/blob/avro-nested-union/java/adapter/avro/src/test/resources/schema/test_nested_union.avsc)
 is as follows:
   ```
   {
     "namespace": "org.apache.arrow.avro",
     "type": "record",
     "name": "testArrayOfRecords",
     "fields": [
       {
         "name": "f0",
         "type": {
           "type": "record",
           "name": "NestedRecord",
           "fields": [
             {
               "name": "fNestedUnion",
               "type": ["null", "int"],
               "default": null
             }
           ]
         }
       }
     ]
   ``` 
   
   Changing type of field `fNestedUnion` from `["null", "int"]` to `"int"` will 
make the test pass (and the iterator correctly prints out the arrow data).
   
   
   ### Component(s)
   
   Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to