ravisharda opened a new issue, #207:
URL: https://github.com/apache/arrow-java/issues/207

   ### Describe the usage question you have. Please include as many useful 
details as  possible.
   
   
   I want to read encrypted Parquet files using Arrow Java API to create an 
input stream for [DuckDB](https://github.com/duckdb/duckdb). 
   
   It works for Parquet files as shown below. 
   
   ```java
   import org.apache.arrow.c.ArrowArrayStream;
   import org.apache.arrow.c.Data;
   import org.apache.arrow.dataset.file.FileFormat;
   import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
   import org.apache.arrow.dataset.jni.NativeMemoryPool;
   import org.apache.arrow.dataset.scanner.ScanOptions;
   import org.apache.arrow.dataset.scanner.Scanner;
   import org.apache.arrow.dataset.source.Dataset;
   import org.apache.arrow.dataset.source.DatasetFactory;
   import org.apache.arrow.memory.BufferAllocator;
   import org.apache.arrow.memory.RootAllocator;
   import org.apache.arrow.vector.ipc.ArrowReader;
   import org.duckdb.DuckDBConnection;
   import java.net.URI;
   import java.sql.DriverManager;
   import java.sql.ResultSet;
   import java.sql.Statement;
   
   URI uri = URI.create("file:/path/to/sample.parquet");
   
   ScanOptions options = new ScanOptions(/*batchSize*/ 32768);
   try (BufferAllocator allocator = new RootAllocator();
        ArrowArrayStream stream = ArrowArrayStream.allocateNew(allocator);
        DatasetFactory datasetFactory = new FileSystemDatasetFactory(allocator,
                        NativeMemoryPool.getDefault(), FileFormat.PARQUET, 
uri.toString());
        Dataset dataset = datasetFactory.finish();
        Scanner scanner = dataset.newScan(options);
        ArrowReader reader = scanner.scanBatches();) {
   
      Data.exportArrayStream(allocator, reader, stream);
      
      Class.forName("org.duckdb.DuckDBDriver");
      try (DuckDBConnection conn = (DuckDBConnection) 
DriverManager.getConnection("jdbc:duckdb:")) {
            conn.registerArrowStream("testStream", stream);
   
            try (Statement stmt = conn.createStatement(); 
                   ResultSet rs = stmt.executeQuery("SELECT * FROM 
testStream")) {
                       // Do stuff with the resultset...
                   }
            }
       }
   }
   ```
   
   However, I haven't found how to do it for files encrypted using [Parquet 
modular 
encryption](https://github.com/apache/parquet-format/blob/master/Encryption.md).
 Arrow Python documentation shows some 
[examples](https://arrow.apache.org/docs/python/parquet.html#parquet-modular-encryption-columnar-encryption),
 but they don't exist for Java. 
   
   ### Component(s)
   
   Java


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to