vbekiaris opened a new issue, #437:
URL: https://github.com/apache/iceberg-go/issues/437

   ### Apache Iceberg version
   
   main (development)
   
   ### Please describe the bug 🐞
   
   `Scan.PlanFiles` takes a `context` argument. This creates the expectation 
that this context is actually used internally for downloading objects from S3. 
However this is not the case: the context is not propagated through the `IO` 
abstraction and the implementation actually uses a stored context (previously 
stored from `Catalog.LoadTable`).
   
   This breaks the case where `Table`s are cached across requests (to avoid 
hitting catalog and download/parse table metadata on each request), as setting 
a context with timeout on `LoadTable` results in getting "context cancelled" 
errors in any further request.
   
   The reproducer below just uses `LoadTable` from a separate function and 
cancels the context with timeout for resource cleanup (as per recommended 
practice), then `PlanFiles` fails with "context canceled".
   
   ```
   func TestLoadTableWithTimeout(t *testing.T) {
        ctx := context.Background()
        cat, err := GetCatalog(ctx)
        require.NoError(t, err)
   
        tbl, err := loadTableWithTimeout(ctx, cat, "db.test_table")
        require.NoError(t, err)
        // the following fails, because PlanFiles does not really use passed 
context
        // error:  Received unexpected error:
        // could not open manifest file: operation error S3: GetObject, context 
canceled
        _, err = tbl.Scan().PlanFiles(ctx)
        require.NoError(t, err)
   }
   
   func loadTableWithTimeout(ctx context.Context, cat catalog.Catalog, tblName 
string) (*table.Table, error) {
        ctxWithTimeout, cancelFn := context.WithTimeout(ctx, 1*time.Minute)
        defer cancelFn()
        return cat.LoadTable(ctxWithTimeout, catalog.ToIdentifier(tblName), nil)
   }
   ```
   
   We use the Glue catalog, but any implementation that uses `io.LoadFS` 
ultimately stores the context in `IO` implementation.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to