jedrek-VL opened a new issue, #6713: URL: https://github.com/apache/iceberg/issues/6713
### Apache Iceberg version 1.1.0 (latest release) ### Query engine Other ### Please describe the bug 🐞 I start the spark/iceberg docker containers (as explained [here](https://tabular.io/blog/rest-catalog-docker/)) and I use the Getting Started notebook to create and populate table `nyc.taxis`. Then, I use PyIceberg to access the data. I can get the list of tables and get some basic info about the table, but when I query it I get the following error: ```Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, in <module> results = [task.file.file_path for task in scan.plan_files()] File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, in <listcomp> results = [task.file.file_path for task in scan.plan_files()] File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 320, in plan_files for manifest_file in snapshot.manifests(io) File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/snapshots.py", line 116, in manifests return list(read_manifest_list(file)) File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/manifest.py", line 153, in read_manifest_list with AvroFile(input_file) as reader: File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/avro/file.py", line 133, in __enter__ self.input_stream = BufferedReader(self.input_file.open()) File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/io/pyarrow.py", line 153, in open input_file = self._filesystem.open_input_file(self._path) File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: When reading information for key 'wh/nyc/taxis/metadata/snap-6907359110454980554-1-72c446bf-de84-4800-aa24-73b2dd64a259.avro' in bucket 'warehouse': AWS Error UNKNOWN (HTTP status 301) during HeadObject operation: No response body. ``` I would be very grateful if someone could take a look and give me some hints :) Info about my system: ```➜ ~ docker images REPOSITORY TAG IMAGE ID CREATED SIZE minio/mc latest 8e2b3ca6225f 9 hours ago 139MB minio/minio latest 107801c34719 9 hours ago 246MB tabulario/spark-iceberg latest 731f180d545e 7 days ago 3.8GB tabulario/iceberg-rest 0.2.0 d33a2980abc4 13 days ago 442MB``` ```(venv) ➜ apache-iceberg git:(apache-iceberg) ✗ pip freeze appnope==0.1.3 asttokens==2.2.1 backcall==0.2.0 certifi==2022.12.7 cfgv==3.3.1 charset-normalizer==2.1.1 click==8.1.3 commonmark==0.9.1 decorator==5.1.1 distlib==0.3.6 executing==1.2.0 filelock==3.9.0 fsspec==2022.10.0 identify==2.5.16 idna==3.4 ipython==8.9.0 jedi==0.18.2 matplotlib-inline==0.1.6 mmhash3==3.0.1 nodeenv==1.7.0 numpy==1.24.1 parso==0.8.3 pexpect==4.8.0 pickleshare==0.7.5 platformdirs==2.6.2 pre-commit==3.0.2 prompt-toolkit==3.0.36 ptyprocess==0.7.0 pure-eval==0.2.2 py4j==0.10.9.5 pyarrow==11.0.0 pydantic==1.10.2 Pygments==2.14.0 pyiceberg==0.2.1 pyspark==3.3.1 PyYAML==6.0 requests==2.28.1 rich==12.6.0 six==1.16.0 stack-data==0.6.2 traitlets==5.8.1 typing_extensions==4.4.0 urllib3==1.26.14 virtualenv==20.17.1 wcwidth==0.2.6 zstandard==0.19.0 ``` Code I'm running: ```import os from pyiceberg.catalog import load_catalog os.environ["AWS_ACCESS_KEY_ID"] = "admin" os.environ["AWS_SECRET_ACCESS_KEY"] = "password" os.environ["AWS_REGION"] = "us-east-1" catalog = load_catalog("demo_catalog", uri="http://localhost:8181") table = catalog.load_table("nyc.taxis") print(table.identifier) print(table.location()) print(table.schema()) scan = table.scan(selected_fields=("trip_distance", )) results = [task.file.file_path for task in scan.plan_files()]``` Full output: ```(venv) ➜ apache-iceberg git:(apache-iceberg) ✗ python -m src.micro ('demo_catalog', 'nyc', 'taxis') s3a://warehouse/wh/nyc/taxis table { 1: VendorID: optional long 2: tpep_pickup_datetime: optional timestamptz 3: tpep_dropoff_datetime: optional timestamptz 4: passenger_count: optional double 5: trip_distance: optional double 6: RatecodeID: optional double 7: store_and_fwd_flag: optional string 8: PULocationID: optional long 9: DOLocationID: optional long 10: payment_type: optional long 11: fare_amount: optional double 12: extra: optional double 13: mta_tax: optional double 14: tip_amount: optional double 15: tolls_amount: optional double 16: improvement_surcharge: optional double 17: total_amount: optional double 18: congestion_surcharge: optional double 19: airport_fee: optional double } Traceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, in <module> results = [task.file.file_path for task in scan.plan_files()] File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, in <listcomp> results = [task.file.file_path for task in scan.plan_files()] File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py", line 320, in plan_files for manifest_file in snapshot.manifests(io) File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/snapshots.py", line 116, in manifests return list(read_manifest_list(file)) File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/manifest.py", line 153, in read_manifest_list with AvroFile(input_file) as reader: File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/avro/file.py", line 133, in __enter__ self.input_stream = BufferedReader(self.input_file.open()) File "/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/io/pyarrow.py", line 153, in open input_file = self._filesystem.open_input_file(self._path) File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file File "pyarrow/error.pxi", line 144, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status OSError: When reading information for key 'wh/nyc/taxis/metadata/snap-1143379398124310344-1-b8503783-03fc-4eed-9290-110e65ddf9a1.avro' in bucket 'warehouse': AWS Error UNKNOWN (HTTP status 301) during HeadObject operation: No response body.``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org