jedrek-VL opened a new issue, #6713:
URL: https://github.com/apache/iceberg/issues/6713

   ### Apache Iceberg version
   
   1.1.0 (latest release)
   
   ### Query engine
   
   Other
   
   ### Please describe the bug 🐞
   
   I start the spark/iceberg docker containers (as explained 
[here](https://tabular.io/blog/rest-catalog-docker/)) and I use the Getting 
Started notebook to create and populate table `nyc.taxis`.
   
   Then, I use PyIceberg to access the data. I can get the list of tables and 
get some basic info about the table, but when I query it I get the following 
error:
   
   ```Traceback (most recent call last):
     File 
"/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", 
line 196, in _run_module_as_main
       return _run_code(code, main_globals, None,
     File 
"/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", 
line 86, in _run_code
       exec(code, run_globals)
     File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, 
in <module>
       results = [task.file.file_path for task in scan.plan_files()]
     File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, 
in <listcomp>
       results = [task.file.file_path for task in scan.plan_files()]
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py",
 line 320, in plan_files
       for manifest_file in snapshot.manifests(io)
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/snapshots.py",
 line 116, in manifests
       return list(read_manifest_list(file))
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/manifest.py",
 line 153, in read_manifest_list
       with AvroFile(input_file) as reader:
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/avro/file.py",
 line 133, in __enter__
       self.input_stream = BufferedReader(self.input_file.open())
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/io/pyarrow.py",
 line 153, in open
       input_file = self._filesystem.open_input_file(self._path)
     File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file
     File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
   OSError: When reading information for key 
'wh/nyc/taxis/metadata/snap-6907359110454980554-1-72c446bf-de84-4800-aa24-73b2dd64a259.avro'
 in bucket 'warehouse': AWS Error UNKNOWN (HTTP status 301) during HeadObject 
operation: No response body.
   ```
   
   I would be very grateful if someone could take a look and give me some hints 
:)
   
   Info about my system:
   
   ```➜  ~ docker images
   REPOSITORY                TAG       IMAGE ID       CREATED       SIZE
   minio/mc                  latest    8e2b3ca6225f   9 hours ago   139MB
   minio/minio               latest    107801c34719   9 hours ago   246MB
   tabulario/spark-iceberg   latest    731f180d545e   7 days ago    3.8GB
   tabulario/iceberg-rest    0.2.0     d33a2980abc4   13 days ago   442MB```
   
   ```(venv) ➜  apache-iceberg git:(apache-iceberg) ✗ pip freeze
   appnope==0.1.3
   asttokens==2.2.1
   backcall==0.2.0
   certifi==2022.12.7
   cfgv==3.3.1
   charset-normalizer==2.1.1
   click==8.1.3
   commonmark==0.9.1
   decorator==5.1.1
   distlib==0.3.6
   executing==1.2.0
   filelock==3.9.0
   fsspec==2022.10.0
   identify==2.5.16
   idna==3.4
   ipython==8.9.0
   jedi==0.18.2
   matplotlib-inline==0.1.6
   mmhash3==3.0.1
   nodeenv==1.7.0
   numpy==1.24.1
   parso==0.8.3
   pexpect==4.8.0
   pickleshare==0.7.5
   platformdirs==2.6.2
   pre-commit==3.0.2
   prompt-toolkit==3.0.36
   ptyprocess==0.7.0
   pure-eval==0.2.2
   py4j==0.10.9.5
   pyarrow==11.0.0
   pydantic==1.10.2
   Pygments==2.14.0
   pyiceberg==0.2.1
   pyspark==3.3.1
   PyYAML==6.0
   requests==2.28.1
   rich==12.6.0
   six==1.16.0
   stack-data==0.6.2
   traitlets==5.8.1
   typing_extensions==4.4.0
   urllib3==1.26.14
   virtualenv==20.17.1
   wcwidth==0.2.6
   zstandard==0.19.0
   ```
   
   Code I'm running:
   ```import os
   
   from pyiceberg.catalog import load_catalog
   
   os.environ["AWS_ACCESS_KEY_ID"] = "admin"
   os.environ["AWS_SECRET_ACCESS_KEY"] = "password"
   os.environ["AWS_REGION"] = "us-east-1"
   
   catalog = load_catalog("demo_catalog", uri="http://localhost:8181";)
   table = catalog.load_table("nyc.taxis")
   print(table.identifier)
   print(table.location())
   print(table.schema())
   
   scan = table.scan(selected_fields=("trip_distance", ))
   results = [task.file.file_path for task in scan.plan_files()]```
   
   
   Full output:
   ```(venv) ➜  apache-iceberg git:(apache-iceberg) ✗ python -m src.micro
   ('demo_catalog', 'nyc', 'taxis')
   s3a://warehouse/wh/nyc/taxis
   table {
     1: VendorID: optional long
     2: tpep_pickup_datetime: optional timestamptz
     3: tpep_dropoff_datetime: optional timestamptz
     4: passenger_count: optional double
     5: trip_distance: optional double
     6: RatecodeID: optional double
     7: store_and_fwd_flag: optional string
     8: PULocationID: optional long
     9: DOLocationID: optional long
     10: payment_type: optional long
     11: fare_amount: optional double
     12: extra: optional double
     13: mta_tax: optional double
     14: tip_amount: optional double
     15: tolls_amount: optional double
     16: improvement_surcharge: optional double
     17: total_amount: optional double
     18: congestion_surcharge: optional double
     19: airport_fee: optional double
   }
   Traceback (most recent call last):
     File 
"/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", 
line 196, in _run_module_as_main
       return _run_code(code, main_globals, None,
     File 
"/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/runpy.py", 
line 86, in _run_code
       exec(code, run_globals)
     File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, 
in <module>
       results = [task.file.file_path for task in scan.plan_files()]
     File "/Users/xyz/data/data-research/apache-iceberg/src/micro.py", line 16, 
in <listcomp>
       results = [task.file.file_path for task in scan.plan_files()]
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/__init__.py",
 line 320, in plan_files
       for manifest_file in snapshot.manifests(io)
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/table/snapshots.py",
 line 116, in manifests
       return list(read_manifest_list(file))
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/manifest.py",
 line 153, in read_manifest_list
       with AvroFile(input_file) as reader:
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/avro/file.py",
 line 133, in __enter__
       self.input_stream = BufferedReader(self.input_file.open())
     File 
"/Users/xyz/data/data-research/venv/lib/python3.10/site-packages/pyiceberg/io/pyarrow.py",
 line 153, in open
       input_file = self._filesystem.open_input_file(self._path)
     File "pyarrow/_fs.pyx", line 770, in pyarrow._fs.FileSystem.open_input_file
     File "pyarrow/error.pxi", line 144, in 
pyarrow.lib.pyarrow_internal_check_status
     File "pyarrow/error.pxi", line 115, in pyarrow.lib.check_status
   OSError: When reading information for key 
'wh/nyc/taxis/metadata/snap-1143379398124310344-1-b8503783-03fc-4eed-9290-110e65ddf9a1.avro'
 in bucket 'warehouse': AWS Error UNKNOWN (HTTP status 301) during HeadObject 
operation: No response body.```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to