timsaucer commented on issue #1394:
URL: 
https://github.com/apache/datafusion-python/issues/1394#issuecomment-4309073946

   After installing the skill that landed in #1497 this correctly generated 
21/22 TPC-H queries using only the text description in a brand new environment. 
The one that failed, Q11, failed with
   
   - Q11 — TypeError: unsupported operand type(s) for *: 'decimal.Decimal' and
     'float'. collect_column("total_value")[0].as_py() returned a 
decimal.Decimal
     (because ps_supplycost is a decimal column), and multiplying it by a Python
     float (0.0001) raises. The AGENTS.md guide covers date-type surprises from
     as_py() but not decimal-type ones.
   
   This is something to be resolved, probably when we work on the TPC-H queries 
in PR3. 
   
   This was the recommended addition to the skill from my agent:
   
     ▎ Decimal columns: PyArrow surfaces decimal128/decimal256 columns as
     ▎ decimal.Decimal from to_pydict() / to_pylist() / .as_py(). Decimal does 
not
     ▎ interoperate with Python float — decimal.Decimal("100.0") * 0.0001 raises
     ▎ TypeError. Options:
     ▎ - Keep the arithmetic inside DataFusion (e.g., a follow-up .select() with
     ▎ col("x") * lit(0.0001)) and never collect the scalar to Python.
     ▎ - Convert with float(scalar) before mixing with floats, or use
     ▎ Decimal(str(0.0001)) to stay in decimal.
     ▎
     ▎ # WRONG — Decimal * float
     ▎ total = df.collect_column("total_value")[0].as_py()   # decimal.Decimal
     ▎ threshold = total * 0.0001                             # TypeError
     ▎
     ▎ # CORRECT — cast to float, or pass the threshold back through lit()
     ▎ threshold = float(total) * 0.0001
     ▎ # or, better, keep it in DataFusion:
     ▎ threshold_df = df.select((col("total_value") *
     ▎ lit(0.0001)).alias("threshold"))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to