Steve Stagg created AVRO-3834:
---------------------------------
Summary: [Python] Incorrect decimal encoding/decoding
Key: AVRO-3834
URL: https://issues.apache.org/jira/browse/AVRO-3834
Project: Apache Avro
Issue Type: Bug
Components: logical types, python
Affects Versions: 1.11.2
Environment: Python 3.10.3, Avro 1.11.2
Reporter: Steve Stagg
When encoding `decimal.Decimal` values using the python avro library, the
exponent of the value is largely ignored.
This means that incorrect twos-complement values are calculated, and we end up
with incorrect avros are produced.
Here's a reasonalby compact reproducer:
```python
import avro
import avro.io
from decimal import Decimal
from io import BytesIO
TESTS = [
'314',
'31',
'3',
'3.1',
'31.4',
'3.14',
'3.141',
'3.1415',
]
if __name__ == '__main__':
schema_text = '''{
"type": "bytes",
"logicalType": "decimal",
"precision": 8,
"scale": 4
}'''
print(f"AVRO VERSION: {avro.__version__}")
schema = avro.schema.parse(schema_text)
writer = avro.io.DatumWriter(schema)
reader = avro.io.DatumReader(schema)
for val in TESTS:
buf = BytesIO()
val = Decimal(val)
writer.write(val, avro.io.BinaryEncoder(buf))
buf.seek(0)
decoded_val = reader.read(avro.io.BinaryDecoder(buf))
match = val == decoded_val
result = 'PASS' if match else 'FAIL'
print(f'Encoded: {val} -> {buf.getvalue()} -> {decoded_val} {result}')
```
Which outputs:
```
AVRO VERSION: 1.11.2
Encoded: 314 -> b'\x04\x01:' -> 0.0314 FAIL
Encoded: 31 -> b'\x02\x1f' -> 0.0031 FAIL
Encoded: 3 -> b'\x02\x03' -> 0.0003 FAIL
Encoded: 3.1 -> b'\x02\x1f' -> 0.0031 FAIL
Encoded: 31.4 -> b'\x04\x01:' -> 0.0314 FAIL
Encoded: 3.14 -> b'\x04\x01:' -> 0.0314 FAIL
Encoded: 3.141 -> b'\x04\x0cE' -> 0.3141 FAIL
Encoded: 3.1415 -> b'\x04z\xb7' -> 3.1415 PASS
```
The problem is that the code here:
https://github.com/apache/avro/blob/5bd2bc7a492a611382cddc5db3b5bf0b1b7d2b83/lang/py/avro/io.py#L468
does not use `exp` to shift the digits, exp is just checked to ensure it's not
greater than scale for validation purposes.
If you look in the output, the produced avro bytes for '31.4' and '3.14' is
identical, because the exp is ignored.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)