RE: array and struct 64-bit Linux change in behavior Python 3.7 and 2.7

Chris Clark Tue, 03 Dec 2019 21:18:48 -0800

Thanks for all the replies (and apologies for top posting, I have a brain dead 
email client ☹).

I think the consensus from the various threads is that the docs are either 
lacking or misleading.

I mentioned that this impacts bytes and the problem there is more telling as it 
hard fails (this is how I first discovered this was an issue):

    >>> array.array('L', b'\0\0\0\0')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    ValueError: string length not a multiple of item size

I don't believe the documentation is accurate by using the word "minimum". 
Minimum would suggest that it would accept a 4-byte value as a minimum and on 
64-bit it does *not*, it hard fails. If it were to document that, "the sizes 
are native integer types for the platform, the table documents some typical but 
*not* guaranteed sizes", that would be more clear.

For struct - I think the '<' and '=' non-padding docs could benefit from some 
explanation.. I'm not sure what yet 😊

I saw a few suggestions on alternatives for size specifications, I'm definitely 
in favor of that (right now I'm probing I and L to determine size before using 
them for real). I don’t think U prefix would work as array really only accepts 
a single specifier. If array was to be updated to use multiple character 
specifiers I would recommend matching the struct specifier (which it is close 
to at the moment) format.

For my uses case I'm seriously thinking about not using array moving forward 
and only using struct. I briefly wondered about ctypes (it has nice names, e.g. 
c_int64 that are unambiguous) but then I remembered it is not available in 
Jython).

With the benefit of hindsight it would have been better if array (and struct) 
used stdint.h types, those types and lengths are explicitly documented.

Regarding Barry's comment, yep size consistency with array is a pain - what I 
implemented as workaround is below (and likely to be my solution going forward):

    x = array.array('L', [0])
    if x.itemsize == 4:
        FMT_ARRAY_4BYTE = 'L'
        FMT_STRUCT_4BYTE = '<L'
    else:
        x = array.array('I', [0])
        if x.itemsize == 4:
            FMT_ARRAY_4BYTE = 'I'
            FMT_STRUCT_4BYTE = '<L'
    del(x)

and then use the constants in array/struct calls where (binary) file IO is 
happening.

Any other thoughts before I open doc bugs/PRs?

Thanks!

Chris

-----Original Message-----
From: Richard Damon <[email protected]> 
Sent: Monday, December 2, 2019 5:50 PM
To: [email protected]
Subject: Re: array and struct 64-bit Linux change in behavior Python 3.7 and 2.7

On 12/2/19 4:25 PM, Barry Scott wrote:
>
>> On 2 Dec 2019, at 17:55, Rob Gaddi <[email protected]> wrote:
>>
>> On 12/2/19 9:26 AM, Chris Clark wrote:
>>> Test case:
>>>                import array
>>>                array.array('L', [0]) # x.itemsize == 8  rather than 
>>> 4 This works fine (returns 4) under Windows Python 3.7.3 64-bit 
>>> build.
>>> Under Ubuntu; Python 2.7.15rc1, 3.6.5, 3.70b3 64-bit this returns 8. 
>>> Documentation at https://docs.python.org/3/library/array.html explicitly 
>>> states 'L' is for size 4.
>>> It impacts all uses types of array (e.g. reading from byte strings).
>>> The struct module is a little different:
>>> import struct
>>> x = struct.pack('L', 0)
>>> # len(x) ===8 rather than 4
>>> This can be worked around by using '=L' - which is not well documented - so 
>>> this maybe a doc issue.
>>> Wanted to post here for comments before opening a bug at 
>>> https://bugs.python.org/ 
>>> Is anyone seeing this under Debian/Ubuntu?
>>> Chris
>> I'd say not a bug, at least in array.  Reading that array documentation you 
>> linked, 4 is explicitly the MINIMUM size in bytes, not the guaranteed size.
> I'm wondering how useful it is that for array you can read from a file but 
> have no ideas how many bytes each item needs.
> If I have a file with int32_t  in it I cannot from the docs know how to read 
> that file into an array.
>
>> The struct situation is, as you said, a bit different.  I believe that with 
>> the default native alignment @, you're seeing 4-byte data padded to an 
>> 8-byte alignment, not 8-byte data.  That does seem to go against what the 
>> struct documentation says, "Padding is only automatically added between 
>> successive structure members. No padding is added at the beginning or the 
>> end of the encoded struct."
> The 'L' in struct is documented for 3.7 to use 4 bytes, but in fact uses 8, 
> on fedora 31. Doc bug?
>
>>>> x=struct.pack('L',0x102030405)
>>>> x
> b'\x05\x04\x03\x02\x01\x00\x00\x00'
>
> Given I have exact control with b, h, i, and q but L is not fixed in size I'm 
> not sure how it can be used with certainty across OS and versions.
>
> Barry
>
Actually, you DON'T have exact control with those sizes, it just happens that 
all the platforms you are using happen to have the same size for those types. 
Welcome to the ambiguity in the C type system, the basic types are NOT fixed in 
size. L means 'Long' and as Christian said, that is 8 byte long on Linux-64 
bit. 'L' is exactly the right type for interfacing with a routine defined as 
taking a long. The issue is that you don't know what type a int32_t will be (it 
might be int, or it might be long, and long might not be 32 bits, it will be at 
least 32 bits).

Perhaps array could be extended so that it took '4' for a 4 byte integer and 
'8' for an 8 byte integer (maybe 'U4' and 'U8' for unsigned). Might as well 
also allow 1 and 2 for completeness for char and short (but those are currently 
consistent).

--
Richard Damon

-- 
https://mail.python.org/mailman/listinfo/python-list

RE: array and struct 64-bit Linux change in behavior Python 3.7 and 2.7

Reply via email to