[issue46799] ShareableList memory bloat and performance improvement

2022-02-19 Thread Ting-Che Lin


New submission from Ting-Che Lin :

The current implementation of ShareableList keeps an unnecessary list of 
offsets in self._allocated_offsets. This list could have a large memory 
footprint if the number of items in the list is high. Additionally, this list 
will be copied in each process that needs access to the ShareableList, 
sometimes negating the benefit of the shared memory. Furthermore, in the 
current implementation, different metadata is kept at different sections of 
shared memory, requiring multiple struck.unpack_from calls for a __getitem__ 
call. I have attached a prototype that merged the allocated offsets and packing 
format into a single section in the shared memory. This allows us to use single 
struck.unpack_from operation to obtain both the allocated offset and the 
packing format. By removing the self._allocated_offset list and reducing the 
number of struck.unpack_from operations, we can drastically reduce the memory 
usage and increase the reading performance by 10%. In the case where there are 
only intege
 rs in the ShareableList, we can reduce the memory usage by half. The attached 
implementation also fixed the issue https://bugs.python.org/issue44170 that 
causes error when reading some Unicode characters. I am happy to adapt this 
implementation into a proper bugfix/patch if it is deemed reasonable.

--
components: Library (Lib)
files: shareable_list.py
messages: 413544
nosy: davin, pitrou, tcl326
priority: normal
severity: normal
status: open
title: ShareableList memory bloat and performance improvement
type: performance
versions: Python 3.10, Python 3.11, Python 3.9
Added file: https://bugs.python.org/file50632/shareable_list.py

___
Python tracker 
<https://bugs.python.org/issue46799>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46799] ShareableList memory bloat and performance improvement

2022-02-21 Thread Ting-Che Lin


Change by Ting-Che Lin :


--
keywords: +patch
pull_requests: +29597
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/31467

___
Python tracker 
<https://bugs.python.org/issue46799>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46799] ShareableList memory bloat and performance improvement

2022-02-27 Thread Ting-Che Lin


Ting-Che Lin  added the comment:

So I wrote a patch for this issue and published submitted a MR. When I was 
working on the patch, I realized that there is another issue related to how 
string and byte array size alignment is calculated. As seen here: 
https://github.com/python/cpython/blob/3.10/Lib/multiprocessing/shared_memory.py#L303.
 

>>> from multiprocessing.shared_memory import ShareableList
>>> s_list = ShareableList(["12345678"])
>>> s_list.format
'16s'

I changed the calculation of 
self._alignment * (len(item) // self._alignment + 1),
to
self._alignment * max(1, (len(item) - 1) // self._alignment + 1)

With the patch, this will give
>>> from multiprocessing.shared_memory import ShareableList
>>> s_list = ShareableList(["12345678"])
>>> s_list.format
'8s'

--

___
Python tracker 
<https://bugs.python.org/issue46799>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue46799] ShareableList memory bloat and performance improvement

2022-03-27 Thread Ting-Che Lin


Ting-Che Lin  added the comment:

A gentle Ping to the multiprocessing lib maintainers. Is there anything else I 
can do to move this forward?

--
resolution:  -> remind

___
Python tracker 
<https://bugs.python.org/issue46799>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com