[issue17343] Add a version of str.split which returns an iterator

2021-02-26 Thread Paweł Miech

Paweł Miech  added the comment:

Making string.split iterator sounds like an interesting task. I found this 
issue because recently we talked in project that string.split returns a list 
and it can cause increased memory usage footprint for some tasks when there is 
large response to parse. 

Here is small script, created by my friend Juancarlo Anez, with iterator 
version of string.split. Compared with default string split it uses much less 
memory. When running with memory-profiler tool: 
https://pypi.org/project/memory-profiler/

It creates this output
329
Filename: main.py

Line #Mem usageIncrement  Occurences   Line Contents

24   39.020 MiB   39.020 MiB   1   @profile
25 def generate_string():
26   39.020 MiB0.000 MiB   1   n = 10
27   49.648 MiB4.281 MiB  13   long_string = " 
".join([uuid.uuid4().hex.upper() for _ in range(n)])
28   43.301 MiB   -6.348 MiB   1   print(len(long_string))
29 
30   43.301 MiB0.000 MiB   1   z = isplit(long_string)
31   43.301 MiB0.000 MiB  11   for line in z:
32   43.301 MiB0.000 MiB  10   continue
33 
34   52.281 MiB0.297 MiB  11   for line in 
long_string.split():
35   52.281 MiB0.000 MiB  10   continue


You can see that default string.split uses much more memory.

--
nosy: +Paweł Miech
Added file: https://bugs.python.org/file49837/main.py

___
Python tracker 
<https://bugs.python.org/issue17343>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41238] Python 3 shelve.DbfilenameShelf is generating 164 times larger files than Python 2.7 when storing dicts

2020-07-08 Thread Paweł Miech

New submission from Paweł Miech :

I'm porting some code from Python 2.7 to Python 3.8. There is some code that is 
using shelve.DbfilenameShelf to store some nested dictionaries with sets. I 
found out that compared with Python 2.7 Python 3.8 shelve generates files that 
are approximately 164 larger on disk. Python 3.8 file is 2 027 520 size, when 
Python 2.7 size is 12 288.

Code sample:
Filename: test_anydbm.py

#!/usr/bin/env python
import datetime
import shelve
import sys
import time
from os import path


def main():
print(sys.version)
fname = 'shelf_test_{}'.format(datetime.datetime.now().isoformat())
bucket = shelve.DbfilenameShelf(fname, "n")
now = time.time()
limit = 1000
key = 'some key > some key > other'
top_dict = {}
to_store = {
1: {
'page_item_numbers': set(),
'products_on_page': None
}
}
for i in range(limit):
to_store[1]['page_item_numbers'].add(i)
top_dict[key] = to_store
bucket[key] = top_dict
end = time.time()
db_file = False
try:
fsize = path.getsize(fname)
except Exception as e:
print("file not found? {}".format(e))
try:
fsize = path.getsize(fname + '.db')
db_file = True
except Exception as e:
print("file not found? {}".format(e))
fsize = None
print("Stored {} in {} filesize {}".format(limit, end - now, fsize))
print(fname)
bucket.close()
bucket = shelve.DbfilenameShelf(fname, flag="r")
if db_file:
fname += '.db'
print("In file {} {}".format(fname, len(list(bucket.items()

Output of running it in docker image:

Dockerfile:
FROM python:2-jessie
VOLUME /scripts
CMD scripts/test_anydbm.py

2.7.16 (default, Jul 10 2019, 03:39:20) 
[GCC 4.9.2]
Stored 1000 in 0.0814290046692 filesize 12288
shelf_test_2020-07-08T07:26:23.778769
In file shelf_test_2020-07-08T07:26:23.778769 1


So you can see file size: 12 288

And now running same thing in Python 3

Dockerfile:

FROM python:3.8-slim-buster
VOLUME /scripts
CMD scripts/test_anydbm.py

3.8.3 (default, Jun  9 2020, 17:49:41) 
[GCC 8.3.0]
Stored 1000 in 0.02681446075439453 filesize 2027520
shelf_test_2020-07-08T07:27:18.068638
In file shelf_test_2020-07-08T07:27:18.068638 1

Notice file size: 2 027 520

Why is this happening? Is this a bug? If I'd like to fix it, do you have some 
ideas about causes of this?

--
components: Library (Lib)
files: test_anydbm.py
messages: 373284
nosy: Paweł Miech
priority: normal
severity: normal
status: open
title: Python 3 shelve.DbfilenameShelf is generating 164 times larger files 
than Python 2.7 when storing dicts
type: resource usage
versions: Python 3.8
Added file: https://bugs.python.org/file49304/test_anydbm.py

___
Python tracker 
<https://bugs.python.org/issue41238>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue41238] Python 3 shelve.DbfilenameShelf is generating 164 times larger files than Python 2.7 when storing dicts

2020-07-08 Thread Paweł Miech

Paweł Miech  added the comment:

Ok so I see this is an issue that involves the way Pickle pickles Python set 
objects. Updated script to reproduce appended. Apparently, sets are becoming 
much larger when stored in Python3 pickle.

--
Added file: https://bugs.python.org/file49308/test_anydbm.py

___
Python tracker 
<https://bugs.python.org/issue41238>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com