Re: Out of memory while reading excel file
I wrote this: a = np.zeros((p.max_row, p.max_column), dtype=object) for y, row in enumerate(p.rows): for cell in row: print (cell.value) a[y] = cell.value print (a[y]) For one of the cells, I see NM_198576.3 ['NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'] These are 50 NM_198576.3 in a[y] and 50 is the number of columns in my excel file (p.max_column) The excel file looks like CHR1 11,202,100 NM_198576.3 PASS 3.08932G|B|C -. . . Note that in each row, some cells are '-' or '.' only. I want to read all cells as string. Then I will write the matrix in a file and my main code (java) will process that. I chose openpyxl for reading excel files, because Apache POI (a java package for manipulating excel files) consumes huge memory even for medium files. So my python script only transforms an xlsx file to a txt file keeping the cell positions and formats. Any suggestion? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Mahmood Naderan via Python-list wrote: > I wrote this: > > a = np.zeros((p.max_row, p.max_column), dtype=object) > for y, row in enumerate(p.rows): > for cell in row: > print (cell.value) > a[y] = cell.value In the line above you overwrite the row in the numpy array with the cell value. In combination with numpy's "broadcasting" you end up with all values in a row set to the rightmost cell in the spreadsheet row, just like in >>> import numpy >>> a = numpy.array([[0, 0, 0]]) >>> a array([[0, 0, 0]]) >>> for x in 1, 2, 3: ... a[0] = x ... >>> a array([[3, 3, 3]]) The correct code: for y, row in enumerate(ws.rows): a[y] = [cell.value for cell in row] I think I posted it before ;) > print (a[y]) > > > For one of the cells, I see > > NM_198576.3 > ['NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' > 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'] > > > These are 50 NM_198576.3 in a[y] and 50 is the number of columns in my > excel file (p.max_column) > > > > The excel file looks like > > CHR1 11,202,100 NM_198576.3 PASS 3.08932G|B|C - > . . . > > > > Note that in each row, some cells are '-' or '.' only. I want to read all > cells as string. Then I will write the matrix in a file and my main code > (java) will process that. I chose openpyxl for reading excel files, > because Apache POI (a java package for manipulating excel files) consumes > huge memory even for medium files. > > So my python script only transforms an xlsx file to a txt file keeping the > cell positions and formats. What kind of text file? > Any suggestion? In that case there's no need to load the data into memory. For example, to convert xlsx to csv: #!/usr/bin/env python3 from openpyxl import load_workbook import csv source = "beta.xlsx" dest = "gamma.csv" sheet = 'alpha' wb = load_workbook(filename=source, read_only=True) ws = wb[sheet] with open(dest, "w") as outstream: csv.writer(outstream).writerows( [cell.value for cell in row] for row in ws.rows ) -- https://mail.python.org/mailman/listinfo/python-list
Re: Why am I getting a 'sqlite3.OperationalError'?
The ? is indeed for variable substitution, but AFAIK only for field values, not for table names, which is why your first example doesn't work and your second and third examples do work. -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Thanks. That code is so simple and works. However, there are things to be considered. With the CSV format, cells in a row are separated by ',' and for some cells it writes "" around the cell content. So, if the excel looks like CHR1 11,232,445 The output file looks like CHR1,"11,232,445" Is it possible to use as the delimiting character and omit ""? I say that because, my java code which has to read the output file has to do some extra works (using space as delimiter is the default and much easier to work). I want a[0][0] = CHR a[0][1] = 11,232,445 And both are strings. Is that possible? Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Excuse me, I changed
csv.writer(outstream)
to
csv.writer(outstream, delimiter =' ')
It puts space between cells and omits "" around some content. However, between
two lines there is a new empty line. In other word, the first line is the first
row of excel file. The second line is empty ("\n") and the third line is the
second row of the excel file.
Any thought?
Regards,
Mahmood
--
https://mail.python.org/mailman/listinfo/python-list
Re: Repeatedly crawl website every 1 min
Unless you are authorized, don't do it. It literally costs a lot of money to the website you are crawling, in CPU and bandwidth. Hundreds of concurrent requests can even kill a small server (with bad configuration). Look scrapy package, it is great for scraping, but be friendly with the websites you are crawling. Em 10 de mai de 2017 23:22, escreveu: > Hi Everyone, > > Thanks for stoping by. I am working on a feature to crawl website content > every 1 min. I am curious to know if there any good open source project for > this specific scenario. > > Specifically, I have many urls, and I want to maintain a thread pool so > that each thread will repeatedly crawl content from the given url. It could > be a hundreds thread at the same time. > > Your help is greatly appreciated. > > ;) > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Mahmood Naderan via Python-list wrote:
> Excuse me, I changed
>
> csv.writer(outstream)
>
> to
>
> csv.writer(outstream, delimiter =' ')
>
>
> It puts space between cells and omits "" around some content.
If your data doesn't contain any spaces that's fine. Otherwise you need a
way to distinguish between space as a delimiter and space inside a field, e.
g. by escaping it:
>>> w = csv.writer(sys.stdout, delimiter=" ", quoting=csv.QUOTE_NONE,
escapechar="\\")
>>> w.writerow(["a", "b c"])
a b\ c
8
> However,
> between two lines there is a new empty line. In other word, the first line
> is the first row of excel file. The second line is empty ("\n") and the
> third line is the second row of the excel file.
>
> Any thought?
In text mode Windows translates "\n" to b"\r\n" in the file. Python allows
you to override that:
>>> help(open)
Help on built-in function open in module io:
open(...)
open(file, mode='r', buffering=-1, encoding=None,
errors=None, newline=None, closefd=True, opener=None) -> file
object
newline controls how universal newlines works (it only applies to text
mode). It can be None, '', '\n', '\r', and '\r\n'. It works as
follows:
* On output, if newline is None, any '\n' characters written are
translated to the system default line separator, os.linesep. If
newline is '' or '\n', no translation takes place. If newline is any
of the other legal values, any '\n' characters written are translated
to the given string.
So you need to specify newlines:
with open(dest, "w", newline="") as outstream:
...
--
https://mail.python.org/mailman/listinfo/python-list
Re: Repeatedly crawl website every 1 min
On Thu, 11 May 2017 12:18 pm, [email protected] wrote: > Hi Everyone, > > Thanks for stoping by. I am working on a feature to crawl website content > every 1 min. I am curious to know if there any good open source project > for this specific scenario. I agree with Iuri: crawling a website every minute is abuse. Unless it is your own website, once a month is more appropriate -- and even then, you should be very careful to restrict the rate at which you make requests. -- Steve Emoji: a small, fuzzy, indistinct picture used to replace a clear and perfectly comprehensible word. -- https://mail.python.org/mailman/listinfo/python-list
Re: Ten awesome things you are missing out on if you're still using Python 2
On 09/05/17 03:01, Rustom Mody wrote: On Monday, May 8, 2017 at 12:48:03 PM UTC+5:30, Steven D'Aprano wrote: http://www.asmeurer.com/python3-presentation/slides.html#1 Nice list thanks! Do you have a similar list of 10 awesome features of Python that you can't use because you refuse to upgrade from Java/C++ ? Why the upgrade? I use the three languages every day. Each of them have their own unique strength, just use the right tool for the right job. [Context: Ive to take a couple of classes for senior such developers and wondering what features would give them) the most value] -- Cholo Lennon Bs.As. ARG -- https://mail.python.org/mailman/listinfo/python-list
Re: Out of memory while reading excel file
Thanks a lot for suggestions. It is now solved. Regards, Mahmood -- https://mail.python.org/mailman/listinfo/python-list
Embedded Python import fails with zip/egg files (v3.6.1)
Hello, I am having trouble importing python modules on certain machines. On some machines import works, on some not (all machines are Win7 64bit). Python is not installed on any of these machines but used embedded. I tried to analyze the problem but did not succeed so here is what I found. First I will use the module xlsxwriter to explain the problem but it also happens with python36.zip (when importing for example codecs). I have a xlsxwriter.egg file which is found by the import mechanism but it cannot be opened. /Traceback (most recent call last):/ // /File "Z:\Documents\///myscript/.py", line 1, in / /import glob, inspect, os, json, base64, xlsxwriter, datetime, string/ /ModuleNotFoundError: No module named 'xlsxwriter'/ When I unzip the egg and create two folders, for code and egg-info, it works, the module is imported. Again, the very same egg file works fine on other machines. Tested on win7 with or without python installed, and freshly setup win7 systems with nothing else installed. I have the same problem with python36.zip that comes with the embedded package. When starting python.exe (from https://www.python.org/ftp/python/3.6.1/python-3.6.1-embed-win32.zip) the codecs module cannot be imported and python.exe crashes. All paths are correctly set. When I unzip the python36.zip into the python.exe folder everything works fine. What I found interesting is that the disk monitor tool (Procmon.exe) shows following detail: 07:59:04,3187854python.exe4224CreateFile C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS Desired Access: Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 07:59:04,3198189python.exe4224CloseFile C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS 07:59:04,3205458python.exe4224CreateFile C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS Desired Access: Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Open Reparse Point, Attributes: N, ShareMode: None, AllocationSize: n/a, OpenResult: Opened 07:59:04,3205860python.exe4224 QueryInformationVolume C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS VolumeCreationTime: 05.05.2015 12:28:45, VolumeSerialNumber: 36B5-A026, SupportsObjects: True, VolumeLabel: OS 07:59:04,3206127python.exe4224 QueryAllInformationFile C:\Users\hansi\Downloads\python-emb\python36.zipBUFFER OVERFLOW CreationTime: 18.04.2017 06:07:23, LastAccessTime: 18.04.2017 06:07:23, LastWriteTime: 21.03.2017 09:06:10, ChangeTime: 18.04.2017 06:07:23, FileAttributes: N, AllocationSize: 2.228.224, EndOfFile: 2.224.303, NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 0x2000a9467, EaSize: 0, Access: Read Attributes, Synchronize, Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Word The interesting line is the one with QueryAllInformationFile and BUFFER OVERFLOW. On machines where it works the buffer overflow does not happen and the query is done with QueryBasicInformationFile and not QueryInformationVolume. Since QueryInformationVolume is most likely only for folders, maybe there is a problem with that. Here is the log when it's working: 06:30:39,6650716python.exe30176CreateFile C:\Projects\Python\rt_win32\python36.zipSUCCESSDesired Access: Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, AllocationSize: n/a, OpenResult: Opened 06:30:39,6652657python.exe30176 QueryBasicInformationFile C:\Projects\Python\rt_win32\python36.zipSUCCESS CreationTime: 15.02.2017 13:34:03, LastAccessTime: 15.02.2017 13:34:03, LastWriteTime: 22.12.2016 23:30:40, ChangeTime: 18.04.2017 06:19:36, FileAttributes: A 06:30:39,6673617python.exe30176 QueryStandardInformationFile C:\Projects\Python\rt_win32\python36.zipSUCCESS AllocationSize: 2.240.512, EndOfFile: 2.237.601, NumberOfLinks: 1, DeletePending: False, Directory: False Any help is appreciated! Thanks, Herb -- https://mail.python.org/mailman/listinfo/python-list
Re: import docx error
[Please keep this on the list so that others can benefit (and so that I can deal with it via my NNTP client). Further replies will only happen on-list.] On Wed, May 10, 2017 at 05:14:22PM -0700, somebody wrote: > I need to go back before John, I guess. Sorry, I have no idea what that means. > I have downloaded Anaconda to Cinnamon Mint 18.1 64 bit where Python > 3.6 exists. > > It will not start up. The anaconda that I know about is the RedHat installer program (which was originally written in Python, BTW), but I'm guessing that's not what you're asking about. > My naive question is: When I go to pypi for example, am I to download > packages into python or Mint? I don't understand the question: python is a language, Mint is a Linux OS distro. If you can't use your distro's package manager to install the package you're looking for (see below), then here is how you install a package from pypi: https://packaging.python.org/installing/#installing-from-pypi > It seems that I have skipped a step where one creates a folder for > these files. I don't know why you would have to create a folder. If you're running Mint Linux, then your first step is to look to see if the Mint repositories contain the package you want. http://packages.linuxmint.com/ http://packages.linuxmint.com/list.php?release=Serena If Linux Mint doesn't provide the package you want, the the above link shows how to install packages from from pypi. If what you want isn't on pypi, then Google is your friend: https://www.google.com/search?q=linux+mint+anaconda https://docs.continuum.io/anaconda/install-linux https://www.youtube.com/watch?v=siov5S0Qzdc Is that the anaconda you're talking about? Or is it one of these? https://pypi.python.org/pypi?%3Aaction=search&term=anaconda&submit=search -- Grant Edwards grant.b.edwardsYow! If Robert Di Niro at assassinates Walter Slezak, gmail.comwill Jodie Foster marry Bonzo?? -- https://mail.python.org/mailman/listinfo/python-list
Re: Embedded Python import fails with zip/egg files (v3.6.1)
On Thu, May 11, 2017 at 9:02 PM, Griebel, Herbert wrote: > > 07:59:04,3205458python.exe4224CreateFile > C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS Desired Access: > Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO > Non-Alert, Open Reparse Point, Attributes: N, ShareMode: None, > AllocationSize: n/a, OpenResult: Opened > > 07:59:04,3205860python.exe4224 QueryInformationVolume > C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS > VolumeCreationTime: 05.05.2015 12:28:45, VolumeSerialNumber: 36B5-A026, > SupportsObjects: True, VolumeLabel: OS > > 07:59:04,3206127python.exe4224 QueryAllInformationFile > C:\Users\hansi\Downloads\python-emb\python36.zipBUFFER OVERFLOW > CreationTime: 18.04.2017 06:07:23, LastAccessTime: 18.04.2017 06:07:23, > LastWriteTime: 21.03.2017 09:06:10, ChangeTime: 18.04.2017 06:07:23, > FileAttributes: N, AllocationSize: 2.228.224, EndOfFile: 2.224.303, > NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: > 0x2000a9467, EaSize: 0, Access: Read Attributes, Synchronize, Position: > 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Word This looks like a regular Python stat call on Windows. It opens a handle without following links (i.e. reparse points) and calls GetFileInformationByHandle. That in turn gets the volume serial number from the volume information. Then it gets the file information, which includes the filename. But the FILE_ALL_INFORMATION buffer only has space for a single character of the name. That's the reason for the buffer overflow (0x8005). It's an NTSTATUS warning, not an error, and it doesn't fail the GetFileInformationByHandle call. -- https://mail.python.org/mailman/listinfo/python-list
The future is bright for Python
https://medium.com/@trstringer/the-future-is-looking-bright-for-python-95a748a4ef3e -- Steve Emoji: a small, fuzzy, indistinct picture used to replace a clear and perfectly comprehensible word. -- https://mail.python.org/mailman/listinfo/python-list
Re: import docx error
On 5/10/17, Grant Edwards wrote:
> On 2017-05-10, RRS1 via Python-list wrote:
>
>> I am very new to Python, have only done simple things >>>print("hello
>> world") type things. I've really been looking forward to using Python. I
>> bought Two books, downloaded Python 3.6.1 (32 & 64) and each time I try
>> this:
>>
>>
> import docx
>>
>> I get errors.
>>
>> Traceback (most recent call last):
>> File "", line 1 in
>> ModuleNotFoundError: No module named docx
>
> You need to install the docx module:
>
> https://pypi.python.org/pypi/docx
> https://pypi.python.org/pypi
I am afraid https://pypi.python.org/pypi/python-docx could be what he needs.
Using anaconda it could be better to do:
conda install python-docx # but this doesnt work
or
conda install docx # but this doesnt work too
Anaconda has channels. For example cjs14 channel includes docx.
But unfortunately it is only for python2 :(
conda install -c cjs14 python-docx
UnsatisfiableError: The following specifications were found to be in conflict:
- python 3.6*
- python-docx -> python 2.7* -> openssl 1.0.1*
PL.
--
https://mail.python.org/mailman/listinfo/python-list
Re: import docx error
On 5/11/17, Grant Edwards wrote: > On Wed, May 10, 2017 at 05:14:22PM -0700, somebody wrote: >> I have downloaded Anaconda to Cinnamon Mint 18.1 64 bit where Python >> 3.6 exists. >> >> It will not start up. > > The anaconda that I know about is the RedHat installer program (which > was originally written in Python, BTW), but I'm guessing that's not > what you're asking about. > >> My naive question is: When I go to pypi for example, am I to download >> packages into python or Mint? > > I don't understand the question: python is a language, Mint is a Linux > OS distro. If you can't use your distro's package manager to install > the package you're looking for (see below), then here is how you > install a package from pypi: > > https://packaging.python.org/installing/#installing-from-pypi Under linux you could use anaconda python and distro's python side by side. If you use default installation process you could get anaconda probably in $HOME/anaconda directory. If you don't change .bashrc then you could start anaconda's virtual environment by: source $HOME/anaconda/bin/activate ~/anaconda/ After this command pip will install packages into anaconda's python environment. (without this command it install into distro's python environment) So answer to "somebody's" question is probably that it depends on. Under ubuntu 16.04 which could be similar to Mint I got this: python -V # by default my python is distro's python Python 2.7.12 source $HOME/anaconda3/bin/activate anaconda3 (anaconda3) xyz:~$ python -V # now I could use nice features like f-strings from python 3.6 :) Python 3.6.1 :: Anaconda custom (64-bit) PL. -- https://mail.python.org/mailman/listinfo/python-list
Re: The future is bright for Python
Steve D'Aprano wrote: https://medium.com/@trstringer/the-future-is-looking-bright-for-python-95a748a4ef3e I hope it doesn't mean that Python users are getting more and more confused! -- Greg -- https://mail.python.org/mailman/listinfo/python-list
