Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
I wrote this:

a = np.zeros((p.max_row, p.max_column), dtype=object)
for y, row in enumerate(p.rows):
  for cell in row:
print (cell.value)
a[y] = cell.value 
 print (a[y])


For one of the cells, I see

NM_198576.3
['NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3']

 
These are 50 NM_198576.3 in a[y] and 50 is the number of columns in my excel 
file (p.max_column)



The excel file looks like

CHR1 11,202,100 NM_198576.3 PASS 3.08932G|B|C -.   
.   .



Note that in each row, some cells are '-' or '.' only. I want to read all cells 
as string. Then I will write the matrix in a file and my main code (java) will 
process that. I chose openpyxl for reading excel files, because Apache POI (a 
java package for manipulating excel files) consumes huge memory even for medium 
files.

So my python script only transforms an xlsx file to a txt file keeping the cell 
positions and formats.

Any suggestion?

Regards,
Mahmood
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Out of memory while reading excel file

2017-05-11 Thread Peter Otten
Mahmood Naderan via Python-list wrote:

> I wrote this:
> 
> a = np.zeros((p.max_row, p.max_column), dtype=object)
> for y, row in enumerate(p.rows):
>   for cell in row:
> print (cell.value)
> a[y] = cell.value

In the line above you overwrite the row in the numpy array with the cell 
value. In combination with numpy's "broadcasting" you end up with all values 
in a row set to the rightmost cell in the spreadsheet row, just like in 

>>> import numpy
>>> a = numpy.array([[0, 0, 0]])
>>> a
array([[0, 0, 0]])
>>> for x in 1, 2, 3:
... a[0] = x
... 
>>> a
array([[3, 3, 3]])


The correct code:

for y, row in enumerate(ws.rows):
a[y] = [cell.value for cell in row]

I think I posted it before ;)

>  print (a[y])
> 
> 
> For one of the cells, I see
> 
> NM_198576.3
> ['NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3'
> 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3' 'NM_198576.3']
> 
>  
> These are 50 NM_198576.3 in a[y] and 50 is the number of columns in my
> excel file (p.max_column)
> 
> 
> 
> The excel file looks like
> 
> CHR1 11,202,100 NM_198576.3 PASS 3.08932G|B|C -   
> .   .   .
> 
> 
> 
> Note that in each row, some cells are '-' or '.' only. I want to read all
> cells as string. Then I will write the matrix in a file and my main code
> (java) will process that. I chose openpyxl for reading excel files,
> because Apache POI (a java package for manipulating excel files) consumes
> huge memory even for medium files.
> 
> So my python script only transforms an xlsx file to a txt file keeping the
> cell positions and formats.

What kind of text file?

> Any suggestion?

In that case there's no need to load the data into memory. For example, to 
convert xlsx to csv:

#!/usr/bin/env python3
from openpyxl import load_workbook
import csv

source = "beta.xlsx"
dest = "gamma.csv"
sheet = 'alpha'

wb = load_workbook(filename=source, read_only=True)
ws = wb[sheet]

with open(dest, "w") as outstream:
csv.writer(outstream).writerows(
[cell.value for cell in row]
for row in ws.rows
)


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Why am I getting a 'sqlite3.OperationalError'?

2017-05-11 Thread Mark Summerfield via Python-list
The ? is indeed for variable substitution, but AFAIK only for field values, not 
for table names, which is why your first example doesn't work and your second 
and third examples do work.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
Thanks. That code is so simple and works. However, there are things to be 
considered. With the CSV format, cells in a row are separated by ',' and for 
some cells it writes "" around the cell content.

So, if the excel looks like 


CHR1  11,232,445


The output file looks like

CHR1,"11,232,445"


Is it possible to use  as the delimiting character and omit ""? I say 
that because, my java code which has to read the output file has to do some 
extra works (using space as delimiter is the default and much easier to work). 
I want

a[0][0] = CHR
a[0][1] = 11,232,445

And both are strings. Is that possible?
 
Regards,
Mahmood
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
Excuse me, I changed 

csv.writer(outstream)

to 

csv.writer(outstream, delimiter =' ')


It puts space between cells and omits "" around some content. However, between 
two lines there is a new empty line. In other word, the first line is the first 
row of excel file. The second line is empty ("\n") and the third line is the 
second row of the excel file.

Any thought?
 
Regards,
Mahmood
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Repeatedly crawl website every 1 min

2017-05-11 Thread Iuri
Unless you are authorized, don't do it. It literally costs a lot of money
to the website you are crawling, in CPU and bandwidth.

Hundreds of concurrent requests can even kill a small server (with bad
configuration).

Look scrapy package, it is great for scraping, but be friendly with the
websites you are crawling.

Em 10 de mai de 2017 23:22,  escreveu:

> Hi Everyone,
>
> Thanks for stoping by. I am working on a feature to crawl website content
> every 1 min. I am curious to know if there any good open source project for
> this specific scenario.
>
> Specifically, I have many urls, and I want to maintain a thread pool so
> that each thread will repeatedly crawl content from the given url. It could
> be a hundreds thread at the same time.
>
> Your help is greatly appreciated.
>
> ;)
> --
> https://mail.python.org/mailman/listinfo/python-list
>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Out of memory while reading excel file

2017-05-11 Thread Peter Otten
Mahmood Naderan via Python-list wrote:

> Excuse me, I changed
> 
> csv.writer(outstream)
> 
> to
> 
> csv.writer(outstream, delimiter =' ')
> 
> 
> It puts space between cells and omits "" around some content. 

If your data doesn't contain any spaces that's fine. Otherwise you need a 
way to distinguish between space as a delimiter and space inside a field, e. 
g. by escaping it:

>>> w = csv.writer(sys.stdout, delimiter=" ", quoting=csv.QUOTE_NONE, 
escapechar="\\")
>>> w.writerow(["a", "b c"])
a b\ c
8

> However,
> between two lines there is a new empty line. In other word, the first line
> is the first row of excel file. The second line is empty ("\n") and the
> third line is the second row of the excel file.
> 
> Any thought?

In text mode Windows translates "\n" to b"\r\n" in the file. Python allows 
you to override that:

>>> help(open)
Help on built-in function open in module io:

open(...)
open(file, mode='r', buffering=-1, encoding=None,
 errors=None, newline=None, closefd=True, opener=None) -> file 
object



newline controls how universal newlines works (it only applies to text
mode). It can be None, '', '\n', '\r', and '\r\n'.  It works as
follows:



* On output, if newline is None, any '\n' characters written are
  translated to the system default line separator, os.linesep. If
  newline is '' or '\n', no translation takes place. If newline is any
  of the other legal values, any '\n' characters written are translated
  to the given string.

So you need to specify newlines:

with open(dest, "w", newline="") as outstream:
...


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Repeatedly crawl website every 1 min

2017-05-11 Thread Steve D'Aprano
On Thu, 11 May 2017 12:18 pm, [email protected] wrote:

> Hi Everyone,
> 
> Thanks for stoping by. I am working on a feature to crawl website content
> every 1 min. I am curious to know if there any good open source project
> for this specific scenario.

I agree with Iuri: crawling a website every minute is abuse. Unless it is
your own website, once a month is more appropriate -- and even then, you
should be very careful to restrict the rate at which you make requests.



-- 
Steve
Emoji: a small, fuzzy, indistinct picture used to replace a clear and
perfectly comprehensible word.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Ten awesome things you are missing out on if you're still using Python 2

2017-05-11 Thread Cholo Lennon

On 09/05/17 03:01, Rustom Mody wrote:

On Monday, May 8, 2017 at 12:48:03 PM UTC+5:30, Steven D'Aprano wrote:

http://www.asmeurer.com/python3-presentation/slides.html#1


Nice list thanks!

Do you have a similar list of
10 awesome features of Python that you can't use because you refuse to upgrade
from Java/C++  ?



Why the upgrade? I use the three languages every day. Each of them have 
their own unique strength, just use the right tool for the right job.



[Context: Ive to take a couple of classes for senior such developers and
wondering what features would give them) the most value]




--
Cholo Lennon
Bs.As.
ARG
--
https://mail.python.org/mailman/listinfo/python-list


Re: Out of memory while reading excel file

2017-05-11 Thread Mahmood Naderan via Python-list
Thanks a lot for suggestions. It is now solved.

 
Regards,
Mahmood
-- 
https://mail.python.org/mailman/listinfo/python-list


Embedded Python import fails with zip/egg files (v3.6.1)

2017-05-11 Thread Griebel, Herbert

Hello,

I am having trouble importing python modules on certain machines. On 
some machines import works, on some not (all machines are Win7 64bit).
Python is not installed on any of these machines but used embedded. I 
tried to analyze the problem but did not succeed so here is what I found.


First I will use the module xlsxwriter to explain the problem but it 
also happens with python36.zip (when importing for example codecs).


I have a xlsxwriter.egg file which is found by the import mechanism but 
it cannot be opened.


/Traceback (most recent call last):/

//

/File "Z:\Documents\///myscript/.py", line 1, in /

/import glob, inspect, os, json, base64, xlsxwriter, datetime, string/

/ModuleNotFoundError: No module named 'xlsxwriter'/

When I unzip the egg and create two folders, for code and egg-info, it 
works, the module is imported.
Again, the very same egg file works fine on other machines. Tested on 
win7 with or without python installed, and freshly setup win7 systems 
with nothing else installed.


I have the same problem with python36.zip that comes with the embedded 
package.


When starting python.exe (from 
https://www.python.org/ftp/python/3.6.1/python-3.6.1-embed-win32.zip) 
the codecs module cannot be imported and
python.exe crashes. All paths are correctly set. When I unzip the 
python36.zip into the python.exe folder everything works fine.


What I found interesting is that the disk monitor tool (Procmon.exe) 
shows following detail:


07:59:04,3187854python.exe4224CreateFile 
C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS Desired 
Access: Read Attributes, Synchronize, Disposition: Open, Options: 
Synchronous IO Non-Alert, Attributes: n/a, ShareMode: Read, Write, 
Delete, AllocationSize: n/a, OpenResult: Opened
07:59:04,3198189python.exe4224CloseFile 
C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS
07:59:04,3205458python.exe4224CreateFile 
C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS Desired 
Access: Read Attributes, Synchronize, Disposition: Open, Options: 
Synchronous IO Non-Alert, Open Reparse Point, Attributes: N, ShareMode: 
None, AllocationSize: n/a, OpenResult: Opened
07:59:04,3205860python.exe4224 QueryInformationVolume 
C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS 
VolumeCreationTime: 05.05.2015 12:28:45, VolumeSerialNumber: 36B5-A026, 
SupportsObjects: True, VolumeLabel: OS
07:59:04,3206127python.exe4224 QueryAllInformationFile 
C:\Users\hansi\Downloads\python-emb\python36.zipBUFFER OVERFLOW
CreationTime: 18.04.2017 06:07:23, LastAccessTime: 18.04.2017 06:07:23, 
LastWriteTime: 21.03.2017 09:06:10, ChangeTime: 18.04.2017 06:07:23, 
FileAttributes: N, AllocationSize: 2.228.224, EndOfFile: 2.224.303, 
NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber: 
0x2000a9467, EaSize: 0, Access: Read Attributes, Synchronize, 
Position: 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Word


The interesting line is the one with QueryAllInformationFile and BUFFER 
OVERFLOW. On machines where it works the buffer overflow does not happen 
and the query is done with QueryBasicInformationFile and not 
QueryInformationVolume.
Since QueryInformationVolume is most likely only for folders, maybe 
there is a problem with that.

Here is the log when it's working:

06:30:39,6650716python.exe30176CreateFile 
C:\Projects\Python\rt_win32\python36.zipSUCCESSDesired Access: 
Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO 
Non-Alert, Attributes: n/a, ShareMode: Read, Write, Delete, 
AllocationSize: n/a, OpenResult: Opened
06:30:39,6652657python.exe30176 QueryBasicInformationFile 
C:\Projects\Python\rt_win32\python36.zipSUCCESS CreationTime: 
15.02.2017 13:34:03, LastAccessTime: 15.02.2017 13:34:03, LastWriteTime: 
22.12.2016 23:30:40, ChangeTime: 18.04.2017 06:19:36, FileAttributes: A
06:30:39,6673617python.exe30176 QueryStandardInformationFile 
C:\Projects\Python\rt_win32\python36.zipSUCCESS AllocationSize: 
2.240.512, EndOfFile: 2.237.601, NumberOfLinks: 1, DeletePending: False, 
Directory: False


Any help is appreciated!

Thanks,
Herb


--
https://mail.python.org/mailman/listinfo/python-list


Re: import docx error

2017-05-11 Thread Grant Edwards
[Please keep this on the list so that others can benefit (and so that I
can deal with it via my NNTP client).  Further replies will only
happen on-list.]

On Wed, May 10, 2017 at 05:14:22PM -0700, somebody wrote:

> I need to go back before John, I guess.

Sorry, I have no idea what that means.

> I have downloaded Anaconda to Cinnamon Mint 18.1 64 bit where Python
> 3.6 exists.
> 
> It will not start up.

The anaconda that I know about is the RedHat installer program (which
was originally written in Python, BTW), but I'm guessing that's not
what you're asking about.

> My naive question is: When I go to pypi for example, am I to download
> packages into python or Mint?

I don't understand the question: python is a language, Mint is a Linux
OS distro.  If you can't use your distro's package manager to install
the package you're looking for (see below), then here is how you
install a package from pypi:

  https://packaging.python.org/installing/#installing-from-pypi

> It seems that I have skipped a step where one creates a folder for
> these files.

I don't know why you would have to create a folder.

If you're running Mint Linux, then your first step is to look to see
if the Mint repositories contain the package you want.

 http://packages.linuxmint.com/
 http://packages.linuxmint.com/list.php?release=Serena

If Linux Mint doesn't provide the package you want, the the above link
shows how to install packages from from pypi.

If what you want isn't on pypi, then Google is your friend:

https://www.google.com/search?q=linux+mint+anaconda

  https://docs.continuum.io/anaconda/install-linux
  https://www.youtube.com/watch?v=siov5S0Qzdc

Is that the anaconda you're talking about?

Or is it one of these?

 https://pypi.python.org/pypi?%3Aaction=search&term=anaconda&submit=search

-- 
Grant Edwards   grant.b.edwardsYow! If Robert Di Niro
  at   assassinates Walter Slezak,
  gmail.comwill Jodie Foster marry
   Bonzo??

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Embedded Python import fails with zip/egg files (v3.6.1)

2017-05-11 Thread eryk sun
On Thu, May 11, 2017 at 9:02 PM, Griebel, Herbert  wrote:
>
> 07:59:04,3205458python.exe4224CreateFile
> C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS Desired Access:
> Read Attributes, Synchronize, Disposition: Open, Options: Synchronous IO
> Non-Alert, Open Reparse Point, Attributes: N, ShareMode: None,
> AllocationSize: n/a, OpenResult: Opened
>
> 07:59:04,3205860python.exe4224 QueryInformationVolume
> C:\Users\hansi\Downloads\python-emb\python36.zipSUCCESS
> VolumeCreationTime: 05.05.2015 12:28:45, VolumeSerialNumber: 36B5-A026,
> SupportsObjects: True, VolumeLabel: OS
>
> 07:59:04,3206127python.exe4224 QueryAllInformationFile
> C:\Users\hansi\Downloads\python-emb\python36.zipBUFFER OVERFLOW
> CreationTime: 18.04.2017 06:07:23, LastAccessTime: 18.04.2017 06:07:23,
> LastWriteTime: 21.03.2017 09:06:10, ChangeTime: 18.04.2017 06:07:23,
> FileAttributes: N, AllocationSize: 2.228.224, EndOfFile: 2.224.303,
> NumberOfLinks: 1, DeletePending: False, Directory: False, IndexNumber:
> 0x2000a9467, EaSize: 0, Access: Read Attributes, Synchronize, Position:
> 0, Mode: Synchronous IO Non-Alert, AlignmentRequirement: Word

This looks like a regular Python stat call on Windows. It opens a
handle without following links (i.e. reparse points) and calls
GetFileInformationByHandle. That in turn gets the volume serial number
from the volume information. Then it gets the file information, which
includes the filename. But the FILE_ALL_INFORMATION buffer only has
space for a single character of the name. That's the reason for the
buffer overflow (0x8005). It's an NTSTATUS warning, not an error,
and it doesn't fail the GetFileInformationByHandle call.
-- 
https://mail.python.org/mailman/listinfo/python-list


The future is bright for Python

2017-05-11 Thread Steve D'Aprano
https://medium.com/@trstringer/the-future-is-looking-bright-for-python-95a748a4ef3e




-- 
Steve
Emoji: a small, fuzzy, indistinct picture used to replace a clear and
perfectly comprehensible word.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: import docx error

2017-05-11 Thread Pavol Lisy
On 5/10/17, Grant Edwards  wrote:
> On 2017-05-10, RRS1 via Python-list  wrote:
>
>> I am very new to Python, have only done simple things >>>print("hello
>> world") type things.  I've really been looking forward to using Python.  I
>> bought Two books, downloaded Python 3.6.1 (32 & 64) and each time I try
>> this:
>>
>>
> import docx
>>
>> I get errors.
>>
>> Traceback (most recent call last):
>> File "", line 1 in 
>> ModuleNotFoundError: No module named docx
>
> You need to install the docx module:
>
>  https://pypi.python.org/pypi/docx
>  https://pypi.python.org/pypi

I am afraid https://pypi.python.org/pypi/python-docx could be what he needs.

Using anaconda it could be better to do:

conda install python-docx  # but this doesnt work
or
conda install docx  # but this doesnt work too

Anaconda has channels. For example cjs14 channel includes docx.

But unfortunately it is only for python2 :(

conda install -c cjs14 python-docx
UnsatisfiableError: The following specifications were found to be in conflict:
  - python 3.6*
  - python-docx -> python 2.7* -> openssl 1.0.1*

PL.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: import docx error

2017-05-11 Thread Pavol Lisy
On 5/11/17, Grant Edwards  wrote:

> On Wed, May 10, 2017 at 05:14:22PM -0700, somebody wrote:

>> I have downloaded Anaconda to Cinnamon Mint 18.1 64 bit where Python
>> 3.6 exists.
>>
>> It will not start up.
>
> The anaconda that I know about is the RedHat installer program (which
> was originally written in Python, BTW), but I'm guessing that's not
> what you're asking about.
>
>> My naive question is: When I go to pypi for example, am I to download
>> packages into python or Mint?
>
> I don't understand the question: python is a language, Mint is a Linux
> OS distro.  If you can't use your distro's package manager to install
> the package you're looking for (see below), then here is how you
> install a package from pypi:
>
>   https://packaging.python.org/installing/#installing-from-pypi

Under linux you could use anaconda python and distro's python side by side.

If you use default installation process you could get anaconda
probably in $HOME/anaconda  directory.

If you don't change .bashrc then you could start anaconda's virtual
environment by:

source $HOME/anaconda/bin/activate ~/anaconda/

After this command pip will install packages into anaconda's python
environment. (without this command it install into distro's python
environment)

So answer to "somebody's" question is probably that it depends on.

Under ubuntu 16.04 which could be similar to Mint I got this:

python -V  # by default my python is distro's python
Python 2.7.12

source $HOME/anaconda3/bin/activate anaconda3

(anaconda3) xyz:~$ python -V  # now I could use nice features like
f-strings from python 3.6 :)
Python 3.6.1 :: Anaconda custom (64-bit)

PL.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: The future is bright for Python

2017-05-11 Thread Gregory Ewing

Steve D'Aprano wrote:

https://medium.com/@trstringer/the-future-is-looking-bright-for-python-95a748a4ef3e


I hope it doesn't mean that Python users are getting
more and more confused!

--
Greg


--
https://mail.python.org/mailman/listinfo/python-list