On Mon, Nov 4, 2013 at 11:26 AM, Amal Thomas wrote:
> @Dave: thanks.. By the way I am running my codes on a server with about
> 100GB ram but I cant afford my code to use 4-5 times the size of the text
> file. Now I am using read() / readlines() , these seems to be more
> efficient in memory usag
On Nov 5, 2013, at 11:12 AM, Alan Gauld wrote:
> On 05/11/13 02:02, Danny Yoo wrote:
>
>> To visualize the sheer scale of the problem, see:
>>
>> http://i.imgur.com/X1Hi1.gif
>>
>> which would normally be funny, except that it's not quite a joke. :P
>
> I think I'm missing something. All I s
On Tue, Nov 05, 2013 at 04:12:51PM +, Alan Gauld wrote:
> On 05/11/13 02:02, Danny Yoo wrote:
>
> >To visualize the sheer scale of the problem, see:
> >
> >http://i.imgur.com/X1Hi1.gif
> >
> >which would normally be funny, except that it's not quite a joke. :P
>
> I think I'm missing somethi
On 05/11/13 02:02, Danny Yoo wrote:
To visualize the sheer scale of the problem, see:
http://i.imgur.com/X1Hi1.gif
which would normally be funny, except that it's not quite a joke. :P
I think I'm missing something. All I see in Firefox is
a vertical red bar. And in Chrome I don't even get t
On 5 November 2013 13:20, Amal Thomas wrote:
> On Mon, Nov 4, 2013 at 10:00 PM, Steven D'Aprano
> wrote:
>>
>
>>
>> import os
>> filename = "YOUR FILE NAME HERE"
>> print("File size:", os.stat(filename).st_size)
>> f = open(filename)
>> content = f.read()
>> print("Length of content actually read
On 4 November 2013 17:41, Amal Thomas wrote:
> @Steven: Thank you...My input data is basically AUGC and newlines... I would
> like to know about bytearray technique. Please suggest me some links or
> reference.. I will go through the profiler and check whether the code
> maintains linearity with t
On Mon, Nov 4, 2013 at 10:00 PM, Steven D'Aprano
wrote:
>
>
> import os
> filename = "YOUR FILE NAME HERE"
> print("File size:", os.stat(filename).st_size)
> f = open(filename)
> content = f.read()
> print("Length of content actually read:", len(content))
> print("Current file position:", f.tell(
Amal Thomas, 04.11.2013 14:55:
> I have checked the execution time manually as well as I found it through my
> code. During execution of my code, at start, I stored my initial time(start
> time) to a variable and at the end calculated time taken to run the code =
> end time - start time. There was
On Mon, Nov 04, 2013 at 06:02:47PM -0800, Danny Yoo wrote:
> To visualize the sheer scale of the problem, see:
>
> http://i.imgur.com/X1Hi1.gif
>
> which would normally be funny, except that it's not quite a joke. :P
Nice visualisation! Was that yours?
> So you want to minimize hard disk
I mustly agree with Alan, but a couple of little quibbles:
On Tue, Nov 05, 2013 at 01:10:39AM +, ALAN GAULD wrote:
> >@Alan: Thanks.. I have checked the both ways( reading line by line by not
> >loading into ram ,
> > other loading entire file to ram and then reading line by line) for file
>
> You _must_ avoid swap at all costs here. You may not understand the
> point, so a little more explanation: touching swap is several orders of
> magnitude more expensive than anything else you are doing in your program.
>
> CPU operations are on the order of nanoseconds. (10^-9)
>
> Dis
>
>
> > Also as I have mentioned I cant afford to run my code using 4-5 times
> memory.
> > Total resource available in my server is about 180 GB memory (approx 64
> GB RAM + 128GB swap).
>
> OK, There is a huge difference between having 100G of RAM and having
> 64G+128G swap.
> swap is basically d
Forwarding to tutor list. Please use Reply All in responses.
From: Amal Thomas
>To: Alan Gauld
>Sent: Monday, 4 November 2013, 17:26
>Subject: Re: [Tutor] Load Entire File into memory
>
>
>
>@Alan: Thanks.. I have checked the both ways( reading line by line by n
On Mon, Nov 4, 2013 at 9:41 AM, Amal Thomas wrote:
> @Steven: Thank you...My input data is basically AUGC and newlines... I
> would like to know about bytearray technique. Please suggest me some links
> or reference.. I will go through the profiler and check whether the code
> maintains linearity
On 4/11/2013 11:26, Amal Thomas wrote:
> @Dave: thanks.. By the way I am running my codes on a server with about
> 100GB ram but I cant afford my code to use 4-5 times the size of the text
> file. Now I am using read() / readlines() , these seems to be more
> efficient in memory usage than io.Str
On Tue, 5 Nov 2013 02:53:41 +1100, Steven D'Aprano
wrote:
Dave, do you have a reference for that? As far as I can tell, read()
will read to EOF unless you open the file in non-blocking mode.
No. I must be just remembering something from another language.
Sorry.
--
DaveA
___
@Steven: Thank you...My input data is basically AUGC and newlines... I
would like to know about bytearray technique. Please suggest me some links
or reference.. I will go through the profiler and check whether the code
maintains linearity with the input files.
> > It's probably worth putting so
On Mon, Nov 04, 2013 at 04:54:16PM +, Alan Gauld wrote:
> On 04/11/13 16:34, Amal Thomas wrote:
> >@Joel: The code runs for weeks..input file which I have to process in
> >very huge(in 50 gbs). So its not a matter of hours.its matter of days
> >and weeks..
>
> OK, but that's not down to readin
On 04/11/13 16:34, Amal Thomas wrote:
@Joel: The code runs for weeks..input file which I have to process in
very huge(in 50 gbs). So its not a matter of hours.its matter of days
and weeks..
OK, but that's not down to reading the file from disk.
Reading a 50G file will only take a few minutes if
On Mon, Nov 04, 2013 at 11:27:52AM -0500, Joel Goldstick wrote:
> If you are new to python why are you so concerned about the speed of
> your code.
Amal is new to Python but he's not new to biology, he's a 4th year
student. With a 50GB file, I expect he is analysing something to do with
DNA seq
@Steven: Thanks... Right now I cant access the files. I will send you the
output when I can.
--
Please try this little bit of code, replacing the file name with the
actual name of your 50GB data file:
import os
filename = "YOUR FILE NAME HERE"
print("File size:", os.stat(filename).st_size)
f
@Joel: The code runs for weeks..input file which I have to process in very
huge(in 50 gbs). So its not a matter of hours.its matter of days and
weeks..I was using C++. Recently I switched over to Python. I am trying to
optimize my code to get the outputs in less time and memory efficiently.
On Mo
On Mon, Nov 04, 2013 at 07:00:29PM +0530, Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.
This is remarkable, and quite frankly incredible. I wonder whether you
are misinterpreting wha
"I am new to python. I am working in computational biology and I have
to deal with text files of huge size. I know how to read line by line
from a text file. I want to know the best method in python3 to load
the enire file into ram and do the operations.(since this saves time)"
If you are new t
@Dave: thanks.. By the way I am running my codes on a server with about
100GB ram but I cant afford my code to use 4-5 times the size of the text
file. Now I am using read() / readlines() , these seems to be more
efficient in memory usage than io.StringIO(f.read()).
On Mon, Nov 4, 2013 at 9:23 P
On Mon, Nov 04, 2013 at 02:48:11PM +, Dave Angel wrote:
> Now I understand. Processing line by line is slower because it actually
> reads the whole file. The code you showed earlier:
>
> >I am currently using this method to load my text file:
> > *f = open("output.txt")
> > content=io.S
On 4/11/2013 09:04, Amal Thomas wrote:
> @William:
> Thanks,
>
> My Line size varies from 40 to 550 characters. Please note that text file
> which I have to process is in gigabytes ( approx 50 GB ) . This was the
> code which i used to process line by line without loading into memory.
Now I under
@William:
Thanks,
My Line size varies from 40 to 550 characters. Please note that text file
which I have to process is in gigabytes ( approx 50 GB ) . This was the
code which i used to process line by line without loading into memory.
*for lines in open('uniqname.txt'): *
* *
On Mon, Nov 4, 201
Hi,
@Peter:
I have checked the execution time manually as well as I found it through my
code. During execution of my code, at start, I stored my initial time(start
time) to a variable and at the end calculated time taken to run the code =
end time - start time. There was a significance difference
On Nov 4, 2013, at 8:30 AM, Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.
>
[huge snip]
> --
> AMAL THOMAS
> Fourth Year Undergraduate Student
> Department of Biotechnology
> II
Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.
How exactly did you find out? You should only see a speed-up if you iterate
over the data at least twice.
Yes I have found that after loading to RAM and then reading lines by lines
saves a huge amount of time since my text files are very huge.
On Mon, Nov 4, 2013 at 6:46 PM, Alan Gauld wrote:
> On 04/11/13 13:06, Amal Thomas wrote:
>
> Present code:
>>
>>
>> *f = open("output.txt")
>> content=f.rea
On 04/11/13 13:06, Amal Thomas wrote:
Present code:
*f = open("output.txt")
content=f.read().split('\n')
f.close()
If your objective is to save time, then you should replace this with
f.readlines() which will save you reprocesasing the entire file to
remove the newlines.
for lines in con
Hi,
Thanks Alan.
Now I have made changes in code :
Present code:
*f = open("output.txt")content=f.read().split('\n') f.close()for lines in
content:*
* *
*content.clear()*
Previous code:
*f = open("output.txt") content=io.StringIO(f.read()) f.close()for lines in
content: *
*content.
On 04/11/13 11:07, Amal Thomas wrote:
I am currently using this method to load my text file:
*f = open("output.txt")
content=io.StringIO(f.read())
f.close()*
But I have found that this method uses 4 times the size of text file.
So why not use
f = open("output.txt")
content=f.read()
f.clo
Hi,
I am new to python. I am working in computational biology and I have to
deal with text files of huge size. I know how to read line by line from a
text file. I want to know the best method in *python3* to load the enire
file into ram and do the operations.(since this saves time)
I am curren
36 matches
Mail list logo