Hi,
I am new to python. I am working in computational biology and I have to
deal with text files of huge size. I know how to read line by line from a
text file. I want to know the best method in *python3* to load the enire
file into ram and do the operations.(since this saves time)
I am curren
On 04/11/13 11:07, Amal Thomas wrote:
I am currently using this method to load my text file:
*f = open("output.txt")
content=io.StringIO(f.read())
f.close()*
But I have found that this method uses 4 times the size of text file.
So why not use
f = open("output.txt")
content=f.read()
f.clo
Hi,
Thanks Alan.
Now I have made changes in code :
Present code:
*f = open("output.txt")content=f.read().split('\n') f.close()for lines in
content:*
* *
*content.clear()*
Previous code:
*f = open("output.txt") content=io.StringIO(f.read()) f.close()for lines in
content: *
*content.
On 04/11/13 13:06, Amal Thomas wrote:
Present code:
*f = open("output.txt")
content=f.read().split('\n')
f.close()
If your objective is to save time, then you should replace this with
f.readlines() which will save you reprocesasing the entire file to
remove the newlines.
for lines in con
Yes I have found that after loading to RAM and then reading lines by lines
saves a huge amount of time since my text files are very huge.
On Mon, Nov 4, 2013 at 6:46 PM, Alan Gauld wrote:
> On 04/11/13 13:06, Amal Thomas wrote:
>
> Present code:
>>
>>
>> *f = open("output.txt")
>> content=f.rea
Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.
How exactly did you find out? You should only see a speed-up if you iterate
over the data at least twice.
On Nov 4, 2013, at 8:30 AM, Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.
>
[huge snip]
> --
> AMAL THOMAS
> Fourth Year Undergraduate Student
> Department of Biotechnology
> II
Hi,
@Peter:
I have checked the execution time manually as well as I found it through my
code. During execution of my code, at start, I stored my initial time(start
time) to a variable and at the end calculated time taken to run the code =
end time - start time. There was a significance difference
@William:
Thanks,
My Line size varies from 40 to 550 characters. Please note that text file
which I have to process is in gigabytes ( approx 50 GB ) . This was the
code which i used to process line by line without loading into memory.
*for lines in open('uniqname.txt'): *
* *
On Mon, Nov 4, 201
On 4/11/2013 09:04, Amal Thomas wrote:
> @William:
> Thanks,
>
> My Line size varies from 40 to 550 characters. Please note that text file
> which I have to process is in gigabytes ( approx 50 GB ) . This was the
> code which i used to process line by line without loading into memory.
Now I under
On Mon, Nov 04, 2013 at 02:48:11PM +, Dave Angel wrote:
> Now I understand. Processing line by line is slower because it actually
> reads the whole file. The code you showed earlier:
>
> >I am currently using this method to load my text file:
> > *f = open("output.txt")
> > content=io.S
@Dave: thanks.. By the way I am running my codes on a server with about
100GB ram but I cant afford my code to use 4-5 times the size of the text
file. Now I am using read() / readlines() , these seems to be more
efficient in memory usage than io.StringIO(f.read()).
On Mon, Nov 4, 2013 at 9:23 P
"I am new to python. I am working in computational biology and I have
to deal with text files of huge size. I know how to read line by line
from a text file. I want to know the best method in python3 to load
the enire file into ram and do the operations.(since this saves time)"
If you are new t
On Mon, Nov 04, 2013 at 07:00:29PM +0530, Amal Thomas wrote:
> Yes I have found that after loading to RAM and then reading lines by lines
> saves a huge amount of time since my text files are very huge.
This is remarkable, and quite frankly incredible. I wonder whether you
are misinterpreting wha
@Joel: The code runs for weeks..input file which I have to process in very
huge(in 50 gbs). So its not a matter of hours.its matter of days and
weeks..I was using C++. Recently I switched over to Python. I am trying to
optimize my code to get the outputs in less time and memory efficiently.
On Mo
@Steven: Thanks... Right now I cant access the files. I will send you the
output when I can.
--
Please try this little bit of code, replacing the file name with the
actual name of your 50GB data file:
import os
filename = "YOUR FILE NAME HERE"
print("File size:", os.stat(filename).st_size)
f
On Mon, Nov 04, 2013 at 11:27:52AM -0500, Joel Goldstick wrote:
> If you are new to python why are you so concerned about the speed of
> your code.
Amal is new to Python but he's not new to biology, he's a 4th year
student. With a 50GB file, I expect he is analysing something to do with
DNA seq
On 04/11/13 16:34, Amal Thomas wrote:
@Joel: The code runs for weeks..input file which I have to process in
very huge(in 50 gbs). So its not a matter of hours.its matter of days
and weeks..
OK, but that's not down to reading the file from disk.
Reading a 50G file will only take a few minutes if
On Mon, Nov 04, 2013 at 04:54:16PM +, Alan Gauld wrote:
> On 04/11/13 16:34, Amal Thomas wrote:
> >@Joel: The code runs for weeks..input file which I have to process in
> >very huge(in 50 gbs). So its not a matter of hours.its matter of days
> >and weeks..
>
> OK, but that's not down to readin
@Steven: Thank you...My input data is basically AUGC and newlines... I
would like to know about bytearray technique. Please suggest me some links
or reference.. I will go through the profiler and check whether the code
maintains linearity with the input files.
> > It's probably worth putting so
On Tue, 5 Nov 2013 02:53:41 +1100, Steven D'Aprano
wrote:
Dave, do you have a reference for that? As far as I can tell, read()
will read to EOF unless you open the file in non-blocking mode.
No. I must be just remembering something from another language.
Sorry.
--
DaveA
___
On 4/11/2013 11:26, Amal Thomas wrote:
> @Dave: thanks.. By the way I am running my codes on a server with about
> 100GB ram but I cant afford my code to use 4-5 times the size of the text
> file. Now I am using read() / readlines() , these seems to be more
> efficient in memory usage than io.Str
On Mon, Nov 4, 2013 at 9:41 AM, Amal Thomas wrote:
> @Steven: Thank you...My input data is basically AUGC and newlines... I
> would like to know about bytearray technique. Please suggest me some links
> or reference.. I will go through the profiler and check whether the code
> maintains linearity
Forwarding to tutor list. Please use Reply All in responses.
From: Amal Thomas
>To: Alan Gauld
>Sent: Monday, 4 November 2013, 17:26
>Subject: Re: [Tutor] Load Entire File into memory
>
>
>
>@Alan: Thanks.. I have checked the both ways( reading line by line by not
>loading into ram ,
> othe
>
>
> > Also as I have mentioned I cant afford to run my code using 4-5 times
> memory.
> > Total resource available in my server is about 180 GB memory (approx 64
> GB RAM + 128GB swap).
>
> OK, There is a huge difference between having 100G of RAM and having
> 64G+128G swap.
> swap is basically d
>
> You _must_ avoid swap at all costs here. You may not understand the
> point, so a little more explanation: touching swap is several orders of
> magnitude more expensive than anything else you are doing in your program.
>
> CPU operations are on the order of nanoseconds. (10^-9)
>
> Dis
I mustly agree with Alan, but a couple of little quibbles:
On Tue, Nov 05, 2013 at 01:10:39AM +, ALAN GAULD wrote:
> >@Alan: Thanks.. I have checked the both ways( reading line by line by not
> >loading into ram ,
> > other loading entire file to ram and then reading line by line) for file
On Mon, Nov 04, 2013 at 06:02:47PM -0800, Danny Yoo wrote:
> To visualize the sheer scale of the problem, see:
>
> http://i.imgur.com/X1Hi1.gif
>
> which would normally be funny, except that it's not quite a joke. :P
Nice visualisation! Was that yours?
> So you want to minimize hard disk
Amal Thomas, 04.11.2013 14:55:
> I have checked the execution time manually as well as I found it through my
> code. During execution of my code, at start, I stored my initial time(start
> time) to a variable and at the end calculated time taken to run the code =
> end time - start time. There was
29 matches
Mail list logo