subject:"\[Tutor\] need to get unique elements out of a 2.5Gb file"

Re: [Tutor] need to get unique elements out of a 2.5Gb file

2006-02-02 Thread Alan Gauld

Hi, < I have a file which is 2.5 Gb., > > There are many duplicate lines. I wanted to get rid > of the duplicates. First, can you use uniq which is a standard Unix/Linux OS command? > I chose to parse to get uniqe element. > > f1 = open('mfile','r') > da = f1.read().split('\n') This reads 2

Re: [Tutor] need to get unique elements out of a 2.5Gb file

2006-02-02 Thread Rinzwind

I'd use a database if I was you. Install for instance MYSQL or MudBase or something like that and (if need be use Python) to insert the lines into the database. Only storing unique lines would be failry easy. Other sollution (with the usage of Python): If you must use Python I'd suggest making new

Re: [Tutor] need to get unique elements out of a 2.5Gb file

2006-02-01 Thread Danny Yoo

On Wed, 1 Feb 2006, Srinivas Iyyer wrote: > I have a file which is 2.5 Gb., > [data cut] > > There are many duplicate lines. I wanted to get rid of the duplicates. Hi Srinivas, When we deal with such large files, we do have to be careful and aware of issues like the concept of memory. > I

[Tutor] need to get unique elements out of a 2.5Gb file

2006-02-01 Thread Srinivas Iyyer

Hi Group, I have a file which is 2.5 Gb., TRIM54 NM_187841.1 GO:0004984 TRIM54 NM_187841.1 GO:0001584 TRIM54 NM_187841.1 GO:0003674 TRIM54 NM_187841.1 GO:0004985 TRIM54 NM_187841.1 GO:0001584 TRIM54 NM_187841.1 GO:0001653 TRIM54 NM_187841.1 GO:0004984 There ar