Hi,
< I have a file which is 2.5 Gb.,
>
> There are many duplicate lines. I wanted to get rid
> of the duplicates.
First, can you use uniq which is a standard Unix/Linux OS command?
> I chose to parse to get uniqe element.
>
> f1 = open('mfile','r')
> da = f1.read().split('\n')
This reads 2
I'd use a database if I was you.
Install for instance MYSQL or MudBase or something like that and (if
need be use Python) to insert the lines into the database. Only
storing unique lines would be failry easy.
Other sollution (with the usage of Python):
If you must use Python I'd suggest making new
On Wed, 1 Feb 2006, Srinivas Iyyer wrote:
> I have a file which is 2.5 Gb.,
>
[data cut]
>
> There are many duplicate lines. I wanted to get rid of the duplicates.
Hi Srinivas,
When we deal with such large files, we do have to be careful and aware of
issues like the concept of memory.
> I
Hi Group,
I have a file which is 2.5 Gb.,
TRIM54 NM_187841.1 GO:0004984
TRIM54 NM_187841.1 GO:0001584
TRIM54 NM_187841.1 GO:0003674
TRIM54 NM_187841.1 GO:0004985
TRIM54 NM_187841.1 GO:0001584
TRIM54 NM_187841.1 GO:0001653
TRIM54 NM_187841.1 GO:0004984
There ar