[Tutor] File handling Tab separated files

2018-04-19 Thread Niharika Jakhar
Hi
I want to store a file from BioGRID database (tab separated file, big data)
into a data structure(I prefer lists, please let me know if another would
be better) and I am trying to print the objects.
Here’s my code:
class BioGRIDReader:
def __init__(self, filename):
with open('filename', 'r') as file_:
read_data = f.read()
for i in file_ :
read_data = (i.split('\t'))
return (objects[:100])

a = BioGRIDReader
print (a.__init__(test_biogrid.txt))




Here's what the terminal says:
Traceback (most recent call last):
  File "./BioGRIDReader.py", line 23, in 
print (a.__init__(test_biogrid.txt))
NameError: name 'test_biogrid' is not defined

The file named test_biogrid.txt do exist in the same folder as this program.

I am unable to go further with this code. Kindly help me out.


Thanks and regards
NIHARIKA
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] File handling Tab separated files

2018-04-19 Thread Niharika Jakhar
Hi again
I tried re-writing the code with all your advices(i assume to cover all of
them). I have extended the code a little bit to store the data in the form
of lists and am trying to access it.
I also changed the file name to BioGRID.txt

Here's what I wrote(Please ignore the identation, there was no such error,
it's just the e-mail thingy.):

import csv
class BioGRIDReader:
def __init__(self, filename):
with open('filename', 'rb') as f:
self.reader = csv.reader(f, delimiter='\t')
self.object_ = self.reader.split['\n']
for row in range(len(object_)):
for r in range(len(row)):
if (r[-1] == r[-2]):
return (r[2], r[3]) #returns pair of taxon ids






a = BioGRIDReader('BioGRID.txt')
print(a.object[:100])




here's what the compiler says:
Traceback (most recent call last):
  File "./BioGRIDReader.py", line 17, in 
a = BioGRIDReader('BioGRID.txt')
  File "./BioGRIDReader.py", line 4, in __init__
with open('filename', 'rb') as f:
IOError: [Errno 2] No such file or directory: 'filename'





I am extremely sorry if I have repeated a same mistake again, but I did
what I understood.
Thanks again for the links and I am not allowed to use different packages
and have to strictly use the standard python library. But thanks anyway, it
will be of great help in future. :)


On Thu, Apr 19, 2018 at 4:50 PM, Mats Wichmann  wrote:

> On 04/19/2018 07:57 AM, Wolfgang Maier wrote:
> > On 04/19/2018 10:45 AM, Niharika Jakhar wrote:
> >> Hi
> >> I want to store a file from BioGRID database (tab separated file, big
> >> data)
> >> into a data structure(I prefer lists, please let me know if another
> would
> >> be better) and I am trying to print the objects.
> >> Here’s my code:
> >> class BioGRIDReader:
> >>  def __init__(self, filename):
> >>  with open('filename', 'r') as file_:
> >>  read_data = f.read()
> >>  for i in file_ :
> >>  read_data = (i.split('\t'))
> >>  return (objects[:100])
> >>
> >> a = BioGRIDReader
> >> print (a.__init__(test_biogrid.txt))
> >>
> >
> > In addition to your immediate problem, which Steven explained already,
> > you will run into more issues with the posted code:
>
> In addition to this low level advice, let me observe that whenever the
> term "big data" is tossed into the discussion, you want to consider
> whether reading it all in to Python's memory into a "simple" data
> structure in one go is what you want to do.  You may want to look into
> the Pandas project (possibly after spending a little more time becoming
> comfortable with Python itself first):
>
> https://pandas.pydata.org/
>
> Pandas has its own file handling code (particularly, a read_csv
> function) which might end up being useful.
>
>
> Also quite by chance, I happen to know there's an existing project to
> interact with the BioGRID web service, have no idea if that would be a
> match for any of your needs.  A quick google to refind it:
>
> https://github.com/arvkevi/biogridpy
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] File handling Tab separated files

2018-04-25 Thread Niharika Jakhar
hi again


when I #print (self.organismA) under the for x in self.results: , it
results in what it is supposed to be.
But when i print it in the below function, it gives some garbage value.

Kindly let me know what is wrong. :)




import functools
import csv
import time
start =time.time()

class BioGRIDReader:
def __init__(self, filename):
self.results = []
self.organisms = {}
i = 0
with open(filename) as f:
for line in csv.reader(f, delimiter = '\t'):
i += 1
if i>35:
self.results.append(line)
#print (self.results)
for x in self.results:
self.organismA = x[2]
self.organismB = x[3]
self.temp = (x[2],)
self.keys = self.temp
self.values = [x[:]]
self.organisms = dict(zip(self.keys, self.values))
#print (self.organismA)
#print (self.results[0:34]) #omitted region

def getMostAbundantTaxonIDs(self,n):
#print (self.organismA)
self.temp_ = 0
self.number_of_interactions = []
self.interaction_dict = {}
for x in self.organismA:
for value in self.organisms:
if (x in value):
self.temp_ += 1
self.number_of_interactions.append(self.temp_)
self.interaction_dict = dict(zip(self.organismA,
self.number_of_interactions))



a = BioGRIDReader("BIOGRID-ALL-3.4.159.tab.txt")
a.getMostAbundantTaxonIDs(5)
end = time.time()
#print(end - start)













Thanking you in advance

Best Regards
NIHARIKA

On Fri, Apr 20, 2018 at 11:06 AM, Alan Gauld  wrote:

>
> Use Reply-All or Reply-List to include the mailing list in replies.
>
> On 20/04/18 09:10, Niharika Jakhar wrote:
> > Hi
> >
> > I want to store the data of file into a data structure which has 11
> > objects per line , something like this:
> > 2354 somethin2  23nothing   23214.
> >
> >
> > so I was trying to split the lines using \n and storer each line in a
> > list so I have a list of 11 objects, then I need to retrieve the last
> > two position,
>
> You are using the csv module so you don't need to split the lines, the
> csv reader has already done that for you. It generates a sequence of
> tuples, one per line.
>
> So you only need to do something like:
>
> results = []
> with open(filename) as f:
>  for line in csv.reader(f, delimiter='\t'):
>  if line[-1] == line[-2]:
> results.append(line[2],line[3])
>
> Let the library do the work.
>
> You can see what the reader is doing by inserting a print(line) call
> instead of the if statement. When using a module for the first time
> don't be afraid to use print to check the input/output values.
> Its better than guessing.
>
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] tab separated file handling

2018-06-13 Thread Niharika Jakhar
hi everyone!
I am working with a tsv file which has NA and empty values.
I have used csv package to make a list of list of the data.
I want to remove NA and empty values.

This is what I wrote:


#removes row with NA values
for rows in self.dataline:
for i in rows:
if i == 'NA' or i ==  '':
self.dataline.remove(rows)


This is what the terminal says:

self.dataline.remove(rows)
ValueError: list.remove(x): x not in list


This is how the file looks like:

d23 87 9 NA 67 5 657 NA 76 8 87 78 90 800
er 21 8 908 9008 9 7 5 46 3 5 757 7 5
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor