[Tutor] File handling Tab separated files
Hi I want to store a file from BioGRID database (tab separated file, big data) into a data structure(I prefer lists, please let me know if another would be better) and I am trying to print the objects. Here’s my code: class BioGRIDReader: def __init__(self, filename): with open('filename', 'r') as file_: read_data = f.read() for i in file_ : read_data = (i.split('\t')) return (objects[:100]) a = BioGRIDReader print (a.__init__(test_biogrid.txt)) Here's what the terminal says: Traceback (most recent call last): File "./BioGRIDReader.py", line 23, in print (a.__init__(test_biogrid.txt)) NameError: name 'test_biogrid' is not defined The file named test_biogrid.txt do exist in the same folder as this program. I am unable to go further with this code. Kindly help me out. Thanks and regards NIHARIKA ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] File handling Tab separated files
Hi again I tried re-writing the code with all your advices(i assume to cover all of them). I have extended the code a little bit to store the data in the form of lists and am trying to access it. I also changed the file name to BioGRID.txt Here's what I wrote(Please ignore the identation, there was no such error, it's just the e-mail thingy.): import csv class BioGRIDReader: def __init__(self, filename): with open('filename', 'rb') as f: self.reader = csv.reader(f, delimiter='\t') self.object_ = self.reader.split['\n'] for row in range(len(object_)): for r in range(len(row)): if (r[-1] == r[-2]): return (r[2], r[3]) #returns pair of taxon ids a = BioGRIDReader('BioGRID.txt') print(a.object[:100]) here's what the compiler says: Traceback (most recent call last): File "./BioGRIDReader.py", line 17, in a = BioGRIDReader('BioGRID.txt') File "./BioGRIDReader.py", line 4, in __init__ with open('filename', 'rb') as f: IOError: [Errno 2] No such file or directory: 'filename' I am extremely sorry if I have repeated a same mistake again, but I did what I understood. Thanks again for the links and I am not allowed to use different packages and have to strictly use the standard python library. But thanks anyway, it will be of great help in future. :) On Thu, Apr 19, 2018 at 4:50 PM, Mats Wichmann wrote: > On 04/19/2018 07:57 AM, Wolfgang Maier wrote: > > On 04/19/2018 10:45 AM, Niharika Jakhar wrote: > >> Hi > >> I want to store a file from BioGRID database (tab separated file, big > >> data) > >> into a data structure(I prefer lists, please let me know if another > would > >> be better) and I am trying to print the objects. > >> Here’s my code: > >> class BioGRIDReader: > >> def __init__(self, filename): > >> with open('filename', 'r') as file_: > >> read_data = f.read() > >> for i in file_ : > >> read_data = (i.split('\t')) > >> return (objects[:100]) > >> > >> a = BioGRIDReader > >> print (a.__init__(test_biogrid.txt)) > >> > > > > In addition to your immediate problem, which Steven explained already, > > you will run into more issues with the posted code: > > In addition to this low level advice, let me observe that whenever the > term "big data" is tossed into the discussion, you want to consider > whether reading it all in to Python's memory into a "simple" data > structure in one go is what you want to do. You may want to look into > the Pandas project (possibly after spending a little more time becoming > comfortable with Python itself first): > > https://pandas.pydata.org/ > > Pandas has its own file handling code (particularly, a read_csv > function) which might end up being useful. > > > Also quite by chance, I happen to know there's an existing project to > interact with the BioGRID web service, have no idea if that would be a > match for any of your needs. A quick google to refind it: > > https://github.com/arvkevi/biogridpy > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] File handling Tab separated files
hi again when I #print (self.organismA) under the for x in self.results: , it results in what it is supposed to be. But when i print it in the below function, it gives some garbage value. Kindly let me know what is wrong. :) import functools import csv import time start =time.time() class BioGRIDReader: def __init__(self, filename): self.results = [] self.organisms = {} i = 0 with open(filename) as f: for line in csv.reader(f, delimiter = '\t'): i += 1 if i>35: self.results.append(line) #print (self.results) for x in self.results: self.organismA = x[2] self.organismB = x[3] self.temp = (x[2],) self.keys = self.temp self.values = [x[:]] self.organisms = dict(zip(self.keys, self.values)) #print (self.organismA) #print (self.results[0:34]) #omitted region def getMostAbundantTaxonIDs(self,n): #print (self.organismA) self.temp_ = 0 self.number_of_interactions = [] self.interaction_dict = {} for x in self.organismA: for value in self.organisms: if (x in value): self.temp_ += 1 self.number_of_interactions.append(self.temp_) self.interaction_dict = dict(zip(self.organismA, self.number_of_interactions)) a = BioGRIDReader("BIOGRID-ALL-3.4.159.tab.txt") a.getMostAbundantTaxonIDs(5) end = time.time() #print(end - start) Thanking you in advance Best Regards NIHARIKA On Fri, Apr 20, 2018 at 11:06 AM, Alan Gauld wrote: > > Use Reply-All or Reply-List to include the mailing list in replies. > > On 20/04/18 09:10, Niharika Jakhar wrote: > > Hi > > > > I want to store the data of file into a data structure which has 11 > > objects per line , something like this: > > 2354 somethin2 23nothing 23214. > > > > > > so I was trying to split the lines using \n and storer each line in a > > list so I have a list of 11 objects, then I need to retrieve the last > > two position, > > You are using the csv module so you don't need to split the lines, the > csv reader has already done that for you. It generates a sequence of > tuples, one per line. > > So you only need to do something like: > > results = [] > with open(filename) as f: > for line in csv.reader(f, delimiter='\t'): > if line[-1] == line[-2]: > results.append(line[2],line[3]) > > Let the library do the work. > > You can see what the reader is doing by inserting a print(line) call > instead of the if statement. When using a module for the first time > don't be afraid to use print to check the input/output values. > Its better than guessing. > > > -- > Alan G > Author of the Learn to Program web site > http://www.alan-g.me.uk/ > http://www.amazon.com/author/alan_gauld > Follow my photo-blog on Flickr at: > http://www.flickr.com/photos/alangauldphotos > > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] tab separated file handling
hi everyone! I am working with a tsv file which has NA and empty values. I have used csv package to make a list of list of the data. I want to remove NA and empty values. This is what I wrote: #removes row with NA values for rows in self.dataline: for i in rows: if i == 'NA' or i == '': self.dataline.remove(rows) This is what the terminal says: self.dataline.remove(rows) ValueError: list.remove(x): x not in list This is how the file looks like: d23 87 9 NA 67 5 657 NA 76 8 87 78 90 800 er 21 8 908 9008 9 7 5 46 3 5 757 7 5 ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor