Re: [Tutor] Huge list comprehension
take a look at numpy and don't necessarily give us the whole code. it becomes too long without purpose Abdur-Rahmaan Janhangeer, Mauritius abdurrahmaanjanhangeer.wordpress.com On 6 Jun 2017 03:26, "syed zaidi" wrote: hi, I would appreciate if you can help me suggesting a quick and efficient strategy for comparing multiple lists with one principal list I have about 125 lists containing about 100,000 numerical entries in each my principal list contains about 6 million entries. I want to compare each small list with main list and append yes/no or 0/1 in each new list corresponding to each of 125 lists The program is working but it takes ages to process huge files, Can someone pleases tell me how can I make this process fast. Right now it takes arounf 2 weeks to complete this task the code I have written and is working is as under: sample_name = [] main_op_list,principal_list = [],[] dictionary = {} with open("C:/Users/INVINCIBLE/Desktop/T2D_ALL_blastout_batch.txt", 'r') as f: reader = csv.reader(f, dialect = 'excel', delimiter='\t') list2 = filter(None, reader) for i in range(len(list2)): col1 = list2[i][0] operon = list2[i][1] main_op_list.append(operon) col1 = col1.strip().split("_") sample_name = col1[0] if dictionary.get(sample_name): dictionary[sample_name].append(operon) else: dictionary[sample_name] = [] dictionary[sample_name].append(operon) locals().update(dictionary) ## converts dictionary keys to variables ##print DLF004 dict_values = dictionary.values() dict_keys = dictionary.keys() print dict_keys print len(dict_keys) main_op_list_np = np.array(main_op_list) DLF002_1,DLF004_1,DLF005_1,DLF006_1,DLF007_1,DLF008_1, DLF009_1,DLF010_1,DLF012_1,DLF013_1,DLF014_1,DLM001_1, DLM002_1,DLM003_1,DLM004_1,DLM005_1,DLM006_1,DLM009_1, DLM011_1,DLM012_1,DLM018_1,DOF002_1,DOF003_1 =[],[],[],[],[],[],[],[],[],[] ,[],[],[],[],[],[],[],[],[],[],[],[],[] DOF004_1,DOF006_1,DOF007_1,DOF008_1,DOF009_1,DOF010_1, DOF011_1,DOF012_1,DOF013_1,DOF014_1,DOM001_1,DOM003_1, DOM005_1,DOM008_1,DOM010_1,DOM012_1,DOM013_1,DOM014_1, DOM015_1,DOM016_1,DOM017_1,DOM018_1,DOM019_1 =[],[],[],[],[],[],[],[],[],[] ,[],[],[],[],[],[],[],[],[],[],[],[],[] DOM020_1,DOM021_1,DOM022_1,DOM023_1,DOM024_1,DOM025_1,DOM026_1 = [],[],[],[],[],[],[] NLF001_1,NLF002_1,NLF005_1,NLF006_1,NLF007_1,NLF008_1, NLF009_1,NLF010_1,NLF011_1,NLF012_1,NLF013_1,NLF014_1, NLF015_1,NLM001_1,NLM002_1,NLM003_1,NLM004_1,NLM005_1, NLM006_1,NLM007_1,NLM008_1,NLM009_1,NLM010_1 =[],[],[],[],[],[],[],[],[],[] ,[],[],[],[],[],[],[],[],[],[],[],[],[] NLM015_1,NLM016_1,NLM017_1,NLM021_1,NLM022_1,NLM023_1, NLM024_1,NLM025_1,NLM026_1,NLM027_1,NLM028_1,NLM029_1, NLM031_1,NLM032_1,NOF001_1,NOF002_1,NOF004_1,NOF005_1, NOF006_1,NOF007_1,NOF008_1,NOF009_1,NOF010_1 =[],[],[],[],[],[],[],[],[],[] ,[],[],[],[],[],[],[],[],[],[],[],[],[] NOF011_1,NOF012_1,NOF013_1,NOF014_1,NOM001_1,NOM002_1, NOM004_1,NOM005_1,NOM007_1,NOM008_1,NOM009_1,NOM010_1, NOM012_1,NOM013_1,NOM015_1,NOM016_1,NOM017_1,NOM018_1, NOM019_1,NOM020_1,NOM022_1,NOM023_1,NOM025_1 =[],[],[],[],[],[],[],[],[],[] ,[],[],[],[],[],[],[],[],[],[],[],[],[] NOM026_1,NOM027_1,NOM028_1,NOM029_1 = [],[],[],[] for i in main_op_list_np: if i in DLF002: DLF002_1.append('1') else:DLF002_1.append('0') if i in DLF004: DLF004_1.append('1') else:DLF004_1.append('0') if i in DLF005: DLF005_1.append('1') else:DLF005_1.append('0') if i in DLF006: DLF006_1.append('1') else:DLF006_1.append('0') if i in DLF007: DLF007_1.append('1') else:DLF007_1.append('0') if i in DLF008: DLF008_1.append('1') else:DLF008_1.append('0') ## if main_op_list[i] in DLF009: DLF009_1.append('1') ## else:DLF009_1.append('0') if i in DLF010: DLF010_1.append('1') else:DLF010_1.append('0') if i in DLF012: DLF012_1.append('1') else:DLF012_1.append('0') if i in DLF013: DLF013_1.append('1') else:DLF013_1.append('0') if i in DLF014: DLF014_1.append('1') else:DLF014_1.append('0') if i in DLM001: DLM001_1.append('1') else:DLM001_1.append('0') if i in DLM002: DLM002_1.append('1') else:DLM002_1.append('0') if i in DLM003: DLM003_1.append('1') else:DLM003_1.append('0') if i in DLM004: DLM004_1.append('1') else:DLM004_1.append('0') if i in DLM005: DLM005_1.append('1') else:DLM005_1.append('0') if i in DLM006: DLM006_1.append('1') else:DLM006_1.append('0') if i in DLM009: DLM009_1.append('1') else:DLM009_1.append('0') if i in DLM011: DLM011_1.append('1') else:DLM011_1.append('0') if i in DLM012: DLM012_1.append('1') else:DLM012_1.append('0') if i in DLM018: DLM018_1.append('1') else:DLM018_1.append('0') if i in DOF002: DOF002_1.append('1') else:DOF002_1.append('0') if i in DOF003: DOF003_1.append('1') else:DOF003_1.append('0') if i in DOF004: DOF004_1.append('1') else:DOF004_1.app
Re: [Tutor] Huge list comprehension
On 10/06/17 08:35, Abdur-Rahmaan Janhangeer wrote: > take a look at numpy It seems he already has, np.array is in his code. It's just the imports that are missing I suspect. > and don't necessarily give us the whole code. it becomes too long without > purpose Yes although in this case it does serve to highlight the problems with his approach - as highlighted by Peter. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] string reversal using [::-1]
Question: Why does "123"[::-1] result in "321"? MY thinking is [::-1] is same as [0:3:-1], that the empty places defaults to start and end index of the string object. So, if we start from 0 index and decrement index by 1 till we reach 3, how many index we should get? I think we should get infinite infinite number of indices (0,-1,-2,-3.). This is my confusion. I hope my question is clear. Thanks, Vikas ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python - help with something most essential
It's really awkward the way you're using Counter here... you're making new instances in every lambda (which is not great for memory usage), and then not actually using the Counter functionality: return sum(1 for _ in filter(lambda x: Counter(word) == Counter(x.strip()), fileContent)) (the whole point of the Counter() is to get sums, you don't need to do any of this !!) I'm not sure that they cared about how you used file.readlines(), I think the memory comment was a hint about instantiating Counter()s anyhow, all of this can be much much simpler: """ sortedword = sorted(inputWord) # an array of letters in the word, in alphabetical order count = 0 with open(filename) as f: for word in f.read().split(' '): # iterate over every word in the file if sorted(word) == sortedword: count +=1 print count """ which could in turn probably be written as a one liner. So even though you cleaned the code up a bit, it's still quite a bit more complicated then it needs to be, which makes it seem like your fundamentals are not great either! On Tue, Jun 6, 2017 at 2:31 AM, Peter Otten <__pete...@web.de> wrote: > Schtvveer Schvrveve wrote: > > > I need someone's help. I am not proficient in Python and I wish to > > understand something. I was in a job pre-screening process where I was > > asked to solve a simple problem. > > > > The problem was supposed to be solved in Python and it was supposed to > > take two arguments: filename and word. The program reads the file which > is > > a .txt file containing a bunch of words and counts how many of those > words > > in the file are anagrams of the argument. > > > > First I concocted this solution: > > > > import sys > > from collections import Counter > > > > def main(args): > > filename = args[1] > > word = args[2] > > print countAnagrams(word, filename) > > > > def countAnagrams(word, filename): > > > > fileContent = readFile(filename) > > > > counter = Counter(word) > > num_of_anagrams = 0 > > > > for i in range(0, len(fileContent)): > > if counter == Counter(fileContent[i]): > > num_of_anagrams += 1 > > > > return num_of_anagrams > > > > def readFile(filename): > > > > with open(filename) as f: > > content = f.readlines() > > > > content = [x.strip() for x in content] > > > > return content > > > > if __name__ == '__main__': > > main(sys.argv) > > > > Very quickly I received this comment: > > > > "Can you adjust your solution a bit so you less loops (as little as > > possible) and also reduce the memory usage footprint of you program?" > > > > I tried to rework the methods into this: > > > > def countAnagrams(word, filename): > > > > fileContent = readFile(filename) > > > > return sum(1 for _ in filter(lambda x: Counter(word) == > > Counter(x.strip()), fileContent)) > > > > def readFile(filename): > > > > with open(filename) as f: > > content = f.readlines() > > > > return content > > > > And I was rejected. I just wish to understand what I could have done for > > this to be better? > > > > I am a Python beginner, so I'm sure there are things I don't know, but I > > was a bit surprised at the abruptness of the rejection and I'm worried > I'm > > doing something profoundly wrong. > > for i in range(0, len(stuff)): > ... > > instead of > > for item in stuff: > ... > > and > > content = file.readlines() # read the whole file into memory > process(content) > > are pretty much the most obvious indicators that you are a total newbie in > Python. Looks like they weren't willing to give you the time to iron that > out on the job even though you knew about lambda, Counter, list > comprehensions and generator expressions which are not newbie stuff. > > When upon their hint you did not address the root cause of the unbounded > memory consumption they might have come to the conclusion that you were > reproducing snippets you picked up somewhere and thus were cheating. > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string reversal using [::-1]
On 10/06/17 17:39, Vikas YADAV wrote: > Question: Why does "123"[::-1] result in "321"? > > MY thinking is [::-1] is same as [0:3:-1], that the empty places defaults to > start and end index of the string object. Did you try that? You may be surprised. The wonderful thing about the >>> prompt is that it's as quick to try it out as to guess. Look in the archives for a recent post by Steven(29/5, "Counting a string backwards") that explains slicing, it may help. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor