Re: [Tutor] threading for each line in a large file, and doing it right
On 25/04/18 03:26, Evuraan wrote: > Please consider this situation : > Each line in "massive_input.txt" need to be churned by the > "time_intensive_stuff" function, so I am trying to background it. What kind of "churning" is involved? If its compute intensive threading may not be the right answer, but if its I/O bound then threading is probably ok. > import threading > > def time_intensive_stuff(arg): ># some code, some_conditional >return (some_conditional) What exactly do you mean by some_conditional? Is it some kind of big decision tree? Or if/else network? Or is it dependent on external data (from where? a database? network?) And you return it - but what is returned? - an expression, a boolean result? Its not clear what the nature of the task is but that makes a big difference to how best to parallelise the work. > with open("massive_input.txt") as fobj: >for i in fobj: > thread_thingy = thread.Threading(target=time_intensive_stuff, args=(i,) > ) > thread_thingy.start() > > With above code, it still does not feel like it is backgrounding at > scale, Can you say why you feel that way? What measurements have you done? What system observations(CPU, Memory, Network etc)? What did you expect to see and what did you see. Also consider that processing a huge number of lines will generate a huge number of subprocesses or threads. There is an overhead to each thread and your computer may not have enough resources to run them all efficiently. It may be better to batch the lines so each subprocess handles 10, or 50 or 100 lines (whatever makes sense). Put a loop into your time intensive function to process the list of input values and return a list of outputs. And your external loop needs an inner loop to create the batches. The number of entries in the batch can be parametrized so that you can experiment to find the most cost effective size.. > I am sure there is a better pythonic way. I suspect the issues are not Python specific but are more generally about paralleling large jobs. > How do I achieve something like this bash snippet below in python: > > time_intensive_stuff_in_bash(){ ># some code > : > } > > for i in $(< massive_input.file); do > time_intensive_stuff_in_bash i & disown > : > done Its the same except in bash you start a whole new process so instead of using threading you use concurrent. But did you try this in bash? Was it faster than using Python? I would expect the same issues of too many processes to arise in bash. HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] threading for each line in a large file, and doing it right
On 25/04/18 09:27, Alan Gauld via Tutor wrote: >> for i in $(< massive_input.file); do >> time_intensive_stuff_in_bash i & disown >> : >> done > > Its the same except in bash you start a whole > new process so instead of using threading you > use concurrent. concurrent -> multiprocessing doh! sorry -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Calling the same thread multiple times
Hello, Im creating a progress bar for applications that can keep track of a download in progress. The progress bar will be on a separate thread and will communicate with the main thread using delegates. Ive made the download and the progress bar part, all that remains is the connection between the two of them. For this purpose i tried to simplify the problem, but i cant seem to make it right. Here's what i got so far... import threading def test(): print(threading.current_thread()) for i in range(5): print(threading.current_thread()) t1 = threading.Thread(target = test) t1.start() t1.join() This gives me the output: <_MainThread(MainThread, started 139983023449408)> <_MainThread(MainThread, started 139983023449408)> <_MainThread(MainThread, started 139983023449408)> <_MainThread(MainThread, started 139983023449408)> <_MainThread(MainThread, started 139983023449408)> What i need to do is to call the same thread (Thread-1) multiple times, and the call (of the test function) must be IN the for loop. Ive also tried something like that: import threading import queue def test(): print(threading.current_thread()) i = q.get() print(i) q = queue.Queue() t1 = threading.Thread(target = test) t1.start() for i in range(5): print(threading.current_thread()) q.put(i) t1.join() The result im getting is : <_MainThread(MainThread, started 140383183029568)> <_MainThread(MainThread, started 140383183029568)> 0 <_MainThread(MainThread, started 140383183029568)> <_MainThread(MainThread, started 140383183029568)> <_MainThread(MainThread, started 140383183029568)> Any ideas on how to solve this? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] I'm attempting to code the barbershop problem in OS except with 3 barbers instead of one. Can anyone help rewrite my Barber1 and Barber2 classes so it just uses the functions already defined i
class Barber: barberWorkingEvent = Event() def sleep(self): self.barberWorkingEvent.wait() def wakeUp(self): self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as busy self.barberWorkingEvent.clear() print '{0} is having a haircut from barber\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5) time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name) class Barber1: barberWorkingEvent = Event() def sleep(self): self.barberWorkingEvent.wait() def wakeUp(self): self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as busy self.barberWorkingEvent.clear() print '{0} is having a haircut from barber1\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5) time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name) class Barber2: barberWorkingEvent = Event() def sleep(self): self.barberWorkingEvent.wait() def wakeUp(self): self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as busy self.barberWorkingEvent.clear() print '{0} is having a haircut from barber1\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5) time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I'm attempting to code the barbershop problem...
Hi Michael and welcome. In future, please leave the Subject line as a BRIEF summary, and put the description of your problem in the body of the email. You said: > I'm attempting to code the barbershop problem in OS except > with 3 barbers instead of one. Can anyone help rewrite my Barber1 and > Barber2 classes so it just uses the functions already defined in the > original Barber class. What's the barbershop problem? Why do you need three classes? On Wed, Apr 25, 2018 at 11:29:23AM +0100, Michael Solan wrote: > class Barber: barberWorkingEvent = Event() def sleep(self): > self.barberWorkingEvent.wait() def wakeUp(self): > self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as > busy self.barberWorkingEvent.clear() print '{0} is having a haircut from > barber\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5) > time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name) The formatting here is completely messed up. If you are posting using Gmail, you need to ensure that your email uses no formatting (no bold, no colours, no automatic indentation etc), or else Gmail will mangle the indentation of your code, as it appears to have done above. My wild guess is that what you probably want is something like this: import random import time class Barber(object): def __init__(self, name): self.workingEvent = Event() # What is Event? self.name = name def sleep(self): self.workingEvent.wait() def wakeUp(self): self.workingEvent.set() def cutHair(self, customer): # Set this barber as busy. self.workingEvent.clear() template = '{0} is having a haircut from barber {1}\n' print template.format(customer.name, self.name) HairCuttingTime = random.randrange(0, 5) time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name) tony = Barber('Tony') fred = Barber('Fred') george = Barber('George') # and then what? Notice that we have *one* class and three instances of that class. I've given them individual names so they're easier to distinguish. Please ensure you reply on the mailing list. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] I'm attempting to code the barbershop problem...
> What's the barbershop problem? a classic computer science puzzle which is essentially a process synchronization problem. it does help to spell out the problem you are trying to solve, however - we don't have the context the original poster is operating in. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Dict of Dict with lists
Hello everybody, I'm coming from a Perl background and try to parse some Exim Logfiles into a data structure of dictionaries. The regex and geoip part works fine and I'd like to save the email adress, the countries (from logins) and the count of logins. The structure I'd like to have: result = { 'f...@bar.de': { 'Countries': [DE,DK,UK] 'IP': ['192.168.1.1','172.10.10.10'] 'Count': [12] } 'b...@foo.de': { 'Countries': [DE,SE,US] 'IP': ['192.168.1.2','172.10.10.11'] 'Count': [23] } } I don't have a problem when I do these three seperately like this with a one dimensonial dict (snippet): result = defaultdict(list) with open('/var/log/exim4/mainlog',encoding="latin-1") as logfile: for line in logfile: result = pattern.search(line) if (result): login_ip = result.group("login_ip") login_auth = result.group("login_auth") response = reader.city(login_ip) login_country = response.country.iso_code if login_auth in result and login_country in result[login_auth]: continue else: result[login_auth].append(login_country) else: continue This checks if the login_country exists within the list of the specific login_auth key, adds them if they don't exist and gives me the results I want. This also works for the ip addresses and the number of logins without any problems. As I don't want to repeat these loops three times with three different data structures I'd like to do this in one step. There are two main problems I don't understand right now: 1. How do I check if a value exists within a list which is the value of a key which is again a value of a key in my understanding exists? What I like to do: if login_auth in result and (login_country in result[login_auth][Countries]) continue This obviously does not work and I am not quite sure how to address the values of 'Countries' in the right way. I'd like to check 'Key1:Key2:List' and don't know how to address this 2. How do I append new values to these lists within the nested dict? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Dict of Dict with lists
On 25/04/18 14:22, Kai Bojens wrote: > The structure I'd like to have: > > result = { > 'f...@bar.de': { > 'Countries': [DE,DK,UK] > 'IP': ['192.168.1.1','172.10.10.10'] > 'Count': [12] > } > } > ... > for line in logfile: > result = pattern.search(line) Doesn't this overwrite your data structure? I would strongly advise using another name. > if (result): > login_ip = result.group("login_ip") > login_auth = result.group("login_auth") > response = reader.city(login_ip) > login_country = response.country.iso_code > if login_auth in result and login_country in result[login_auth]: > continue > else: > result[login_auth].append(login_country) > else: > continue > 1. How do I check if a value exists within a list which is the value of a key > which is again a value of a key in my understanding exists? What I like to do: dic = {'key1':{'key2':[...]}} if my_value in dic[key1][key2]: > if login_auth in result and (login_country in result[login_auth][Countries]) > continue Should work. > This obviously does not work and I am not quite sure how to address the values > of 'Countries' in the right way. I'd like to check 'Key1:Key2:List' and don't > know how to address this It should worjk as you expect. However personally I'd use a class to define tyour data structure and just have a top leveldictionary holding instances of the class. Something like: class Login: def __init__(self, countries, IPs, count): self.countries = countries self.IPs = IPs self.count = count results = {'f...@bar.de': Login([DE,DK,UK], ['192.168.1.1','172.10.10.10'], [12]) } if auth in results and (myvalue in results[auth].Countries): ... BTW should count really be a list? > 2. How do I append new values to these lists within the nested dict? Same as any other list, just use the append() method: dic[key1][key2].append(value) or with a class: results[auth].IPs.append(value) HTH -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] File handling Tab separated files
hi again when I #print (self.organismA) under the for x in self.results: , it results in what it is supposed to be. But when i print it in the below function, it gives some garbage value. Kindly let me know what is wrong. :) import functools import csv import time start =time.time() class BioGRIDReader: def __init__(self, filename): self.results = [] self.organisms = {} i = 0 with open(filename) as f: for line in csv.reader(f, delimiter = '\t'): i += 1 if i>35: self.results.append(line) #print (self.results) for x in self.results: self.organismA = x[2] self.organismB = x[3] self.temp = (x[2],) self.keys = self.temp self.values = [x[:]] self.organisms = dict(zip(self.keys, self.values)) #print (self.organismA) #print (self.results[0:34]) #omitted region def getMostAbundantTaxonIDs(self,n): #print (self.organismA) self.temp_ = 0 self.number_of_interactions = [] self.interaction_dict = {} for x in self.organismA: for value in self.organisms: if (x in value): self.temp_ += 1 self.number_of_interactions.append(self.temp_) self.interaction_dict = dict(zip(self.organismA, self.number_of_interactions)) a = BioGRIDReader("BIOGRID-ALL-3.4.159.tab.txt") a.getMostAbundantTaxonIDs(5) end = time.time() #print(end - start) Thanking you in advance Best Regards NIHARIKA On Fri, Apr 20, 2018 at 11:06 AM, Alan Gauld wrote: > > Use Reply-All or Reply-List to include the mailing list in replies. > > On 20/04/18 09:10, Niharika Jakhar wrote: > > Hi > > > > I want to store the data of file into a data structure which has 11 > > objects per line , something like this: > > 2354 somethin2 23nothing 23214. > > > > > > so I was trying to split the lines using \n and storer each line in a > > list so I have a list of 11 objects, then I need to retrieve the last > > two position, > > You are using the csv module so you don't need to split the lines, the > csv reader has already done that for you. It generates a sequence of > tuples, one per line. > > So you only need to do something like: > > results = [] > with open(filename) as f: > for line in csv.reader(f, delimiter='\t'): > if line[-1] == line[-2]: > results.append(line[2],line[3]) > > Let the library do the work. > > You can see what the reader is doing by inserting a print(line) call > instead of the if statement. When using a module for the first time > don't be afraid to use print to check the input/output values. > Its better than guessing. > > > -- > Alan G > Author of the Learn to Program web site > http://www.alan-g.me.uk/ > http://www.amazon.com/author/alan_gauld > Follow my photo-blog on Flickr at: > http://www.flickr.com/photos/alangauldphotos > > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] (no subject)
On 12/31/1969 05:00 PM, wrote: > Hello everybody, > I'm coming from a Perl background and try to parse some Exim Logfiles into a > data structure of dictionaries. The regex and geoip part works fine and I'd > like to save the email adress, the countries (from logins) and the count of > logins. > > The structure I'd like to have: > > result = { > 'f...@bar.de': { > 'Countries': [DE,DK,UK] > 'IP': ['192.168.1.1','172.10.10.10'] > 'Count': [12] > } > 'b...@foo.de': { > 'Countries': [DE,SE,US] > 'IP': ['192.168.1.2','172.10.10.11'] > 'Count': [23] > } > } I presume that's pseudo-code, since it's missing punctuation (commas between elements) and the country codes are not quoted > > I don't have a problem when I do these three seperately like this with a one > dimensonial dict (snippet): > > result = defaultdict(list) > > with open('/var/log/exim4/mainlog',encoding="latin-1") as logfile: > for line in logfile: > result = pattern.search(line) > if (result): > login_ip = result.group("login_ip") > login_auth = result.group("login_auth") > response = reader.city(login_ip) > login_country = response.country.iso_code > if login_auth in result and login_country in result[login_auth]: > continue > else: > result[login_auth].append(login_country) > else: > continue > > This checks if the login_country exists within the list of the specific > login_auth key, adds them if they don't exist and gives me the results I want. > This also works for the ip addresses and the number of logins without any > problems. > > As I don't want to repeat these loops three times with three different data > structures I'd like to do this in one step. There are two main problems I > don't understand right now: > > 1. How do I check if a value exists within a list which is the value of a key > which is again a value of a key in my understanding exists? What I like to do: > > if login_auth in result and (login_country in result[login_auth][Countries]) > continue you don't actually need to check (there's a Python aphorism that goes something like "It's better to ask forgiveness than permission"). You can do: try: result[login_auth]['Countries'].append(login_country) except KeyError: # means there was no entry for login_auth # so add one here that will happily add another instance of a country if it's already there, but there's no problem with going and cleaning the 'Countries' value later (one trick is to take that list, convert it to a set, then (if you want) convert it back to a list if you need unique values. you're overloading the name result here so this won't work literally - you default it outside the loop, then also set it to the regex answer... I assume you can figure out how to fix that up. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Dict of Dict with lists
On 25/04/2018 –– 18:35:30PM +0100, Alan Gauld via Tutor wrote: > > ... > > for line in logfile: > > result = pattern.search(line) > Doesn't this overwrite your data structure? > I would strongly advise using another name. You are of course right. I accidentally shortened this name as I was trying to fit my code into 80 characters width of this mail. That was sloppy ;) > However personally I'd use a class to define tyour data structure and > just have a top leveldictionary holding instances of the class. You are right (again). I haven't thougt of using classes, but that's exactly what they were invented for. Thanks for pointing that out. Thanks for the help! ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] (no subject)
On 25/04/2018 –– 12:19:28PM -0600, Mats Wichmann wrote: > I presume that's pseudo-code, since it's missing punctuation (commas > between elements) and the country codes are not quoted Yes, that was just a short pseudo code example of what I wanted to achieve. > you don't actually need to check (there's a Python aphorism that goes > something like "It's better to ask forgiveness than permission"). > You can do: > try: > result[login_auth]['Countries'].append(login_country) > except KeyError: > # means there was no entry for login_auth > # so add one here I see. That'd be better indeed. The try/except concept is still rather new to me and I still have to get used to it. Thanks for your hints! I'm sure that I can work with these suggestions ;) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] Async TCP Server
Hi, I've come up with an idea for a new protocol I want to implement in Python using 3.6 (or maybe 3.7 when that comes out), but I'm somewhat confused about how to do it in an async way. The way I understand it is that you have a loop that waits for an incoming request and then calls a function/method asynchronously which handles the incoming request. While that is happening the main event loop is still listening for incoming connections. Is that roughly correct? The idea is to have a chat application that can at least handle a few hundred clients if not more in the future. I'm planning on using Python because I am pretty up-to-date with it, but I've never written a network server before. Also another quick question. Does Python support async database operations? I'm thinking of the psycopg2-binary database driver. That way I can offload the storage in the database while still handling incoming connections. If I have misunderstood anything, any clarification would be much appreciated. Simon. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Async TCP Server
On 04/25/2018 05:14 PM, Simon Connah wrote: Hi, I've come up with an idea for a new protocol I want to implement in Python using 3.6 (or maybe 3.7 when that comes out), but I'm somewhat confused about how to do it in an async way. The way I understand it is that you have a loop that waits for an incoming request and then calls a function/method asynchronously which handles the incoming request. While that is happening the main event loop is still listening for incoming connections. Is that roughly correct? The idea is to have a chat application that can at least handle a few hundred clients if not more in the future. I'm planning on using Python because I am pretty up-to-date with it, but I've never written a network server before. Also another quick question. Does Python support async database operations? I'm thinking of the psycopg2-binary database driver. That way I can offload the storage in the database while still handling incoming connections. If I have misunderstood anything, any clarification would be much appreciated. Simon. How does your idea differ from Twisted? ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor