Re: [Tutor] threading for each line in a large file, and doing it right

2018-04-25 Thread Alan Gauld via Tutor
On 25/04/18 03:26, Evuraan wrote:

> Please consider this situation :
> Each line in "massive_input.txt" need to be churned by the
> "time_intensive_stuff" function, so I am trying to background it.

What kind of "churning" is involved?
If its compute intensive threading may not be the right
answer, but if its I/O bound then threading is probably
ok.

> import threading
> 
> def time_intensive_stuff(arg):
># some code, some_conditional
>return (some_conditional)

What exactly do you mean by some_conditional?
Is it some kind of big decision tree? Or if/else network?
Or is it dependent on external data
(from where? a database? network?)

And you return it - but what is returned?
 - an expression, a boolean result?

Its not clear what the nature of the task is but that
makes a big difference to how best to parallelise the work.

> with open("massive_input.txt") as fobj:
>for i in fobj:
>   thread_thingy = thread.Threading(target=time_intensive_stuff, args=(i,) 
> )
>   thread_thingy.start()
> 
> With above code, it still does not feel like it is backgrounding at
> scale,  

Can you say why you feel that way?
What measurements have you done?
What system observations(CPU, Memory, Network etc)?
What did you expect to see and what did you see.

Also consider that processing a huge number of lines
will generate a huge number of subprocesses or
threads. There is an overhead to each thread and
your computer may not have enough resources to
run them all efficiently.

It may be better to batch the lines so each subprocess
handles 10, or 50 or 100 lines (whatever makes sense).
Put a loop into your time intensive function to process
the list of input values and return a list of outputs.

And your external loop needs an inner loop to create
the batches. The number of entries in the batch can
be parametrized so that you can experiment to find
the most cost effective size..

> I am sure there is a better pythonic way.

I suspect the issues are not Python specific but
are more generally about paralleling large jobs.

> How do I achieve something like this bash snippet below in python:
> 
> time_intensive_stuff_in_bash(){
># some code
>   :
> }
> 
> for i in $(< massive_input.file); do
> time_intensive_stuff_in_bash i & disown
> :
> done

Its the same except in bash you start a whole
new process so instead of using threading you
use concurrent. But did you try this in bash?
Was it faster than using Python? I would expect
the same issues of too many processes to arise
in bash.

HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] threading for each line in a large file, and doing it right

2018-04-25 Thread Alan Gauld via Tutor
On 25/04/18 09:27, Alan Gauld via Tutor wrote:

>> for i in $(< massive_input.file); do
>> time_intensive_stuff_in_bash i & disown
>> :
>> done
> 
> Its the same except in bash you start a whole
> new process so instead of using threading you
> use concurrent. 

concurrent -> multiprocessing
doh!

sorry
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Calling the same thread multiple times

2018-04-25 Thread Vagelis
Hello,

Im creating a progress bar for applications that can keep track of a
download in progress. The progress bar will be on a separate thread and
will communicate with the main thread using delegates.
Ive made the download and the progress bar part, all that remains is the
connection between the two of them.

For this purpose i tried to simplify the problem, but i cant seem to
make it right.

Here's what i got so far...

import threading

def test():
    print(threading.current_thread())

for i in range(5):

    print(threading.current_thread())

    t1 = threading.Thread(target = test)

    t1.start()
    t1.join()


This gives me the output:

<_MainThread(MainThread, started 139983023449408)>

<_MainThread(MainThread, started 139983023449408)>

<_MainThread(MainThread, started 139983023449408)>

<_MainThread(MainThread, started 139983023449408)>

<_MainThread(MainThread, started 139983023449408)>


What i need to do is to call the same thread (Thread-1) multiple times,
and the call (of the test function) must be IN the for loop.

Ive also tried something like that:

import threading
import queue

def test():
    print(threading.current_thread())
    i = q.get()
    print(i)

q = queue.Queue()
t1 = threading.Thread(target = test)
t1.start()

for i in range(5):
    print(threading.current_thread())
    q.put(i)


t1.join()

The result im getting is :


<_MainThread(MainThread, started 140383183029568)>
<_MainThread(MainThread, started 140383183029568)>
0
<_MainThread(MainThread, started 140383183029568)>
<_MainThread(MainThread, started 140383183029568)>
<_MainThread(MainThread, started 140383183029568)>

Any ideas on how to solve this?

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] I'm attempting to code the barbershop problem in OS except with 3 barbers instead of one. Can anyone help rewrite my Barber1 and Barber2 classes so it just uses the functions already defined i

2018-04-25 Thread Michael Solan
class Barber: barberWorkingEvent = Event() def sleep(self):
self.barberWorkingEvent.wait() def wakeUp(self):
self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as
busy self.barberWorkingEvent.clear() print '{0} is having a haircut from
barber\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5)
time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name)
class Barber1: barberWorkingEvent = Event() def sleep(self):
self.barberWorkingEvent.wait() def wakeUp(self):
self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as
busy self.barberWorkingEvent.clear() print '{0} is having a haircut from
barber1\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5)
time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name)
class Barber2: barberWorkingEvent = Event() def sleep(self):
self.barberWorkingEvent.wait() def wakeUp(self):
self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as
busy self.barberWorkingEvent.clear() print '{0} is having a haircut from
barber1\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5)
time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] I'm attempting to code the barbershop problem...

2018-04-25 Thread Steven D'Aprano
Hi Michael and welcome.

In future, please leave the Subject line as a BRIEF summary, and put the 
description of your problem in the body of the email.

You said:

> I'm attempting to code the barbershop problem in OS except
> with 3 barbers instead of one. Can anyone help rewrite my Barber1 and
> Barber2 classes so it just uses the functions already defined in the
> original Barber class.

What's the barbershop problem?

Why do you need three classes?


On Wed, Apr 25, 2018 at 11:29:23AM +0100, Michael Solan wrote:
> class Barber: barberWorkingEvent = Event() def sleep(self):
> self.barberWorkingEvent.wait() def wakeUp(self):
> self.barberWorkingEvent.set() def cutHair(self, customer): #Set barber as
> busy self.barberWorkingEvent.clear() print '{0} is having a haircut from
> barber\n'.format(customer.name) HairCuttingTime = random.randrange(0, 5)
> time.sleep(HairCuttingTime) print '{0} is done\n'.format(customer.name)

The formatting here is completely messed up. If you are posting using 
Gmail, you need to ensure that your email uses no formatting (no bold, 
no colours, no automatic indentation etc), or else Gmail will mangle the 
indentation of your code, as it appears to have done above.

My wild guess is that what you probably want is something like this:

import random
import time

class Barber(object):
def __init__(self, name):
self.workingEvent = Event()  # What is Event?
self.name = name
def sleep(self):
self.workingEvent.wait()
def wakeUp(self):
self.workingEvent.set()
def cutHair(self, customer):
# Set this barber as busy.
self.workingEvent.clear()
template = '{0} is having a haircut from barber {1}\n'
print template.format(customer.name, self.name)
HairCuttingTime = random.randrange(0, 5)
time.sleep(HairCuttingTime)
print '{0} is done\n'.format(customer.name)


tony = Barber('Tony')
fred = Barber('Fred')
george = Barber('George')

# and then what?



Notice that we have *one* class and three instances of that class. I've 
given them individual names so they're easier to distinguish.


Please ensure you reply on the mailing list.


-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] I'm attempting to code the barbershop problem...

2018-04-25 Thread Mats Wichmann

> What's the barbershop problem?

a classic computer science puzzle which is essentially a process
synchronization problem.

it does help to spell out the problem you are trying to solve, however -
we don't have the context the original poster is operating in.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Dict of Dict with lists

2018-04-25 Thread Kai Bojens
Hello everybody,
I'm coming from a Perl background and try to parse some Exim Logfiles into a
data structure of dictionaries. The regex and geoip part works fine and I'd
like to save the email adress, the countries (from logins) and the count of
logins.

The structure I'd like to have:

result = {
'f...@bar.de': {
'Countries': [DE,DK,UK]
'IP': ['192.168.1.1','172.10.10.10']
'Count': [12]
}
'b...@foo.de': {
'Countries': [DE,SE,US]
'IP': ['192.168.1.2','172.10.10.11']
'Count': [23]
}
}

I don't have a problem when I do these three seperately like this with a one
dimensonial dict (snippet):

result = defaultdict(list)

with open('/var/log/exim4/mainlog',encoding="latin-1") as logfile:
for line in logfile:
result = pattern.search(line)
if (result):
login_ip = result.group("login_ip")
login_auth =  result.group("login_auth")
response = reader.city(login_ip)
login_country = response.country.iso_code
if login_auth in result and login_country in result[login_auth]:
continue
else:
result[login_auth].append(login_country)
else:
continue

This checks if the login_country exists within the list of the specific
login_auth key, adds them if they don't exist and gives me the results I want.
This also works for the ip addresses and the number of logins without any 
problems. 

As I don't want to repeat these loops three times with three different data
structures I'd like to do this in one step. There are two main problems I
don't understand right now:

1. How do I check if a value exists within a list which is the value of a key 
which is again a value of a key in my understanding exists? What I like to do:

 if login_auth in result and (login_country in result[login_auth][Countries])
  continue

This obviously does not work and I am not quite sure how to address the values
of 'Countries' in the right way. I'd like to check 'Key1:Key2:List' and don't
know how to address this

2. How do I append new values to these lists within the nested dict? 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Dict of Dict with lists

2018-04-25 Thread Alan Gauld via Tutor
On 25/04/18 14:22, Kai Bojens wrote:

> The structure I'd like to have:
> 
> result = {
> 'f...@bar.de': {
> 'Countries': [DE,DK,UK]
> 'IP': ['192.168.1.1','172.10.10.10']
> 'Count': [12]
> }
> }
> ...
> for line in logfile:
> result = pattern.search(line)

Doesn't this overwrite your data structure?
I would strongly advise using another name.

> if (result):
> login_ip = result.group("login_ip")
> login_auth =  result.group("login_auth")
> response = reader.city(login_ip)
> login_country = response.country.iso_code
> if login_auth in result and login_country in result[login_auth]:
> continue
> else:
> result[login_auth].append(login_country)
> else:
> continue

> 1. How do I check if a value exists within a list which is the value of a key 
> which is again a value of a key in my understanding exists? What I like to do:

dic = {'key1':{'key2':[...]}}

if my_value in dic[key1][key2]:

>  if login_auth in result and (login_country in result[login_auth][Countries])
>   continue

Should work.

> This obviously does not work and I am not quite sure how to address the values
> of 'Countries' in the right way. I'd like to check 'Key1:Key2:List' and don't
> know how to address this

It should worjk as you expect.
However personally I'd use a class to define tyour data structure and
just have a top leveldictionary holding instances of the class.
Something like:

class Login:
   def __init__(self, countries, IPs, count):
  self.countries = countries
  self.IPs = IPs
  self.count = count

results = {'f...@bar.de': Login([DE,DK,UK],
   ['192.168.1.1','172.10.10.10'],
   [12])
  }

if auth in results and (myvalue in results[auth].Countries):
   ...

BTW should count really be a list?

> 2. How do I append new values to these lists within the nested dict? 

Same as any other list, just use the append() method:

dic[key1][key2].append(value)

or with a class:

results[auth].IPs.append(value)


HTH
-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] File handling Tab separated files

2018-04-25 Thread Niharika Jakhar
hi again


when I #print (self.organismA) under the for x in self.results: , it
results in what it is supposed to be.
But when i print it in the below function, it gives some garbage value.

Kindly let me know what is wrong. :)




import functools
import csv
import time
start =time.time()

class BioGRIDReader:
def __init__(self, filename):
self.results = []
self.organisms = {}
i = 0
with open(filename) as f:
for line in csv.reader(f, delimiter = '\t'):
i += 1
if i>35:
self.results.append(line)
#print (self.results)
for x in self.results:
self.organismA = x[2]
self.organismB = x[3]
self.temp = (x[2],)
self.keys = self.temp
self.values = [x[:]]
self.organisms = dict(zip(self.keys, self.values))
#print (self.organismA)
#print (self.results[0:34]) #omitted region

def getMostAbundantTaxonIDs(self,n):
#print (self.organismA)
self.temp_ = 0
self.number_of_interactions = []
self.interaction_dict = {}
for x in self.organismA:
for value in self.organisms:
if (x in value):
self.temp_ += 1
self.number_of_interactions.append(self.temp_)
self.interaction_dict = dict(zip(self.organismA,
self.number_of_interactions))



a = BioGRIDReader("BIOGRID-ALL-3.4.159.tab.txt")
a.getMostAbundantTaxonIDs(5)
end = time.time()
#print(end - start)













Thanking you in advance

Best Regards
NIHARIKA

On Fri, Apr 20, 2018 at 11:06 AM, Alan Gauld  wrote:

>
> Use Reply-All or Reply-List to include the mailing list in replies.
>
> On 20/04/18 09:10, Niharika Jakhar wrote:
> > Hi
> >
> > I want to store the data of file into a data structure which has 11
> > objects per line , something like this:
> > 2354 somethin2  23nothing   23214.
> >
> >
> > so I was trying to split the lines using \n and storer each line in a
> > list so I have a list of 11 objects, then I need to retrieve the last
> > two position,
>
> You are using the csv module so you don't need to split the lines, the
> csv reader has already done that for you. It generates a sequence of
> tuples, one per line.
>
> So you only need to do something like:
>
> results = []
> with open(filename) as f:
>  for line in csv.reader(f, delimiter='\t'):
>  if line[-1] == line[-2]:
> results.append(line[2],line[3])
>
> Let the library do the work.
>
> You can see what the reader is doing by inserting a print(line) call
> instead of the if statement. When using a module for the first time
> don't be afraid to use print to check the input/output values.
> Its better than guessing.
>
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] (no subject)

2018-04-25 Thread Mats Wichmann
On 12/31/1969 05:00 PM,  wrote:
> Hello everybody,
> I'm coming from a Perl background and try to parse some Exim Logfiles into a
> data structure of dictionaries. The regex and geoip part works fine and I'd
> like to save the email adress, the countries (from logins) and the count of
> logins.
> 
> The structure I'd like to have:
> 
> result = {
> 'f...@bar.de': {
> 'Countries': [DE,DK,UK]
> 'IP': ['192.168.1.1','172.10.10.10']
> 'Count': [12]
> }
> 'b...@foo.de': {
> 'Countries': [DE,SE,US]
> 'IP': ['192.168.1.2','172.10.10.11']
> 'Count': [23]
> }
> }

I presume that's pseudo-code, since it's missing punctuation (commas
between elements) and the country codes are not quoted

> 
> I don't have a problem when I do these three seperately like this with a one
> dimensonial dict (snippet):
> 
> result = defaultdict(list)
> 
> with open('/var/log/exim4/mainlog',encoding="latin-1") as logfile:
> for line in logfile:
> result = pattern.search(line)
> if (result):
> login_ip = result.group("login_ip")
> login_auth =  result.group("login_auth")
> response = reader.city(login_ip)
> login_country = response.country.iso_code
> if login_auth in result and login_country in result[login_auth]:
> continue
> else:
> result[login_auth].append(login_country)
> else:
> continue
> 
> This checks if the login_country exists within the list of the specific
> login_auth key, adds them if they don't exist and gives me the results I want.
> This also works for the ip addresses and the number of logins without any 
> problems. >
> As I don't want to repeat these loops three times with three different data
> structures I'd like to do this in one step. There are two main problems I
> don't understand right now:
> 
> 1. How do I check if a value exists within a list which is the value of a key 
> which is again a value of a key in my understanding exists? What I like to do:
> 
>  if login_auth in result and (login_country in result[login_auth][Countries])
>   continue

you don't actually need to check (there's a Python aphorism that goes
something like "It's better to ask forgiveness than permission").

You can do:

try:
result[login_auth]['Countries'].append(login_country)
except KeyError:
# means there was no entry for login_auth
# so add one here

that will happily add another instance of a country if it's already
there, but there's no problem with going and cleaning the 'Countries'
value later (one trick is to take that list, convert it to a set, then
(if you want) convert it back to a list if you need unique values.

you're overloading the name result here so this won't work literally -
you default it outside the loop, then also set it to the regex answer...
I assume you can figure out how to fix that up.

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Dict of Dict with lists

2018-04-25 Thread Kai Bojens
On 25/04/2018 –– 18:35:30PM +0100, Alan Gauld via Tutor wrote:
> > ...
> > for line in logfile:
> > result = pattern.search(line)
 
> Doesn't this overwrite your data structure?
> I would strongly advise using another name.

You are of course right. I accidentally shortened this name as I was trying to
fit my code into 80 characters width of this mail. That was sloppy ;) 
 
> However personally I'd use a class to define tyour data structure and
> just have a top leveldictionary holding instances of the class.

You are right (again). I haven't thougt of using classes, but that's exactly
what they were invented for. Thanks for pointing that out. 

Thanks for the help!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] (no subject)

2018-04-25 Thread Kai Bojens
On 25/04/2018 –– 12:19:28PM -0600, Mats Wichmann wrote:
> I presume that's pseudo-code, since it's missing punctuation (commas
> between elements) and the country codes are not quoted

Yes, that was just a short pseudo code example of what I wanted to achieve. 
 
> you don't actually need to check (there's a Python aphorism that goes
> something like "It's better to ask forgiveness than permission").
 
> You can do:
 
> try:
> result[login_auth]['Countries'].append(login_country)
> except KeyError:
> # means there was no entry for login_auth
> # so add one here

I see. That'd be better indeed. The try/except concept is still rather new to me
and I still have to get used to it. 

Thanks for your hints! I'm sure that I can work with these suggestions ;)
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Async TCP Server

2018-04-25 Thread Simon Connah
Hi,

I've come up with an idea for a new protocol I want to implement in
Python using 3.6 (or maybe 3.7 when that comes out), but I'm somewhat
confused about how to do it in an async way.

The way I understand it is that you have a loop that waits for an
incoming request and then calls a function/method asynchronously which
handles the incoming request. While that is happening the main event
loop is still listening for incoming connections.

Is that roughly correct?

The idea is to have a chat application that can at least handle a few
hundred clients if not more in the future. I'm planning on using
Python because I am pretty up-to-date with it, but I've never written
a network server before.

Also another quick question. Does Python support async database
operations? I'm thinking of the psycopg2-binary database driver. That
way I can offload the storage in the database while still handling
incoming connections.

If I have misunderstood anything, any clarification would be much appreciated.

Simon.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Async TCP Server

2018-04-25 Thread Leam Hall

On 04/25/2018 05:14 PM, Simon Connah wrote:

Hi,

I've come up with an idea for a new protocol I want to implement in
Python using 3.6 (or maybe 3.7 when that comes out), but I'm somewhat
confused about how to do it in an async way.

The way I understand it is that you have a loop that waits for an
incoming request and then calls a function/method asynchronously which
handles the incoming request. While that is happening the main event
loop is still listening for incoming connections.

Is that roughly correct?

The idea is to have a chat application that can at least handle a few
hundred clients if not more in the future. I'm planning on using
Python because I am pretty up-to-date with it, but I've never written
a network server before.

Also another quick question. Does Python support async database
operations? I'm thinking of the psycopg2-binary database driver. That
way I can offload the storage in the database while still handling
incoming connections.

If I have misunderstood anything, any clarification would be much appreciated.

Simon.


How does your idea differ from Twisted?

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor