[Tutor] How default arg of function works

2018-06-14 Thread Deepak Dixit
I am learning python and working with function.

Here is my test program :-

program.py

def test1(nums=[]):
return nums


def test2(nums=[]):
nums.append(len(nums));
return nums

print "Calling test1"
print '=' * 40
print 'test1()', test1()
print 'test1([1,2])', test1([1,2])
print 'test1()', test1()
print 'test1([1,1,1])', test1([1,1,1])
print 'test1()', test1()

print "Calling test2"
print '=' * 40

print 'test2()', test2()
print 'test2([1,2,3])', test2([1,2,3])
print 'test2([1,2])', test2([1,2])
print 'test2()', test2()
print 'test2()', test2()

--

# python program.py


Calling test1

test1() [ ]
test1([1,2]) [1, 2]
test1() [ ]
test1([1,1,1]) [1, 1, 1]
test1() [ ]
Calling test2

test2() [0]
test2([1,2,3]) [1, 2, 3, 3]
test2([1,2]) [1, 2, 2]
test2() [0, 1]
test2() [0, 1, 2]


I am assuming that in test1() we are not doing any modification in the
passed list and because of that its working as per my expectation.
But when I am calling test2() with params and without params then both are
using different references.  Why ? Can you please help me to understand
this.

-- 
With Regards,
Deepak Kumar Dixit
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How default arg of function works

2018-06-14 Thread Alan Gauld via Tutor
On 14/06/18 08:04, Deepak Dixit wrote:

> def test2(nums=[]):
> nums.append(len(nums));
> return nums
> 
> print 'test2()', test2()
> print 'test2([1,2,3])', test2([1,2,3])
> print 'test2([1,2])', test2([1,2])
> print 'test2()', test2()
> print 'test2()', test2()
> 
> Calling test2
> 
> test2() [0]
> test2([1,2,3]) [1, 2, 3, 3]
> test2([1,2]) [1, 2, 2]
> test2() [0, 1]
> test2() [0, 1, 2]
> 
> I am assuming that in test1() we are not doing any modification in the
> passed list and because of that its working as per my expectation.

Correct. You are still dealing with 2 separate objects(see below)
but you don't modify anything so it looks like it works as you
expected.

> But when I am calling test2() with params and without params then both are
> using different references.  Why ? Can you please help me to understand
> this.

When you create a default value for a function parameter Python
creates an actual object and uses that object for every invocation
of the function that uses the default. If the object is mutable
(a list or class instance, say) then any modifications to that
object will stick and be present on subsequent calls.

Thus in your case the first call of the function the list is
empty and has len() 0, the second call it still has the zero
stored so len() is now 1, and so on.

But when you call the function without using the default
Python does not involve the default object at all, it uses
whichever object you provide.

The same applies to your tst1 function. Each time you use
the default call the same empty list object is being printed.
But because you don't modify it it always appears as
an empty list.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How default arg of function works

2018-06-14 Thread Deepak Dixit
You mean that for default args and passed args of mutable type, python uses
different object and same reference will be used for further calling of the
function.
Now one more thing I want to ask you that how can I get deep understanding
of python like how list, dictionary works internally and other topics , is
it only possible after more and more practice. Any suggestions from your
side will be really helpful for me.

On Thu, Jun 14, 2018, 1:00 PM Alan Gauld via Tutor  wrote:

> On 14/06/18 08:04, Deepak Dixit wrote:
>
> > def test2(nums=[]):
> > nums.append(len(nums));
> > return nums
> >
> > print 'test2()', test2()
> > print 'test2([1,2,3])', test2([1,2,3])
> > print 'test2([1,2])', test2([1,2])
> > print 'test2()', test2()
> > print 'test2()', test2()
> >
> > Calling test2
> > 
> > test2() [0]
> > test2([1,2,3]) [1, 2, 3, 3]
> > test2([1,2]) [1, 2, 2]
> > test2() [0, 1]
> > test2() [0, 1, 2]
> >
> > I am assuming that in test1() we are not doing any modification in the
> > passed list and because of that its working as per my expectation.
>
> Correct. You are still dealing with 2 separate objects(see below)
> but you don't modify anything so it looks like it works as you
> expected.
>
> > But when I am calling test2() with params and without params then both
> are
> > using different references.  Why ? Can you please help me to understand
> > this.
>
> When you create a default value for a function parameter Python
> creates an actual object and uses that object for every invocation
> of the function that uses the default. If the object is mutable
> (a list or class instance, say) then any modifications to that
> object will stick and be present on subsequent calls.
>
> Thus in your case the first call of the function the list is
> empty and has len() 0, the second call it still has the zero
> stored so len() is now 1, and so on.
>
> But when you call the function without using the default
> Python does not involve the default object at all, it uses
> whichever object you provide.
>
> The same applies to your tst1 function. Each time you use
> the default call the same empty list object is being printed.
> But because you don't modify it it always appears as
> an empty list.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] accessing buttons (tkinter) made with loop

2018-06-14 Thread Freedom Peacemaker
Hello Tutor,
currently im working with tkinter application. Main idea was to create 25
buttons with for loop. Each button text was random number and user needs to
click them in order form lowest to highest. When button is clicked its
being disabled with coresponding color (green-ok, red-not). This is my code:

http://pastebin.pl/view/1ac9ad13

I was looking for help in Google but now i cant figure out, how to move
forward. I am stuck.

Buttons with numbers are created, and all buttons (key) with their numbers
(value) are added to buttons dictionary. So objects are created with
coresponding values. And now is my problem. I tried with use() function /it
gives me only errors painting few buttons to green, one not painted and
rest is red/ but i have no idea how i could access configuration of those
buttons. I think im close but still can see. Can You guide me?

Best Regards
Pi
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How default arg of function works

2018-06-14 Thread Alan Gauld via Tutor
On 14/06/18 08:40, Deepak Dixit wrote:
> You mean that for default args and passed args of mutable type, python uses
> different object and same reference will be used for further calling of the
> function.

Yes, the default argument object is created when the
function is defined (ie before it is even called the
first time) and the same reference to that obje3ct is
always used for the default argument.

When you provide the argument yourself Python uses
whatever object you pass.

> Now one more thing I want to ask you that how can I get deep understanding
> of python like how list, dictionary works internally and other topics , is
> it only possible after more and more practice. 

- You can observe the behaviour as you have been doing.
- You can use the disassembly module (dis) to look at
  the object code generated by the compiler.
- And, the ultimate truth, you can read the C source code
  if you know C.

And of course you can ask questions here. We have several
contributers who understand Python internals quite well.

-- 
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.amazon.com/author/alan_gauld
Follow my photo-blog on Flickr at:
http://www.flickr.com/photos/alangauldphotos


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How default arg of function works

2018-06-14 Thread Deepak Dixit
Thanks a lot for this information.

On Thu, Jun 14, 2018, 4:28 PM Alan Gauld via Tutor  wrote:

> On 14/06/18 08:40, Deepak Dixit wrote:
> > You mean that for default args and passed args of mutable type, python
> uses
> > different object and same reference will be used for further calling of
> the
> > function.
>
> Yes, the default argument object is created when the
> function is defined (ie before it is even called the
> first time) and the same reference to that obje3ct is
> always used for the default argument.
>
> When you provide the argument yourself Python uses
> whatever object you pass.
>
> > Now one more thing I want to ask you that how can I get deep
> understanding
> > of python like how list, dictionary works internally and other topics ,
> is
> > it only possible after more and more practice.
>
> - You can observe the behaviour as you have been doing.
> - You can use the disassembly module (dis) to look at
>   the object code generated by the compiler.
> - And, the ultimate truth, you can read the C source code
>   if you know C.
>
> And of course you can ask questions here. We have several
> contributers who understand Python internals quite well.
>
> --
> Alan G
> Author of the Learn to Program web site
> http://www.alan-g.me.uk/
> http://www.amazon.com/author/alan_gauld
> Follow my photo-blog on Flickr at:
> http://www.flickr.com/photos/alangauldphotos
>
>
> ___
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/mailman/listinfo/tutor
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] accessing buttons (tkinter) made with loop

2018-06-14 Thread Peter Otten
Freedom Peacemaker wrote:

> Hello Tutor,
> currently im working with tkinter application. Main idea was to create 25
> buttons with for loop. Each button text was random number and user needs
> to click them in order form lowest to highest. When button is clicked its
> being disabled with coresponding color (green-ok, red-not). This is my
> code:
> 
> http://pastebin.pl/view/1ac9ad13

That's small enough to incorporate into your message:

> from tkinter import *
> import random
> 
> def use():
> if buttons[guzik] == min(buttons.values()):
> guzik.config(state = 'disabled', bg='green')
> #buttons.pop(guzik) #  error
> else:
> guzik.config(state = 'disabled', bg='red')

If you make that the buttons' command 'guzik' will always refer to the last 
button because that's what the global guzik variable is bound to when the 
for loop below ends. One way to set guzik to a specific button is called 
"closure" (other options are default arguments and callable classes):

def make_use(guzik):
def use():
if buttons[guzik] == min(buttons.values()):
guzik.config(state = 'disabled', bg='green')
else:
guzik.config(state = 'disabled', bg='red')
del buttons[guzik]

return use

Now the local 'guzik' variable inside make_use() is not changed when the 
make_use function ends, and for every call of make_use() the inner use() 
function sees the specific value from the enclosing scope.

> num = [x for x in range(1000)]
> buttons = {}
> 
> root = Tk()
> 
> for i in range(25):
> guzik = Button(root, 
>text = random.sample(num,1), height = 1, width = 7)
> guzik.grid(row = int(i/5), column = i%5)
> buttons[guzik] = int(guzik.cget('text'))

guzik["command"] = make_use(guzik)

> 
> print(buttons)
> 
> root.mainloop()
> 

> 
> I was looking for help in Google but now i cant figure out, how to move
> forward. I am stuck.
> 
> Buttons with numbers are created, and all buttons (key) with their numbers
> (value) are added to buttons dictionary. So objects are created with
> coresponding values. And now is my problem. I tried with use() function
> /it gives me only errors painting few buttons to green, one not painted
> and rest is red/ but i have no idea how i could access configuration of
> those buttons. I think im close but still can see. Can You guide me?
> 
> Best Regards
> Pi


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Virtual environment can't find uno

2018-06-14 Thread Mats Wichmann
On 06/13/2018 06:55 PM, Jim wrote:
> Running Linux Mint 18.
> 
> I have python 3.6 running in a virtual environment.
> 
> I want to use a package called oosheet to work with libreoffice calc.
> When I try to import it I get the following error:
> 
 import oosheet
> Traceback (most recent call last):
>   File "", line 1, in 
>   File
> "/home/jfb/EVs/env36/lib/python3.6/site-packages/oosheet/__init__.py",
> line 38, in 
>     import uno, re, zipfile, types, inspect, tempfile, shutil, subprocess
> ModuleNotFoundError: No module named 'uno'
> 
> If I do the same thing in the system installed python3 it imports and
> runs with no errors.
> 
> python3-uno was installed using apt-get.
> 
> How do I get python 3.6 in the virtual environment to find uno?

You should be able to pip install uno in the virtualenv, which might be
best.  After all, once you're using a virtualenv, you've already started
down the road of picking depends from upstream, so why not :)



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How default arg of function works

2018-06-14 Thread Mats Wichmann
On 06/14/2018 05:01 AM, Deepak Dixit wrote:
> Thanks a lot for this information.
> 
> On Thu, Jun 14, 2018, 4:28 PM Alan Gauld via Tutor  wrote:

>> Yes, the default argument object is created when the
>> function is defined (ie before it is even called the
>> first time) and the same reference to that obje3ct is
>> always used for the default argument.

this turns out to be one of the "surprises" in Python that catches quite
a few people.  once you know how things work under the covers, it makes
sense why it is so, but it's a little unusual on the surface.

def (to define a function) is actually an executable statement that is
run when encountered.  The result is a function object, bound to the
name you def'd.

One of the long-time Python luminaries wrote about this ages ago, I just
went and hunted up the link in case you're interested (not that lots of
other people haven't written about it, but Fredrik Lundh's comments are
usually worth a read, and he's got some advice for you if you're seeking
to get less surprising behavior (that is, don't use an empty list as the
default, instead use a placeholder like None that you can check for):

http://www.effbot.org/zone/default-values.htm


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Virtual environment can't find uno

2018-06-14 Thread Jim

On 06/14/2018 10:51 AM, Mats Wichmann wrote:

On 06/13/2018 06:55 PM, Jim wrote:

Running Linux Mint 18.

I have python 3.6 running in a virtual environment.

I want to use a package called oosheet to work with libreoffice calc.
When I try to import it I get the following error:


import oosheet

Traceback (most recent call last):
   File "", line 1, in 
   File
"/home/jfb/EVs/env36/lib/python3.6/site-packages/oosheet/__init__.py",
line 38, in 
     import uno, re, zipfile, types, inspect, tempfile, shutil, subprocess
ModuleNotFoundError: No module named 'uno'

If I do the same thing in the system installed python3 it imports and
runs with no errors.

python3-uno was installed using apt-get.

How do I get python 3.6 in the virtual environment to find uno?


You should be able to pip install uno in the virtualenv, which might be
best.  After all, once you're using a virtualenv, you've already started
down the road of picking depends from upstream, so why not :)




Is it available for a pip install? I looked on pypi and didn't see it.

It may be incompatible with 3.6. I was looking at the dependencies with 
synaptic and found. Depends Python3(>= 3.5~), Depends Python3(<= 3.6).


Anyway I had forgotten I have a virtual environment with 3.5 in it, so I 
tried that and it works.


Thanks,  Jim



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Virtual environment can't find uno

2018-06-14 Thread Mats Wichmann
On 06/14/2018 11:57 AM, Jim wrote:

> Is it available for a pip install? I looked on pypi and didn't see it.
> 
> It may be incompatible with 3.6. I was looking at the dependencies with
> synaptic and found. Depends Python3(>= 3.5~), Depends Python3(<= 3.6).
> 
> Anyway I had forgotten I have a virtual environment with 3.5 in it, so I
> tried that and it works.
> 
> Thanks,  Jim


hmmm, the github repository the rather minimal PyPi page points to is

https://github.com/elbow-jason/Uno-deprecated

that doesn't give one a warm fuzzy feeling :)

don't know the story here, maybe someone else does.


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


[Tutor] Recursion depth exceeded in python web crawler

2018-06-14 Thread Daniel Bosah
I am trying to modify code from a web crawler to scrape for keywords from
certain websites. However, Im trying to run the web crawler before  I
modify it, and I'm running into issues.

When I ran this code -




*import threading*
*from Queue import Queue*
*from spider import Spider*
*from domain import get_domain_name*
*from general import file_to_set*


*PROJECT_NAME = "SPIDER"*
*HOME_PAGE = "https://www.cracked.com/ "*
*DOMAIN_NAME = get_domain_name(HOME_PAGE)*
*QUEUE_FILE = '/home/me/research/queue.txt'*
*CRAWLED_FILE = '/home/me/research/crawled.txt'*
*NUMBER_OF_THREADS = 1*
*#Captialize variables and make them class variables to make them const
variables*

*threadqueue = Queue()*

*Spider(PROJECT_NAME,HOME_PAGE,DOMAIN_NAME)*

*def crawl():*
*change = file_to_set(QUEUE_FILE)*
*if len(change) > 0:*
*print str(len(change)) + 'links in the queue'*
*create_jobs()*

*def create_jobs():*
*for link in file_to_set(QUEUE_FILE):*
*threadqueue.put(link) #.put = put item into the queue*
*threadqueue.join()*
*crawl()*
*def create_spiders():*
*for _ in range(NUMBER_OF_THREADS): #_ basically if you dont want to
act on the iterable*
*vari = threading.Thread(target = work)*
*vari.daemon = True #makes sure that it dies when main exits*
*vari.start()*

*#def regex():*
*#for i in files_to_set(CRAWLED_FILE):*
*  #reg(i,LISTS) #MAKE FUNCTION FOR REGEX# i is url's, LISTs is
list or set of keywords*
*def work():*
*while True:*
*url = threadqueue.get()# pops item off queue*
*Spider.crawl_pages(threading.current_thread().name,url)*
*threadqueue.task_done()*

*create_spiders()*

*crawl()*


That used this class:

*from HTMLParser import HTMLParser*
*from urlparse import urlparse*

*class LinkFinder(HTMLParser):*
*def _init_(self, base_url,page_url):*
*super()._init_()*
*self.base_url= base_url*
*self.page_url = page_url*
*self.links = set() #stores the links*
*def error(self,message):*
*pass*
*def handle_starttag(self,tag,attrs):*
*if tag == 'a': # means a link*
*for (attribute,value) in attrs:*
*if attribute  == 'href':  #href relative url i.e not
having www*
*url = urlparse.urljoin(self.base_url,value)*
*self.links.add(url)*
*def return_links(self):*
*return self.links()*


And this spider class:



*from urllib import urlopen #connects to webpages from python*
*from link_finder import LinkFinder*
*from general import directory, text_maker, file_to_set, conversion_to_set*


*class Spider():*
* project_name = 'Reader'*
* base_url = ''*
* Queue_file = ''*
* crawled_file = ''*
* queue = set()*
* crawled = set()*


* def __init__(self,project_name, base_url,domain_name):*
* Spider.project_name = project_name*
* Spider.base_url = base_url*
* Spider.domain_name = domain_name*
* Spider.Queue_file =  '/home/me/research/queue.txt'*
* Spider.crawled_file =  '/home/me/research/crawled.txt'*
* self.boot()*
* self.crawl_pages('Spider 1 ', base_url)*

* @staticmethod  *
* def boot():*
*  directory(Spider.project_name)*
*  text_maker(Spider.project_name,Spider.base_url)*
*  Spider.queue = file_to_set(Spider.Queue_file)*
*  Spider.crawled = file_to_set(Spider.crawled_file)*
* @staticmethod*
* def crawl_pages(thread_name, page_url):*
*  if page_url not in Spider.crawled:*
*  print thread_name + 'crawling ' + page_url*
*  print 'queue' + str(len(Spider.queue)) + '|crawled' +
str(len(Spider.crawled))*
*  Spider.add_links_to_queue(Spider.gather_links(page_url))*
*  Spider.crawled.add(page_url)*
*  Spider.update_files()*
* @staticmethod*
* def gather_links(page_url):*
*  html_string = ''*
*  try:*
*  response = urlopen(page_url)*
*  if 'text/html' in response.getheader('Content Type'):*
*  read = response.read()*
*  html_string = read.decode('utf-8')*
*  finder = LinkFinder(Spider.base_url,page_url)*
*  finder.feed(html_string)*
*  except:*
*   print 'Error: cannot crawl page'*
*   return set()*
*  return finder.return_links()*

* @staticmethod*
* def add_links_to_queue(links):*
*for i in links:*
*if i in Spider.queue:*
*continue*
*if i in Spider.crawled:*
*continue*
*   # if Spider.domain_name != get_domain_name(url):*
*#continue*
*Spider.queue.add()*
* @staticmethod*
* def update_files():*
*conversion_to_set(Spider.queue,Spider.Queue_file)*
*conversion_t

[Tutor] In matplotlib, why are there axes classes vs. axes API? Why not list them under one documentation?

2018-06-14 Thread C W
Hello everyone,

I'm working on matplotlib, could someone explain the difference between
these two?

Axes class: https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes
Axes and tick API: https://matplotlib.org/api/axis_api.html

I began at reading axes class, but discovered axes API by accident. Why are
there two documentations on the same thing? Why not merge? I mean, one is
already confusing enough. ;)

Axes is already branching off of plt,plot(), and now there is a branch
within axes. I didn't dare to look, but may even be sub-branches?

Thank you!
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] In matplotlib, why are there axes classes vs. axes API? Why not list them under one documentation?

2018-06-14 Thread Steven D'Aprano
On Thu, Jun 14, 2018 at 12:31:44PM -0400, C W wrote:
> Hello everyone,
> 
> I'm working on matplotlib, could someone explain the difference between
> these two?
> 
> Axes class: https://matplotlib.org/api/axes_api.html#matplotlib.axes.Axes
> Axes and tick API: https://matplotlib.org/api/axis_api.html
> 
> I began at reading axes class, but discovered axes API by accident. Why are
> there two documentations on the same thing? Why not merge? I mean, one is
> already confusing enough. ;)

*shrug*

Because the designers of matplotlib suck at designing a good, easy to 
use, easy to understand, simple API? Because they're not good at writing 
documentation? I dunno.

It is hard to write an API which is both *obvious* and *powerful*, but 
the few times I tried using matplotlib I found it baroque and confusing.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Recursion depth exceeded in python web crawler

2018-06-14 Thread Steven D'Aprano
On Thu, Jun 14, 2018 at 02:32:46PM -0400, Daniel Bosah wrote:

> I am trying to modify code from a web crawler to scrape for keywords from
> certain websites. However, Im trying to run the web crawler before  I
> modify it, and I'm running into issues.
> 
> When I ran this code -

[snip enormous code-dump]

> The interpreter returned this error:
> 
> *RuntimeError: maximum recursion depth exceeded while calling a Python
> object*

Since this is not your code, you should report it as a bug to the 
maintainers of the web crawler software. They wrote it, and it sounds 
like it is buggy.

Quoting the final error message on its own is typically useless, because 
we have no context as to where it came from. We don't know and cannot 
guess what object was called. Without that information, we're blind and 
cannot do more than guess or offer the screamingly obvious advice "find 
and fix the recursion error".

When an error does occur, Python provides you with a lot of useful 
information about the context of the error: the traceback. As a general 
rule, you should ALWAYS quote the entire traceback, starting from the 
line beginning "Traceback: ..." not just the final error message.

Unfortunately, in the case of RecursionError, that information can be a 
firehose of hundreds of identical lines, which is less useful than it 
sounds. The most recent versions of Python redacts that and shows 
something similar to this:

Traceback (most recent call last):
  File "", line 1, in 
  File "", line 2, in f
  [ previous line repeats 998 times ]
RecursionError: maximum recursion depth exceeded

but in older versions you should manually cut out the enormous flood of 
lines (sorry). If the lines are NOT identical, then don't delete them!

The bottom line is, without some context, it is difficult for us to tell 
where the bug is.

Another point: whatever you are using to post your messages (Gmail?) is 
annoyingly adding asterisks to the start and end of each line. I see 
your quoted code like this:

[direct quote]
*import threading*
*from Queue import Queue*
*from spider import Spider*
*from domain import get_domain_name*
*from general import file_to_set*

Notice the * at the start and end of each line? That makes the code 
invalid Python. You should check how you are posting to the list, and if 
you have "Rich Text" or some other formatting turned on, turn it off.

(My guess is that you posted the code in BOLD or perhaps some colour 
other than black, and your email program "helpfully" added asterisks to 
it to make it stand out.)

Unfortunately modern email programs, especially web-based ones like 
Gmail and Outlook.com, make it *really difficult* for technical forums 
like this. They are so intent on making email "pretty" (generally pretty 
ugly) for regular users, they punish technically minded users who need
to focus on the text not the presentation.



-- 
Steve
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Recursion depth exceeded in python web crawler

2018-06-14 Thread Mark Lawrence

On 14/06/18 19:32, Daniel Bosah wrote:

I am trying to modify code from a web crawler to scrape for keywords from
certain websites. However, Im trying to run the web crawler before  I
modify it, and I'm running into issues.

When I ran this code -




*import threading*
*from Queue import Queue*
*from spider import Spider*
*from domain import get_domain_name*
*from general import file_to_set*


*PROJECT_NAME = "SPIDER"*
*HOME_PAGE = "https://www.cracked.com/ "*
*DOMAIN_NAME = get_domain_name(HOME_PAGE)*
*QUEUE_FILE = '/home/me/research/queue.txt'*
*CRAWLED_FILE = '/home/me/research/crawled.txt'*
*NUMBER_OF_THREADS = 1*
*#Captialize variables and make them class variables to make them const
variables*

*threadqueue = Queue()*

*Spider(PROJECT_NAME,HOME_PAGE,DOMAIN_NAME)*

*def crawl():*
*change = file_to_set(QUEUE_FILE)*
*if len(change) > 0:*
*print str(len(change)) + 'links in the queue'*
*create_jobs()*

*def create_jobs():*
*for link in file_to_set(QUEUE_FILE):*
*threadqueue.put(link) #.put = put item into the queue*
*threadqueue.join()*
*crawl()*
*def create_spiders():*
*for _ in range(NUMBER_OF_THREADS): #_ basically if you dont want to
act on the iterable*
*vari = threading.Thread(target = work)*
*vari.daemon = True #makes sure that it dies when main exits*
*vari.start()*

*#def regex():*
*#for i in files_to_set(CRAWLED_FILE):*
*  #reg(i,LISTS) #MAKE FUNCTION FOR REGEX# i is url's, LISTs is
list or set of keywords*
*def work():*
*while True:*
*url = threadqueue.get()# pops item off queue*
*Spider.crawl_pages(threading.current_thread().name,url)*
*threadqueue.task_done()*

*create_spiders()*

*crawl()*


That used this class:

*from HTMLParser import HTMLParser*
*from urlparse import urlparse*

*class LinkFinder(HTMLParser):*
*def _init_(self, base_url,page_url):*
*super()._init_()*
*self.base_url= base_url*
*self.page_url = page_url*
*self.links = set() #stores the links*
*def error(self,message):*
*pass*
*def handle_starttag(self,tag,attrs):*
*if tag == 'a': # means a link*
*for (attribute,value) in attrs:*
*if attribute  == 'href':  #href relative url i.e not
having www*
*url = urlparse.urljoin(self.base_url,value)*
*self.links.add(url)*
*def return_links(self):*
*return self.links()*


It's very unpythonic to define getters like return_links, just access 
self.links directly.





And this spider class:



*from urllib import urlopen #connects to webpages from python*
*from link_finder import LinkFinder*
*from general import directory, text_maker, file_to_set, conversion_to_set*


*class Spider():*
* project_name = 'Reader'*
* base_url = ''*
* Queue_file = ''*
* crawled_file = ''*
* queue = set()*
* crawled = set()*


* def __init__(self,project_name, base_url,domain_name):*
* Spider.project_name = project_name*
* Spider.base_url = base_url*
* Spider.domain_name = domain_name*
* Spider.Queue_file =  '/home/me/research/queue.txt'*
* Spider.crawled_file =  '/home/me/research/crawled.txt'*
* self.boot()*
* self.crawl_pages('Spider 1 ', base_url)*


It strikes me as completely pointless to define this class when every 
variable is at the class level and every method is defined as a static 
method.  Python isn't Java :)


[code snipped]



and these functions:



*from urlparse import urlparse*

*#get subdomain name (name.example.com )*

*def subdomain_name(url):*
*try:*
*return urlparse(url).netloc*
*except:*
*return ''*


It's very bad practice to use a bare except like this as it hides any 
errors and prevents you from using CTRL-C to break out of your code.




*def get_domain_name(url):*
*try:*
*variable = subdomain_name.split(',')*
*return variable[-2] + ',' + variable[-1] #returns 2nd to last and
last instances of variable*
*except:*
*return '''*


The above line is a syntax error.




(there are more functions, but those are housekeeping functions)


The interpreter returned this error:

*RuntimeError: maximum recursion depth exceeded while calling a Python
object*


After calling crawl() and create_jobs() a bunch of times?

How can I resolve this?

Thanks


Just a quick glance but crawl calls create_jobs which calls crawl...

--
My fellow Pythonistas, ask not what our language can do for you, ask
what you can do for our language.

Mark Lawrence

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor