Re: [Tutor] extract uri from beautiful soup string

2012-10-16 Thread eryksun
On Tue, Oct 16, 2012 at 7:52 AM, Norman Khine wrote: > > thanks, i made the changes https://gist.github.com/3891927 On line 67, use the result of soup.findAll directly: assoc_data.extend(assoc_cont.renderContents() for assoc_cont in soup.findAll('td', {'width': '49%', 'class': 'menu2

Re: [Tutor] extract uri from beautiful soup string

2012-10-16 Thread Norman Khine
On Tue, Oct 16, 2012 at 6:40 AM, eryksun wrote: > On Mon, Oct 15, 2012 at 1:17 PM, Norman Khine wrote: >> >> i made an update: https://gist.github.com/3891927 which works based on >> some of the recommendations posted here. >> >> any suggestions for improvement? > > I can't make any specific reco

Re: [Tutor] extract uri from beautiful soup string

2012-10-15 Thread eryksun
On Mon, Oct 15, 2012 at 1:17 PM, Norman Khine wrote: > > i made an update: https://gist.github.com/3891927 which works based on > some of the recommendations posted here. > > any suggestions for improvement? I can't make any specific recommendations about using BeautifulSoup since I haven't used

Re: [Tutor] extract uri from beautiful soup string

2012-10-15 Thread Norman Khine
On Mon, Oct 15, 2012 at 2:02 AM, Sander Sweers wrote: > Sander Sweers schreef op ma 15-10-2012 om 02:35 [+0200]: >> > On Mon, Oct 15, 2012 at 12:12 AM, Sander Sweers >> > wrote: >> > > Norman Khine schreef op zo 14-10-2012 om 23:10 [+0100]: >> > Norman Khine schreef op ma 15-10-2012 om 00:17 [+0

Re: [Tutor] extract uri from beautiful soup string

2012-10-14 Thread Sander Sweers
Sander Sweers schreef op ma 15-10-2012 om 02:35 [+0200]: > > On Mon, Oct 15, 2012 at 12:12 AM, Sander Sweers > > wrote: > > > Norman Khine schreef op zo 14-10-2012 om 23:10 [+0100]: > > Norman Khine schreef op ma 15-10-2012 om 00:17 [+0100]: > > i tried this: http://pastie.org/5059153 Btw, if I

Re: [Tutor] extract uri from beautiful soup string

2012-10-14 Thread Sander Sweers
Please don't top post. > On Mon, Oct 15, 2012 at 12:12 AM, Sander Sweers > wrote: > > Norman Khine schreef op zo 14-10-2012 om 23:10 [+0100]: > >> One thing is that when I try to write the assoc_data into a CSV file, > >> it groaks on > >> > >> UnicodeEncodeError: 'ascii' codec can't encode char

Re: [Tutor] extract uri from beautiful soup string

2012-10-14 Thread Norman Khine
i tried this: http://pastie.org/5059153 but now i get a Traceback (most recent call last): File "nimes_extract.py", line 75, in c.writerow([item.encode("UTF-8")]) TypeError: 'NoneType' object is not callable On Mon, Oct 15, 2012 at 12:12 AM, Sander Sweers wrote: > Norman Khine schreef

Re: [Tutor] extract uri from beautiful soup string

2012-10-14 Thread Sander Sweers
Norman Khine schreef op zo 14-10-2012 om 23:10 [+0100]: > One thing is that when I try to write the assoc_data into a CSV file, > it groaks on > > UnicodeEncodeError: 'ascii' codec can't encode character u'\xc7' in position > 0: It looks like python is doing an implicit decode/encode on one of y

Re: [Tutor] extract uri from beautiful soup string

2012-10-14 Thread Norman Khine
Hi thanks, i changed the code to http://pastie.org/5059153 One thing is that when I try to write the assoc_data into a CSV file, it groaks on UnicodeEncodeError: 'ascii' codec can't encode character u'\xc7' in position 0: here some sample data from the print: [u'Social', u'Action9', u'ash-ni...

Re: [Tutor] extract uri from beautiful soup string

2012-10-14 Thread Steven D'Aprano
On 15/10/12 05:05, Norman Khine wrote: for page_no in pages: [...] try: urllib2.urlopen(req) except urllib2.URLError, e: pass else: # do something with the page doc = urllib2.urlopen(req) This is a bug. J

Re: [Tutor] extract uri from beautiful soup string

2012-10-14 Thread Norman Khine
ignore, i got it: get_url = re.compile(r"""window.open\('(.*)','','toolbar=0,""", re.DOTALL).findall ... get_onclick = str(soup('a')[0]['onclick']) # get the 'onclick' attribute urls = get_url(get_onclick) print assoc_name

[Tutor] extract uri from beautiful soup string

2012-10-14 Thread Norman Khine
hello, i have this code: #!/usr/local/bin/python # -*- coding: utf-8 -*- import re import urllib2 import BeautifulSoup origin_site = 'http://DOMAIN.TLD/index.php?id=annuaire_assos&theme=0&rech=&num_page=' pages = range(1,3) for page_no in pages: print '== %s' % page_no re