With Stevens help about writing and Peters help about import codecs - and when I used \r\n instead of \r to give me new lines everything worked. I just thought that \n would be necessary? Thanks. Tommy
> -----Oprindelig meddelelse----- > Fra: tutor-bounces+tommy.kaas=kaasogmulvad...@python.org > [mailto:tutor-bounces+tommy.kaas=kaasogmulvad...@python.org] På > vegne af Peter Otten > Sendt: 29. december 2010 11:46 > Til: tutor@python.org > Emne: Re: [Tutor] scraping and saving in file > > Tommy Kaas wrote: > > > I’m trying to learn basic web scraping and starting from scratch. I’m > > using Activepython 2.6.6 > > > I have uploaded a simple table on my web page and try to scrape it and > > will save the result in a text file. I will separate the columns in > > the file with #. > > > It works fine but besides # I also get spaces between the columns in > > the text file. How do I avoid that? > > > This is the script: > > > import urllib2 > > from BeautifulSoup import BeautifulSoup f = open('tabeltest.txt', 'w') > > soup = > BeautifulSoup(urllib2.urlopen('http://www.kaasogmulvad.dk/unv/python/ta > belte > > st.htm').read()) > > > rows = soup.findAll('tr') > > > for tr in rows: > > cols = tr.findAll('td') > > print >> f, > > cols[0].string,'#',cols[1].string,'#',cols[2].string,'#',cols[3].strin > > g > > > > f.close() > > > And the text file looks like this: > > > Kommunenr # Kommune # Region # Regionsnr > > 101 # København # Hovedstaden # 1084 > > 147 # Frederiksberg # Hovedstaden # 1084 > > 151 # Ballerup # Hovedstaden # 1084 > > 153 # Brøndby # Hovedstaden # 1084 > > The print statement automatically inserts spaces, so you can either resort to > the write method > > for i in range(4): > if i: > f.write("#") > f.write(cols[i].string) > > which is a bit clumsy, or you build the complete line and then print it as a > whole: > > print >> f, "#".join(col.string for col in cols) > > Note that you have non-ascii characters in your data -- I'm surprised that > writing to a file works for you. I would expect that > > import codecs > f = codecs.open("tmp.txt", "w", encoding="utf-8") > > is needed to successfully write your data to a file > > Peter > > _______________________________________________ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor