On 11/28/2011 04:28 AM, Mark Lybrand wrote:
Okay, so I just started to learn Python. I have been working through Dive
Into Python 3 and the Google stuff (great exercises IMHO, totally fun).
However, with Dive, I had an issue with him referencing the files in the
example directory, which from the website seem very unhandy. Although I
have since stumbled upon his GitHub, I made a Python script to grab those
files for me and it works great, with the exception of doubling the line
spacing. So here is my code. I hope you critique the heck out of my and
that you point out what I did wrong to introduce double line-spacing.
Thanks a bunch:
import os
import urllib.request
import re
url_root = 'http://diveintopython3.ep.io/examples/'
file_root = os.path.join(os.path.expanduser("~"), "diveintopython3",
"examples")
main_page = urllib.request.urlopen(url_root).read()
main_page = main_page.decode("utf-8")
pattern = 'href="([^"].*?.)(py|xml)"'
matches = re.findall(pattern, main_page)
for my_tuple in matches:
this_file = my_tuple[0] + my_tuple[1]
data = urllib.request.urlopen(url_root + this_file).read()
data = data.decode("utf-8")
with open(os.path.join(file_root, this_file), mode='w', encoding='utf-8')
as a_file:
a_file.write(data)
You don't tell what your environment is, nor how you decide that the
file is double-spaced. You also don't mention whether you're using
Python 2.x or 3.x
My guess is that you are using a Unix/Linux environment, and that the
Dive author(s) used Windows. And that your text editor is interpreting
the cr/lf pair (hex 0d 0a) as two line-endings. I believe emacs would
have ignored the redundant cr. Python likewise probably won't care,
though I'm not positive about things like lines that continue across
newline boundaries.
You can figure out what is actually in the file by using repr() on bytes
read from the file in binary mode. Exactly how you do that will differ
between Python 2.x and 3.x
As for fixing it, you could either just use one of the dos2unix
utilities kicking around (one's available on my Ubuntu from the Synaptic
package manager), or you could make your utility manage it. On a
regular file open, there's a mode paramter that you can use "u", or
better "ru" to say Universal. It's intended to handle any of the three
common line endings, and use a simple newline for all 3 cases. I don't
know whether urlopen() also has that option, but if not, you can always
copy the file after you have it locally.
--
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor