On 11/28/2011 04:28 AM, Mark Lybrand wrote:
Okay, so I just started to learn Python.  I have been working through Dive
Into Python 3 and the Google stuff (great exercises IMHO, totally fun).
  However, with Dive, I had an issue with him referencing the files in the
example directory, which from the website seem very unhandy.  Although I
have since stumbled upon his GitHub, I made a Python script to grab those
files for me and it works great, with the exception of doubling the line
spacing.  So here is my code. I hope you critique the heck out of my and
that you point out what I did wrong to introduce double line-spacing.
  Thanks a bunch:

import os
import urllib.request
import re

url_root = 'http://diveintopython3.ep.io/examples/'
file_root = os.path.join(os.path.expanduser("~"), "diveintopython3",
"examples")

main_page = urllib.request.urlopen(url_root).read()
main_page = main_page.decode("utf-8")

pattern = 'href="([^"].*?.)(py|xml)"'
matches = re.findall(pattern, main_page)
for my_tuple in matches:
this_file = my_tuple[0] + my_tuple[1]
data = urllib.request.urlopen(url_root + this_file).read()
data = data.decode("utf-8")
with open(os.path.join(file_root, this_file), mode='w', encoding='utf-8')
as a_file:
a_file.write(data)

You don't tell what your environment is, nor how you decide that the file is double-spaced. You also don't mention whether you're using Python 2.x or 3.x

My guess is that you are using a Unix/Linux environment, and that the Dive author(s) used Windows. And that your text editor is interpreting the cr/lf pair (hex 0d 0a) as two line-endings. I believe emacs would have ignored the redundant cr. Python likewise probably won't care, though I'm not positive about things like lines that continue across newline boundaries.

You can figure out what is actually in the file by using repr() on bytes read from the file in binary mode. Exactly how you do that will differ between Python 2.x and 3.x

As for fixing it, you could either just use one of the dos2unix utilities kicking around (one's available on my Ubuntu from the Synaptic package manager), or you could make your utility manage it. On a regular file open, there's a mode paramter that you can use "u", or better "ru" to say Universal. It's intended to handle any of the three common line endings, and use a simple newline for all 3 cases. I don't know whether urlopen() also has that option, but if not, you can always copy the file after you have it locally.


--

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to