Re: [Tutor] Critique and Question

Dave Angel Mon, 28 Nov 2011 04:29:07 -0800

On 11/28/2011 04:28 AM, Mark Lybrand wrote:

Okay, so I just started to learn Python.  I have been working through Dive
Into Python 3 and the Google stuff (great exercises IMHO, totally fun).
  However, with Dive, I had an issue with him referencing the files in the
example directory, which from the website seem very unhandy.  Although I
have since stumbled upon his GitHub, I made a Python script to grab those
files for me and it works great, with the exception of doubling the line
spacing.  So here is my code. I hope you critique the heck out of my and
that you point out what I did wrong to introduce double line-spacing.
  Thanks a bunch:


import os
import urllib.request
import re

url_root = 'http://diveintopython3.ep.io/examples/'
file_root = os.path.join(os.path.expanduser("~"), "diveintopython3",
"examples")

main_page = urllib.request.urlopen(url_root).read()
main_page = main_page.decode("utf-8")

pattern = 'href="([^"].*?.)(py|xml)"'
matches = re.findall(pattern, main_page)
for my_tuple in matches:
this_file = my_tuple[0] + my_tuple[1]
data = urllib.request.urlopen(url_root + this_file).read()
data = data.decode("utf-8")
with open(os.path.join(file_root, this_file), mode='w', encoding='utf-8')
as a_file:
a_file.write(data)

You don't tell what your environment is, nor how you decide that thefile is double-spaced. You also don't mention whether you're usingPython 2.x or 3.x

My guess is that you are using a Unix/Linux environment, and that theDive author(s) used Windows. And that your text editor is interpretingthe cr/lf pair (hex 0d 0a) as two line-endings. I believe emacs wouldhave ignored the redundant cr. Python likewise probably won't care,though I'm not positive about things like lines that continue acrossnewline boundaries.

You can figure out what is actually in the file by using repr() on bytesread from the file in binary mode. Exactly how you do that will differbetween Python 2.x and 3.x

As for fixing it, you could either just use one of the dos2unixutilities kicking around (one's available on my Ubuntu from the Synapticpackage manager), or you could make your utility manage it. On aregular file open, there's a mode paramter that you can use "u", orbetter "ru" to say Universal. It's intended to handle any of the threecommon line endings, and use a simple newline for all 3 cases. I don'tknow whether urlopen() also has that option, but if not, you can alwayscopy the file after you have it locally.



--

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Critique and Question

Reply via email to