Re: [Tutor] Python Tutorials: How to create useful programs after learning the syntax?

2009-07-06 Thread Luke Paireepinart

Luis Galvan wrote:
Hello all, this is my first time using a mailing list, so I'm not sure 
if I'm doing this right!
Everything's fine except perhaps your formatting - it's easier to read 
e-mails that are delineated into paragraphs rather than just a single 
block of text.  That may be a problem on my end, I'm not sure.
 If anyone has any idea on how I can really get started with 
programming useful programs, please do let me know!  Any help would be 
immensely appreciated!  
What sorts of things do you want to make?  The most successful projects 
are usually borne from people's desire for a tool that doesn't exist yet.
Or have you always wanted to make a video game?  Those can be very 
satisfying first projects.  There are very good tutorials available for 
beginning game development with Python + Pygame, and also look into Pyglet.


One thing games do well is let you make that leap of understanding from 
just manipulating variables and lists and such to actually seeing the 
bigger picture and having those manipulations just be the 
behind-the-scenes implementation details to get to what you are really 
trying to do.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] int to bytes and the other way around

2009-07-06 Thread Timo
I have written a program that uses a C++ module as backend. Now I found 
out that I can use Python to call an underneath C lib. That's nice, so I 
don't need to Popen() the C++ module.


I have a problem though with some info that is returned (always an integer).
I'll try to explain a bit, this is what I found out so far.
There are 4 options in the program. The first 3 options go up to 18 and 
the fourth to 7.

If all options are set to 0, the returned int is 0.
If the first option is set from 1 to 18, this is what I get returned.
However, for option 2, I get 256, 512, 768, 1024, etc.
For option 3 I get 65536, 131072, 196608, etc, etc.
And for option 4: 16777216, 33554432, etc.

Ok, that's nice so far. But if option 1 is set to 4 and option 2 is set 
to 8 and option 3 is set to 10 (for example), I get this returned: 657412


The C++ module counts the bytes. First byte = option1, second byte = 
option2 etc.

   u8 *options = (u8 *)&result[1];
   option1 = options[0]
   option2 = options[1]
   option3 = options[2]
   option4 = options[3]

How will I approach this in Python?
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Tutorials: How to create useful programs afterlearning the syntax?

2009-07-06 Thread Alan Gauld

"Luis Galvan"  wrote


Hello all, this is my first time using a mailing list, so I'm not sure if
I'm doing this right!


Yep. You send a mail to the list, we reply,. Easy :-)
One thing to remember is when you reply use "ReplyAll"
on your mail tool, not simple Reply.

of programming. (I'm new to programming)  What I'm looking for is a 
tutorial

series that walks you through the steps of writing a useful program in
Python.


Try the Case Study in my tutorial, iot tales you through what is
supposed to be the typical evolution of an idea from a simple concept
(a word counter) to something grander (a grammar counter) then
adding a GUI front end. It is still somewhat contrioved since it is a
beginners tutorial adfter all, but it might give you some ideas.

There are also two rolling examples throughout the tutorial:
- a simple multiplication table printer in the early topics
- an address book which extends all the way to a full client-server
database driven program. (Eventually it will have a GUI and web
front end too! :-)



Whether it be a text editor, a simple browser, etc,


Neither of those is particularly "simple"! You probably want to
moderate your expectations for your first projects.


teach me how I can use everything I learned about manipulating strings,
creating classes, and importing modules and how to stir it all up into
something meaningful.


Hopefully my case study covers all of those.


--
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/ 



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] append question

2009-07-06 Thread Steven Buck
Thanks for the previous responses.  This isn't homework--I'm beyond
coursework, although I am a newbie to Python (and I've never had to do much
real programming since I've just used Stata for econometric analysis).  I'm
testing Python as a more powerful alternative to Stata.

I've learned from the responses I received, although now see my problem
differently.  The data structure I have uses a dictionary and know now
that the append command doesn't work.  Having said that, perhaps my
variables of interest have already been created--perhaps I just don't know
how to identify them.  I've been using some borrowed code to get me started;
my modified version is below:


import sys

# The modules below help me get a .dta file into Python. # Although I'm not
sure what form they take; I suppose a list of lists???
from StataTools import Reader
from StataTypes import MissingValue

# I call my data set the psid (Panel Study of Income Dynamics) # In Stata
this would look like and NXK matrix (N observations and K variables)
psid=Reader(file('data3.dta'))

# I gather this next just creates a list of the variable names. varnames=[
x.name for x in psid.variables()]

# It's not clear what these next two lines gain me.
labels=psid.file_headers()['vlblist']
Labels=dict(zip(varnames,labels))


 From here, I'd like Python to identify the Nx1 vectors (or n-tuples) that
correspond to the varnames list defined above.  I can't seem grab the
vectors representing age, wage, etc..  I've tried things like
age, psid['age'], psid.age.  My last email was an attempt to create the
vectors myself, although the Reader module puts the data in a dictionary
structure so the append command I was trying to use doesn't work.

Hopefully once I learn to create and call on my own vectors and matrices
I'll be better off--I'm comfortable working with these in MATLAB and Stata.

Bottom line:  Given the above data I've imported/extracted from Stata .dta
file, how do I create an Nx1 vector  which I call 'age'?

Thanks for your patience with this newbie.
Steve




On Sun, Jul 5, 2009 at 5:19 PM, Rich Lovely wrote:


> 2009/7/5 Steven Buck :
>  for i in len(test): > testvar2.append(test[i][2])
> >
> > I want testvar2 = [2,5,8] but instead I get the following error message:
> >
> > Traceback (most recent call last):
> >   File "", line 1, in 
> > for i in len(test):
> > TypeError: 'int' object is not iterable
> >
> > Any insight would be appreciated.
> > Thanks
> > Steve
> > --
> > Steven Buck
> > Ph.D. Student
> > Department of Agricultural and Resource Economics
> > University of California, Berkeley
>
>
> This sounds like a homework assignment, and we're not supposed to give out
> answers to homework.
>
> The error message and the docs explain what you're doing wrong if you
> take a moment to look.
> from http://www.python.org/doc/2.6/reference/compound_stmts.html#for
>
> """for_stmt ::=  "for" target_list "in" expression_list ":" suite
>  ["else" ":" suite]
>
> The expression list is evaluated once; it should yield an iterable
> object. An iterator is created for the result of the expression_list.
> The suite is then executed once for each item provided by the
> iterator, in the order of ascending indices. Each item in turn is
> assigned to the target list using the standard rules for assignments,
> and then the suite is executed."""
>
> As Luke said, len returns an int, which as your error tells you, is
> not iterable.  From the same page:
> """The for statement is used to iterate over the elements of a
> sequence (such as a string, tuple or list) or other iterable
> object:"""
>
> Therefore you have an iterable, there is no need to try and construct a new
> one.
>
> Does that help?
>
> It is extremly unpythonic to iterate over range(len(...)), as it adds
> in the overhead of two function calls, and ruins the readability of
> code.  The latter is probably the most important of the two.
>
> An even more pythonic way to do this would be a list comprehension,
>
> http://www.python.org/doc/2.6/tutorial/datastructures.html#list-comprehensions
>
> If it's not homework, let us know, and we'll be more than willing to
> give you code if you still need it.
>
> --
> Richard "Roadie Rich" Lovely, part of the JNP|UK Famile
> www.theJNP.com 
>
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] append question

2009-07-06 Thread Kent Johnson
> 2009/7/6 Steven Buck :

>> # I call my data set the psid (Panel Study of Income Dynamics)
>> # In Stata this would look like and NXK matrix (N observations and K
>> variables)
>> psid=Reader(file('data3.dta'))
>>
>> # I gather this next just creates a list of the variable names.
>> varnames=[x.name for x in psid.variables()]

Yes, but psid.variables() is already a list of variable names, so you
could just say
varnames = psid.variables()

>>  From here, I'd like Python to identify the Nx1 vectors (or n-tuples) that
>> correspond to the varnames list defined above.  I can't seem grab the
>> vectors representing age, wage, etc..  I've tried things like
>> age, psid['age'], psid.age.  My last email was an attempt to create the
>> vectors myself, although the Reader module puts the data in a dictionary
>> structure so the append command I was trying to use doesn't work.

psid.dataset() is the list of lists that you need to start. Try this:

data = psid.dataset()
ages = [ item[0] for item in data ]
wages = [ item[1] for item in data ]

This way of making a list is called a list comprehension, you can read
about them here:
http://docs.python.org/tutorial/datastructures.html#list-comprehensions

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] int to bytes and the other way around

2009-07-06 Thread Kent Johnson
On Mon, Jul 6, 2009 at 4:48 AM, Timo wrote:
> I have written a program that uses a C++ module as backend. Now I found out
> that I can use Python to call an underneath C lib. That's nice, so I don't
> need to Popen() the C++ module.
>
> I have a problem though with some info that is returned (always an integer).
> I'll try to explain a bit, this is what I found out so far.
> There are 4 options in the program. The first 3 options go up to 18 and the
> fourth to 7.
> If all options are set to 0, the returned int is 0.
> If the first option is set from 1 to 18, this is what I get returned.
> However, for option 2, I get 256, 512, 768, 1024, etc.
> For option 3 I get 65536, 131072, 196608, etc, etc.
> And for option 4: 16777216, 33554432, etc.
>
> Ok, that's nice so far. But if option 1 is set to 4 and option 2 is set to 8
> and option 3 is set to 10 (for example), I get this returned: 657412
>
> The C++ module counts the bytes. First byte = option1, second byte = option2
> etc.
>       u8 *options = (u8 *)&result[1];
>       option1 = options[0]
>       option2 = options[1]
>       option3 = options[2]
>       option4 = options[3]
>
> How will I approach this in Python?

Looks like you need the struct module.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Poor style to use list as "array"? CORRECTION

2009-07-06 Thread bob gailer

class Coin:

def __init__(self, name, value, plural=None):
  self.name = name
  self.value = value
  if plural:
self.plural = plural
  else:
self.plural = self.name + 's'
  self.count = 0

def display(self):
  if self.count == 0:
return None
  if self.count == 1:
return "%d %s" % (self.count, name)
  else:
return "%d %s" % (self.count, self.plural)

 next line unindented ###

coins = Coin('dollar', 100), Coin('quarter', 25), Coin('dime', 10), 
Coin('nickel', 5), Coin('penny', 1, 'pennies')

amnt = 99
buff = []
for coin in coins:
(coin.count, amnt) = divmod(amnt, coin.value)
d = coin.display()
if d:
  buff.append(d)
if len(buff) < 2:
print buff
else:
print ', '.join(buff[:-1]) + " and " + buff[-1]

--
Bob Gailer
Chapel Hill NC
919-636-4239
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Popen problem with a pipe sign "|"

2009-07-06 Thread hyou
Hi Sander,

Do I post to the list by also replying to Python Tutor List?

Thanks for the answer! I found the problem was because I put the 2nd
argument to Popen with Shell = true. Though I'm not sure why it doesn't work
with Shell = true while the same setting works for other commands.

Thanks again and have a great week!

Cheers,
Shawn

P.S. If you know the reason why some command works for Shell = true some
not, please let me know!

-Original Message-
From: Sander Sweers [mailto:sander.swe...@gmail.com] 
Sent: Saturday, July 04, 2009 12:57 PM
To: Shawn Gong
Cc: Python Tutor List
Subject: Re: [Tutor] Popen problem with a pipe sign "|"

Again, You need to also post to the list!!

On Fri, 2009-07-03 at 22:32 -0400, Shawn Gong wrote:
> I see what you mean. However in my case the | sign simply constitute an 
> argument. I'm actually calling devenv.com, which is the MS VS2005's
building 
> command. The whole command looks like:
> "...\devenv.com" solution.sln /build "Debug|Win32"
> If I sub "Debug|Win32" with Debug only, it works fine. But whenever I add 
> the |, nothing seems to run.

Ah, ok now I get what your problem is.

I do not know why that would not work. Can you show the actual code you
have? Do you have the command in a list? See below an *untested*
example, does this work?

command = ['C:\\...\\devenv', 'solution.sln', '/build', 'Debug|Win32']
proc = subprocess.Popen(command,
stdout=subporcess.PIPE,
stderr=subporcess.PIPE,
stdin=subporcess.PIPE)

stdout, stderr = proc.communicate()


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Popen problem with a pipe sign "|"

2009-07-06 Thread Sander Sweers
2009/7/6 hyou :
> Do I post to the list by also replying to Python Tutor List?

Yes, thanks.

> Thanks for the answer! I found the problem was because I put the 2nd
> argument to Popen with Shell = true. Though I'm not sure why it doesn't work
> with Shell = true while the same setting works for other commands.

You can read up on subprocess on [1]. I do not understand why this
does not work with shell=True. My best guess is that it gets
interpreted as a pipe.

Greets
Sander

[1] http://docs.python.org/library/subprocess.html
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Popen problem with a pipe sign "|"

2009-07-06 Thread Tim Golden

hyou wrote:

Thanks for the answer! I found the problem was because I put the 2nd
argument to Popen with Shell = true. Though I'm not sure why it doesn't work
with Shell = true while the same setting works for other commands.



There's a long-outstanding bug when shell=True is passed to
subprocess.Popen on Windows such that the rest of the line
isn't quoted correctly (ie doesn't cope with special chars
such as space, pipe and ampersand).

In general, you almost never need to pass shell=True on
Windows. The latest docs have just been updated with a
patch I wrote to that effect, but it basically says:
don't use shell=True unless you know you need to.

TJG
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python Tutorials: How to create useful programs after learning the syntax?

2009-07-06 Thread Robert Berman
I have programmed since I was 21. Since I am now retired; that gives me
a tad of experience in some aspects of coding. I have only been using
Python for two years and I enjoy it for two reasons; the first and most
important is it is fun. if you don't enjoy the language, find another
one you do enjoy. And, finally, it is a language that lends itself to
some really creative solutions. 

I do not know if your interest extends to games, designing user
Graphical input programs, or puzzle solving. I have done two for most of
my professional life; games are not at all my forte. I am impressed with
them (my wife loves them, but I have neither the artistic design skills
or the patience).  I have worked primarily as a consultant to law
enforcement and have designed and implemented juvenile justice systems,
and done work in both drug rehabilitation database studies, and child
support studies as well.

Now, my interest has turned to puzzle solving and I feel this is a
wonderful method to hone your skills. I would suggest, if this at all
piques your interest, you take a look at the following sites. The order
is of no bearing at all. What I find complex you will not, so these
suggestions I simply pulled from my bookmarks and  hope they might give
you both asistance and simple pleasure

1.  http://www.challenge-you.com/
2.  http://projecteuler.net/index.php?section=logout
3.  http://www.spoj.pl/problems/classical/
4.  http://codegolf.com/
5.  http://www.codechef.com/

These will certainly keep you occupied for many hours, days, weeks, etc.

Enjoy,


Robert




On Sun, 2009-07-05 at 23:48 -0700, Luis Galvan wrote:
> Hello all, this is my first time using a mailing list, so I'm not sure
> if I'm doing this right!  Anyway, I have a wee bit of a problem.  I've
> recently completed watching a Youtube video series on Python 2.6 by
> thenewboston which helped me a TON with learning Python's syntax and
> how to use some of the various builtin modules.  I am very thankful
> for his tutorials, but the only thing that I need is something to help
> me really grasp onto the world of programming. (I'm new to
> programming)  What I'm looking for is a tutorial series that walks you
> through the steps of writing a useful program in Python.  Whether it
> be a text editor, a simple browser, etc, it doesn't really matter.  I
> learned the syntax (at least most of it), but I guess I'm just looking
> for something to help me learn to "utilize it".  Something to teach me
> how I can use everything I learned about manipulating strings,
> creating classes, and importing modules and how to stir it all up into
> something meaningful.  Most people's answer to this kind of question
> is that the best way to learn is to "play around with it", but
> personally I believe that you can't really play with it if it you
> don't know where to start.  If anyone has any idea on how I can really
> get started with programming useful programs, please do let me know!
> Any help would be immensely appreciated!  
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Popen problem with a pipe sign "|"

2009-07-06 Thread hyou
Hi Sander,

My guess is that the two shell options must be treating the command string 
differently, thus the "|" sign has different functionalities on them.  

Thanks!
Shawn

-Original Message-
From: Sander Sweers [mailto:sander.swe...@gmail.com] 
Sent: Monday, July 06, 2009 10:23 AM
To: hyou
Cc: Python Tutor List
Subject: Re: [Tutor] Popen problem with a pipe sign "|"

2009/7/6 hyou :
> Do I post to the list by also replying to Python Tutor List?

Yes, thanks.

> Thanks for the answer! I found the problem was because I put the 2nd
> argument to Popen with Shell = true. Though I'm not sure why it doesn't work
> with Shell = true while the same setting works for other commands.

You can read up on subprocess on [1]. I do not understand why this
does not work with shell=True. My best guess is that it gets
interpreted as a pipe.

Greets
Sander

[1] http://docs.python.org/library/subprocess.html

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Poor style to use list as "array"?

2009-07-06 Thread Angus Rodgers
On Mon, 6 Jul 2009 01:51:22 +0100, Rich Lovely wrote:

[I wrote:]
>>                if name in plural:
>>                    name = plural[name]
>>                else:
>>                    name += 's'
>This could be written more cleanly (although arguably not as readably) as
>
>name = plural.get(name, name + "s")

Nice, and readable enough, I think.

>d.get(key, default) returns the value from d mapped to key if it
>exists, or default otherwise.
>
>You might also want to split your calculation and display code into
>two separate loops.  This might seem wasteful, but it will make your
>code easier to read and maintain, and the waste is only marginal with
>the loops you're running - there is a maximum of only 17 passes (once
>for each value of coin and note)

If I understand you correctly, this is the same idea as Bob Gailer
used in his code - in which 'buff' becomes a list, instead of (as
in my code) a string, formatted for printing.  It certainly seems 
to simplify the whole thing enormously:

   (retention: 1 day)
(Ignore the boilerplate!)

Thanks to all of you.  I haven't adopted the suggestion of using
classes, which I think I'll leave for when I'm reading the later
chapters of the book (although I do get the gist of the idea).
-- 
Angus Rodgers
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] int to bytes and the other way around

2009-07-06 Thread Rich Lovely
2009/7/6 Timo :
> I have written a program that uses a C++ module as backend. Now I found out
> that I can use Python to call an underneath C lib. That's nice, so I don't
> need to Popen() the C++ module.
>
> I have a problem though with some info that is returned (always an integer).
> I'll try to explain a bit, this is what I found out so far.
> There are 4 options in the program. The first 3 options go up to 18 and the
> fourth to 7.
> If all options are set to 0, the returned int is 0.
> If the first option is set from 1 to 18, this is what I get returned.
> However, for option 2, I get 256, 512, 768, 1024, etc.
> For option 3 I get 65536, 131072, 196608, etc, etc.
> And for option 4: 16777216, 33554432, etc.
>
> Ok, that's nice so far. But if option 1 is set to 4 and option 2 is set to 8
> and option 3 is set to 10 (for example), I get this returned: 657412
>
> The C++ module counts the bytes. First byte = option1, second byte = option2
> etc.
>       u8 *options = (u8 *)&result[1];
>       option1 = options[0]
>       option2 = options[1]
>       option3 = options[2]
>       option4 = options[3]
>
> How will I approach this in Python?
> ___
> Tutor maillist  -  tu...@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
import itertools
i = iter(reversed(hex(657412)[2:]))
optlList = map(lambda (m, n): int(n+m, 16),
itertools.izip_longest(i,i, fillvalue="0"))
This will give you a list of the options.  You will need to check to
see if all four options are present.

Another way would be to use bitshifting and masking:

opt1= options & 0xff
opt2 = (options >> 8) & 0xff
opt3 = (options >> 16) & 0xff
opt4 = (options >> 24) & 0xff

I personally prefer the former method, as you won't need to change it
if you introduce a fifth (or more) options, but I think the second is
probably more reliable, and easier to extend if you need options that
use more than a single byte.  It is also maybe slightly easier to read
and understand.

Alternatively, to go the otherway (you aren't quite clear which way
you want to go)

opt1 thru opt4 are the original values (1-18 or 1-7)
options = opt1 | (opt2<<8) | (opt3<<16) | (opt4<<24)

Or, you seeing as you appear to be treating your options as
(effectively) a char[]:

options = "".join(chr(x) for x in (opt4, opt3, opt2, opt1))

Alternatively, like Kent said, take a look at the struct module, which
will do all this for you, but is not so educational.

-- 
Richard "Roadie Rich" Lovely, part of the JNP|UK Famile
www.theJNP.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] int to bytes and the other way around

2009-07-06 Thread Chris Fuller
The only things that matter are the arguments and the result.  It sounds to me 
like a good case use for SWIG (http:://www.swig.org).  You can do really 
complicated stuff with swig, and it takes a correspondingly steep learning 
curve to achieve, but doing simple stuff is really simple.  It sounds like 
the first example in the SWIG docs could be straightforwardly adapted to your 
problem.

Cheers
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-06 Thread David Kim
Hello all,

I have two questions I'm hoping someone will have the patience to
answer as an act of mercy.

I. How to get past a Terms of Service page?

I've just started learning python (have never done any programming
prior) and am trying to figure out how to open or download a website
to scrape data. The only problem is, whenever I try to open the link
(via urllib2, for example) I'm after, I end up getting the HTML to a
Terms of Service Page (where one has to click an "I Agree" button)
rather than the actual target page.

I've seen examples on the web on providing data for forms (typically
by finding the name of the form and providing some sort of dictionary
to fill in the form fields), but this simple act of getting past "I
Agree" is stumping me. Can anyone save my sanity? As a workaround,
I've been using os.popen('curl ' + url ' >' filename) to save the html
in a txt file for later processing. I have no idea why curl works and
urllib2, for example, doesn't (I use OS X). I even tried to use Yahoo
Pipes to try and sidestep coding anything altogether, but ended up
looking at the same Terms of Service page anyway.

Here's the code (tho it's probably not that illuminating since it's
basically just opening a url):

import urllib2
url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1'
#the first of 23 tables
html = urllib2.urlopen(url).read()

II. How to parse html tables with lxml, beautifulsoup? (for dummies)

Assuming i get past the Terms of Service, I'm a bit overwhelmed by the
need to know XPath, CSS, XML, DOM, etc. to scrape data from the web.
I've tried looking at the documentation included with different python
libraries, but just got more confused.

The basic tutorials show something like the following:

from lxml import html
doc = html.parse("/path/to/test.txt") #the file i downloaded via curl
root = doc.getroot() #what is this root business?
tables = root.cssselect('table')

I understand that selecting all the table tags will somehow target
however many tables on the page. The problem is the table has multiple
headers, empty cells, etc. Most of the examples on the web have to do
with scraping the web for search results or something that don't
really depend on the table format for anything other than layout. Are
there any resources out there that are appropriate for web/python
illiterati like myself that deal with structured data as in the url
above?

FYI, the data in the url above goes up in smoke every week, so I'm
trying to capture it automatically on a weekly basis. Getting all of
it into a CSV or database would be a personal cause for celebration as
it would be the first really useful thing I've done with python since
starting to learn it a few months ago.

For anyone who is interested, here is the code that uses "curl" to
pull the webpages. It basically just builds the url string for the
different table-pages and saves down the file with a timestamped
filename:

import os
from time import strftime

BASE_URL = 'http://www.dtcc.com/products/derivserv/data_table_'
SECTIONS = {'section1':{'select':'i.php?id=table', 'id':range(1,9)},
'section2':{'select':'ii.php?id=table', 'id':range(9,17)},
'section3':{'select':'iii.php?id=table', 'id':range(17,24)}
}

def get_pages():

filenames = []
path = '~/Dev/Data/DTCC_DerivServ/'
#os.popen('cd ' + path)

for section in SECTIONS:
for id in SECTIONS[section]['id']:
#urlList.append(BASE_URL + SECTIONS[section]['select']+str(id))
url = BASE_URL + SECTIONS[section]['select'] + str(id)
timestamp = strftime('%Y%m%d_')
#sectionName = BASE_URL.split('/')[-1]
sectionNumber = SECTIONS[section]['select'].split('.')[0]
tableNumber = str(id) + '_'
filename = timestamp + tableNumber + sectionNumber + '.txt'
os.popen('curl ' + url + '> ' + path + filename)
filenames.append(filename)

return filenames

if (__name__ == '__main__'):
get_pages()


--
morenotestoself.wordpress.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Urllib, mechanize, beautifulsoup, lxml do not compute (for me)!

2009-07-06 Thread Stefan Behnel
Hi,

David Kim wrote:
> I have two questions I'm hoping someone will have the patience to
> answer as an act of mercy.
> 
> I. How to get past a Terms of Service page?
> 
> I've just started learning python (have never done any programming
> prior) and am trying to figure out how to open or download a website
> to scrape data. The only problem is, whenever I try to open the link
> (via urllib2, for example) I'm after, I end up getting the HTML to a
> Terms of Service Page (where one has to click an "I Agree" button)
> rather than the actual target page.

One comment to make here is that you should first read that page and check
if the provider of the service actually allows you to automatically
download content, or to use the service in the way you want. This is
totally up to them, and if their terms of service state that you must not
do that, well, then you must not do that.

Once you know that it's permitted, you can read the ToS page and search for
the form that the "Agree" button triggers. The URL given there is the one
you have to read next, but augmented with the parameter ("?xyz=...") that
the button sends.


> I've seen examples on the web on providing data for forms (typically
> by finding the name of the form and providing some sort of dictionary
> to fill in the form fields), but this simple act of getting past "I
> Agree" is stumping me. Can anyone save my sanity? As a workaround,
> I've been using os.popen('curl ' + url ' >' filename) to save the html
> in a txt file for later processing. I have no idea why curl works and
> urllib2, for example, doesn't (I use OS X).

There may be different reasons for that. One is that web servers often
present different content based on the client identifier. So if you see one
page with one client, and another page with a different client, that may be
the reason.


> Here's the code (tho it's probably not that illuminating since it's
> basically just opening a url):
> 
> import urllib2
> url = 'http://www.dtcc.com/products/derivserv/data_table_i.php?id=table1'
> #the first of 23 tables
> html = urllib2.urlopen(url).read()

Hmmm, if what you want is to read a stock ticker or something like that,
you should *really* read their ToS first and make sure they do not disallow
automated access. Because it's actually quite likely that they do.


> II. How to parse html tables with lxml, beautifulsoup? (for dummies)
> 
> Assuming i get past the Terms of Service, I'm a bit overwhelmed by the
> need to know XPath, CSS, XML, DOM, etc. to scrape data from the web.

Using CSS selectors (lxml.cssselect) is not at all hard. You basically
express the page structure in a *very* short and straight forward way.

Searching the web for a CSS selectors tutorial should give you a few hits.


> The basic tutorials show something like the following:
> 
> from lxml import html
> doc = html.parse("/path/to/test.txt") #the file i downloaded via curl

... or read from the standard output pipe of curl. Note that there is a
stdlib module called "subprocess", which may make running curl easier.

Once you've determined the final URL to parse, you can also push it right
into lxml's parse() function, instead of going through urllib2 or an
external tool. Example:

url = "http://pypi.python.org/pypi?%3Aaction=search&term=lxml";
doc = html.parse(url)


> root = doc.getroot() #what is this root business?

The root (or top-most) node of the document you just parsed. Usually an
"html" tag in HTML pages.


> tables = root.cssselect('table')

Simple, isn't it? :)

BTW, did you look at this?

http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/


> I understand that selecting all the table tags will somehow target
> however many tables on the page. The problem is the table has multiple
> headers, empty cells, etc. Most of the examples on the web have to do
> with scraping the web for search results or something that don't
> really depend on the table format for anything other than layout.

That's because in cases like yours, you have to do most of the work
yourself anyway. No page is like the other, so you have to find your way
through the structure and figure out fixed points that allow you to get to
the data.

Stefan

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] int to bytes and the other way around

2009-07-06 Thread Stefan Behnel
Chris Fuller wrote:
> The only things that matter are the arguments and the result.  It sounds to 
> me 
> like a good case use for SWIG (http:://www.swig.org).  You can do really 
> complicated stuff with swig, and it takes a correspondingly steep learning 
> curve to achieve, but doing simple stuff is really simple.  It sounds like 
> the first example in the SWIG docs could be straightforwardly adapted to your 
> problem.

I think the OP was already talking about a ctypes based solution. If that's
all the OP needs, that's just fine, and will work out-of-the-box without
depending on things like a C++ compiler.

Regarding SWIG, you are definitely right about the "steep learning curve"
and the "complicated stuff". If you want to just get your work done, it's
usually better to go with Cython directly, instead of trying hard to make
SWIG do what you want. Because once you get to the "complicated stuff", you
will (hopefully) take the shortcut of rewriting your code in Cython anyway.

Stefan

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor