Re: [Tutor] code review please

2005-12-28 Thread Brian van den Broek
Eakin, W said unto the world upon 27/12/05 09:59 AM:
> Hello,
> Although I've been coding in PHP and ASP and JavaScript for a couple of
> years now, I'm relatively new to Python. For learning exercises, I'm writing
> small Python programs that do limited things, but hopefully do them well.
> 
> The following program takes a text file, reads through it, and any word
> longer than four characters will have the internal letters scrambled, but
> the first and last letters of the word will remain unchanged. Here's what
> happened when I ran the program on a file called example.txt.
> 
> Before:
> This is a sample of text that has been scrambled, before and after.
> 
> After:
>  Tihs is a sapmle of txet taht has been sblrmcead, broefe and aetfr.
> 
> The code follows, so any comments and/or suggestions as to what I did right
> or wrong, or what could be done better will be appreciated.
> 
> thanks,
> William


Hi William,

I coded up an approach; no guarantees that it is anywhere near optimal :-)

I didn't bother with the file I/O portions. Also, it respects internal 
punctuation in compound-words and the like. It does not respect extra 
white-space in the sense that "A cat" and "A  cat" produce identical 
output.

Best,

Brian vdB

import random
from string import punctuation

tstring = '''
This is my test string for the scramble_words function. It contains lots
of punctuation marks like ')', and '?'--well, not lots: instead, enough!
Here's what happens when one occurs mid-word: punct#uation.'''

def scramble_words(a_string):
 '''returns a_string with all internal substrings of words randomized

 The scramble_words function respects punctuation in that a word is a
 maximal string with neither whitespace nor characters from 
punctuation.
 Each word is scrambled in the sense that the characters excepting the
 first and last character are randomized.
 '''
 output = []
 for sequence in a_string.split():
 chunks = punctuation_split(sequence)
 # appending the joined chunks prevents spurious spaces
 # around punctuation marks
 output.append(''.join([_scramble_word(x) for x in chunks]))
 output = ' '.join(output)
 return output

def punctuation_split(sequence):
 '''returns list of character sequences separating punctuation 
characters'''
 for mark in punctuation:
 sequence = sequence.replace(mark, ' %s ' %mark)
 return sequence.split(' ')

def _scramble_word(word):
 '''returns word with its internal substring randomized'''
 if len(word) < 4:
 return word
 middle = list(word[1:-1])
 random.shuffle(middle)
 return ''.join((word[0], ''.join(middle), word[-1]))

a = scramble_words(tstring)
print a
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] regex

2005-12-28 Thread Will Harris
Thanks, this helped out. I hadn't thought of trying to use strings for this, I will give that a shot.

I removed the TYPE field from the regex thinking that might have been causing a problem and forgot to add it back to my regex.On 12/27/05, Kent Johnson
 <[EMAIL PROTECTED]> wrote:Danny Yoo wrote:
>>Dec 18 10:04:45 dragon logger: TCPWRAP: SERVICE=sshd@:::192.168.0.1>>,TYPE=ALL_DENY,HOST_ADDRESS=:::195.145.94.75,HOST_INFO=:::
>>195.145.94.75,HOST_NAME=unknown,USER_NAME=unknown,OTHERINFO=>>> Hi Will,>> Observation: the output above looks comma delimited, at least the stuff
> after the 'TCPWRAP:' part.self.twist_fail_re =>>rc('SERVICE=\S*\sHOST_ADDRESS=\S*\sHOST_INFO=\S*\sHOST_NAME=\S*\sUSER_NAME=\S*\s')>>> The line given as example doesn't appear to have whitespace in the places
> that the regular _expression_ expects.  It does contain commas as delimiters> between the key/value pairs encoded in the line.Expanding on Danny's comment...\S*\s matches any amount of non-whitespace followed by one whitespace.
This doesn't match your sample. It looks like you want to matchnon-comma followed by comma. For example this will match the first field:SERVICE=[^,]*,Presumably you will want to pull out the value of the field so enclose
it in parenthesis to make a group:SERVICE=([^,]*),Another thing I notice about your regex is it doesn't include all thefields in the sample, for example TYPE. If the fields are always thesame you can just include them in your regex. If they vary you can try
to make the regex skip them, use a different regex for each field, ortry Danny's approach of using str.split() to break apart the data.The Regex Demo program that comes with Python is handy for creating and
testing regexes. Look in C:\Python24\Tools\Scripts\redemo.py or theequivalent.Kent___Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to call mysqlcommand in Python , "mysqldump for backup "

2005-12-28 Thread John Joseph

--- nephish <[EMAIL PROTECTED]> wrote:

> ooh ooh, i know this one, i have python do this for
> me every day !
> 
> target_dir = '/path/to/where/you/want/to/dump'
> 
> os.system("mysqldump --add-drop-table -c -u user
> -ppassword database
> table > "+target_dir+"/table.bak.sql")
> 
> dont forget the p in front of your password !
> 
> hope this helps
> 
> 

  Hi it  helped me a lot ,
  I did my script like this  for backing my zabbix
database 

import os, time
# difine the target directory
target_dir = '/home/john/backup/z-b-weekly/zabbix'

# in the formar year-month-day-hours-minute-secound
# uses  time module
today =  time.strftime('%Y-%m-%d-%H-%M-%S')

# For testing purpose only I had kept %M %S , we can
remove it later
now = target_dir + today

os.system("mysqldump  -u root -pjohn zabbix >  
"+now+"  " )
  Thanks  A LOT 
  Joseph 

 




___ 
To help you stay safe and secure online, we've developed the all new Yahoo! 
Security Centre. http://uk.security.yahoo.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] threaded multipart FTP download

2005-12-28 Thread Justin Ezequiel
Greetings,

Did not think I could post my code here as it's a bit long so I placed it in

http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/465531

Can some of you have a look and post comments, suggestions, alternatives?

Thanks.

BTW, the recipe is sufficient for our needs at the moment but I am
sure there must be better methods than what I used.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] code review please

2005-12-28 Thread Kent Johnson
Brian van den Broek wrote:
> def punctuation_split(sequence):
>  '''returns list of character sequences separating punctuation 
> characters'''
>  for mark in punctuation:
>  sequence = sequence.replace(mark, ' %s ' %mark)
>  return sequence.split(' ')

You should look at re.split().

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] code review please

2005-12-28 Thread Kent Johnson
Danny Yoo wrote:
> Similarly, the try/except for IndexError seems too pervasive: rather than
> wrap it around the whole program, we may want to limit its extent to just
> around the sys.argv access.  Otherwise, any other IndexError has the
> potential of being misattributed.  Much of the program does indexing of
> some sort, so that's why this concerns me.
> 
> Alternatively, doing the explicit length check:
> 
>if not sys.argv[1:]:
>print some kind of usage message
>raise SystemExit
> 
> might be sufficient.
> 
> 
> Unless we really want to hide errors from the user, I'd avoid the
> exception handlers altogether: if bad things happen, the default exception
> handler gives a fairly good stack trace that aids debugging.
> 
> But if we do want to handle the exceptions manually, we should try to make
> sure that useful information is preserved in the error messages: it's a
> terrible thing to get an error message that says "Something bad happened."
> when we can do much better.  *grin*

I agree with Danny that in this case there is no need to catch the 
exceptions - just let the default exception handler do its thing.

If you *do* want to handle the exceptions yourself, a good principle is 
to put the try / except around the least amount of code possible - just 
the lines that may generate the expected exception. This prevents the 
except clause from hiding unexpected exceptions.

An easy way to do this is to use try / except / else. The else clause is 
only executed if no exception was caught by the except clause. In your 
case you could write

try:
   fname = sys.argv[1]
except IndexError:
   # report the error as you like
else:
   # normal processing goes here

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Printing

2005-12-28 Thread Ron Phillips

>>> "John Corry" < [EMAIL PROTECTED] > 12/24/2005 12:28 PM >>>Hi + Season's Greetings!I have put together a program that queries and modifies a Gadfly database.I have captured my output. I now want to print it to paper.I have written the output to a text file. I have searched the tutor mailinglist and used the mailing list advice to get my data into nice lookingcolumns + tables.I am using Python 2.4, Glade 2, pygtk2.8.0 + wxWidgets2.6.1.I have downloaded win32, win32com, Preppy and PIL. I have had a go at usingthem but can't get them to work. At the moment I can't even print the textfile.Is there a good helpguide/FAQ page which deals with printing text files oris there simple code which prints a text file?Thanks,John.You might want to look at karrigell ( http://karrigell.sourceforge.net/ ) and consider making your output an html text file, styled with css, that you can view/print using the browser. I think karrigell is simple for desktop apps - - certainly simpler than wxWidgets, etc.TurboGears ( http://www.turbogears.org ) is more oriented toward a full website. Both frameworks are built on CherryPy, which is coming on strong as a full-featured, lightweight 'Pythonic" server.I like to use the browser for output because it does so much of the formatting for you and it's cross-platform, and I like using a framework because if you ever want to use your program over the web, you're nearly done. Ron
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] code review please

2005-12-28 Thread Kent Johnson
Kent Johnson wrote:
> BTW thinking of writing this as a loop brings to mind some problems - 
> what will your program do with 'words' such as 555-1212 or "Ha!" ?

Hmm, on reflection I don't thing "Ha!" will be a problem, but a 'word' 
with no letters will cause an IndexError.

Your test for 4 letters is really misplaced. What you want to find is 
words that have only two letters to scramble. I would put a test right 
where you call random.shuffle() to see if tempWord is longer than 2. In 
fact you might put the whole test and shuffle bit in a separate function.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] How to call mysqlcommand in Python , "mysqldump for backup "

2005-12-28 Thread nephish
Glad to help, glad you got it working too.!

shawn


On Wed, 2005-12-28 at 09:03 +, John Joseph wrote:
> --- nephish <[EMAIL PROTECTED]> wrote:
> 
> > ooh ooh, i know this one, i have python do this for
> > me every day !
> > 
> > target_dir = '/path/to/where/you/want/to/dump'
> > 
> > os.system("mysqldump --add-drop-table -c -u user
> > -ppassword database
> > table > "+target_dir+"/table.bak.sql")
> > 
> > dont forget the p in front of your password !
> > 
> > hope this helps
> > 
> > 
> 
>   Hi it  helped me a lot ,
>   I did my script like this  for backing my zabbix
> database 
> 
> import os, time
> # difine the target directory
> target_dir = '/home/john/backup/z-b-weekly/zabbix'
> 
> # in the formar year-month-day-hours-minute-secound
> # uses  time module
> today =  time.strftime('%Y-%m-%d-%H-%M-%S')
> 
> # For testing purpose only I had kept %M %S , we can
> remove it later
> now = target_dir + today
> 
> os.system("mysqldump  -u root -pjohn zabbix >  
> "+now+"  " )
>   Thanks  A LOT 
>   Joseph 
> 
>  
> 
> 
> 
>   
> ___ 
> To help you stay safe and secure online, we've developed the all new Yahoo! 
> Security Centre. http://uk.security.yahoo.com
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] matching a file

2005-12-28 Thread jonasmg
Hi! 

I would to working with some files. But I have to using a regular expression 
with one of them: 

for file in [glob.glob('/etc/env.d/[0-9]*foo'), '/etc/bar']: 

glob returns a list so i'm supposed that i would that convert it into a 
string. 

Is it correct? 

Thanks for your help
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] matching a file

2005-12-28 Thread Kent Johnson
[EMAIL PROTECTED] wrote:
> Hi! 
> 
> I would to working with some files. But I have to using a regular expression 
> with one of them: 
> 
> for file in [glob.glob('/etc/env.d/[0-9]*foo'), '/etc/bar']: 
> 
> glob returns a list so i'm supposed that i would that convert it into a 
> string. 

glob.glob() returns a list of file paths that match your path spec. You 
are embedding this list in another list. The result of
   [glob.glob('/etc/env.d/[0-9]*foo'), '/etc/bar']

will be a list something like this:
   [ [ '/etc/env.d/123foo', '/etc/env.d/456foo' ], '/etc/bar' ]

Iterating over this list will bind the name 'file' to the entire first 
list, then to '/etc/bar'.

The solution is to create a flat list instead of a nested list. Try

for file in glob.glob('/etc/env.d/[0-9]*foo') + ['/etc/bar']:

Kent

PS 'file' is the name of a built-in function, it would be better to 
choose a different name for your variable.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Multi-Dimensional Dictionary that contains a 12 element list.

2005-12-28 Thread Paul Kraus
I am trying to build a data structure that would be a dictionary of a 
dictionary of a list.

In Perl I would build the structure like so $dictionary{key1}{key2}[0] = X
I would iterate like so ...
foreach my $key1 ( sort keys %dictionary ) {
foreach my $key2 ( sort keys %{$dictionary{$key1}} ) {
foreach my $element ( @{$dictionary{$key1}{$key2} } ) {
print "$key1 => $key2 => $element\n";
}
}
}

Sorry for the Perl reference but its the language I am coming from. I use data 
structures like this all the time. I don't always iterate them like this but 
If i can learn to build these and move through them in python then a good 
portion of the Perl apps I am using can be ported.

Playing around I have come up with this but have no clue how to iterate over 
it or if its the best way. It seems "clunky" but it is most likely my lack of 
understanding.

dictionary[(key1,key2)]=[ a,b,c,d,e,f,g ... ]

This i think gives me a dictionary with two keys ( not sure how to doing 
anything usefull with it though) and a list.

Now I am not sure how I can assign one element at a time to the list.

here is the pseudo code.
read text file.
split line from text file into list of fields.
One of the fields contains the date. Split the date into two fields Year and 
Month/Period. Build data structure that is a dictionary based on year, based 
on period, based on item code then store/increment the units sold based on 
period.

dictionary[(year,period)] = [ jan, feb, mar, apr, may, jun, july, aug, sep, 
oct, nov ,dec]

I would prefer to have the months just be an array index 0 through 11 and when 
it reads the file it increments the number contained there.

TIA,
-- 
Paul Kraus
=-=-=-=-=-=-=-=-=-=-=
PEL Supply Company
Network Administrator
216.267.5775 Voice
216.267.6176 Fax
www.pelsupply.com
=-=-=-=-=-=-=-=-=-=-=
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Multi-Dimensional Dictionary that contains a 12 element list.

2005-12-28 Thread Paul Kraus
On Wednesday 28 December 2005 10:18 am, Paul Kraus wrote:
> I am trying to build a data structure that would be a dictionary of a
> dictionary of a list.
>
> In Perl I would build the structure like so $dictionary{key1}{key2}[0] = X
> I would iterate like so ...
> foreach my $key1 ( sort keys %dictionary ) {
>   foreach my $key2 ( sort keys %{$dictionary{$key1}} ) {
>   foreach my $element ( @{$dictionary{$key1}{$key2} } ) {
>   print "$key1 => $key2 => $element\n";
>   }
>   }
> }
>
Here is the code that I used. Its functional and it works but there has got to 
be some better ways to do a lot of this. Transversing the data structure 
still seems like I have to be doing it the hard way.

The input data file has fixed width fields that are delimited by pipe.
So depending on the justification for each field it will either have leading 
or ending whitespace.

TIA,
Paul


#!/usr/bin/python
#
## Paul D. Kraus - 2005-12-27
## parse.py - Parse Text File
## Pipe deliminted '|'
#
## Fields: CustCode[0]
##   : OrdrType[1]
##   : OrdrReturn  [2]
##   : State   [3]
##   : QtyShipped  [4]
##   : VendCode[5]
##   : InvoiceDate [7] 
#
import re 
import string
results = {}
def format_date(datestring):
(period,day,year) = map(int,datestring.split('/') )
period += 2
if period == 13: period = 1; year += 1
if period == 14: period = 2; year += 1
if year > 80:
year = '19%02d' % year
else:
year = '20%02d' % year
return (year,period)

def format_qty(qty,credit,oreturn):
qty = float(qty)
if credit == 'C' or oreturn == 'Y':
return qty * -1
else:
return qty

textfile = open('orders.txt','r')
for line in textfile:
fields = map( string.strip, line.split( '|' ) )
fields[4] = format_qty(fields[ 4 ],fields[ 1 ], fields[ 2 ] )
(year, period) = format_date( fields[7] )
for count in range(12):
if count == period:
if results.get( ( year, fields[6], count), 0):
results[ year,fields[6], count] += fields[4]
else:
results[ year,fields[6],count] = fields[4]

sortedkeys = results.keys()
sortedkeys.sort()

for keys in sortedkeys:
res_string = keys[0]+'|'+keys[1]
for count in range(12):
if results.get((keys[0],keys[1],count),0):
res_string += '|'+str(results[keys[0],keys[1],count])
else:
res_string += '|0'
print res_string

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Multi-Dimensional Dictionary that contains a 12 element list.

2005-12-28 Thread Kent Johnson
Paul Kraus wrote:
> I am trying to build a data structure that would be a dictionary of a 
> dictionary of a list.
> 
> In Perl I would build the structure like so $dictionary{key1}{key2}[0] = X
> I would iterate like so ...
> foreach my $key1 ( sort keys %dictionary ) {
>   foreach my $key2 ( sort keys %{$dictionary{$key1}} ) {
>   foreach my $element ( @{$dictionary{$key1}{$key2} } ) {
>   print "$key1 => $key2 => $element\n";
>   }
>   }
> }
> 
> Sorry for the Perl reference but its the language I am coming from. I use 
> data 
> structures like this all the time. I don't always iterate them like this but 
> If i can learn to build these and move through them in python then a good 
> portion of the Perl apps I am using can be ported.
> 
> Playing around I have come up with this but have no clue how to iterate over 
> it or if its the best way. It seems "clunky" but it is most likely my lack of 
> understanding.
> 
> dictionary[(key1,key2)]=[ a,b,c,d,e,f,g ... ]
> 
> This i think gives me a dictionary with two keys ( not sure how to doing 
> anything usefull with it though) and a list.

This gives you a dict whose keys are the tuple (key1, key2). Since 
tuples sort in lexicographical order you could print this out sorted by 
key1, then key2 with

for (key1, key2), value in sorted(dictionary.iteritems()):
   for element in value:
 print key1, '=>', key2, '=>', element

(Wow, Python code that is shorter than the equivalent Perl? There must 
be some mistake! ;)
> 
> Now I am not sure how I can assign one element at a time to the list.

Assuming the list already has an element 0, use
dictionary[(key1, key2)][0] = X

Python lists don't create new elements on assignment (I think Perl lists 
do this?) so for example
dictionary[(key1, key2)][10] = X

will fail if the list doesn't already have 11 elements or more. You can 
use list.append() or pre-populate the list with default values depending 
on your application.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Multi-Dimensional Dictionary that contains a 12 element list.

2005-12-28 Thread Kent Johnson
Paul Kraus wrote:
> Here is the code that I used. Its functional and it works but there has got 
> to 
> be some better ways to do a lot of this. Transversing the data structure 
> still seems like I have to be doing it the hard way.
> 
> The input data file has fixed width fields that are delimited by pipe.
> So depending on the justification for each field it will either have leading 
> or ending whitespace.
> #
> import re 
> import string
> results = {}
> def format_date(datestring):
> (period,day,year) = map(int,datestring.split('/') )
> period += 2
> if period == 13: period = 1; year += 1
> if period == 14: period = 2; year += 1
   if period > 12: period -= 12; year += 1

> if year > 80:
> year = '19%02d' % year
> else:
> year = '20%02d' % year
> return (year,period)
> 
> def format_qty(qty,credit,oreturn):
> qty = float(qty)
> if credit == 'C' or oreturn == 'Y':
> return qty * -1
> else:
> return qty
> 
> textfile = open('orders.txt','r')
> for line in textfile:
> fields = map( string.strip, line.split( '|' ) )
> fields[4] = format_qty(fields[ 4 ],fields[ 1 ], fields[ 2 ] )
   qty = format_qty(fields[ 4 ],fields[ 1 ], fields[ 2 ] )
would be clearer in subsequent code.

> (year, period) = format_date( fields[7] )
> for count in range(12):
> if count == period:
> if results.get( ( year, fields[6], count), 0):
> results[ year,fields[6], count] += fields[4]
> else:
> results[ year,fields[6],count] = fields[4]

The loop on count is not doing anything, you can use period directly. 
And the test on results.get() is not needed, it is safe to always add:
   key = (year, fields[6], period)
   results[key] = results.get(key, 0) + qty
> 
> sortedkeys = results.keys()
> sortedkeys.sort()
> 
> for keys in sortedkeys:
> res_string = keys[0]+'|'+keys[1]
> for count in range(12):
> if results.get((keys[0],keys[1],count),0):
> res_string += '|'+str(results[keys[0],keys[1],count])
> else:
> res_string += '|0'
> print res_string

This will give you duplicate outputs if you ever have more than one 
period for a given year and field[6] (whatever that is...). OTOH if you 
just show the existing keys you will not have entries for the 0 keys. So 
maybe you should go back to your original idea of using a 12-element 
list for the counts.

Anyway in the above code the test on results.get() is not needed since 
you just use the default value in the else:
   res_string += str(results.get((keys[0],keys[1],count),0))

> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 
> 


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Multiple Assignment from list.

2005-12-28 Thread Paul Kraus
How do I code this in python. Assuming fields is a list of 3 things.

(myfielda, myfieldb, myfieldc) = fields

When i try that I get 
ValueError: need more than 1 value to unpack.
If i print fields it is in need printed as
['somestring','somestring','somestring']

TIA,
-- 
Paul Kraus
=-=-=-=-=-=-=-=-=-=-=
PEL Supply Company
Network Administrator
216.267.5775 Voice
216.267.6176 Fax
www.pelsupply.com
=-=-=-=-=-=-=-=-=-=-=
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Multiple Assignment from list.

2005-12-28 Thread Paul Kraus
Never mind. i figured this out. the top line of a file i was reading in and 
splitting only had 1 char so "fields" on that line was not a list. I fixed 
this.
On Wednesday 28 December 2005 3:12 pm, Paul Kraus wrote:
> How do I code this in python. Assuming fields is a list of 3 things.
>
> (myfielda, myfieldb, myfieldc) = fields
>
> When i try that I get
> ValueError: need more than 1 value to unpack.
> If i print fields it is in need printed as
> ['somestring','somestring','somestring']
>
> TIA,

-- 
Paul Kraus
=-=-=-=-=-=-=-=-=-=-=
PEL Supply Company
Network Administrator
216.267.5775 Voice
216.267.6176 Fax
www.pelsupply.com
=-=-=-=-=-=-=-=-=-=-=
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] Extracting data from HTML files

2005-12-28 Thread motorolaguy
Hello,
I`m very new to Python and programming in general.I`ve been reading Dive in
to Python as an introduction to the language and I think I`m doing pretty
well,but I`m stuck on this problem.
I`m trying to make a python script for extracting certain data from HTML
files.These files are from a template so they all have the same formatting.I
just want to extract the data from certain fields.It would also be nice to
insert it into a mysql database, but I`ll leave that for later since I`m
stuck in just reading the files.
Say for example the HTML file has the following format:

Category:Category1
[...]
Name:Filename.exe
[...]
Description:Description1.

Taking in to account that each HTML file has a load of code in between each
[...], what I want to do is extract the information for each field.In this
case what I want to do is the script to read Category1, filename.exe and
Description1.And later on insert this in to a mysql database, or read the
info and generate a CSV file to make db insertion easier.
Since all the files are generated by a script each field I want to read
is,from what I`ve seen, in the same line number so this could make things
easier.But not all fields are of the same length.
I`ve read Chapter 8 of Dive in to Python so I`m basing my work on that.
I also thought regexes might be useful for this but I suck at using regexes
so that`s another problem.
Do any of you have an idea of where I could get a good start on this and if
there`s any modules (like sgmllib.py) that might come in handy for this.
Thanks! 

-- 
Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Extracting data from HTML files

2005-12-28 Thread bob
At 01:26 PM 12/28/2005, [EMAIL PROTECTED] wrote:
>[snip]
>I`m trying to make a python script for extracting certain data from HTML
>filesSay for example the HTML file has the following format:
>Category:Category1
>[...]
>Name:Filename.exe
>[...]
>Description:Description1.
>
>Taking in to account that each HTML file has a load of code in between each
>[...], what I want to do is extract the information for each field.In this
>case what I want to do is the script to read Category1, filename.exe and
>Description1.

Check out BeautifulSoup http://www.crummy.com/software/BeautifulSoup/

>And later on insert this in to a mysql database, or read the
>info and generate a CSV file to make db insertion easier.
>Since all the files are generated by a script each field I want to read
>is,from what I`ve seen, in the same line number so this could make things
>easier.But not all fields are of the same length.
>I`ve read Chapter 8 of Dive in to Python so I`m basing my work on that.
>I also thought regexes might be useful for this but I suck at using regexes
>so that`s another problem.
>Do any of you have an idea of where I could get a good start on this and if
>there`s any modules (like sgmllib.py) that might come in handy for this.
>Thanks!
>
>--
>Lust, ein paar Euro nebenbei zu verdienen? Ohne Kosten, ohne Risiko!
>Satte Provisionen für GMX Partner: http://www.gmx.net/de/go/partner
>
>___
>Tutor maillist  -  Tutor@python.org
>http://mail.python.org/mailman/listinfo/tutor

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] code review please

2005-12-28 Thread Brian van den Broek
Kent Johnson said unto the world upon 28/12/05 07:06 AM:
> Brian van den Broek wrote:
> 
>>def punctuation_split(sequence):
>> '''returns list of character sequences separating punctuation 
>>characters'''
>> for mark in punctuation:
>> sequence = sequence.replace(mark, ' %s ' %mark)
>> return sequence.split(' ')
> 
> 
> You should look at re.split().
> 
> Kent

What, and have *2* problems? :-)

(But seriously, thanks for the pointer. As I was coding, I thought 
there must have been a better way drawing off of the library. See my 
other post for why I couldn't find it.)

Best,

Brian vdB
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] new to linux and I cannot find some python things

2005-12-28 Thread Brian van den Broek
Hi all,

I'm a week or so into having switched from WinXP to linux (ubuntu 
breezy). There is a lot to learn about the differences in the OS'es 
and that's just fine.

But, a couple of things have been in my way with Python. Most notably, 
  I don't know how one browses the documentation. On Windows, I just 
fired up the .chm (I think cmh--at any rate, the compiled help file.)

I have installed the docs on the linux side and they can be found by 
python:

 >>> help()

help> NONE
 


   2.3.10.7 The Null Object

   This object is returned by functions that don't explicitly return a




I assume there is some linux facility for documentation browsing that 
beats importing modules and accessing docstrings. I'd work it out 
eventually, but a timesaving pointer would be appreciated.

Best to all,

Brian vdB
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] new to linux and I cannot find some python things

2005-12-28 Thread Simon Gerber
> Hi all,
>
> I'm a week or so into having switched from WinXP to linux (ubuntu
> breezy). There is a lot to learn about the differences in the OS'es
> and that's just fine.

Excellent! Another Ubuntu Breezy user here. If there's anything Ubuntu
I can help you with, drop me an e-mail and I'll do what I can to help.

> But, a couple of things have been in my way with Python. Most notably,
>   I don't know how one browses the documentation. On Windows, I just
> fired up the .chm (I think cmh--at any rate, the compiled help file.)

Yeah, chm. Incidentally, there's a chm reader for Linux. Very
primative, but it works in a pinch. Look for 'xchm' in the Universe
repository.

> I have installed the docs on the linux side and they can be found by
> python:
>
>  >>> help()
> 
> help> NONE

Nah, that's part of core Python. Nothing to do with the 'python-doc'
package you installed.

> I assume there is some linux facility for documentation browsing that
> beats importing modules and accessing docstrings. I'd work it out
> eventually, but a timesaving pointer would be appreciated.

Firefox!

file:///usr/share/doc/python2.4/html/index.html

The python-doc package is just an offline version of
http://www.python.org/doc/2.4.2/

You can also probably find a copy of the book 'Dive into Python' here:
file:///usr/share/doc/diveintopython/html/index.html

I know Hoary installed it by default. Not sure about Breezy, since I
just did a dist-upgrade from Hoary.

As a rule, with Ubuntu (and most other Linux distros), the
documentation goes under /usr/share/doc/. But you can always
check to see exactly what a package has put where. From the
command-line, just type 'dpkg -L python-doc'.

Hope that helps,

--
Seen in the release notes for ACPI-support 0.34:

'The "I do not wish to discuss it" release
  * Add workaround for prodding fans back into life on resume
  * Add sick evil code for doing sick evil things to sick evil
screensavers'
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] new to linux and I cannot find some python things

2005-12-28 Thread Brian van den Broek
Simon Gerber said unto the world upon 28/12/05 05:12 PM:
>>Hi all,
>>
>>I'm a week or so into having switched from WinXP to linux (ubuntu
>>breezy). There is a lot to learn about the differences in the OS'es
>>and that's just fine.
> 
> 
> Excellent! Another Ubuntu Breezy user here. If there's anything Ubuntu
> I can help you with, drop me an e-mail and I'll do what I can to help.

Hi Simon,

thanks for the reply and the offer :-)

>>But, a couple of things have been in my way with Python. Most notably,
>>  I don't know how one browses the documentation. On Windows, I just
>>fired up the .chm (I think cmh--at any rate, the compiled help file.)
> 
> 
> Yeah, chm. Incidentally, there's a chm reader for Linux. Very
> primative, but it works in a pinch. Look for 'xchm' in the Universe
> repository.

Thanks. A bit part of the difficulty in the transition is suddenly I 
don't know what program to use for what. Pointers help :-)

> 
>>I have installed the docs on the linux side and they can be found by
>>python:
>>
>> >>> help()
>>
>>help> NONE
> 
> 
> Nah, that's part of core Python. Nothing to do with the 'python-doc'
> package you installed.

I beg to differ :-)

Before I installed I got this:

IDLE 1.1.2
 >>> help()

Welcome to Python 2.4!  This is the online help utility.



help> topics

Here is a list of available topics.  Enter any topic name to get more 
help.

ASSERTION   DELETIONLOOPING SEQUENCES



help> ASSERTION

Sorry, topic and keyword documentation is not available because the Python
HTML documentation files could not be found.  If you have installed them,
please set the environment variable PYTHONDOCS to indicate their location.

On Debian GNU/{Linux,Hurd} systems you have to install the corresponding
pythonX.Y-doc package, i.e. python2.3-doc.

help>


On windows, one has to download the html version of the documentation 
and point the PYDOCS (or something close) env. variable at them. On 
ubuntu, once I installed python2.4-doc, it worked as shown in my OP. 
(I did test by removing the python2.4-doc package to get the behaviour 
shown in this post, then reinstalling to get the original behaviour.)

>>I assume there is some linux facility for documentation browsing that
>>beats importing modules and accessing docstrings. I'd work it out
>>eventually, but a timesaving pointer would be appreciated.
> 
> 
> Firefox!
> 
> file:///usr/share/doc/python2.4/html/index.html

But shouldn't it be harder than that? :-)

> The python-doc package is just an offline version of
> http://www.python.org/doc/2.4.2/
> 
> You can also probably find a copy of the book 'Dive into Python' here:
> file:///usr/share/doc/diveintopython/html/index.html
> 
> I know Hoary installed it by default. Not sure about Breezy, since I
> just did a dist-upgrade from Hoary.

Yep, it was installed by default. I'd wondered where it lived. But, 
since I've the dead-tree version, I didn't get motivated enough to 
find out. Still, thanks.

> As a rule, with Ubuntu (and most other Linux distros), the
> documentation goes under /usr/share/doc/. But you can always
> check to see exactly what a package has put where. From the
> command-line, just type 'dpkg -L python-doc'.
> 
> Hope that helps,

Thanks, it did. Especially that last bit. I've been learnign the 
various shell commands almost as quickly as I've been finding I want 
to know what command does foo.

Thanks muchly,

Brian vdB
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


[Tutor] [OT] A thanks for all the help since I've found tutor

2005-12-28 Thread Brian van den Broek
Hi all,

I'd like to thank the tutor community, especially Alan, Danny, and 
Kent, but all the other posters, regular and occasional, tutor or 
tutee, too.

I've recently been engaged in what, for pre-python and -tutor me, 
would have been some deeply black magic unrelated to python, and the 
tutor list is a big part of why I had the confidence to try my hand at 
conjuring.


I've mentioned in another thread that I've recently installed ubuntu. 
It's on a laptop, and there has been some pain. My fan isn't working 
right, nor is much of the other acpi governed stuff. Within a few days 
of the install, I found myself decompiling the DSDT (Differentiated 
System Description Table) file provided by my OEM and fixing the code 
(C, I think) so that it would compile with an intel-provided complier. 
Apparently, the MS compiler which Fujitsu used would ignore various 
"Object does not exist" errors that an intel compiler (and, by 
extension, the linux kernel attempting to employ the DSDT file) choked on.

Now, all that has nothing to do with python. But, I don't think I'd 
have felt up to even trying to fix code produced by Fujitsu in a 
language I don't really understand had I not had python and the 
community to give me confidence in my abilities.

FWIW, my fix worked on the immediate problem of compilation failure. 
The ultimate problem remains, as Fujitsu is very proud of its 
proprietary fan control technology :-(  Be that as it may, while I 
didn't succeed, at least I failed nobly in the effort rather than 
merely turning tail :-)

So, thanks to all, and best wishes for 2006!

Brian vdB
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Extracting data from HTML files

2005-12-28 Thread Kent Johnson
[EMAIL PROTECTED] wrote:
> I`m trying to make a python script for extracting certain data from HTML
> files.These files are from a template so they all have the same formatting.I
> just want to extract the data from certain fields.It would also be nice to
> insert it into a mysql database, but I`ll leave that for later since I`m
> stuck in just reading the files.
> Say for example the HTML file has the following format:
> 
> Category:Category1
> [...]
> Name:Filename.exe
> [...]
> Description:Description1.


Since your data is all in the same form, I think a regex will easily 
find this data. Something like

import re
catRe = re.compile(r'Category:(.*?)')
data = ...read the HTML file here
m = catRe.search(data)
category = m.group(1)

> I also thought regexes might be useful for this but I suck at using regexes
> so that`s another problem.

Regexes take some effort to learn but it is worth it, they are a very 
useful tool in many contexts, not just Python. Have you read the regex 
HOW-TO?
http://www.amk.ca/python/howto/regex/

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Extracting data from HTML files

2005-12-28 Thread motorolaguy
Hello,
I was taking a look at BeautifulSoup as recommended by bob and from what I
can tell it`s just what I`m looking for but it`s a bit over my current
skills with python I`m afraid.I`ll still keep playing with it and see what I
can come up with.
I`ll also take a look at regexes as recommended by Kent Johnson to see if
it`ll work here.My guess is this is the way to go since the data I need is
always in the same line number in the HTML source.So I could just go to the
specific line numbers, look for my data and strip out the unnecesary tags.
Thanks for the help guys, if anyone`s got more tips they are more than
welcome :)
Thanks again and happy holidays!


> --- Ursprüngliche Nachricht ---
> Von: Kent Johnson <[EMAIL PROTECTED]>
> An: Python Tutor 
> Betreff: Re: [Tutor] Extracting data from HTML files
> Datum: Wed, 28 Dec 2005 22:16:47 -0500
> 
> [EMAIL PROTECTED] wrote:
> > I`m trying to make a python script for extracting certain data from HTML
> > files.These files are from a template so they all have the same
> formatting.I
> > just want to extract the data from certain fields.It would also be nice
> to
> > insert it into a mysql database, but I`ll leave that for later since I`m
> > stuck in just reading the files.
> > Say for example the HTML file has the following format:
> > 
> > Category:Category1
> > [...]
> > Name:Filename.exe
> > [...]
> > Description:Description1.
> 
> 
> Since your data is all in the same form, I think a regex will easily 
> find this data. Something like
> 
> import re
> catRe = re.compile(r'Category:(.*?)')
> data = ...read the HTML file here
> m = catRe.search(data)
> category = m.group(1)
> 
> > I also thought regexes might be useful for this but I suck at using
> regexes
> > so that`s another problem.
> 
> Regexes take some effort to learn but it is worth it, they are a very 
> useful tool in many contexts, not just Python. Have you read the regex 
> HOW-TO?
> http://www.amk.ca/python/howto/regex/
> 
> Kent
> 
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
> 

-- 
10 GB Mailbox, 100 FreeSMS/Monat http://www.gmx.net/de/go/topmail
+++ GMX - die erste Adresse für Mail, Message, More +++
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor