[Tutor] Html entities, beautiful soup and unicode

2010-01-19 Thread andy
Hi people

I'm using beautiful soup to rip the uk headlines from the uk bbc page.
This works rather well but there is the problem of html entities which
appear in the xml feed.
Is there an elegant/simple way to convert them into the "standard"
output? By this I mean £ going to  ? or do i have to use regexp?
and where does unicode fit into all of this?

Thanks for your help

Andy 
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Html entities, beautiful soup and unicode

2010-01-19 Thread Gerard Flanagan

andy wrote:

Hi people

I'm using beautiful soup to rip the uk headlines from the uk bbc page.
This works rather well but there is the problem of html entities which
appear in the xml feed.
Is there an elegant/simple way to convert them into the "standard"
output? By this I mean £ going to  ? or do i have to use regexp?
and where does unicode fit into all of this?



import re

# Fredrik Lundh, http://effbot.org/zone/re-sub.html
def unescape(text):
def fixup(m):
text = m.group(0)
if text[:2] == "&#":
# character reference
try:
if text[:3].lower() == "&#x":
return unichr(int(text[3:-1], 16))
else:
return unichr(int(text[2:-1]))
except ValueError:
pass
else:
# named entity
import htmlentitydefs
try:
text = unichr(htmlentitydefs.name2codepoint[text[1:-1]])
except KeyError:
pass
return text # leave as is
return re.sub("&#?\w+;", fixup, text)

print unescape('£')

£



~

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Html entities, beautiful soup and unicode

2010-01-19 Thread spir
On Tue, 19 Jan 2010 08:49:27 +0100
andy  wrote:

> Hi people
> 
> I'm using beautiful soup to rip the uk headlines from the uk bbc page.
> This works rather well but there is the problem of html entities which
> appear in the xml feed.
> Is there an elegant/simple way to convert them into the "standard"
> output? By this I mean £ going to  ? or do i have to use regexp?
> and where does unicode fit into all of this?

Ha, ha!
What do you mean exactly, convert them into the "standard" output? What form do 
you expect, and to do what?
Maybe your aim is to replace number-coded html entities in a python string by 
real characters in a given format, to be able to output them. Then one way may 
be to use a simple regex and replace with a custom function. Eg:

import re

def rep(result):
string = result.group()   # "&#xxx;"
n = int(string[2:-1])
uchar = unichr(n) # matching unicode char
# for you dest format may be iso-8859-2 ?
return unicode.encode(uchar, "utf-8") # format-encoded char

source = "xxx¡xxxÂxxxÿxxx"
pat = re.compile("""&#\d+;""")
print pat.sub(rep, source)

Denis


la vita e estrany

http://spir.wikidot.com/
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python workspace - IDE and version control

2010-01-19 Thread Kent Johnson
On Mon, Jan 18, 2010 at 4:17 PM, Alan Gauld  wrote:

> I use plain old RCS for version control because its just me working on the
> code.

Wow. You should take a look at Mercurial. It is so easy to set up a
Mercurial repository for a local project - just
hg init # create a repository
hg st # show what will be checked in

hg add # mark new files as to be added
hg ci -m "Initial checkin" # the actual checkin

and voila! you have a version-controlled project!

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python workspace - IDE and version control

2010-01-19 Thread Kent Johnson
On Tue, Jan 19, 2010 at 9:12 AM, Andreas Kostyrka  wrote:

> The cool part about git that I've not yet replicated with hg is git add -p
> which allows you to seperate out
> different changes in the same file.

Sounds like the record and crecord extensions come close, anyway:
http://mercurial.selenic.com/wiki/RecordExtension
http://mercurial.selenic.com/wiki/CrecordExtension

Kent
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python workspace - IDE and version control

2010-01-19 Thread Alan Gauld

"Kent Johnson"  wrote

I use plain old RCS for version control because its just me working on 
the

code.


hg init # create a repository


md RCS in rcs


hg st # show what will be checked in

hg add # mark new files as to be added


Don't need any of that stuff


hg ci -m "Initial checkin" # the actual checkin


ci foo.py in rcs


and voila! you have a version-controlled project!


I prefer RCS - two commands is all you need (ci/co) :-)

Alan G. 



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python workspace - IDE and version control

2010-01-19 Thread ALAN GAULD


> >>> I use plain old RCS for version control because its just me working 
>> I prefer RCS - two commands is all you need (ci/co) :-)
>
> Certainly, OTOH, you get only file based commits, no upgrade path 
> should you ever decide that you need to go multiuser 
> (and multiuser can be just you with two different places, 

Well, you get tags which allow you to check in/out a whole project
at a time if need be. And RCS does allow multi user and server 
based working (just by locating the RCS folder there!). In fact the 
biggest project I ever worked on had around 3.5 million lines of 
C++ in 10,000 source files in over 200 folders and it was all 
controlled using RCS and makefiles.

And branching and merging are all standard features too. (We had 
over 400 developers working off the repositories with 4 or 5 branches 
active at any one time - but CVS would have been much easier if it 
had been available at the time - v1.0 was just released the same year 
we started work - 1990!)

But modern tools are much better I agree. And at work, as I said, we use 
subversion (and CVS on older projects). In my time I've also used 
several heavyweight version and configuration control tools - ranging 
in price from a few hundred pounds to several hundred thousand dollars.

The best by a long shot is ClearCase on Unix, although Aide de Camp 
is also good. But these both cost 

For my home use, the biggest Python project I've done had less 
than 10 files in a single folder plus some imported modules from 
my personal collection so RCS is more than adequate.

Alan G.
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


[Tutor] create an object from a class in dll with ctypes?

2010-01-19 Thread katrin schmid
hi,
i am getting started with ctypes in python 2.5 and was wondering if 
i would be able to create an object from the class in my dll somehow.
I only found examples that show how to access a function but the 
function i want to call is part of a cpp class in my dll...
Thank you,
katrin___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] create an object from a class in dll with ctypes?

2010-01-19 Thread Mark Tolonen


"katrin schmid"  wrote in message 
news:0edda7dddff84d639352526b72dbf...@katissspc...

hi,
i am getting started with ctypes in python 2.5 and was wondering if
i would be able to create an object from the class in my dll somehow.
I only found examples that show how to access a function but the
function i want to call is part of a cpp class in my dll...


ctypes is for C DLLs.  You might want to look at www.swig.org instead.

-Mark



___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Python workspace - IDE and version control

2010-01-19 Thread Mark Tolonen


"Kent Johnson"  wrote in message 
news:1c2a2c591001190905u28db4464hc1d1461ad26e9...@mail.gmail.com...
On Tue, Jan 19, 2010 at 9:12 AM, Andreas Kostyrka  
wrote:


The cool part about git that I've not yet replicated with hg is git 
add -p

which allows you to seperate out
different changes in the same file.


Sounds like the record and crecord extensions come close, anyway:
http://mercurial.selenic.com/wiki/RecordExtension
http://mercurial.selenic.com/wiki/CrecordExtension


TortoiseHg's commit GUI allows this.

-Mark


___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor