[Tutor] Html entities, beautiful soup and unicode
Hi people I'm using beautiful soup to rip the uk headlines from the uk bbc page. This works rather well but there is the problem of html entities which appear in the xml feed. Is there an elegant/simple way to convert them into the "standard" output? By this I mean £ going to  ? or do i have to use regexp? and where does unicode fit into all of this? Thanks for your help Andy ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Html entities, beautiful soup and unicode
andy wrote: Hi people I'm using beautiful soup to rip the uk headlines from the uk bbc page. This works rather well but there is the problem of html entities which appear in the xml feed. Is there an elegant/simple way to convert them into the "standard" output? By this I mean £ going to  ? or do i have to use regexp? and where does unicode fit into all of this? import re # Fredrik Lundh, http://effbot.org/zone/re-sub.html def unescape(text): def fixup(m): text = m.group(0) if text[:2] == "": # character reference try: if text[:3].lower() == "": return unichr(int(text[3:-1], 16)) else: return unichr(int(text[2:-1])) except ValueError: pass else: # named entity import htmlentitydefs try: text = unichr(htmlentitydefs.name2codepoint[text[1:-1]]) except KeyError: pass return text # leave as is return re.sub("?\w+;", fixup, text) print unescape('£') £ ~ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Html entities, beautiful soup and unicode
On Tue, 19 Jan 2010 08:49:27 +0100 andy wrote: > Hi people > > I'm using beautiful soup to rip the uk headlines from the uk bbc page. > This works rather well but there is the problem of html entities which > appear in the xml feed. > Is there an elegant/simple way to convert them into the "standard" > output? By this I mean £ going to  ? or do i have to use regexp? > and where does unicode fit into all of this? Ha, ha! What do you mean exactly, convert them into the "standard" output? What form do you expect, and to do what? Maybe your aim is to replace number-coded html entities in a python string by real characters in a given format, to be able to output them. Then one way may be to use a simple regex and replace with a custom function. Eg: import re def rep(result): string = result.group() # "xx;" n = int(string[2:-1]) uchar = unichr(n) # matching unicode char # for you dest format may be iso-8859-2 ? return unicode.encode(uchar, "utf-8") # format-encoded char source = "xxx¡xxxÂxxxÿxxx" pat = re.compile("""\d+;""") print pat.sub(rep, source) Denis la vita e estrany http://spir.wikidot.com/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python workspace - IDE and version control
On Mon, Jan 18, 2010 at 4:17 PM, Alan Gauld wrote: > I use plain old RCS for version control because its just me working on the > code. Wow. You should take a look at Mercurial. It is so easy to set up a Mercurial repository for a local project - just hg init # create a repository hg st # show what will be checked in hg add # mark new files as to be added hg ci -m "Initial checkin" # the actual checkin and voila! you have a version-controlled project! Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python workspace - IDE and version control
On Tue, Jan 19, 2010 at 9:12 AM, Andreas Kostyrka wrote: > The cool part about git that I've not yet replicated with hg is git add -p > which allows you to seperate out > different changes in the same file. Sounds like the record and crecord extensions come close, anyway: http://mercurial.selenic.com/wiki/RecordExtension http://mercurial.selenic.com/wiki/CrecordExtension Kent ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python workspace - IDE and version control
"Kent Johnson" wrote I use plain old RCS for version control because its just me working on the code. hg init # create a repository md RCS in rcs hg st # show what will be checked in hg add # mark new files as to be added Don't need any of that stuff hg ci -m "Initial checkin" # the actual checkin ci foo.py in rcs and voila! you have a version-controlled project! I prefer RCS - two commands is all you need (ci/co) :-) Alan G. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python workspace - IDE and version control
> >>> I use plain old RCS for version control because its just me working >> I prefer RCS - two commands is all you need (ci/co) :-) > > Certainly, OTOH, you get only file based commits, no upgrade path > should you ever decide that you need to go multiuser > (and multiuser can be just you with two different places, Well, you get tags which allow you to check in/out a whole project at a time if need be. And RCS does allow multi user and server based working (just by locating the RCS folder there!). In fact the biggest project I ever worked on had around 3.5 million lines of C++ in 10,000 source files in over 200 folders and it was all controlled using RCS and makefiles. And branching and merging are all standard features too. (We had over 400 developers working off the repositories with 4 or 5 branches active at any one time - but CVS would have been much easier if it had been available at the time - v1.0 was just released the same year we started work - 1990!) But modern tools are much better I agree. And at work, as I said, we use subversion (and CVS on older projects). In my time I've also used several heavyweight version and configuration control tools - ranging in price from a few hundred pounds to several hundred thousand dollars. The best by a long shot is ClearCase on Unix, although Aide de Camp is also good. But these both cost For my home use, the biggest Python project I've done had less than 10 files in a single folder plus some imported modules from my personal collection so RCS is more than adequate. Alan G. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] create an object from a class in dll with ctypes?
hi, i am getting started with ctypes in python 2.5 and was wondering if i would be able to create an object from the class in my dll somehow. I only found examples that show how to access a function but the function i want to call is part of a cpp class in my dll... Thank you, katrin___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] create an object from a class in dll with ctypes?
"katrin schmid" wrote in message news:0edda7dddff84d639352526b72dbf...@katissspc... hi, i am getting started with ctypes in python 2.5 and was wondering if i would be able to create an object from the class in my dll somehow. I only found examples that show how to access a function but the function i want to call is part of a cpp class in my dll... ctypes is for C DLLs. You might want to look at www.swig.org instead. -Mark ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python workspace - IDE and version control
"Kent Johnson" wrote in message news:1c2a2c591001190905u28db4464hc1d1461ad26e9...@mail.gmail.com... On Tue, Jan 19, 2010 at 9:12 AM, Andreas Kostyrka wrote: The cool part about git that I've not yet replicated with hg is git add -p which allows you to seperate out different changes in the same file. Sounds like the record and crecord extensions come close, anyway: http://mercurial.selenic.com/wiki/RecordExtension http://mercurial.selenic.com/wiki/CrecordExtension TortoiseHg's commit GUI allows this. -Mark ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor