Re: 2to3 chokes on bad character

2011-02-24 Thread John Machin
On Feb 23, 7:47 pm, "Frank Millman"  wrote:
> Hi all
>
> I don't know if this counts as a bug in 2to3.py, but when I ran it on my
> program directory it crashed, with a traceback but without any indication of
> which file caused the problem.
>
[traceback snipped]

> UnicodeDecodeError: 'utf8' codec can't decode byte 0x92 in position 5055:
> invalid start byte
>
> On investigation, I found some funny characters in docstrings that I
> copy/pasted from a pdf file.
>
> Here are the details if they are of any use. Oddly, I found two instances
> where characters 'look like' apostrophes when viewed in my text editor, but
> one of them was accepted by 2to3 and the other caused the crash.
>
> The one that was accepted consists of three bytes - 226, 128, 153 (as
> reported by python 2.6)

How did you incite it to report like that? Just use repr(the_3_bytes).
It'll show up as '\xe2\x80\x99'.

 >>> from unicodedata import name as ucname
 >>> ''.join(chr(i) for i in (226, 128, 153)).decode('utf8')
 u'\u2019'
 >>> ucname(_)
 'RIGHT SINGLE QUOTATION MARK'

What you have there is the UTF-8 representation of U+2019 RIGHT SINGLE
QUOTATION MARK. That's OK.

 or 226, 8364, 8482 (as reported by python3.2).

Sorry, but you have instructed Python 3.2 to commit a nonsense:

 >>> [ord(chr(i).decode('cp1252')) for i in (226, 128, 153)]
 [226, 8364, 8482]

In other words, you have taken that 3-byte sequence, decoded each byte
separately using cp1252 (aka "the usual suspect") into a meaningless
Unicode character and printed its ordinal.

In Python 3, don't use repr(); it has undergone the MHTP
transformation and become ascii().

>
> The one that crashed consists of a single byte - 146 (python 2.6) or 8217
> (python 3.2).

 >>> chr(146).decode('cp1252')
 u'\u2019'
 >>> hex(8217)
 '0x2019'


> The issue is not that 2to3 should handle this correctly, but that it should
> give a more informative error message to the unsuspecting user.

Your Python 2.x code should be TESTED before you poke 2to3 at it. In
this case just trying to run or import the offending code file would
have given an informative syntax error (you have declared the .py file
to be encoded in UTF-8 but it's not).

> BTW I have always waited for 'final releases' before upgrading in the past,
> but this makes me realise the importance of checking out the beta versions -
> I will do so in future.

I'm willing to bet that the same would happen with Python 3.1, if a
3.1 to 3.2 upgrade is what you are talking about



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: 2to3 chokes on bad character

2011-02-24 Thread John Machin
On Feb 25, 12:00 am, Peter Otten <[email protected]> wrote:
> John Machin wrote:

> > Your Python 2.x code should be TESTED before you poke 2to3 at it. In
> > this case just trying to run or import the offending code file would
> > have given an informative syntax error (you have declared the .py file
> > to be encoded in UTF-8 but it's not).
>
> The problem is that Python 2.x accepts arbitrary bytes in string constants.

Ummm ... isn't that a bug? According to section 2.1.4 of the Python
2.7.1 Language Reference Manual: """The encoding is used for all
lexical analysis, in particular to find the end of a string, and to
interpret the contents of Unicode literals. String literals are
converted to Unicode for syntactical analysis, then converted back to
their original encoding before interpretation starts ..."""

How do you reconcile "used for all lexical analysis" and "String
literals are converted to Unicode for syntactical analysis" with the
actual (astonishing to me) behaviour?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: py3k: converting int to bytes

2011-02-24 Thread John Machin
On Feb 25, 4:39 am, Terry Reedy wrote:

> Note: an as yet undocumented feature of bytes (at least in Py3) is that
> bytes(count) == bytes()*count == b'\x00'*count.

Python 3.1.3 docs for bytes() say same constructor args as for
bytearray(); this says about the source parameter: """If it is an
integer, the array will have that size and will be initialized with
null bytes"""
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: getting text out of an xml string

2011-03-04 Thread John Machin
On Mar 5, 6:53 am, JT  wrote:
> Yo,
>
>  So I have almost convinced a small program to do what I want it to
> do.  One thing remains (at least, one thing I know of at the moment):
> I am converting xml to some other format, and there are strings in the
> xml like this.
>
> The python:
>
> elif v == "content":
>                 print "content", a.childNodes[0].nodeValue
>
> what gets printed:
>
> content \u3c00note xml:space="preserve"\u3e00see forms in red inbox
> \u3c00/note\u3e00
>
> what this should say is "see forms in red inbox" because that is what
> the the program whose xml file i am trying to convert, properly
> displays, because that is what I typed in oh so long ago.  So my
> question to you is, how can I convert this "enhanced" version to a
> normal string?  Esp. since there is this "xml:space="preserve"" thing
> in there ... I suspect the rest is just some unicode issue.  Thanks
> for any help.
>
>        J "long time no post" T

Your data has been FUABARred (the first A being for Almost) -- the
"\u3c00" and "\u3e00" were once "<" and ">" respectively. You will
need to show (a) a snippet of the xml file including the data that has
the problem (b) the code that you have written, cut down to a small
script that is runnable and displays the problem. Tell us what version
of Python you are running, on what OS.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: getting text out of an xml string

2011-03-05 Thread John Machin
On Mar 5, 8:57 am, JT  wrote:
> On Mar 4, 9:30 pm, John Machin  wrote:
>
> > Your data has been FUABARred (the first A being for Almost) -- the
> > "\u3c00" and "\u3e00" were once "<" and ">" respectively. You will
>
> Hi John,
>
>    I realized that a few minutes after posting.  I then realized that
> I could just extract the text between the stuff with \u3c00 xml
> preserve etc, which I did; it was good enough since it was a one-off
> affair, I had to convert a to-do list from one program to another.
> Thanks for replying and sorry for the noise :-)

Next time you need to extract some data from an xml file, please (for
your own good) don't do whatever you did in that code -- note that the
unicode equivalent of "<" is u"\u003c", NOT u"\u3c00"; I wasn't joking
when I said it had been FU.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Snowball to Python compiler

2011-04-21 Thread John Machin

On Friday, April 22, 2011 8:05:37 AM UTC+10, Matt Chaput wrote:

> I'm looking for some code that will take a Snowball program and compile 
> it into a Python script. Or, less ideally, a Snowball interpreter 
> written in Python.
> 
> (http://snowball.tartarus.org/)

If anyone has done such things they are not advertising them in the usual 
places.

A third (more-than-) possible solution: google("python snowball"); the first 
page of results has at least 3 hits referring to Python wrappers for Snowball.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: codec for UTF-8 with BOM

2011-05-02 Thread John Machin
On Monday, 2 May 2011 19:47:45 UTC+10, Chris Rebert  wrote:
> On Mon, May 2, 2011 at 1:34 AM, Ulrich Eckhardt
>  wrote:

> The correct name, as you found below and as is corroborated by the
> webpage, seems to be "utf_8_sig":
> >>> u"FOøbar".encode('utf_8_sig')
> '\xef\xbb\xbfFO\xc3\xb8bar'

To complete the picture, decoding swallows the BOM:

 >>> '\xef\xbb\xbfFO\xc3\xb8bar'.decode('utf_8_sig')
 u'FO\xf8bar'

-- 
http://mail.python.org/mailman/listinfo/python-list


codecs.open() doesn't handle platform-specific line terminator

2011-05-09 Thread John Machin
According to the 3.2 docs
(http://docs.python.org/py3k/library/codecs.html#codecs.open),

"""Files are always opened in binary mode, even if no binary mode was
specified. This is done to avoid data loss due to encodings using 8-bit
values. This means that no automatic conversion of b'\n' is done on
reading and writing."""

The first point is that one would NOT expect "conversion of b'\n'" anyway.
One expects '\n' -> os.sep.encode(the_encoding) on writing and vice versa
on reading.

The second point is that there is no such restriction with the built-in
open(), which appears to work as expected, doing (e.g. Windows, UTF-16LE)
'\n' -> b'\r\x00\n\x00' when writing and vice versa on reading, and not
striking out when thrown curve balls like '\u0a0a'.

Why is codecs.open() different? What does "encodings using 8-bit values"
mean? What data loss?



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 8:51 am, harrismh777 wrote:
> Is it true that if I am
> working without using bytes sequences that I will not need to care about
> the encoding anyway, unless of course I need to specify a unicode code
> point?

Quite the contrary.

(1) You cannot work without using bytes sequences. Files are byte
sequences. Web communication is in bytes. You need to (know / assume / be
able to extract / guess) the input encoding. You need to encode your
output using an encoding that is expected by the consumer (or use an
output method that will do it for you).

(2) You don't need to use bytes to specify a Unicode code point. Just use
an escape sequence e.g. "\u0404" is a Cyrillic character.



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: urllib2 request with binary file as payload

2011-05-11 Thread John Machin
On Thu, May 12, 2011 10:20 am, Michiel Sikma wrote:
> Hi there,
> I made a small script implementing a part of Youtube's API that allows
> you to upload videos. It's pretty straightforward and uses urllib2.
> The script was written for Python 2.6, but the server I'm going to use
> it on only has 2.5 (and I can't update it right now, unfortunately).
> It seems that one vital thing doesn't work in 2.5's urllib2:
>
> --
>
> data = open(video['filename'], 'rb')
>
> opener = urllib2.build_opener(urllib2.HTTPHandler)
> req = urllib2.Request(settings['upload_location'], data, {
>   'Host': 'uploads.gdata.youtube.com',
>   'Content-Type': video['type'],
>   'Content-Length': '%d' % os.path.getsize(video['filename'])
> })
> req.get_method = lambda: 'PUT'
> url = opener.open(req)
>
> --
>
> This works just fine on 2.6:
> send: 
> sendIng a read()able
>
> However, on 2.5 it refuses:
> Traceback (most recent call last):
[snip]
> TypeError: sendall() argument 1 must be string or read-only buffer, not
> file

I don't use this stuff, just curious. But I can read docs. Quoting from
the 2.6.6 docs:

"""
class urllib2.Request(url[, data][, headers][, origin_req_host][,
unverifiable])
This class is an abstraction of a URL request.

url should be a string containing a valid URL.

data may be a string specifying additional data to send to the server, or
None if no such data is needed. Currently HTTP requests are the only ones
that use data; the HTTP request will be a POST instead of a GET when the
data parameter is provided. data should be a buffer in the standard
application/x-www-form-urlencoded format. The urllib.urlencode() function
takes a mapping or sequence of 2-tuples and returns a string in this
format.
"""

2.6 is expecting a string, according to the above. No mention of file.
Moreover it expects the data to be urlencoded. 2.7.1 docs say the same
thing. Are you sure you have shown the code that worked with 2.6?


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 11:22 am, harrismh777 wrote:
> John Machin wrote:
>> (1) You cannot work without using bytes sequences. Files are byte
>> sequences. Web communication is in bytes. You need to (know / assume /
>> be
>> able to extract / guess) the input encoding. You need to encode your
>> output using an encoding that is expected by the consumer (or use an
>> output method that will do it for you).
>>
>> (2) You don't need to use bytes to specify a Unicode code point. Just
>> use
>> an escape sequence e.g. "\u0404" is a Cyrillic character.
>>
>
> Thanks John.  In reverse order, I understand point (2). I'm less clear
> on point (1).
>
> If I generate a string of characters that I presume to be ascii/utf-8
> (no \u0404 type characters)
> and write them to a file (stdout) how does
> default encoding affect that file.by default..?   I'm not seeing that
> there is anything unusual going on...

About """characters that I presume to be ascii/utf-8 (no \u0404 type
characters)""": All Unicode characters (including U+0404) are encodable in
bytes using UTF-8.

The result of sys.stdout.write(unicode_characters) to a TERMINAL depends
mostly on sys.stdout.encoding. This is likely to be UTF-8 on a
linux/OSX/platform. On a typical American / Western European /[former]
colonies Windows box, this is likely to be cp850 on a Command Prompt
window, and cp1252 in IDLE.

UTF-8: All Unicode characters are encodable in UTF-8. Only problem arises
if the terminal can't render the character -- you'll get spaces or blobs
or boxes with hex digits in them or nothing.

Windows (Command Prompt window): only a small subset of characters can be
encoded in e.g. cp850; anything else causes an exception.

Windows (IDLE): ignores sys.stdout.encoding and renders the characters
itself. Same outcome as *x/UTF-8 above.

If you write directly (or sys.stdout is redirected) to a FILE, the default
encoding is obtained by sys.getdefaultencoding() and is AFAIK ascii unless
the machine's site.py has been fiddled with to make it UTF-8 or something
else.

>   If I open the file with vi?  If
> I open the file with gedit?  emacs?

Any editor will have a default encoding; if that doesn't match the file
encoding, you have a (hopefully obvious) problem if the editor doesn't
detect the mismatch. Consult your editor's docs or HTFF1K.

> Another question... in mail I'm receiving many small blocks that look
> like sprites with four small hex codes, scattered about the mail...
> mostly punctuation, maybe?   ... guessing, are these unicode code
> points,

yes

> and if so what is the best way to 'guess' the encoding?

google("chardet") or rummage through the mail headers (but 4 hex digits in
a box are a symptom of inability to render, not necessarily caused by an
incorrect decoding)

 ... is
> it coded in the stream somewhere...protocol?

Should be.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 1:44 pm, harrismh777 wrote:
> By
> default it looks like Python3 is writing output with UTF-8 as default...
> and I thought that by default Python3 was using either UTF-16 or UTF-32.
> So, I'm confused here...  also, I used the character sequence \u00A3
> which I thought was UTF-16... but Python3 changed my intent to  'c2a3'
> which is the normal UTF-8...

Python uses either a 16-bit or a 32-bit INTERNAL representation of Unicode
code points. Those NN bits have nothing to do with the UTF-NN encodings,
which can be used to encode the codepoints as byte sequences for EXTERNAL
purposes. In your case, UTF-8 has been used as it is the default encoding
on your platform.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode by default

2011-05-11 Thread John Machin
On Thu, May 12, 2011 2:14 pm, Benjamin Kaplan wrote:
>
> If the file you're writing to doesn't specify an encoding, Python will
> default to locale.getdefaultencoding(),

No such attribute. Perhaps you mean locale.getpreferredencoding()



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode by default

2011-05-12 Thread John Machin
On Thu, May 12, 2011 4:31 pm, harrismh777 wrote:

>
> So, the UTF-16 UTF-32 is INTERNAL only, for Python

NO. See one of my previous messages. UTF-16 and UTF-32, like UTF-8 are
encodings for the EXTERNAL representation of Unicode characters in byte
streams.

> I also was not aware that UTF-8 chars could be up to six(6) byes long
> from left to right.

It could be, once upon a time in ISO faerieland, when it was thought that
Unicode could grow to 2**32 codepoints. However ISO and the Unicode
consortium have agreed that 17 planes is the utter max, and accordingly a
valid UTF-8 byte sequence can be no longer than 4 bytes ... see below

>>> chr(17 * 65536)
Traceback (most recent call last):
  File "", line 1, in 
ValueError: chr() arg not in range(0x11)
>>> chr(17 * 65536 - 1)
'\U0010'
>>> _.encode('utf8')
b'\xf4\x8f\xbf\xbf'
>>> b'\xf5\x8f\xbf\xbf'.decode('utf8')
Traceback (most recent call last):
  File "", line 1, in 
  File "C:\python32\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xf5 in position 0:
invalid start byte


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing a search string

2004-12-31 Thread John Machin
Andrew Dalke wrote:
> "It's me" wrote:
> > Here's a NDFA for your text:
> >
> >b  0 1-9 a-Z ,  . +  -   '   " \n
> > S0: S0 E   E  S1  E E E S3 E S2  E
> > S1: T1 E   E  S1  E E E  E  E  E T1
> > S2: S2 E   E  S2  E E E  E  E T2  E
> > S3: T3 E   E  S3  E E E  E  E  E T3
>
> Now if I only had an NDFA for parsing that syntax...

Parsing your sentence as written ("if I only had"): If you were the
sole keeper of the secret??

Parsing it as intended ("if only I had"), and ignoring the smiley:
Looks like a fairly straight-forward state-transition table to me. The
column headings are not aligned properly in the message, b means blank,
a-Z is bletchworthy, but the da Vinci code it ain't.

If only we had an NDFA (whatever that is) for guessing what acronyms
mean ...

Where I come from:
DFA = deterministic finite-state automaton
NFA = non-det..
SFA = content-free
NFI = concept-free
NDFA = National Dairy Farmers' Association

HTH, and Happy New Year!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Calling Function Without Parentheses!

2005-01-02 Thread John Machin
Kamilche wrote:
> What a debug nightmare! I just spent HOURS running my script through
> the debugger, sprinkling in log statements, and the like, tracking
down
> my problem.
>
> I called a function without the ending parentheses. I sure do WISH
> Python would trap it when I try to do the following:
> MyFunc
>
> instead of:
>
> MyFunc()
>
> h.

Aaaah indeed. You must be using an extremely old version of
pychecker. The version I have in my Python22 directory gave the same
results as the current one; see below.

C:\junk>type noparens.py
[bangs inserted to defeat Google's lstrip()
!def bar():
!   foo
!def foo():
!   alist = []
!   alist.sort



C:\junk>pychecker noparens.py

C:\junk>c:\python24\python.exe
c:\python22\Lib\site-packages\pychecker\checker.py noparens.py
Processing noparens...

Warnings...

noparens.py:2: Statement appears to have no effect
noparens.py:5: Statement appears to have no effect

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Calling Function Without Parentheses!

2005-01-02 Thread John Machin

Dan Bishop wrote:
> Kamilche wrote:
> > What a debug nightmare! I just spent HOURS running my script
through
> > the debugger, sprinkling in log statements, and the like, tracking
> down
> > my problem.
> >
> > I called a function without the ending parentheses. I sure do WISH
> > Python would trap it when I try to do the following:
> > MyFunc
> >
> > instead of:
> >
> > MyFunc()
>
> You're a former Pascal programmer, aren't you? ;-)
>
> In Python, it's not an error, because you can do things like:
>
> >>> def simpson(f, a, b):
> ..."Simpson's Rule approximation of the integral of f on [a, b]."
> ...return (b - a) * (f(a) + 4 * f((a + b) / 2.0) + f(b)) / 6.0
> ...
> >>> simpson(math.sin, 0.0, math.pi) # Note that math.sin is a
function
> 2.0943951023931953

In Python, it's not an error, because functions are first class
citizens. The OP's problem is evaluating an expression and then doing
SFA with the result. Pychecker appears to be able to make the
distinction; see below:

C:\junk>type simpson.py
import math
def simpson(f, a, b):
return (b - a) * (f(a) + 4 * f((a + b) / 2.0) + f(b)) / 6.0
print simpson(math.sin, 0.0, math.pi)

C:\junk>python simpson.py
2.09439510239

C:\junk>pychecker simpson.py

C:\junk>c:\python24\python.exe
c:\python22\Lib\site-packages\pychecker\checker.py simpson.py
Processing simpson...
2.09439510239

Warnings...

None

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Py2exe and extension issues

2005-01-03 Thread John Machin

[EMAIL PROTECTED] wrote:
> Is anyone aware of issues with Py2exe and extensions compiled with
> cygwin/mingw for Python 2.3?   I have an extension that wraps access
to
> some C DLLs.  The generated executable always segfaults at startup,
> although things work fine when running through the normal python
> interpreter.  I had a guess that perhaps the issue stems from my
> extension being compiled with cygwin and py2exe compiled with  Visual
> C++?

Some questions:
1. Did it work before (e.g. with Python 2.2, or an earlier version of
py2exe), or has it never worked?
2. Where at start-up does it "segfault"? Doesn't the "Dr Watson" log
file tell you anything? You may need to sprinkle prints and printfs
around your code.
3. Those C DLLs: supplied by whom -- you or an nth party? compiled with
which compiler?
4. Which version(s) of which Windows are you using?

Some hints:
1. Ask on the py2exe mailing list.
2. Your guess may be correct. The usual cause of such a problem is
getting a run-time-library resource on one side of the chasm and trying
to use it on the other side. When the resource is a pointer to a data
structure whose contents are not defined by some standard, anything can
go wrong, and usually does. Examples: (a) malloc() on one side and
free() on the other (b) fopen() on one side and fanything() on the
other. However I would expect these problems to show up under normal
interpreter use.
3. Using py2exe instead of python may merely be exposing a bug in your
code caused by e.g. an uninitialised variable. (-: When you say "things
work fine" in normal interpreter mode, where does this lie in the
continuum between "it ran regression tests and volume tests happily all
night" and "I fired it up and it didn't fall over"? :-)

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed ain't bad

2005-01-03 Thread John Machin
Anders J. Munch wrote:
> Another way is the strategy of "it's easier to ask forgiveness than
to
> ask permission".
> If you replace:
> if(not os.path.isdir(zfdir)):
> os.makedirs(zfdir)
> with:
> try:
> os.makedirs(zfdir)
> except EnvironmentError:
> pass
>
> then not only will your script become a micron more robust, but
> assuming zfdir typically does not exist, you will have saved the call
> to os.path.isdir.

1. Robustness: Both versions will "crash" (in the sense of an unhandled
exception) in the situation where zfdir exists but is not a directory.
The revised version just crashes later than the OP's version :-(
Trapping EnvironmentError seems not very useful -- the result will not
distinguish (on Windows 2000 at least) between the 'existing dir' and
'existing non-directory' cases.


Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
>>> import os, os.path
>>> os.path.exists('fubar_not_dir')
True
>>> os.path.isdir('fubar_not_dir')
False
>>> os.makedirs('fubar_not_dir')
Traceback (most recent call last):
File "", line 1, in ?
File "c:\Python24\lib\os.py", line 159, in makedirs
mkdir(name, mode)
OSError: [Errno 17] File exists: 'fubar_not_dir'
>>> try:
...os.mkdir('fubar_not_dir')
... except EnvironmentError:
...print 'trapped env err'
...
trapped env err
>>> os.mkdir('fubar_is_dir')
>>> os.mkdir('fubar_is_dir')
Traceback (most recent call last):
File "", line 1, in ?
OSError: [Errno 17] File exists: 'fubar_is_dir'
>>>

2. Efficiency: I don't see the disk I/O inefficiency in calling
os.path.isdir() before os.makedirs() -- if the relevant part of the
filesystem wasn't already in memory, the isdir() call would make it so,
and makedirs() would get a free ride, yes/no?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: emulating an and operator in regular expressions

2005-01-03 Thread John Machin

Terry Reedy wrote:
> "Craig Ringer" <[EMAIL PROTECTED]> wrote in message
> news:[EMAIL PROTECTED]
> > On Mon, 2005-01-03 at 08:52, Ross La Haye wrote:
> >> How can an and operator be emulated in regular expressions in
Python?
>
> Regular expressions are designed to define and detect repetition and
> alternatives.  These are easily implemented with finite state
machines.
> REs not meant for conjunction.  'And' can be done but, as I remember,
only
> messily and slowly.  The demonstration I once read was definitely
> theoretical, not practical.
>
> Python was designed for and logic (among everything else).  If you
want
> practical code, use it.
>
> if match1 and match2: do whatever.
>

Provided you are careful to avoid overlapping matches e.g. data = 'Fred
Johnson', query = ('John', 'Johnson').

Even this approach (A follows B or B follows A) gets tricky in the real
world of the OP, who appears to be attempting some sort of name
matching, where the word order may be scrambled. Problem is, punters
can have more than 2 words in their names, e.g. Mao Ze Dong[*], Louise
de la Valliere, and Johann Georg Friedrich von und zu Hohenlohe ... or
misreading handwriting can change the number of perceived words, e.g.
Walenkamp -> Wabu Kamp (no kidding).

[*] aka Mao Zedong aka Mao Tse Tung -- difficult enough before we start
considering variations in the order of the words.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why does UserDict.DictMixin use keys instead of __iter__?

2005-01-04 Thread John Machin

Steven Bethard wrote:
> Sorry if this is a repost -- it didn't appear for me the first time.
>
>
> So I was looking at the Language Reference's discussion about
emulating
> container types[1], and nowhere in it does it mention that .keys() is
> part of the container protocol.

I don't see any reference to a "container protocol". What I do see is
(1) """It is also recommended that mappings provide the methods keys(),
..."""
(2) """The UserDict module provides a DictMixin class to help create
those methods from a base set of __getitem__(), __setitem__(),
__delitem__(), and keys(). """

> Because of this, I would assume that to
> use UserDict.DictMixin correctly, a class would only need to define
> __getitem__, __setitem__, __delitem__ and __iter__.

So I can't see why would you assume that, given that the docs say in
effect "you supply get/set/del + keys as the building blocks, the
DictMixin class will provide the remainder". This message is reinforced
in the docs for UserDict itself.

> So why does
> UserDict.DictMixin require keys() to be defined?

Because it was a reasonable, documented, design?

In any case, isn't UserDict past history? Why are you mucking about
with it?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reaching the real world

2005-01-04 Thread John Machin

Fuzzyman wrote:
> I have a friend who would like to move and program lights and other
> electric/electro-mechanical devices by computer. I would like to help
-
> and needless to say Python would be an ideal language for the
> 'programmers interface'.

Try Googling for "Python X10"

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why does UserDict.DictMixin use keys instead of __iter__?

2005-01-04 Thread John Machin

Steven Bethard wrote:
> John Machin wrote:
> > Steven Bethard wrote:
> >
> >>So I was looking at the Language Reference's discussion about
> >>emulating container types[1], and nowhere in it does it mention
that
> >> .keys() is part of the container protocol.
> >
> > I don't see any reference to a "container protocol".
>
> Sorry, I extrapolated "container protocol" from this statement:
>
> "Containers usually are sequences (such as lists or tuples) or
mappings
> (like dictionaries), but can represent other containers as well. The
> first set of methods is used either to emulate a sequence or to
emulate
> a mapping"
>
> and the fact that there is a "sequence protocol" and a "mapping
protocol".
>
> But all I was really reading from this statement was that the "first
set
> of methods" (__len__, __getitem__, __setitem__, __delitem__ and
> __iter__) were more integral than the second set of methods (keys(),
> values(), ...).
>
>
> > What I do see is
> > (1) """It is also recommended that mappings provide the methods
keys(),
> > ..."""
>
> You skipped the remaining 13 methods in this list:
>
> "It is also recommended that mappings provide the methods keys(),
> values(), items(), has_key(), get(), clear(), setdefault(),
iterkeys(),
> itervalues(), iteritems(), pop(), popitem(), copy(), and update()
> behaving similar to those for Python's standard dictionary objects."
>
> This is the "second set of methods" I mentioned above.  I don't
> understand why the creators of UserDict.DictMixin decided that
keys(),
> from the second list, is more important than __iter__, from the first
list.
>
>
> >>Because of this, I would assume that to
> >>use UserDict.DictMixin correctly, a class would only need to define
> >>__getitem__, __setitem__, __delitem__ and __iter__.
> >
> >
> > So I can't see why would you assume that, given that the docs say
in
> > effect "you supply get/set/del + keys as the building blocks, the
> > DictMixin class will provide the remainder". This message is
reinforced
> > in the docs for UserDict itself.
>
> Sorry, my intent was not to say that I didn't know from the docs that

> UserDict.DictMixin required keys().  Clearly it's documented.

Sorry, the combination of (a) "assume X where not(X) is documented" and
(b) posting of tracebacks that demonstrated behaviour that is both
expected and documented lead to my making an unwarranted assumption :-)

> My
> question was *why* does it use keys()?  Why use keys() when keys()
can
> be derived from __iter__, and __iter__ IMHO looks to be a more basic
> part of the mapping protocol.

Now that I understand your question: Hmmm, good question. __iter__
arrived (2.2) before DictMixin (2.3), so primacy is not the reason.
Ease of implementation by the user of DictMixin: probably not, "yield
akey" vs "alist.append(akey)" -- not much in it in Python, different
story in C, but a C extension wouldn't be using DictMixin anyway.
>
> > In any case, isn't UserDict past history? Why are you mucking about
> > with it?
>
> UserDict is past history, but DictMixin isn't.

OK, I'll rephrase: what is your interest in DictMixin?

My interest: I'm into mappings that provide an approximate match
capability, and have a few different data structures that I'd like to
implement as C types in a unified manner. The plot includes a base type
that, similarly to DictMixin, provides all the non-basic methods.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pythonic search of list of dictionaries

2005-01-04 Thread John Machin
Bulba! wrote:
[big snip]

Forget the csv-induced dicts for the moment, they're just an artifact
of your first solution attempt. Whether english = csv_row[1], or
english = csv_row_dict["english"], doesn't matter yet. Let's take a few
steps back, and look at what you are trying to do through a telescope
instead of a microscope.

Core basic root problem: you have a table with 3 columns, id, english,
polish. Nothing is said about id so let's assume nothing. Evidently
contents of "english" is the "key" for this exercise, but it's not
necessarily unique. You have two instances of the table, and you want
to "diff" them using "english" as the key.

You want to collect together all rows that have the same value of
"english", and then process them somehow. You need to define an object
containing a list of all "left" rows and a list of all "right" rows.

Processing the contents of the object: do whatever you have to with
obj.left_list and obj.right_list depending on the lengths; you have 3
cases of length to consider (0, 1, many). 3 x 3 = 9 but if you get both
zero you have a bug (which you should of course assert does not exist),
so that leaves 8 cases to think about.

Now, how do we get the rows together:

(a) The sort method

Step 1: sort each dataset on (english, id, polish) or (english, polish,
id) -- your choice; sorting on the whole record instead just english
makes the ordering predictable and repeatable.

Step 2: read the two sorted datasets ("left" and "right") in parallel:

when left key < right key: do_stuff(); read another left record
when left key > right key: converse
when left ley == right key: do_stuff(); read another record for both
where do_stuff() includes appending to left_list and right_list as
appropriate and at the right moment, process a completed object.
This is a little tricky, and handling end of file needs a little care.
However this algorithm can be implemented with minimal memory ("core"),
no disk drive at all :-O and a minimum of 3 (preferably 4+)
serially-readable rewindable re-writable storage devices e.g magnetic
tape drives. Once upon a time, it was *the* way of maintaining a
database (left = old database, right = transactions, output file -> new
database).

(b) Fast forwarding 30+ years, let's look at the dictionary method,
assuming you have enough memory to hold all your data at once:

Step 1: read the "left" table; for each row, if english not in mydict,
then do mydict[english] = MyObject(). In any case, do
mydict[english].left_list.append(row)
Step 2: same for the "right" table.
Step 3: for english, obj in mydict.iteritems(): process(english, obj)

As your datasets are stored in MS Excel spreadsheets, N < 64K so
whether your solution is O(N) or O(N*log(N)) doesn't matter too much.
You are however correct to avoid O(N**2) solutions.

Hoping this sketch of the view through the telescope (and the
rear-vision mirror!) is helpful,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed revisited

2005-01-08 Thread John Machin

Bulba! wrote:
> On 4 Jan 2005 14:33:34 -0800, "John Machin" <[EMAIL PROTECTED]>
> wrote:
>
> >(b) Fast forwarding 30+ years, let's look at the dictionary method,
> >assuming you have enough memory to hold all your data at once:
> >
> >Step 1: read the "left" table; for each row, if english not in
mydict,
> >then do mydict[english] = MyObject(). In any case, do
> >mydict[english].left_list.append(row)
> >Step 2: same for the "right" table.
> >Step 3: for english, obj in mydict.iteritems(): process(english,
obj)
>
> >As your datasets are stored in MS Excel spreadsheets, N < 64K so
> >whether your solution is O(N) or O(N*log(N)) doesn't matter too
much.
> >You are however correct to avoid O(N**2) solutions.
>
> Following advice of two posters here (thanks) I have written two
> versions of  the same program, and both of them work, but the
> difference in speed is drastic, about 6 seconds vs 190 seconds
> for about 15000 of processed records, taken from 2 lists of
> dictionaries.
>
> I've read "Python Performance Tips" at
>
> http://manatee.mojam.com/~skip/python/fastpython.html
>
> ..but still don't understand why the difference is so big.
>
> Both versions use local variables, etc. Both have their
> lists initially sorted. Both essentially use a loop with
> conditional for comparison,
> then process the record in the
> same way.

"process the record in the same way"??? That's an interesting use of
"same".

> The overhead of second version is that it also
> uses cmp() function and two additional integer
> variables - that should not slow the program _so much_.
>
> I have measured the run of snippet 2 with time checkpoints
> written to a log (write time delta to log every 200 records),
> even made a graph of time deltas in spreadsheet and in fact
> snippet 2 seems after initial slowdown looks exactly linear,
> like  that:
>
> ^ (time)
> |
> |  /---
> | /
> |/
> ---> (# of records written)
>
> So yes, it would scale to big files.

On your empirical evidence, as presented. However, do read on ...

>However, why is it so
> frigging slow?!

Mainly, because you are (unnecessarily) deleting the first item of a
list. This requires copying the remaining items. It is O(N), not O(1).
You are doing this O(N) times, so the overall result is O(N**2). Your
graph has no obvious explanation; after how many cycles does the speed
become constant?

Secondly, you are calling cmp() up to THREE times when once is enough.
Didn't it occur to you that your last elif needed an else to finish it
off, and the only possible action for the else suite was "assert
False"?

It would appear after reading your "snippet 2" a couple of times that
you are trying to implement the old 3-tape update method.

It would also appear that you are assuming/hoping that there are never
more than one instance of a phrase in either list.

You need something a little more like the following.

Note that in your snippet2 it was not very obvious what you want to do
in the case where a phrase is in "new" but not in "old", and vice versa
-- under one circumstance (you haven't met "end of file") you do
nothing but in the the other circumstance you do something but seem to
have not only a polarity problem but also a copy-paste-edit problem. In
the following code I have abstracted the real requirements as
handle_XXX_unmatched()

!o = n = 0
!lenold = len(oldl)
!lennew = len(newl)
!while o < lenold and n < lennew:
!cmp_result = cmp(oldl[o]['English'], newl[n]['English'])
!if cmp_result == 0:
!# Eng phrase is in both "new" and "old"
!cm.writerow(matchpol(oldl[o], newl[n]))
!o += 1
!n += 1
!elif cmp_result < 0:
!# Eng phrase is in "old", not in "new"
!handle_old_unmatched(o)
!o += 1
!else:
!assert cmp_result > 0 # :-)
!# Eng phrase is in "new", not in "old"
!handle_new_unmatched(n)
!n += 1
!while o < lenold:
!# EOF on new, some old remain
!handle_old_unmatched(o)
!o += 1
!while n < lennew:
!# EOF on old, some new remain
!handle_new_unmatched(n)
!n += 1

Some general hints: Try stating your requirements clearly, to yourself
first and then to us. Try to ensure that your code is meeting those
requirements before you bother timing it. Try not to use single-letter
names -- in particularly using l (that's "L".lower(), not 1 i.e.
str(4-3)) is barf-inducing and makes people likely not to want to read
your code.

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed revisited

2005-01-09 Thread John Machin
Bulba! wrote:
> On 8 Jan 2005 18:25:56 -0800, "John Machin" <[EMAIL PROTECTED]>
> wrote:
>
> >Secondly, you are calling cmp() up to THREE times when once is
enough.
> >Didn't it occur to you that your last elif needed an else to finish
it
> >off, and the only possible action for the else suite was "assert
> >False"?
>
> Sure, but I was trying to make it shorter and absolutely clear
> what I mean in this place (when people see the comparison in every
> place, they see immediately what it was, they don't have to
> recall the variable). Obviously I'd optimise it in practice. It
> was supposed to be "proof of concept" rather than any working
> code.

Three is shorter than one? See immediately? I have to disagree. People
would be more put off by looking at your overly-complicated comparison
three times and having to check character-by-character that it was
doing exactly the same thing each time (i.e. your copy/paste had
worked) than by "recalling" a sensibly-named variable like
"cmp_result".

>
> BTW, this particular example didn't work out, but I still found
> Python to be the most convenient language for prototyping, I
> wrote smth like five versions of this thing, just to learn
> dictionaries, lists, sets, etc. In no other language I'd do
> it so quickly.

NOW we're on the same bus.

>
> >It would appear after reading your "snippet 2" a couple of times
that
> >you are trying to implement the old 3-tape update method.
>
> Well, ahem, I admit you got me curious just how it would work
> with tapes (never used them), so I was sort of  trying to simulate
> that - it's just a bit weird undertaking, I did it rather to explore
> the issue and try to learn smth rather than to get working code.

The 3-tape technique is worth understanding, for handling datasets that
won't fit into memory. More generally, when learning, you *need* to get
working code out of your practice exercises. Otherwise what are you
learning? You don't want to wait until you have two 10GB datasets to
"diff" before you start on thinking through, implementing and testing
what to do on end-of-file.

>
> Deleting the first item from a list was to be a convenient
> equivalent of forwarding the  reading head on the tape one
> record ahead, because I assumed that any deletion from the
> list was to take more or less the same time, just like reading
> heads on tapes were probably reading the records with similar
> speeds regardless of what length of tape was already read.

A reasonable assumption; however if the reading stopped and restarted,
an inordinate amount of time was taken accelerating the drive up to
reading speed again. The idea was to keep the tape moving at all times.
This required techiques like double-buffering, plus neat clean brief
fast processing routines.

... but you mixed in moving the indexes (with e.g. o+=1) with confusing
results.

Deleting the first item of a list in this circumstance reminds me of
the old joke: "Q: How many members of military organisation X does it
take to paint the barracks? A: 201, 1 to hold the paint-brush and 200
to lift the barracks and rotate it."

Tip 1: Once you have data in memory, don't move it, move a pointer or
index over the parts you are inspecting.

Tip 2: Develop an abhorrence of deleting data.

>
> >It would also appear that you are assuming/hoping that there are
never
> >more than one instance of a phrase in either list.
>
> Sure. Because using advice of Skip Montanaro I initially used sets
> to eliminate duplicates.

I see two problems with your implementation of Skip's not-unreasonable
suggestion. One: You've made the exercise into a multi-pass algorithm.
Two: As I posted earlier, you need to get all the instances of the same
key together at the same time. Otherwise you get problems. Suppose you
have in each list, two translations apple -> polish1 and apple ->
polish2. How can you guarantee that you remove the same duplicate from
each list, if you remove duplicates independently from each list?

> It's just not shown for brevity.

Then say so. On the evidence, brevity was not a plausible explanation
for the absence. :-)

> If the
> keys are guaranteed to be unique, it makes it easier to think
> about the algorithm.

Unfortunately in the real world you can't just imagine the problems
away. Like I said, you have to think about the requirements first -- 9
cases of (0, 1, many) x (0, 1, many); what do you need to do? Then
think about an implementation.

>
> >Note that in your snippet2 it was not very obvious what you want to
do
> >in the case where a phrase is in "new" but not in "old", and vice
versa
> >-- under one circumstance (you haven

Re: Python3: on removing map, reduce, filter

2005-01-09 Thread John Machin

Steven Bethard wrote:
> Note that list comprehensions are also C-implemented, AFAIK.

Rather strange meaning attached to "C-implemented". The implementation
generates the code that would have been generated had you written out
the loop yourself, with a speed boost (compared with the fastest DIY
approach) from using a special-purpose opcode LIST_APPEND. See below.

>>> def afunc(n):
...return [x*x for x in xrange(n)]
...
>>> afunc(3)
[0, 1, 4]
>>> import dis
>>> dis.dis(afunc)
2   0 BUILD_LIST   0
3 DUP_TOP
4 STORE_FAST   1 (_[1])
7 LOAD_GLOBAL  1 (xrange)
10 LOAD_FAST0 (n)
13 CALL_FUNCTION1
16 GET_ITER
>>   17 FOR_ITER17 (to 37)
20 STORE_FAST   2 (x)
23 LOAD_FAST1 (_[1])
26 LOAD_FAST2 (x)
29 LOAD_FAST2 (x)
32 BINARY_MULTIPLY
33 LIST_APPEND
34 JUMP_ABSOLUTE   17
>>   37 DELETE_FAST  1 (_[1])
40 RETURN_VALUE
>>> def bfunc(n):
...blist=[]; blapp=blist.append
...for x in xrange(n):
...   blapp(x*x)
...return blist
...
>>> bfunc(3)
[0, 1, 4]
>>> dis.dis(bfunc)
2   0 BUILD_LIST   0
3 STORE_FAST   3 (blist)
6 LOAD_FAST3 (blist)
9 LOAD_ATTR1 (append)
12 STORE_FAST   2 (blapp)

3  15 SETUP_LOOP  34 (to 52)
18 LOAD_GLOBAL  3 (xrange)
21 LOAD_FAST0 (n)
24 CALL_FUNCTION1
27 GET_ITER
>>   28 FOR_ITER20 (to 51)
31 STORE_FAST   1 (x)

4  34 LOAD_FAST2 (blapp)
37 LOAD_FAST1 (x)
40 LOAD_FAST1 (x)
43 BINARY_MULTIPLY
44 CALL_FUNCTION1
47 POP_TOP
48 JUMP_ABSOLUTE   28
>>   51 POP_BLOCK

5 >>   52 LOAD_FAST3 (blist)
55 RETURN_VALUE
>>>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Speed revisited

2005-01-09 Thread John Machin

Andrea Griffini wrote:
> On 9 Jan 2005 12:39:32 -0800, "John Machin" <[EMAIL PROTECTED]>
> wrote:
>
> >Tip 1: Once you have data in memory, don't move it, move a pointer
or
> >index over the parts you are inspecting.
> >
> >Tip 2: Develop an abhorrence of deleting data.
>
> I've to admit that I also found strange that deleting the
> first element from a list is not O(1) in python. My wild
> guess was that the extra addition and normalization required
> to have insertion in amortized O(1) and deletion in O(1) at
> both ends of a random access sequence was going to have
> basically a negligible cost for normal access (given the
> overhead that is already present in python).
>
> But I'm sure this idea is too obvious for not having been
> proposed, and so there must reasons for refusing it
> (may be the cost to pay for random access once measured was
> found being far from negligible, or that the extra memory
> overhead per list - one int for remembering where the live
> data starts - was also going to be a problem).
>

My wild guess: Not a common use case. Double-ended queue is a special
purpose structure.

Note that the OP could have implemented the 3-tape update simulation
efficiently by reading backwards i.e. del alist[-1]


Suggested projects for you, in increasing order of difficulty:

1. Grab the source code (listobject.c) and report back on how you would
implement your proposal.
2. Convince folk that your implementation is faster and more robust and
has beter internal documentation than anything the timbot could ever
write.
3. Write a PEP that didn't cause a flamewar.


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: unicode mystery

2005-01-11 Thread John Machin

Sean McIlroy wrote:
> I recently found out that unicode("\347", "iso-8859-1") is the
> lowercase c-with-cedilla, so I set out to round up the unicode
numbers
> of the extra characters you need for French, and I found them all
just
> fine EXCEPT for the o-e ligature (oeuvre, etc). I examined the
unicode
> characters from 0 to 900 without finding it; then I looked at
> www.unicode.org but the numbers I got there (0152 and 0153) didn't
> work. Can anybody put a help on me wrt this? (Do I need to give a
> different value for the second parameter, maybe?)

Characters that are in iso-8859-1 are mapped directly into Unicode.
That is, the first 256 characters of Unicode are identical to
iso-8859-1.

Consider this:

>>> c_cedilla = unicode("\347", "iso-8859-1")
>>> c_cedilla
u'\xe7'
>>> ord(c_cedilla)
231
>>> ord("\347")
231

What you did with c_cedilla "worked" because it was effectively doing
nothing. However if you do unicode(char, encoding) where char is not in
encoding, it won't "work".

As John Lenton has pointed out, if you find a character in the Unicode
tables, you can just use it directly. There is no need in this
circumstance to use unicode().

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Octal notation: severe deprecation

2005-01-11 Thread John Machin

Some poster wrote (in connexion with another topic):
> ... unicode("\347", "iso-8859-1") ...

Well, I haven't had a good rant for quite a while, so here goes:

I'm a bit of a retro specimen, being able (inter alia) to recall octal
opcodes from the ICT 1900 series (070=call, 072=exit, 074=branch, ...)
but nowadays I regard continued usage of octal as a pox and a
pestilence.

1. Octal notation is of use to systems programmers on computers where
the number of bits in a word is a multiple of 3. Are there any still in
production use? AFAIK word sizes were 12, 24, 36, 48, and 60 bits --
all multiples of 4, so hexadecimal could be used.

2. Consider the effect on the newbie who's never even heard of "octal":

>>> import datetime
>>> datetime.date(2005,01,01)
datetime.date(2005, 1, 1)
>>> datetime.date(2005,09,09)
File "", line 1
datetime.date(2005,09,09)
^
SyntaxError: invalid token

[straight out of the "BOFH Manual of Po-faced Error Messages"]

3. Consider this extract from the docs for the re module:
"""
\number
Matches the contents of the group of the same number. Groups are
numbered starting from 1. For example, (.+) \1 matches 'the the' or '55
55', but not 'the end' (note the space after the group). This special
sequence can only be used to match one of the first 99 groups. If the
first digit of number is 0, or number is 3 octal digits long, it will
not be interpreted as a group match, but as the character with octal
value number. Inside the "[" and "]" of a character class, all numeric
escapes are treated as characters.
"""

I helped to straighten out this description a few years ago, but I fear
it's still not 100% accurate. Worse, take a peek at the code necessary
to implement this.

===

We (un-Pythonically) implicitly take a leading zero (or even just
\[0-7]) as meaning octal, instead of requiring something explicit as
with hexadecimal. The variable length idea in strings doesn't help:
"\18", "\128" and "\1238" are all strings of length 2.

I don't see any mention of octal in GvR's "Python Regrets" or AMK's
"PEP 3000". Why not? Is it not regretted?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Importing Problem on Windows

2005-01-11 Thread John Machin

brolewis wrote:
> I have a directory that has two files in it:
>
> parse.py
> parser.py
>
> parse.py imports a function from parser.py and uses it to parse out
the
> needed information. On Linux, the following code works without a
> problem:
>
> parse.py, line 1:
> from parser import regexsearch
>
> However, when I run the same command in Windows, I get the following
> error:
>
> ImportError: cannot import name regexsearch
> Any suggestions on why this would work on Linux but not on Windows?

Hint for the future: use the -v argument (python -v yourscript.py
yourarg1 etc) to see where modules are being imported from.

Example (I don't have a module named parser anywhere):

python -v
[big snip]
>>> from parser import regexsearch
import parser # builtin    aha!
Traceback (most recent call last):
File "", line 1, in ?
ImportError: cannot import name regexsearch
>>>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Importing Problem on Windows

2005-01-11 Thread John Machin

brolewis wrote:
> I have a directory that has two files in it:
>
> parse.py
> parser.py
>
> parse.py imports a function from parser.py and uses it to parse out
the
> needed information. On Linux, the following code works without a
> problem:
>
> parse.py, line 1:
> from parser import regexsearch
>
> However, when I run the same command in Windows, I get the following
> error:
>
> ImportError: cannot import name regexsearch
> Any suggestions on why this would work on Linux but not on Windows?

Hint for the future: use the -v argument (python -v yourscript.py
yourarg1 etc) to see where modules are being imported from.

Example (I don't have a module named parser anywhere):

python -v
[big snip]
>>> from parser import regexsearch
import parser # builtin    aha!
Traceback (most recent call last):
File "", line 1, in ?
ImportError: cannot import name regexsearch
>>>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help Optimizing Word Search

2005-01-11 Thread John Machin
Case  Nelson wrote:
> Hi there I've just been playing around with some python code and I've
> got a fun little optimization problem I could use some help with.
>
> Basically, the program needs to take in a random list of no more than
> 10 letters,  and find all possible mutations that match a word in my
> dictionary (80k words). However a wildcard letter '?' is also an
> acceptable character which increases the worst case time
significantly.
> So if the letters are ['a','b','c'] check a, b, c, ab, ac, ba, bc,
ca,
> cb, abc, acb, bac, bca, cab, cba where only a, ba and cab would be
> added to the dict of words. If the letters are ['?','?'] check a-z,
aa,
> ab, ac, ad, ..., az, ba, bb, bc, bd, ..., zz

This appears to be a Computer Science 101 Data Structures and
Algorithms question, not a Python question, but here's an answer
anyway:

You appear to want to find all words that have one or more letters in
common with your query or candidate string.

Aside: Have you been following the long thread started by the poster
who appeared to want to store all possible strings that were _not_
words in a given language but could be generated from its alphabet?

Here's a possibility: use a bit mask approach. You attach a bit mask to
each word; simple data structure -- a list of 2-tuples, or two parallel
lists.

!def mask(word):
!   m = 0
!   for letter in word:
!   m |= 1 << (ord(letter) - ord('a'))
!   return m

Searching without wildcards:

!def nowc_search(candidate, mylistof2tuples):
!candmask = mask(candidate) # treating candidate as str, not list
!for word, mask in mylistof2tuples:
!if mask & candmask:
!   # one or more letters in common
!   yield word

Note: this treats "mississippi" and "misp" the same. If "aa" is in your
dictionary, what queries would retrieve it? Depending on your exact
requirements, this technique may suit you, or you may want to use it as
a fast(?) filter, with the matches it throws up needing further
checking. You may need a "count number of bits that are set in an int"
function.

Ref: Fred J. Damerau, "A technique for computer detection and
correction of spelling errors", CACM vol 7 number 3, March 1961.

Searching with wild cards: your example of query == "??" seems to yield
all two-letter words. I'd like to see what you expect for "a?", "?a",
"ab?", and "aa?" before suggesting how to tackle wild cards.
Reverse-engineering requirements out of other folks' code is not
something I do for fun :-)

An alternative for you to think about: instead of a bitmask, store the
letter-sorted transformation of the words: cat->act, act->act,
dog->dgo, god->dgo.

Alternative data structure: key = bitmask or sorted-letters, value =
list of all words that have that key.

A further suggestion which should always be considered when setting up
a search where the timing is worse than average O(1): have a separate
dictionary for each different wordlength, or some other
impossible-length-avoidance filter; that way, with minimum preliminary
calculation you can avoid considering words that are so long or so
short that they cannot possibly be matches. For example, with
approximate matching based on edit distance, if you are searching for a
10-letter word allowing for 2 errors, you can avoid doing the
complicated comparison on words shorter than 8 or longer than 12.
HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: SciTe

2005-01-11 Thread John Machin

Lucas Raab wrote:
> I didn't want to go through the rigamole of adding myself to the
SciTe
> mailing list, so I'm asking my question here. How do I choose a
> different C/C++ compiler to compile in?? I don't use the g++
compiler; I
> use the VC 7 compiler.
>
> TIA,
> Lucas

How the @#$% should we know? Don't be lazy; join the SciTe mailing
list!!!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Help Optimizing Word Search

2005-01-11 Thread John Machin

Paul Rubin wrote:
> "Case  Nelson" <[EMAIL PROTECTED]> writes:
> > Basically, the program needs to take in a random list of no more
than
> > 10 letters,  and find all possible mutations that match a word in
my
> > dictionary (80k words). However a wildcard letter '?' is also an
> > acceptable character which increases the worst case time
significantly.
>
> For that size pattern and dictionary, simply compiling the pattern to
> a regexp, joining the dictionary together into one big string ("abc
> def ghijk..."), and matching the regexp against the big string, may
> well be faster than using some fancy algorithm coded completely in
> python.

Paul, given the OP appears to want something like words that match any
(per)mutation of any substring of his query string -- and that's before
factoring in wildcards -- I'd like to see an example of a regexp that
could handle that.
Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: What strategy for random accession of records in massive FASTA file?

2005-01-12 Thread John Machin

Chris Lasher wrote:
> Hello,
> I have a rather large (100+ MB) FASTA file from which I need to
> access records in a random order. The FASTA format is a standard
format
> for storing molecular biological sequences. Each record contains a
> header line for describing the sequence that begins with a '>'
> (right-angle bracket) followed by lines that contain the actual
> sequence data. Three example FASTA records are below:
>
> >CW127_A01
> TGCAGTCGAACGAGAACGGTCCTTCGGGATGTCAGCTAAGTGGCGGACGGGTGAGTAATG
> TATAGTTAATCTGCCCTTTAGAGATAACAGTTGGAAACGACTGCTAATAATA
> GCATTAAACAT
[snip]
> Since the file I'm working with contains tens of thousands of these
> records, I believe I need to find a way to hash this file such that I
> can retrieve the respective sequence more quickly than I could by
> parsing through the file request-by-request. However, I'm very new to
> Python and am still very low on the learning curve for programming
and
> algorithms in general; while I'm certain there are ubiquitous
> algorithms for this type of problem, I don't know what they are or
> where to look for them. So I turn to the gurus and accost you for
help
> once again. :-) If you could help me figure out how to code a
solution
> that won't be a resource whore, I'd be _very_ grateful. (I'd prefer
to
> keep it in Python only, even though I know interaction with a
> relational database would provide the fastest method--the group I'm
> trying to write this for does not have access to a RDBMS.)
> Thanks very much in advance,
> Chris

Before you get too carried away, how often do you want to do this and
how grunty is the box you will be running on? Will the data be on a
server? If the server is on a WAN or at the other end of a radio link
between buildings, you definitely need an index so that you can access
the data randomly!

By way of example, to read all of a 157MB file into memory from a local
(i.e. not networked) disk using readlines() takes less than 4 seconds
on a 1.4Ghz Athlon processor (see below). The average new corporate
desktop box is about twice as fast as that. Note that Windows Task
Manager showed 100% CPU utilisation for both read() and readlines().

My guess is that you don't need anything much fancier than the effbot's
index method -- which by now you have probably found works straight out
of the box and is more than fast enough for your needs.

BTW, you need to clarify "don't have access to an RDBMS" ... surely
this can only be due to someone stopping them from installing good free
software freely available on the Internet.

HTH,
John

C:\junk>python -m timeit -n 1 -r 6 "print
len(file('bigfile.csv').read())"
157581595
157581595
157581595
157581595
157581595
157581595
1 loops, best of 6: 3.3e+006 usec per loop

C:\junk>python -m timeit -n 1 -r 6 "print
len(file('bigfile.csv').readlines())"
1118870
1118870
1118870
1118870
1118870
1118870
1 loops, best of 6: 3.57e+006 usec per loop

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Pickled text file causing ValueError (dos/unix issue)

2005-01-14 Thread John Machin
On Fri, 14 Jan 2005 09:12:49 -0500, Tim Peters <[EMAIL PROTECTED]>
wrote:

>[Aki Niimura]
>> I started to use pickle to store the latest user settings for the tool
>> I wrote. It writes out a pickled text file when it terminates and it
>> restores the settings when it starts.
>...
>> I guess DOS text format is creating this problem.
>
>Yes.
>
>> My question is "Is there any elegant way to deal with this?".
>
>Yes:  regardless of platform, always open files used for pickles in
>binary mode.  That is, pass "rb" to open() when reading a pickle file,
>and "wb" to open() when writing a pickle file.  Then your pickle files
>will work unchanged on all platforms.  The same is true of files
>containing binary data of any kind (and despite that pickle protocol 0
>was called "text mode" for years, it's still binary data).

Tim, the manual as of version 2.4 does _not_ mention the need to use
'b' on OSes where it makes a difference, not even in the examples at
the end of the chapter. Further, it still refers to protocol 0 as
'text' in several places. There is also a reference to protocol 0
files being viewable in a text editor.

In other words, enough to lead even the most careful Reader of TFM up
the garden path :-)

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to del item of a list in loop?

2005-01-15 Thread John Machin

Fredrik Lundh wrote:
>
> lst = [i for i in lst if i != 2]
>
> (if you have 2.4, try replacing [] with () and see what happens)

The result is a generator with a name ("lst") that's rather misleading
in the context. Achieving the same result as the list comprehension, by
doing lst = list(i for ... etc etc), appears to be slower.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to del item of a list in loop?

2005-01-15 Thread John Machin

Nick Coghlan wrote:
>
> I think this is about the best you can do for an in-place version:
>for i, x in enumerate(reversed(lst)):
>  if x == 2:
>del lst[-i]

Don't think, implement and measure. You may be surprised. Compare these
two for example:

!def method_del_bkwds(lst, x):
!for inx in xrange(len(lst) - 1, -1, -1):
!if lst[inx] == x:
!del lst[inx]
!return lst

!def method_coghlan(lst, x):
!for i, obj in enumerate(reversed(lst)):
!if obj == x:
!del lst[-i]
!return lst

>
> The effbot's version is still going to be faster though:
>lst = [x for x in lst if x != 2]

Have you measured this?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to del item of a list in loop?

2005-01-15 Thread John Machin

Michael Hoffman wrote:
> skull wrote:
> > but I still have an other thing to worry about coming with this
way: does
> > performance sucks when the list is big enough?
> > It makes a copy operation!
> >
> > here is a faster and 'ugly' solution:
> >
> > lst = [1, 2, 3]
> > i = 0
> > while i < len(lst):
> > if lst[i] == 2:
> > lst.remove(i)
> > else:
> > i += 1
>
> Actually, that is the slowest of the three methods proposed so far
for
> large lists on my system.

Assuming, as have other posters, that the requirement is to remove all
elements whose value is 2: it doesn't work. The result is [2, 3]
instead of the expected [1, 3].

>   method_while: [3.868305175781, 3.868305175781,
3.87206539917]

Three significant figures is plenty. Showing just the minimum of the
results might be better.

> If you want to get really hairy, you can compare the bytecode
instructions
> for these three methods:
Timing and bytecode-peeking a broken function are a little "premature".

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to del item of a list in loop?

2005-01-15 Thread John Machin

Nick Coghlan wrote:
> I think this is about the best you can do for an in-place version:
>for i, x in enumerate(reversed(lst)):
>  if x == 2:
>del lst[-i]
I think del lst[-i-1] might be functionally better.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to del item of a list in loop?

2005-01-15 Thread John Machin

Michael Hoffman wrote:
> John Machin wrote:
>
> > Three significant figures is plenty. Showing just the minimum of
the
> > results might be better.
>
> It might be, but how much time do you want to spend on getting your
> results for a benchmark that will be run once in the "better" format?
>

About the same time as the "worse" format. The Mona Lisa was painted
once. The Taj Mahal was built once.

> Next time you can run the benchmark yourself and it will be in
exactly
> the format you want.

I've done that already. I've taken your code and improved it along the
suggested lines, added timing for first, middle, and last elements,
added several more methods, and added a testing facility as well. Would
you like a copy?

-- 
http://mail.python.org/mailman/listinfo/python-list


generator expressions: performance anomaly?

2005-01-16 Thread John Machin
Please consider the timings below, where a generator expression starts
out slower than the equivalent list comprehension, and gets worse:

>python -m timeit -s "orig=range(10)" "lst=orig[:];lst[:]=(x for x
in orig)"
10 loops, best of 3: 6.84e+004 usec per loop

>python -m timeit -s "orig=range(20)" "lst=orig[:];lst[:]=(x for x
in orig)"
10 loops, best of 3: 5.22e+005 usec per loop

>python -m timeit -s "orig=range(30)" "lst=orig[:];lst[:]=(x for x
in orig)"
10 loops, best of 3: 1.32e+006 usec per loop

>python -m timeit -s "orig=range(10)" "lst=orig[:];lst[:]=[x for x
in orig]"
10 loops, best of 3: 6.15e+004 usec per loop

>python -m timeit -s "orig=range(20)" "lst=orig[:];lst[:]=[x for x
in orig]"
10 loops, best of 3: 1.43e+005 usec per loop

>python -m timeit -s "orig=range(30)" "lst=orig[:];lst[:]=[x for x
in orig]"
10 loops, best of 3: 2.33e+005 usec per loop

Specs: Python 2.4, Windows 2000, 1.4GHz Athlon chip, 768Mb of memory.

Background: There was/is a very recent thread about ways of removing
all instances of x from a list. /F proposed a list comprehension to
build the result list. Given a requirement to mutate the original list,
this necessitates the assignment to lst[:]. I tried a generator
expression as well. However while the listcomp stayed competitive up to
a million-element list, the genexp went into outer space, taking about
20 times as long. The above timeit runs show a simpler scenario where
the genexp also seems to be going quadratic.
Comments, clues, ... please.

TIA,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to del item of a list in loop?

2005-01-16 Thread John Machin

Bengt Richter wrote:
> No one seems to have suggested this in-place way yet,
> so I'll trot it out once again ;-)
>
>  >>> lst = [1, 2, 3]
>  >>> i = 0
>  >>> for item in lst:
>  ...if item !=2:
>  ...lst[i] = item
>  ...i += 1
>  ...
>  >>> del lst[i:]
>  >>> lst
>  [1, 3]

Works, but slowly. Here's another that appears to be the best on large
lists, at least for removing 1 element. It's O(len(list) *
number_to_be_removed).

!def method_try_remove(lst, remove_this):
!try:
!while 1:
!lst.remove(remove_this)
!except:
!pass

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: generator expressions: performance anomaly?

2005-01-16 Thread John Machin
On Sun, 16 Jan 2005 12:18:23 GMT, "Raymond Hettinger"
<[EMAIL PROTECTED]> wrote:

>"John Machin" <[EMAIL PROTECTED]> wrote in message
>news:[EMAIL PROTECTED]
>> Please consider the timings below, where a generator expression starts
>> out slower than the equivalent list comprehension, and gets worse:
>>
>> >python -m timeit -s "orig=range(10)" "lst=orig[:];lst[:]=(x for x
>> in orig)"
> . . .
>> >python -m timeit -s "orig=range(20)" "lst=orig[:];lst[:]=(x for x
>> in orig)"
>
>This has nothing to do with genexps and everything to do with list slice
>assignment.
>
>List slice assignment is an example of a tool with a special case optimization
>for inputs that know their own length -- that enables the tool to pre-allocate
>its result rather than growing and resizing in spurts.  Other such tools 
>include
>tuple(), map() and zip().
>

My reading of the source: if the input is not a list or tuple, a
(temporary) tuple is built from the input, using PySequence_Tuple() in
abstract.c. If the input cannot report its own length, then that
function resorts to "growing and resizing in spurts", using the
following code:

if (j >= n) {
if (n < 500)
n += 10;
else
n += 100;
if (_PyTuple_Resize(&result, n) != 0) {

Perhaps it could be changed to use a proportional increase, like
list_resize() in listobject.c, which advertises (amortised) linear
time. Alternative: build a temporary list instead?


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to del item of a list in loop?

2005-01-16 Thread John Machin

skull wrote:
> According to Nick's article, I added three 'reversed' methods to your
provided
> test prog. and the result turned out method_reversed is faster than
others except  the 'three' case.
> Following is my modified version:
[snip]
> def method_reversed_idx(lst):
> idx = 0
> for i in reversed(lst):
> if i == 2:
> del lst[idx]
> idx += 1

There appears to be a problem with this one:

>>> def method_reversed_idx(lst):
... idx = 0
... for i in reversed(lst):
... if i == 2:
... del lst[idx]
... idx += 1
...
>>> lst=[1,2,3];method_reversed_idx(lst);print lst
[1, 3]
>>> lst=[2,1,3];method_reversed_idx(lst);print lst
[2, 1]
>>> lst=[1,3,2];method_reversed_idx(lst);print lst
[3]
>>>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fuzzy matching of postal addresses

2005-01-17 Thread John Machin
Ermmm ... only remove "the" when you are sure it is a whole word. Even
then it's a dodgy idea. In the first 1000 lines of the nearest address
file I had to hand, I found these: Catherine, Matthew, Rotherwood,
Weatherall, and "The Avenue".

Ermmm... don't rip out commas (or other punctuation); replace them with
spaces. That way "SHORTMOOR,BEAMINSTER" doesn't become one word
"SHORTMOORBEAMINSTER".


A not-unreasonable similarity metric would be float(len(sa1 & sa2))  /
len(sa1 | sa2). Even more reasonable would be to use trigrams instead
of words -- more robust in the presence of erroneous insertion or
deletion of spaces (e.g. Short Moor and Bea Minster are plausible
variations) and spelling errors and typos. BTW, the OP's samples look
astonishingly clean to me, so unlike real world data.

Two general comments addressed to the OP:
(1) Your solution doesn't handle the case where the postal code has
been butchered. e.g. "DT8 BEL" or "OT8 3EL".
(2) I endorse John Roth's comments. Validation against an address data
base that is provided by the postal authority, using either an
out-sourced bureau service, or buying a licence to use
validation/standardisation/repair software, is IMHO the way to go. In
Australia the postal authority assigns a unique ID to each delivery
point. This "DPID" has to be barcoded onto the mail article to get bulk
postage discounts. Storing the DPID on your database makes duplicate
detection a snap. You can license s/w (from several vendors) that is
certified by the postal authority and has batch and/or online APIs. I
believe the situation in the UK is similar. At least one of the vendors
in Australia is a British company. Google "address deduplication
site:.uk"
Actually (3): If you are constrained by budget, pointy-haired boss or
hubris to write your own software (a) lots of luck (b) you need to do a
bit more research -- look at the links on the febrl website, also
Google for "Monge Elkan", read their initial paper, look at the papers
referencing that on citeseer; also google for "merge purge"; also
google for "record linkage" (what the statistical and medical
fraternity call the problem) (c) and have a damn good look at your data
[like I said, it looks too clean to be true] and (d) when you add a
nice new wrinkle like "strip out 'the'", do make sure to run your
regression tests :-)
Would you believe (4): you are talking about cross-matching two
databases -- don't forget the possibility of duplicates _within_ each
database.


HTH, 
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fuzzy matching of postal addresses

2005-01-17 Thread John Machin
You can't even get anywhere near 100% accuracy when comparing
"authoritative sources" e.g. postal authority and the body charged with
maintaining a database of which streets are in which electoral district
-- no, not AUS, but close :-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fuzzy matching of postal addresses

2005-01-18 Thread John Machin

John Machin wrote:
> Ermmm ... only remove "the" when you are sure it is a whole word.
Even
> then it's a dodgy idea. In the first 1000 lines of the nearest
address
> file I had to hand, I found these: Catherine, Matthew, Rotherwood,
> Weatherall, and "The Avenue".
>

Partial apologies: I wasn't reading Skip's snippet correctly -- he had
"THE ", I read "THE". Only "The Avenue" is a problem in the above list.
However Skip's snippet _does_ do damage in cases where the word ends in
"the". Grepping lists of placenames found 25 distinct names in UK,
including "The Mythe" and "The Wrythe".

Addendum: Given examples in the UK like "Barton in the Beans" (no
kiddin') and "Barton-on-the-Heath", replacing "-" by space seems
indicated.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: file copy portability

2005-01-18 Thread John Machin

Bob Smith wrote:
> Is shutil.copyfile(src,dst) the *most* portable way to copy files
with
> Python? I'm dealing with plain text files on Windows, Linux and Mac
OSX.
>
> Thanks!

Portable what? Way of copying??

Do you want your files transferred (a) so that they look like native
text files on the destination system, or (b) so that they are exact
byte-wise copies?

A 5-second squint at the source (Lib/shutil.py) indicates that it
provides, reliably and portably, option b:
fsrc = open(src, 'rb')
fdst = open(dst, 'wb')

One way of doing option (a): you would need to be running Python on the
destination system, open the src file with 'rU', open the dst file with
'w'.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: list item's position

2005-01-19 Thread John Machin
On Wed, 19 Jan 2005 22:02:51 -0700, Steven Bethard
<[EMAIL PROTECTED]> wrote:

>
>See Mark's post, if you "need to know the index of something" this is 
>the perfect case for enumerate (assuming you have at least Python 2.3):

But the OP (despite what he says) _doesn't_ need to know the index of
the first thingy containing both a bar and a baz, if all he wants to
do is remove earlier thingies.

def barbaz(iterable, bar, baz):
seq = iter(iterable)
for anobj in seq:
if bar in anobj and baz in anobj:
yield anobj
break
for anobj in seq:
yield anobj

>>> import barbaz
>>> bars = ["str", "foobaz", "barbaz", "foobar"]
>>> print list(barbaz.barbaz(bars, 'bar', 'baz'))
['barbaz', 'foobar']
>>> print list(barbaz.barbaz(bars, 'o', 'b'))
['foobaz', 'barbaz', 'foobar']
>>> print list(barbaz.barbaz(bars, '', 'b'))
['foobaz', 'barbaz', 'foobar']
>>> print list(barbaz.barbaz(bars, '', ''))
['str', 'foobaz', 'barbaz', 'foobar']
>>> print list(barbaz.barbaz(bars, 'q', 'x'))
[]
>>>



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why no time() + timedelta() ?

2005-01-20 Thread John Machin
Tim Peters wrote:
> [josh]
> > Why can't timedelta arithmetic be done on time objects?
>
> Obviously, because it's not implemented .
>
> > (e.g. datetime.time(5)-datetime.timedelta(microseconds=3)
> >
> > Nonzero "days" of the timedelta could either be ignored, or
> > trigger an exception.
>
> And if the result is less than 0, or >= 24 hours, it could raise
> OverflowError, or wrap around mod 24*60*60*100 microseconds, and
> so on.  There are so many arbitrary endcases that no agreement could
> be reached on what they "should" do.  So it's not supported at all.
> In contrast, it was much easier to reach consensus on what datetime
> arithmetic should do, so that was supported.

Reminds me of the debate at the time whether "add months" functionality
should be provided. Disagreement, including someone who shall remain
nameless opining that Jan 31 + one month should return Mar 3 (or Mar 2
in a leap year). Not supported at all. Those of us who dabble in arcane
esoterica like payrolls, pensions, mortgages, insurance, and government
bloody regulations had a quick ROTFL and started typing: class
RealWorldDateFunctionality(datetime.date): ...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Unbinding multiple variables

2005-01-20 Thread John Machin

Johnny Lin wrote:
> Hi!
>
> Is there a way to automate the unbinding of multiple variables?  Say
I
> have a list of the names of all variables in the current scope via
> dir().  Is there a command using del or something like that that will
> iterate the list and unbind each of the variables?
Yes. It's called "return".

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Simple (newbie) regular expression question

2005-01-21 Thread John Machin

André Roberge wrote:
> Sorry for the simple question, but I find regular
> expressions rather intimidating.  And I've never
> needed them before ...
>
> How would I go about to 'define' a regular expression that
> would identify strings like
> __alphanumerical__  as in __init__
> (Just to spell things out, as I have seen underscores disappear
> from messages before, that's  2 underscores immediately
> followed by an alphanumerical string immediately followed
> by 2 underscore; in other words, a python 'private' method).
>
> Simple one-liner would be good.
> One-liner with explanation would be better.
>
> One-liner with explanation, and pointer to 'great tutorial'
> (for future reference) would probably be ideal.
> (I know, google is my friend for that last part. :-)
>
> Andre

Firstly, some corrections: (1) google is your friend for _all_ parts of
your question (2) Python has an initial P and doesn't have private
methods.

Read this:

>>> pat1 = r'__[A-Za-z0-9_]*__'
>>> pat2 = r'__\w*__'
>>> import re
>>> tests = ['x', '__', '', '_', '__!__', '__a__', '__Z__',
'__8__', '__xyzzy__', '__plugh']
>>> [x for x in tests if re.search(pat1, x)]
['', '_', '__a__', '__Z__', '__8__', '__xyzzy__']
>>> [x for x in tests if re.search(pat2, x)]
['', '_', '__a__', '__Z__', '__8__', '__xyzzy__']
>>>

I've interpreted your question as meaning "valid Python identifier that
starts and ends with two [implicitly, or more] underscores".

In the two alternative patterns, the part in the middle says "zero or
more instances of a character that can appear in the middle of a Python
identifier". The first pattern spells this out as "capital letters,
small letters, digits, and underscore". The second pattern uses the \w
shorthand to give the same effect.
You should be able to follow that from the Python documentation.
Now, read this: http://www.amk.ca/python/howto/regex/

HTH,

John

--
http://mail.python.org/mailman/listinfo/python-list


Re: why am I getting a segmentation fault?

2005-01-21 Thread John Machin

Jay  donnell wrote:
> I have a short multi-threaded script that checks web images to make
> sure they are still there. I get a segmentation fault everytime I run
> it and I can't figure out why. Writing threaded scripts is new to me
so
> I may be doing something wrong that should be obvious :(
>

def run(self):
try:
self.site = urllib.urlopen(self.url)
self.f=open(self.filename, 'w')
self.im = Image.open(self.filename)
self.size = self.im.size
self.flag = 'yes'
self.q = "yadda yadda"

That's SIX names that don't need to be attributes of the object; they
are used _only_ in this method, so they can be local to the method,
saving you a whole lot of typing "self." and saving puzzlement &
careful scrutiny by those trying to read your code.

Back to your real problem: does it work when you set maxThreads to 1?
did it work before you added the threading code? what does your
debugger tell you about the location of the seg fault?

MOST IMPORTANTLY, sort this mess out:

self.q = "update item set goodImage = '" + self.flag + "' where
productId='" + str(self.id) + "'"
print self.q
self.cursor.execute(query) ###

### "query" is a global variable[YUK!] (see below) which isn't going to
do what you want. Looks like you meant "self.q".

self.db.close()

db = MySQLdb.connect(host="localhost", user="xxx", passwd="xxx",
db="xxx")
cursor = db.cursor()
query = "select * from item order by rand() limit 0, 100"


### Have you looked in your database to see if the script has actually
updated item.goodImage? Do you have a test plan?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: why am I getting a segmentation fault?

2005-01-21 Thread John Machin

Jay  donnell wrote:
> >### Have you looked in your database to see if the >script has
> actually
> >updated item.goodImage? Do you have a test plan?
>
> Thank you for the help. Sorry for the messy code. I was under time
> constraints. I had class, and I was rushing to get this working
before
> class. I should waited a day and read over it before I asked. Sorry
> again.

Apologise to _yourself_ for the messy code. If 'query' had been safely
tucked away as a local variable instead of a global, that problem
(typing 'query' instead of 'self.q') would have caused an exception on
the first time around.

A few points: Yes, don't rush. I've-forgotten-whom wrote something like
"Don't program standing up". Good advice.

Build things a piece at a time. Build your tests at the same time or
earlier. In this case, a stand-alone method or function that checked
that one file was OK, would have been a reasonable place to start. Then
add the code to query the db. Then add the threading stuff, gingerly.
If you build and test incrementally, then a segfault or other disaster
is highly likely to have been caused by the latest addition.

AND SOME MORE CRUFT:
def __init__(self, url, filename, id):
self.t = time.time() <=== never used again
threading.Thread.__init__(self)
self.db = MySQLdb.connect(host="localhost", user="xxx",
passwd="xxx", db="xxx")
# create a cursor
self.cursor = db.cursor() <<<=== should be self.db.cursor()
 picks up the *GLOBAL* 'db'
self.url = url
self.filename = filename
self.id = id
===
threadList = []
[snip]
threadList.append(imageChecker)
=== that doesn't achieve much!


N.B. You still have two problems: (1) Your script as you said now
"seems to work". That doesn't sound like you have a test plan. (2) You
have shuffled your code around and the segfault went away; i.e. you
waved a dead chicken and the volcano stopped erupting. Most of the
changes suggested by others and myself were of a style/clarity nature.
The self.q/query thing would have caused a select instead an update;
hardly segfault territory. I wouldn't expect that busy-wait loop to
cause a segfault. You still don't know what caused the segfault. That
means you don't know how to avoid it in the future. You are still
living in the shadow of the volcano. Will the chicken trick work next
time?

Looking forward to the next episode,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: getting file size

2005-01-21 Thread John Machin

Bob Smith wrote:
> Are these the same:
>
> 1. f_size = os.path.getsize(file_name)
>
> 2. fp1 = file(file_name, 'r')
> data = fp1.readlines()
> last_byte = fp1.tell()
>
> I always get the same value when doing 1. or 2. Is there a reason I
> should do both? When reading to the end of a file, won't tell() be
just
> as accurate as os.path.getsize()?
>

Read the docs. Note the hint that you get what the stdio serves up.
ftell() can only be _guaranteed_ to give you a magic cookie that you
may later use with fseek(magic_cookie) to return to the same place in a
more reliable manner than with Hansel & Gretel's non-magic
bread-crumbs. On 99.99% of modern filesystems, the cookie obtained by
ftell() when positioned at EOF is in fact the size in bytes. But why
chance it? os.path.getsize does as its name suggests; why not use it,
instead of a method with a side-effect? As for doing _both_, why would
you??

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: getting file size

2005-01-23 Thread John Machin

Tim Roberts wrote:
> Bob Smith <[EMAIL PROTECTED]> wrote:
>
> >Are these the same:
> >
> >1. f_size = os.path.getsize(file_name)
> >
> >2. fp1 = file(file_name, 'r')
> >data = fp1.readlines()
> >last_byte = fp1.tell()
> >
> >I always get the same value when doing 1. or 2. Is there a reason I
> >should do both? When reading to the end of a file, won't tell() be
just
> >as accurate as os.path.getsize()?
>
> On Windows, those two are not equivalent.  Besides the newline
conversion
> done by reading text files,

Doesn't appear to me to go wrong due to newline conversion:

Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
>>> import os.path
>>> txt = 'qwertyuiop\nasdfghjkl\nzxcvbnm\n'
>>> file('bob', 'w').write(txt)
>>> len(txt)
29
>>> os.path.getsize('bob')
32L # as expected
>>> f = file('bob', 'r')
>>> lines = f.readlines()
>>> lines
['qwertyuiop\n', 'asdfghjkl\n', 'zxcvbnm\n']
>>> f.tell()
32L # as expected

> the solution in 2. will stop as soon as it sees
> a ctrl-Z.

... and the value returned by f.tell() is not the position of the
ctrl-Z but more likely the position of the end of the current block --
which could be thousands/millions of bytes before the physical end of
the file.

Good ol' CP/M.

>
> If you used 'rb', you'd be much closer.

And be much less hassled when that ctrl-Z wasn't meant to mean EOF, it
just happened to appear in an unvalidated data field part way down a
critical file :-(

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: getting file size

2005-01-23 Thread John Machin

Tim Roberts wrote:
> Bob Smith <[EMAIL PROTECTED]> wrote:
>
> >Are these the same:
> >
> >1. f_size = os.path.getsize(file_name)
> >
> >2. fp1 = file(file_name, 'r')
> >data = fp1.readlines()
> >last_byte = fp1.tell()
> >
> >I always get the same value when doing 1. or 2. Is there a reason I
> >should do both? When reading to the end of a file, won't tell() be
just
> >as accurate as os.path.getsize()?
>
> On Windows, those two are not equivalent. Besides the newline
conversion
> done by reading text files,

Doesn't appear to me to go wrong due to newline conversion:

Python 2.4 (#60, Nov 30 2004, 11:49:19) [MSC v.1310 32 bit (Intel)] on
win32
>>> import os.path
>>> txt = 'qwertyuiop\nasdfghjkl\nzxcvbnm\n'
>>> file('bob', 'w').write(txt)
>>> len(txt)
29
>>> os.path.getsize('bob')
32L # as expected
>>> f = file('bob', 'r')
>>> lines = f.readlines()
>>> lines
['qwertyuiop\n', 'asdfghjkl\n', 'zxcvbnm\n']
>>> f.tell()
32L # as expected

> the solution in 2. will stop as soon as it sees
> a ctrl-Z.

... and the value returned by f.tell() is not the position of the
ctrl-Z but more likely the position of the end of the current block --
which could be thousands/millions of bytes before the physical end of
the file.

Good ol' CP/M.

>
> If you used 'rb', you'd be much closer.

And be much less hassled when that ctrl-Z wasn't meant to mean EOF, it
just happened to appear in an unvalidated data field part way down a
critical file :-(

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Fuzzy matching of postal addresses [1/1]

2005-01-23 Thread John Machin
Andrew McLean wrote:
> In case anyone is interested, here is the latest.

> def insCost(tokenList, indx, pos):
> """The cost of inserting a specific token at a specific
normalised position along the sequence."""
> if containsNumber(tokenList[indx]):
> return INSERT_TOKEN_WITH_NUMBER + POSITION_WEIGHT * (1 - pos)
> elif indx > 0 and containsNumber(tokenList[indx-1]):
> return INSERT_TOKEN_AFTER_NUMBER + POSITION_WEIGHT * (1 -
pos)
> elif tokenList[indx][0] in minorTokenList:
> return INSERT_MINOR_TOKEN
> else:
> return INSERT_TOKEN + POSITION_WEIGHT * (1 - pos)
>
> def delCost(tokenList, indx, pos):
> """The cost of deleting a specific token at a specific normalised
position along the sequence.
> This is exactly the same cost as inserting a token."""
> return insCost(tokenList, indx, pos)

Functions are first-class citizens of Pythonia -- so just do this:

delCost = insCost

Re speed generally: (1) How many addresses in each list and how long is
it taking? On what sort of configuration? (2) Have you considered using
pysco -- if not running on x86 architecture, consider exporting your
files to a grunty PC and doing the match there. (3) Have you considered
some relatively fast filter to pre-qualify pairs of addresses before
you pass the pair to your relatively slow routine?

Soundex?? To put it bluntly, the _only_ problem to which soundex is the
preferred solution is genealogy searching in the US census records, and
even then one needs to know what varieties of the algorithm were in use
at what times. I thought you said your addresses came from
authoritative sources. You have phonetic errors? Can you give some
examples of pairs of tokens that illustrate the problem you are trying
to overcome with soundex?

Back to speed again: When you look carefully at the dynamic programming
algorithm for edit distance, you will note that it is _not_ necessary
to instantiate the whole NxM matrix -- it only ever refers to the
current row and the previous row. What does space saving have to do
with speed, you ask? Well, Python is not FORTRAN; it takes considerable
effort to evaluate d[i][j]. A relatively simple trick is to keep 2 rows
and swap (the pointers to) them each time around the outer loop. At the
expense of a little more complexity, one can reduce this to one row and
3 variables (north, northwest, and west) corresponding to d[i-1][j],
d[i-1][j-1], and d[i][j-1] -- but I'd suggest the simple way first.
Hope some of this helps,

John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Set parity of a string

2005-01-23 Thread John Machin

Peter Hansen wrote:
> snacktime wrote:
> > Is there a module that sets the parity of a string?  I have an
> > application that needs to communicate with a host using even parity

> > So what I need is before sending the message, convert it from space
to
> > even parity.  And when I get the response I need to convert that
from
> > even to space parity.
>
> By what means are the messages being delivered?  I've rarely
> seen parity used outside of simple RS-232-style serial
communications.
> Certainly not (in my experience, though it's doubtless been done)
> in TCP/IP based stuff.  And if it's serial, parity can be
> supplied at a lower level than your code.
>
> As to the specific question: a module is not really required.
> The parity value of a character depends only on the binary
> value of that one byte, so a simple 128-byte substitution
> table is all you need, plus a call to string.translate for
> each string.
>
> -Peter

And for converting back from even parity to space parity, either a
256-byte translation table, or a bit of bit bashing, like chr(ord(c) &
127),  on each byte.

The bank story sounds eminently plausible to me. Pick up old system
where branch manager had to dial HO every evening, insert phone into
acoustic coupler, etc etc and dump it on the internet ...

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: "bad argument type for built-in operation"

2005-01-24 Thread John Machin

Gilles Arnaud wrote:
> Hello,
>
> I've got a nasty bug and no idea to deal with :
>
> here is the method :

Big snip. The Python code is unlikely to be your problem.

> and the trace
> 
> in None [(-2.0, 2.0), (-2.0, 2.0)] [0.1385039192456847,
> 0.87787941093093491] 2 2 
> [0.1385039192456847, 0.87787941093093491]

That's a very mangled trace!

> the first call of the methode succeed
> all following call failed.

So the first call is leaving a bomb behind.

>
> I've got different scenario which call this low level methode,
> many succeed, some failed this way.
>
> what's happened ?
> If someone got an idea ?
> what can raise this exception ?

At this stage, without the benefit of look-ahead, one could only blame
gamma rays or pointy-eared aliens :-)

>
> My program is written partially in python and partially in C.
> the top level is in python which call a C optimisation routine
> which use a callback (PyObject_CallMethod) to evaluate the cost in
> python again.

Aha! *Now* you tell us. *You* have "denormalised" the stack. Read your
C code carefully. Use a debugger, or put some "printf()" in it. With
PyObject_CallMethod, do the format descriptors and the arguments match?
Are you testing the returned value for NULL and acting accordingly? Is
the called-from-C Python method ever executed? Try putting a print
statement (that shows the args) at the top. More generally, are you
testing the returned value from each and every C API call? Are you
testing for the correct error value (some return NULL, some -1, ...)?
Are you doing the right thing on error?

A catalogue of the different ways of messing things up using C would
take forever to write. If you can't find your problem, post the code,
either on the newsgroup or as a web page.

Hope this helps,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Looking for Form Feeds

2005-01-24 Thread John Machin
Greg Lindstrom wrote:
> Hello-
> I have a file generated by an HP-9000 running Unix containing form
feeds
> signified by ^M^L.  I am trying to scan for the linefeed to signal
> certain processing to be performed but can not get the regex to "see"

> it.  Suppose I read my input line into a variable named "input"
>
> The following does not seem to work...
> input = input_file.readline()

You are shadowing a builtin.

> if re.match('\f', input): print 'Found a formfeed!'
> else: print 'No linefeed!'

formfeed == not not linefeed

>
> I also tried to create a ^M^L (typed in as Q M  L) but
that
> gives me a syntax error when I try to run the program (re does not
like
> the control characters, I guess).  Is it possible for me to pull out
the
> formfeeds in a straightforward manner?
>

For a start, resolve your confusion between formfeed and linefeed.

Formfeed makes your printer skip to the top of a new page (form),
without changing the column position. FF, '\f', ctrl-L, 0x0C.
Linefeed makes the printer skip to a new line, without changing the
column position. LF, '\n', ctrl-J, 0x0D.
There is also carriage return, which makes your typewriter return to
column 1, without moving to the next line. CR, '\r', ctrl-M, 0x0A.

Now you can probably guess why the writer of your report file is
emitting "\r\f". What we can't guess for you is where in your file
these "\r\f" occurrences are in relation to the newlines (i.e. '\n')
which Python is interpreting as line breaks. As others have pointed
out, (1) re.match works on the start of the string and (2) you probably
don't need to use re anyway. The solution may be as simple as: if
input_line[:2] == "\r\f":

BTW, have you checked that there are no other control characters
embedded in the file, e.g. ESC (introducing an escape sequence), SI/SO
(change character set), BEL * 100 (Hey, Fred, the printout's finished),
HT, VT, BS (yeah, probably lots of that, but I mean BackSpace)?
HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Browsing text ; Python the right tool?

2005-01-25 Thread John Machin

Paul Kooistra wrote:
> I need a tool to browse text files with a size of 10-20 Mb. These
> files have a fixed record length of 800 bytes (CR/LF), and containt
> records used to create printed pages by an external company.
>
> Each line (record) contains an 2-character identifier, like 'A0' or
> 'C1'. The identifier identifies the record format for the line,
> thereby allowing different record formats to be used in a textfile.
> For example:
>
> An A0 record may consist of:
> recordnumber [1:4]
> name [5:25]
> filler   [26:800]

1. Python syntax calls these [0:4], [4:25], etc. One has to get into
the habit of deducting 1 from the start column position given in a
document.

2. So where's the "A0"? Are the records really 804 bytes wide -- "A0"
plus the above plus CR LF? What is "recordnumber" -- can't be a line
number (4 digits -> max 10k; 10k * 800 -> only 8Mb); looks too small to
be a customer identifier; is it the key to a mapping that produces
"A0", "C1", etc?

>
> while a C1 record consists of:
> recordnumber [1:4]
> phonenumber  [5:15]
> zipcode  [16:20]
> filler   [21:800]
>
> As you see, all records have a fixed column format. I would like to
> build a utility which allows me (in a windows environment) to open a
> textfile and browse through the records (ideally with a search
> option), where each recordtype is displayed according to its
> recordformat ('Attributename: Value' format). This would mean that
> browsing from a A0 to C1 record results in a different list of
> attributes + values on the screen, allowing me to analyze the data
> generated a lot easier then I do now, browsing in a text editor with
a
> stack of printed record formats at hand.
>
> This is of course quite a common way of encoding data in textfiles.
> I've tried to find a generic text-based browser which allows me to do
> just this, but cannot find anything. Enter Python; I know the
language
> by name, I know it handles text just fine, but I am not really
> interested in learning Python just now, I just need a tool to do what
> I want.
>
> What I would REALLY like is way to define standard record formats in
a
> separate definition, like:
> - defining a common record length;
> - defining the different record formats (attributes, position of the
> line);

Add in the type, number of decimal places, etc as well ..

> - and defining when a specific record format is to be used, dependent
> on 1 or more identifiers in the record.
>
> I CAN probably build something from scratch, but if I can (re)use
> something that already exists it would be so much better and
faster...
> And a utility to do what I just described would be REALLY usefull in
> LOTS of environments.
>
> This means I have the following questions:
>
> 1. Does anybody now of a generic tool (not necessarily Python based)
> that does the job I've outlined?

No, but please post if you hear of one.

> 2. If not, is there some framework or widget in Python I can adapt to
> do what I want?
> 3. If not, should I consider building all this just from scratch in
> Python - which would probably mean not only learning Python, but some
> other GUI related modules?

Approach I use is along the lines of what you suggested, but w/o the
GUI.
I have a Python script that takes layout info and an input file and can
produce an output file in one of two formats:

Format 1:
something like:
Rec:A0 recordnumber:0001 phonenumber:(123) 555-1234 zipcode:12345

This is usually much shorter than the fixed length record, because you
leave out the fillers (after checking they are blank!), and strip
trailing spaces from alphanumeric fields. Whether you leave integers,
money, date etc fields as per file or translated into human-readable
form depends on who will be reading it.

You then use a robust text editor (preferably one which supports
regular expressions in its find function) to browse the output file.

Format 2:
Rec:A0
recordnumber:0001
etc etc i.e. one field per line? Why, you ask? If you are a consumer of
such files, so that you can take small chunks of this, drop it into
Excel, testers take copy, make lots of juicy test data, run it through
another script which makes a flat file out of it.

> 4. Or should I forget about Python and build someting in another
> environment?

No way!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Browsing text ; Python the right tool?

2005-01-25 Thread John Machin

Paul Kooistra wrote:
> I need a tool to browse text files with a size of 10-20 Mb. These
> files have a fixed record length of 800 bytes (CR/LF), and containt
> records used to create printed pages by an external company.
>
> Each line (record) contains an 2-character identifier, like 'A0' or
> 'C1'. The identifier identifies the record format for the line,
> thereby allowing different record formats to be used in a textfile.
> For example:
>
> An A0 record may consist of:
> recordnumber [1:4]
> name [5:25]
> filler   [26:800]

1. Python syntax calls these [0:4], [4:25], etc. One has to get into
the habit of deducting 1 from the start column position given in a
document.

2. So where's the "A0"? Are the records really 804 bytes wide -- "A0"
plus the above plus CR LF? What is "recordnumber" -- can't be a line
number (4 digits -> max 10k; 10k * 800 -> only 8Mb); looks too small to
be a customer identifier; is it the key to a mapping that produces
"A0", "C1", etc?

>
> while a C1 record consists of:
> recordnumber [1:4]
> phonenumber  [5:15]
> zipcode  [16:20]
> filler   [21:800]
>
> As you see, all records have a fixed column format. I would like to
> build a utility which allows me (in a windows environment) to open a
> textfile and browse through the records (ideally with a search
> option), where each recordtype is displayed according to its
> recordformat ('Attributename: Value' format). This would mean that
> browsing from a A0 to C1 record results in a different list of
> attributes + values on the screen, allowing me to analyze the data
> generated a lot easier then I do now, browsing in a text editor with
a
> stack of printed record formats at hand.
>
> This is of course quite a common way of encoding data in textfiles.
> I've tried to find a generic text-based browser which allows me to do
> just this, but cannot find anything. Enter Python; I know the
language
> by name, I know it handles text just fine, but I am not really
> interested in learning Python just now, I just need a tool to do what
> I want.
>
> What I would REALLY like is way to define standard record formats in
a
> separate definition, like:
> - defining a common record length;
> - defining the different record formats (attributes, position of the
> line);

Add in the type, number of decimal places, etc as well ..

> - and defining when a specific record format is to be used, dependent
> on 1 or more identifiers in the record.
>
> I CAN probably build something from scratch, but if I can (re)use
> something that already exists it would be so much better and
faster...
> And a utility to do what I just described would be REALLY usefull in
> LOTS of environments.
>
> This means I have the following questions:
>
> 1. Does anybody now of a generic tool (not necessarily Python based)
> that does the job I've outlined?

No, but please post if you hear of one.

> 2. If not, is there some framework or widget in Python I can adapt to
> do what I want?
> 3. If not, should I consider building all this just from scratch in
> Python - which would probably mean not only learning Python, but some
> other GUI related modules?

Approach I use is along the lines of what you suggested, but w/o the
GUI.
I have a Python script that takes layout info and an input file and can
produce an output file in one of two formats:

Format 1:
something like:
Rec:A0 recordnumber:0001 phonenumber:(123) 555-1234 zipcode:12345

This is usually much shorter than the fixed length record, because you
leave out the fillers (after checking they are blank!), and strip
trailing spaces from alphanumeric fields. Whether you leave integers,
money, date etc fields as per file or translated into human-readable
form depends on who will be reading it.

You then use a robust text editor (preferably one which supports
regular expressions in its find function) to browse the output file.

Format 2:
Rec:A0
recordnumber:0001
etc etc i.e. one field per line? Why, you ask? If you are a consumer of
such files, so that you can take small chunks of this, drop it into
Excel, testers take copy, make lots of juicy test data, run it through
another script which makes a flat file out of it.

> 4. Or should I forget about Python and build someting in another
> environment?

No way!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: How to input one char at a time from stdin?

2005-01-25 Thread John Machin
On Wed, 26 Jan 2005 01:15:10 +0530, Swaroop C H <[EMAIL PROTECTED]>
wrote:

>On Tue, 25 Jan 2005 12:38:13 -0700, Brent W. Hughes
><[EMAIL PROTECTED]> wrote:
>> I'd like to get a character from stdin, perform some action, get another
>> character, etc.  If I just use stdin.read(1), it waits until I finish typing
>> a whole line before I can get the first character.  How do I deal with this?
>
>This is exactly what you need:
>http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/134892
>Title: "getch()-like unbuffered character reading from stdin on both
>Windows and Unix"

Nice to know how, but all those double underscores made my eyes bleed.
Three classes? What's wrong with something simple like the following
(not tested on Unix)?


import sys
bims = sys.builtin_module_names
if 'msvcrt' in bims:
# Windows
from msvcrt import getch
elif 'termios' in bims:
# Unix
import tty, termios
def getch():
fd = sys.stdin.fileno()
old_settings = termios.tcgetattr(fd)
try:
tty.setraw(sys.stdin.fileno())
ch = sys.stdin.read(1)
finally:
termios.tcsetattr(fd, termios.TCSADRAIN, old_settings)
return ch
else:
raise NotImplementedError, '... fill in Mac Carbon code here'

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Browsing text ; Python the right tool?

2005-01-25 Thread John Machin

Jeff Shannon wrote:
> Paul Kooistra wrote:
>
> > 1. Does anybody now of a generic tool (not necessarily Python
based)
> > that does the job I've outlined?
> > 2. If not, is there some framework or widget in Python I can adapt
to
> > do what I want?
>
> Not that I know of, but...
>
> > 3. If not, should I consider building all this just from scratch in
> > Python - which would probably mean not only learning Python, but
some
> > other GUI related modules?
>
> This should be pretty easy.  If each record is CRLF terminated, then
> you can get one record at a time simply by iterating over the file
> ("for line in open('myfile.dat'): ...").  You can have a dictionary
of
> classes or factory functions, one for each record type, keyed off of
> the 2-character identifier.  Each class/factory would know the layout

> of that record type,

This is plausible only under the condition that Santa Claus is paying
you $X per class/factory or per line of code, or you are so speed-crazy
that you are machine-generating C code for the factories.

I'd suggest "data driven" -- you grab the .doc or .pdf that describes
your layouts, ^A^C, fire up Excel, paste special, massage it, so you
get one row per field, with start & end posns, type, dec places,
optional/mandatory, field name, whatever else you need. Insert a column
with the record name. Save it as a CSV file.

Then you need a function to load this layout file into dictionaries,
and build cross-references field_name -> field_number (0,1,2,...) and
vice versa.

As your record name is not in a fixed position in the record, you will
also need to supply a function (file_type, record_string) ->
record_name.

Then you have *ONE* function that takes a file_type, a record_name, and
a record_string, and gives you a list of the values. That is all you
need for a generic browser application.

For working on a _specific_ known file_type, you can _then_ augment
that to give you record objects that you use like a0.zipcode or record
dictionaries that you use like a0['zipcode'].

You *don't* have to hand-craft a class for each record type. And you
wouldn't want to, if you were dealing with files whose spec keeps on
having fields added and fields obsoleted.

Notice: in none of the above do you ever have to type in a column
position, except if you manually add updates to your layout file.

Then contemplate how productive you will be when/if you need to
_create_ such files -- you will push everything through one function
which will format each field correctly in the correct column positions
(and chuck an exception if it won't fit). Slightly better than an
approach that uses
something like nbytes = sprintf(buffer, "%04d%-20s%-5s", a0_num,
a0_phone, a0_zip); 

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Browsing text ; Python the right tool?

2005-01-26 Thread John Machin

Jeff Shannon wrote:
> John Machin wrote:
>
> > Jeff Shannon wrote:
> >
> >> [...]  If each record is CRLF terminated, then
> >>you can get one record at a time simply by iterating over the file
> >>("for line in open('myfile.dat'): ...").  You can have a dictionary
> >>classes or factory functions, one for each record type, keyed off
> >>of the 2-character identifier.  Each class/factory would know the
> >>layout of that record type,
> >
> > This is plausible only under the condition that Santa Claus is
paying
> > you $X per class/factory or per line of code, or you are so
speed-crazy
> > that you are machine-generating C code for the factories.
>
> I think that's overly pessimistic.  I *was* presuming a case where
the
> number of record types was fairly small, and the definitions of those

> records reasonably constant.  For ~10 or fewer types whose spec
> doesn't change, hand-coding the conversion would probably be quicker
> and/or more straightforward than writing a spec-parser as you
suggest.

I didn't suggest writing a "spec-parser". No (mechanical) parsing is
involved. The specs that I'm used to dealing with set out the record
layouts in a tabular fashion. The only hassle is extracting that from a
MSWord document or a PDF.

>
> If, on the other hand, there are many record types, and/or those
> record types are subject to changes in specification, then yes, it'd
> be better to parse the specs from some sort of data file.

"Parse"? No parsing, and not much code at all: The routine to "load"
(not "parse") the layout from the layout.csv file into dicts of dicts
is only 35 lines of Python code. The routine to take an input line and
serve up an object instance is about the same. It does more than the
OP's browsing requirement already. The routine to take an object and
serve up a correctly formatted output line is only 50 lines of which
1/4 is comment or blank.

>
> The O.P. didn't mention anything either way about how dynamic the
> record specs are, nor the number of record types expected.

My reasoning: He did mention A0 and C1 hence one could guess from that
he maybe had 6 at least. Also, files used to "create printed pages by
an external company" (especially by a company that had "leaseplan" in
its e-mail address) would indicate "many" and "complicated" to me.

> I suspect
> that we're both assuming a case similar to our own personal
> experiences, which are different enough to lead to different
preferred
> solutions. ;)

Indeed. You seem to have lead a charmed life; may the wizards and the
rangers ever continue to protect you from the dark riders! :-)

My personal experiences and attitudes: (1) extreme aversion to having
to type (correctly) lots of numbers (column positions and lengths), and
to having to mentally translate start = 663, len = 13 to [662:675] or
having ugliness like [663-1:663+13-1] (2) cases like 17 record types
and 112 fields in one file, 8 record types and 86 fields in a second --
this being a new relatively clean simple exercise in exchanging files
with a government department (3) Past history of this govt dept is that
there are at least another 7 file types in regular use and they change
the _major_ version number of each file type about once a year on
average (3) These things tend to start out deceptively small and simple
and turn into monsters.

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Responding to trollish postings.

2005-01-26 Thread John Machin

Terry Reedy wrote:
>
> No offense taken.  My personal strategy is to read only as much of
trollish
> threads as I find interesting or somehow instructive, almost never
respond,
> and then ignore the rest.  I also mostly ignore discussions about
such
> threads.
>

Indeed. Let's just nominate XL to the "Full Canvas Jacket" website
(http://www.ratbags.com/ranters/) and move on.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Why do look-ahead and look-behind have to be fixed-width patterns?

2005-01-27 Thread John Machin

inhahe wrote:
> Hi i'm a newbie at this and probably always will be, so don't be
surprised
> if I don't know what i'm talking about.
>
> but I don't understand why regex look-behinds (and look-aheads) have
to be
> fixed-width patterns.
>
> i'm getting the impression that it's supposed to make searching
> exponentially slower otherwise
>
> but i just don't see how.
>
> say i have the expression (?<=.*?:.*?:).*
> all the engine has to do is search for .*?:.*?:.*, and then in each
result,
> find .*?:.*?: and return the string starting at the point just after
the
> length of the match.
> no exponential time there, and even that is probably more inefficient
than
> it has to be.

But that's not what you are telling it to do. You are telling it to
firstly find each position which starts a match with .* -- i.e. every
position -- and then look backwards to check that the previous text
matches .*?:.*?:

To grab the text after the 2nd colon (if indeed there are two or more),
it's much simpler to do this:

>>> import re
>>> q = re.compile(r'.*?:.*?:(.*)').search
>>> def grab(s):
...m = q(s)
...if m:
...   print m.group(1)
...else:
...   print 'not found!'
...
>>> grab('')
not found!
>>> grab('')
::
>>> grab('a:b:yadda')
yadda
>> grab('a:b:c:d')
c:d
>>> grab('a:b:')

>>>

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: py.dll for version 2.2.1 (Windows)

2005-01-28 Thread John Machin

mike wrote:
> Just recently, my virus checker detected what it called a Trojan
Horse
> in the py.dll file in the python22 folder.

Sorry to come on like the Inquisition, but this _might_ be something of
significance to the whole Windows Python community:

When was "just recently"? Which virus checker are you using? Did it say
which Trojan it had detected? Have you kept a copy of the "py.dll"
file? Have you kept a copy of the virus checker's report? Was that the
first time you have run that virus checker? If not when was the
previous run?

>  Installation is version
> 2.2.1 and I think that it came installed when I bought the PC in
> October 2002.
>
> Does anyone know where I can get a copy of the py.dll file from
version
> 2.2.1 for Windows (XP) ?
>
> I looked at www.python.org and do not see a py.dll file in the
> self-installation or .tgz versions of 2.2.1 that are posted.

That would be because there is no file named "py.dll" in a Windows
distribution of Python! You should have Python22.dll -- they all have
the major/minor version numbers in the name -- and its size should be
about 820Kb.

You may like to keep the evidence (if you still have it) -- move the
presumably bad file into a folder of its own and rename it to have some
extension other than dll -- because targetting a Python installation
would appear to be novel and someone may very well be interested in
inspecting your file.

If you have some good reason to stay with 2.2, then get the _latest_
version of that (2.2.3). Otherwise, install 2.4.

Regards,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Awkwardness of C API for making tuples

2005-02-01 Thread John Machin
Dave Opstad wrote:
> One of the functions in a C extension I'm writing needs to return a
> tuple of integers, where the length of the tuple is only known at
> runtime. I'm currently doing a loop calling PyInt_FromLong to make
the
> integers,

What is the purpose of this first loop?

In what variable-length storage are you storing these (Python) integers
during this first loop? Something you created with (a) PyMem_Malloc (b)
malloc (c) alloca (d) your_own_malloc?

> then PyTuple_New, and finally a loop calling PyTuple_SET_ITEM
> to set the tuple's items. Whew.

Whew indeed.

> Does anyone know of a simpler way? I can't use Py_BuildValue because
I
> don't know at compile-time how many values there are going to be. And

> there doesn't seem to be a PyTuple_FromArray() function.
>
> If I'm overlooking something obvious, please clue me in!

1. Determine the length of the required tuple; this may need a loop,
but only to _count_ the number of C longs that you have.
2. Use PyTuple_New.
3. Loop to fill the tuple, using PyInt_FromLong and PyTuple_SetItem.

Much later, after thoroughly testing your code, gingerly change
PyTuple_SetItem to PyTuple_SET_ITEM. Benchmark the difference. Is it
anywhere near what you saved by cutting out the store_in_temp_array
thing in the first loop?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Awkwardness of C API for making tuples

2005-02-01 Thread John Machin

Dave Opstad wrote:
> In article <[EMAIL PROTECTED]>,
>  "John Machin" <[EMAIL PROTECTED]> wrote:
>
> > What is the purpose of this first loop?
>
> Error handling. If I can't successfully create all the PyInts then I
can
> dispose the ones I've made and not bother making the tuple at all.
> >
> > In what variable-length storage are you storing these (Python)
integers
> > during this first loop? Something you created with (a) PyMem_Malloc
(b)
> > malloc (c) alloca (d) your_own_malloc?
>
> (b) malloc. The sequence here is: 1) malloc; 2) check for malloc
> success; 3) loop to create PyInts (if failure, Py_DECREF those made
so
> far and free the malloc'ed buffer); 4) create new tuple (error checks

> again); and 5) PyTuple_SET_ITEM (no error checks needed)

Don't. If you _must_ allocate your own storage, use PyMem_Malloc.

>
> > 1. Determine the length of the required tuple; this may need a
loop,
> > but only to _count_ the number of C longs that you have.
> > 2. Use PyTuple_New.
> > 3. Loop to fill the tuple, using PyInt_FromLong and
PyTuple_SetItem.
>
> This would certainly be simpler, although I'm not sure I'm as clear
as
> to what happens if, say, in the middle of this loop a PyInt_FromLong
> fails. I know that PyTuple_SetItem steals the reference; does that
mean
> I could just Py_DECREF the tuple and all the pieces will be
> automagically freed? If so, I'll take your recommendation and rework
the
> logic this way.

This is what I believe happens. However even if you did need to do more
cleaning up, you shouldn't penalise the normal case i.e. when
PyInt_FromLong works. The only failure cause AFAIK is running out of
memory.
This should be rare unless it's triggered by your calling malloc :-)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Crashing Python interpreter! (windows XP, python2.3.4, 2.3.5rc1, 2.4.0)

2005-02-02 Thread John Machin

Leeuw van der, Tim TOP-POSTED:
> Hi all,
>
> I can use this version of gtk and PyGtk to run simple programs. There
seems to be no problem with the code-completion in PythonWin.
> I can do: dir(gtk) without problems after importing the gtk module of
PyGtk, when I use idle or console. (Python version for this test:
python2.4, python 2.3.5rc1)
>
> I already knew that I could run simple PyGtk programs with this
combination of Python, PyGtk, and Gtk. Also knew already, that the code
completion in the PythonWin IDE works.
> The crash comes when invoked from pydev under eclipse. So I can't
remove them from the equation (I mentioned the problem also on the
Pydev Sourceforge-forum, but decided to post here since it's the
interpreter crashing).
>
> It's the "send/don't send" type of error box, so I choose "don't
send" at that point since that will only send crash info to
Microsoft... no point in that! :-)
>
> It could of course be a problem in the GTK DLLs and I haven't yet had
time to test with older GTK versions. Hopefully I can try that
tomorrow.
>
> cheers,
>
> --Tim
>
>
> -Original Message-
> From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
Behalf Of Stefan Behnel
> Sent: Wednesday, February 02, 2005 3:59 PM
> To: [email protected]
> Subject: Re: Crashing Python interpreter! (windows XP, python2.3.4,
2.3.5rc1, 2.4.0)
>
> Leeuw van der, Tim schrieb:
> > I'm using the following combination of software:
> > - Pydev Eclipse plugin (pydev 0.8.5)
> > - eclipse 3.0.1
> > - windows XP SP1
> > - pygtk 2.4.1
> > - GTK 2.6.1 (for windows32 native)
>
> > When trying to get a list of possible completions for the 'gtk'
import object, the python interpreter crashes. Happens with all
versions listed in the subject: python 2.3.4, 2.3.5rc1, 2.4.0.
> 

Do you have a file called drwtsn32.log anywhere on your computer?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reference count question

2005-02-02 Thread John Machin

Fredrik Lundh wrote:
>
> >PyList_SetItem(List,i,Str);
>
> you should check the return value, though.  PyList_SetItem may (in
> theory) fail.
>

:-)
Only a bot could say that. We mere mortals have been known to do things
like (a) pass a non-list as the first argument (b) pass an out-of-range
value for the second argument.
(-:

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Crashing Python interpreter! (windows XP, python2.3.4, 2.3.5rc1, 2.4.0)

2005-02-03 Thread John Machin

Leeuw van der, Tim wrote:
>
>> > Do you have a file called drwtsn32.log anywhere on your computer?
>
> No, unfortunately I cannot find such file anywhere on my computer
>
> What do I do to get such file? Or anything equally useful?
>

On my Windows 2000 box, just crash something :-)

Perhaps this may help:

http://www.windowsnetworking.com/kbase/WindowsTips/Windows2000/RegistryTips/RegistryTools/DrWatson.html

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Crashing Python interpreter! (windows XP, python2.3.4, 2.3.5rc1, 2.4.0)

2005-02-03 Thread John Machin

Leeuw van der, Tim wrote:
> >
> >
> > -Original Message-
> > From: [EMAIL PROTECTED]
on behalf of John Machin
> > Sent: Thu 2/3/2005 12:00 PM
> > To: [email protected]
> > Subject: Re: Crashing Python interpreter! (windows XP, python2.3.4,
2.3.5rc1,2.4.0)
> >
> >
> > Leeuw van der, Tim wrote:
> > >
> > >> > Do you have a file called drwtsn32.log anywhere on your
computer?
> > >
> > > No, unfortunately I cannot find such file anywhere on my
computer
> > >
> > > What do I do to get such file? Or anything equally useful?
> > >
> >
> > On my Windows 2000 box, just crash something :-)
> >
> >
> > Perhaps this may help:
> >
> >
http://www.windowsnetworking.com/kbase/WindowsTips/Windows2000/RegistryTips/RegistryTools/DrWatson.html
> >
>
> Using this URL, I found the log file and it's about 1Gb big...

Yeah, well, it's a *LOG* file -- unless you fiddle with the registry
settings (not recommended), you get one lot of guff per crash. But 1
Gb

> I'll have to find out what is the useful part of it (or remove it and
crash again).

The useful part would be the tail of the file. Remove it? It may
contain evidence useful for other problems (that you didn't know you
were having). Try rename.

> I don't know why searching all drives using windows 'search' did not
find the file!

Possibilities:
1. You have a virus that renames the drwtsn32.log file for the duration
of a search.
2. Emanations from UFOs are affecting computer circuitry in your
neighbourhood.
3. Microsoft programmers stuffed up.
4. You had a typo.

>
> When I have a usefull crashdump, what should I do? Attach to e-mail
and post it here?

You could eyeball it to see if it gave you a clue first. Failing that,
just post a copy of the part(s) from just ONE crash that look something
like this:

*> Stack Back Trace <*

FramePtr ReturnAd Param#1  Param#2  Param#3  Param#4  Function Name
0012D6F8 77F8819B 7803A700 7C34F639 7803A730 7C36B42C
ntdll!RtlpWaitForCriticalSection
0012D73C 1E0A73CE 7803A710  00AE0900 
ntdll!ZwCreateThread
7C36C576 E87C3822 FFFD5D89 E80C75FF FFFE3081 FC658359
!PyTime_DoubleToTimet
B8680C6A      


> Should I include the user.dmp file too?

Look at it; how big is it? [my latest is 13MB for @#$% sake] Is it in
ascii or binary? Does it look useful? What proportion of people who get
this newsgroup/mailing-list via *E-MAIL* will be overjoyed to receive
it?
Are you sure you will not be splattering company-confidential data all
over the Internet?

>
> Should I do the same for both python 2.3.5r1 and python 2.4? Or is it
sufficient to do so for Python 2.4?

If you are sending only stack back trace snippets as above, then send
both just in case they are different.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Crashing Python interpreter! (windows XP, python2.3.4, 2.3.5rc1, 2.4.0)

2005-02-03 Thread John Machin

Leeuw van der, Tim wrote:
> >
> >
> > -Original Message-
> > From: [EMAIL PROTECTED]
on behalf of John Machin
> > Sent: Thu 2/3/2005 12:00 PM
> > To: [email protected]
> > Subject: Re: Crashing Python interpreter! (windows XP, python2.3.4,
2.3.5rc1,2.4.0)
> >
> >
> > Leeuw van der, Tim wrote:
> > >
> > >> > Do you have a file called drwtsn32.log anywhere on your
computer?
> > >
> > > No, unfortunately I cannot find such file anywhere on my
computer
> > >
> > > What do I do to get such file? Or anything equally useful?
> > >
> >
> > On my Windows 2000 box, just crash something :-)
> >
> >
> > Perhaps this may help:
> >
> >
http://www.windowsnetworking.com/kbase/WindowsTips/Windows2000/RegistryTips/RegistryTools/DrWatson.html
> >
>
> Using this URL, I found the log file and it's about 1Gb big...

Yeah, well, it's a *LOG* file -- unless you fiddle with the registry
settings (not recommended), you get one lot of guff per crash. But 1
Gb

> I'll have to find out what is the useful part of it (or remove it and
crash again).

The useful part would be the tail of the file. Remove it? It may
contain evidence useful for other problems (that you didn't know you
were having). Try rename.

> I don't know why searching all drives using windows 'search' did not
find the file!

Possibilities:
1. You have a virus that renames the drwtsn32.log file for the duration
of a search.
2. Emanations from UFOs are affecting computer circuitry in your
neighbourhood.
3. Microsoft programmers stuffed up.
4. You had a typo.

>
> When I have a usefull crashdump, what should I do? Attach to e-mail
and post it here?

You could eyeball it to see if it gave you a clue first. Failing that,
just post a copy of the part(s) from just ONE crash that look something
like this:

*> Stack Back Trace <*

FramePtr ReturnAd Param#1  Param#2  Param#3  Param#4  Function Name
0012D6F8 77F8819B 7803A700 7C34F639 7803A730 7C36B42C
ntdll!RtlpWaitForCriticalSection
0012D73C 1E0A73CE 7803A710  00AE0900 
ntdll!ZwCreateThread
7C36C576 E87C3822 FFFD5D89 E80C75FF FFFE3081 FC658359
!PyTime_DoubleToTimet
B8680C6A      


> Should I include the user.dmp file too?

Look at it; how big is it? [my latest is 13MB for @#$% sake] Is it in
ascii or binary? Does it look useful? What proportion of people who get
this newsgroup/mailing-list via *E-MAIL* will be overjoyed to receive
it?
Are you sure you will not be splattering company-confidential data all
over the Internet?

>
> Should I do the same for both python 2.3.5r1 and python 2.4? Or is it
sufficient to do so for Python 2.4?

If you are sending only stack back trace snippets as above, then send
both just in case they are different.

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: how to generate SQL SELECT pivot table string

2005-02-03 Thread John Machin
McBooCzech wrote:
> Hallo all,
>
> I am trying to generate SQL SELECT command which will return pivot
> table. The number of column in the pivot table depends on the data
> stored in the database. It means I do not know in advance how many
> columns the pivot table will have.
>
> For example I will test the database as following:
> SELECT DISTINCT T1.YEAR FROM T1
>
> The SELECT command will return:
> 2002
> 2003
> 2004
> 2005
>
> So I would like to construct following select:
>
> select T1.WEEK,
> SUM (case T1.YEAR when '2002' then T1.PRICE else 0 END) Y_02,
> SUM (case T1.YEAR when '2003' then T1.PRICE else 0 END) Y_03,
> SUM (case T1.YEAR when '2004' then T1.PRICE else 0 END) Y_04,
> SUM (case T1.YEAR when '2005' then T1.PRICE else 0 END) Y_05
> from T1
> group by T1.week
>
> which will return pivot table with 5 columns:
> WEEK, Y_02, Y_03, Y_04, Y_05,
>
> but if the command "SELECT DISTINCT T1.YEAR FROM T1" returns:
> 2003
> 2004
>
> I have to construct only following string:
>
> select T1.WEEK,
> SUM (case T1.YEAR when '2003' then T1.PRICE else 0 END) Y_03,
> SUM (case T1.YEAR when '2004' then T1.PRICE else 0 END) Y_04,
> from T1
> group by T1.week
>
> which will return pivot table with 3 columns:
> WEEK, Y_03, Y_04
>
> Can anyone help and give me a hand or just direct me, how to write a
> code which will generate SELECT string depending on the data stored
in
> the database as I described?

>>> step1result = ["2003", "2004"] # for example
>>> prologue = "select T1.WEEK, "
>>> template = "SUM (case T1.YEAR when '%s' then T1.PRICE else 0 END)
Y_%s"
>>> epilogue = " from T1 group by T1.week"
>>> step2sql = prologue + ", ".join([template % (x, x[-2:]) for x in
step1result]) + epilogue
>>> step2sql
"select T1.WEEK, SUM (case T1.YEAR when '2003' then T1.PRICE else 0
END) Y_03, SUM (case T1.YEAR when '2004' then T1.PRICE else 0 END) Y_04
from T1 group by T1.week"
>>>

Of course you may need to adjust the strings above to allow for your
local SQL syntax (line breaks, line continuations, semicolon at the end
maybe, ...).

A few quick silly questions:
Have you read the Python tutorial?
Do you read this newsgroup (other than answers to your own questions)?
Could you have done this yourself in a language other than Python?

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Converting a string to a function pointer

2005-02-04 Thread John Machin
On Fri, 04 Feb 2005 12:01:35 +0100, Håkan Persson <[EMAIL PROTECTED]>
wrote:

>Hi.
>
>I am trying to "convert" a string into a function pointer.
>Suppose I have the following:
>
>from a import a
>from b import b
>from c import c
>
>funcString = GetFunctionAsString()
>
>and funcString is a string that contains either "a", "b" or "c".
>How can I simply call the correct function?
>I have tried using getattr() but I don't know what the first (object) 
>argument should be in this case.

Try this:

>>> from sys import exit
>>> globals()
{'__builtins__': , '__name__':
'__main__', 'exit': , '__doc__': None}
>>> afunc = globals()["exit"]

Do you really need to use the "from X import Y" style? Consider the
following alternative:

>>> import sys
>>> afunc = getattr(sys, "exit")



-- 
http://mail.python.org/mailman/listinfo/python-list


Re: newbie: Syntax error

2005-02-04 Thread John Machin
Chad Everett wrote:
> Hi Everyone,
>
> I am new to Python and programming in general.  I bought the book
"Python
> Programming for the Absolute Beginner" by michael Dawson.
>
> I have been working through it but am having trouble.
> I am trying to make a coin flip program and keep geting a Synax Error
> "invalid syntax".
>
> If anyone has a moment could you please look at it and tell me what I
am
> doing wrong.
>
> thanks for your time and patience.
>
> Chad
>
> # Coin Flip Program
> # This program flips a coin 100 times and tells the number of heads
and
> tails.

Oh, no, it doesn't! Once you fix the few things that stop it from
running at all, it will flip the coin ONCE and tell you the one
then-fixed answer *99* times.

> # Chad Everett 2/3/2005
>
>
> print "\t\tCoin Flip Game*\n"
> import random
>
> # heads = 1
> # tails = 2
>
> tries = random.randrange(2) + 1

Better move the above line _after_ the "while" statement.

> count = 1

Try 0; think about it this way: you are counting the number of times
something has happened, and you want to stop after 100 somethings have
happened. At this stage you ain't hatched no chickens :)

>
You also need to initialise the 'heads' and 'tails' counters.

> while count != 100:

Defensive programming hint #1: while count <= 100, if starting at 1; <
100 if starting from 0

Tip: Python takes the drudgery out of programming. Less typing, less
scope for error. To do something 100 times:

!for count in range(100):
!do_something()

> if tries == 1:
> heads = 1

I think you meant:
!heads += 1

> count += 1
Do this in only one place; it doesn't/shouldn't depend on the coin toss
result.

>
> else tries == 2:  # I AM GETTING THE SYNTAX ERROR HERE
> tails = 1

and
!tails += 1

> count += 1
>

Others have answered your syntax error question; I'll just round off
with Defensive Programming hint #2:

Get into this habit early:

!if tries == 1:
!blah_blah()
!elif tries == 2:
!yadda_yadda()
!else:
!raise Exception, "Can't happen :-) tries has unexpected value
(%r)" % tries

> print "heads: " + heads

Ugh; did you get that out of the book?

Try:
!print "heads:", heads

> print "tails: " + tails
> 
> raw_input("Press enter to quit")

HTH,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Error!

2005-02-04 Thread John Machin

administrata wrote:
> I'm programming Car Salesman Program.
> It's been "3 days" learning python...

>From whom or what book or what tutorial?

> But, i got problem

You got problemS. What Jeff & Brian wrote, plus:

You have "change" instead of "charge".

You forgot to add in the base price -- "actual price" according to you
comprises only the taxes and fees. Where is your car yard? We'd each
like to order a nice shiny red Ferrari :-)

Cheers,
John

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: extreme newbie

2005-06-18 Thread John Machin
Dennis Lee Bieber wrote:
> On 18 Jun 2005 07:48:13 -0700, "cpunerd4" <[EMAIL PROTECTED]> declaimed
> the following in comp.lang.python:
> 
> 
>>even so,
>>crackers have a harder time getting into compiled programs rather than
>>intepreted languages. I know hiding the code won't stop all crackers
> 
> 
>   A good debugger in step mode can get into anything... At my
> college, those of us with the skills took less than 30 minutes to unlock
> the system assembler after it had been set to run on higher privileged
> accounts (the OS had numeric "priority" levels in accounts; students ran
> at 20 or 40, the assembler had been set to something like 50 to stop the
> troublemakers).  Copy the executable to local, start under debugger,
> step through until the test for account priority was reached, change
> comparison... Voila, private copy of the assembler.
> 

This unnamed OS didn't allow granting execute access but not read access?

I do agree with your main point however. Once you have read access to 
the software, you can do pretty much what you like.


I recall a piece of software that was paid for on an annual licence fee 
basis, and would stop working after a given date. The update sometimes 
arrived late. Fortunately it was a trivial exercise to find the date 
check in the "expired" executable and circumvent it. Debug in step mode? 
How quaint and tedious! All one had to do was to put a Trojan 
DLL-equivalent in the path; this contained a today()-equivalent function 
that simply called the system debug function. Of course the authors 
could have prevented that by dynamically loading the today()-equivalent 
function directly from the manufacturer-supplied system-central 
DLL-equivalent; my guess is that doing so would have prevented easy 
testing of the "stop working" code on a shared machine where they 
couldn't change the system date without upsetting other users, and it's 
probable they were using a Trojan today()-equivalent gadget to supply 
"old" dates for testing.


Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: oddness in super()

2005-06-18 Thread John Machin
Michael P. Soulier wrote:
> Ok, this works in Python on Windows, but here on Linux, with Python 2.4.1, I'm
> getting an error. 
> 
> The docs say:
> 
> A typical use for calling a cooperative superclass method is:
> 
> class C(B):
> def meth(self, arg):
> super(C, self).meth(arg)
> 
> However, when I try this, which works on windows, with ActiveState
> ActivePython 2.4.0...
> 
> class RemGuiFrame(RemGlade.RemFrame):
> """This class is the top-level frame for the application."""
> 
> def __init__(self, *args, **kwds):
> "Class constructor."
> # Constructor chaining
> super(RemGuiFrame, self).__init__(*args, **kwds)
> 
> ...on linux I get this...
> 
> [EMAIL PROTECTED] pysrc]$ ./RemGui.py 
> Traceback (most recent call last):
>   File "./RemGui.py", line 206, in ?
> frame_1 = RemGuiFrame(None, -1, "")
>   File "./RemGui.py", line 30, in __init__
> super(RemGuiFrame, self).__init__(*args, **kwds)
> TypeError: super() argument 1 must be type, not classobj
> 
> Why the difference?

You are in the best position to answer that; you have access to the 
source code of the place where the problem occurs (RemGui.py), in both a 
"working" instance and a non-"working" instance, so you can (a) read it 
(b) debug it with a debugger, or insert print statements e.g. print 
repr(RemGuiFrame), repr(RemGlade.RemFrame)

You have already noted two variables, OS Windows/Linux, and Python 
2.4.[01]). Are there any others? Perhaps RemGlade.RemFrame comes from a 
3rd party library and there's a version difference.

> Is Python portability overrated?

No. Please stop baying at the moon, and dig out some evidence.

> Is this a bug? 

Is *what* a bug?

Is "it" a bug in what? Windows? Linux? Python 2.4.0? Python 2.4.1? 3rd 
party code? your code?

=

I've never used "super" before, I'm only vaguely aware what it could be 
useful for, and what a "cooperative superclass method" might be, and I 
don't have access to your source code, nor to Linux ... but let's just 
see what can be accomplished if you drag yourself away from that moonlit 
rock and start digging:

First (1) Ooooh what a helpful error message! What is a "classobj"?

Let's RTFM: Hmmm don't tell X** L** but there's no entry for "classobj" 
in the index. But wait, maybe that's geekspeak for "class object" ...
two or three bounces of the pogo-stick later we find this:

"""
Class Types
Class types, or ``new-style classes,'' are callable. These objects 
normally act as factories for new instances of themselves, but 
variations are possible for class types that override __new__(). The 
arguments of the call are passed to __new__() and, in the typical case, 
to __init__() to initialize the new instance.

Classic Classes
Class objects are described below. When a class object is called, a new 
class instance (also described below) is created and returned. This 
implies a call to the class's __init__() method if it has one. Any 
arguments are passed on to the __init__() method. If there is no 
__init__() method, the class must be called without arguments.
"""

(2) Well fancy that... "type" kinda sorta means "new", "classobj" kinda 
sorta means "old"! Now let's re-read the bit about super in TFM:

"""
super() only works for new-style classes
"""

(3) So let's check out our tentative hypothesis:
"""
class Bold:
 def __init__(self):
 print "Bold"

class Bnew(object):
 def __init__(self):
 print "Bnew"

class Cold(Bold):
 def __init__(self):
 print "Cold", repr(Cold), "derives from", repr(Bold)
 super(Cold, self).__init__()
 print "Cold OK"

class Cnew(Bnew):
 def __init__(self):
 print "Cnew", repr(Cnew), "derives from", repr(Bnew)
 super(Cnew, self).__init__()
 print "Cnew OK"

cnew = Cnew()
cold = Cold()
"""

which when run on Python 2.4.1 on a Windows box produces this result:
"""
Cnew  derives from 
Bnew
Cnew OK
Cold  derives from 
Traceback (most recent call last):
   File "C:\junk\soulier.py", line 22, in ?
 cold = Cold()
   File "C:\junk\soulier.py", line 12, in __init__
 super(Cold, self).__init__()
TypeError: super() argument 1 must be type, not classobj

"""

Funny that, the class repr()s are different; not in a meaningful way, 
but the mere difference indicates the classes are products of different 
manufacturers.

(4) Ooooh! How old is the Linux copy of that 3rd party library?

(5) Looks like the bug is in RemGui.py -- it is calling super() with an 
invalid argument. No apparent Python portability problems.

> 
> I'm confused. 

Confession is good for the soul. However true confession is even better 
for the soul. Please come up with a better description. :-)

===

I hope these self-help hints are useful with your next "confusion".

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: struct.(un)pack and ASCIIZ strrings

2005-06-18 Thread John Machin
Terry Reedy wrote:
> "Sergey Dorofeev" <[EMAIL PROTECTED]> wrote in message 
> news:[EMAIL PROTECTED]
> 
>>I can use string.unpack if string in struct uses fixed amount of bytes.
> 
> 
> I presume you mean struct.unpack(format, string).  The string len must be 
> known when you call, but need not be fixed across multiple calls with 
> different strings.
> 
> 
>>But is there some extension to struct modue, which allows to unpack
>>zero-terminated string, size of which is unknown?
>>E.g. such struct: long, long, some bytes (string), zero, short, 
>>short,short.


> Size is easy to determine.  Given the above and string s (untested code):
> prelen = struct.calcsize('2l')
> strlen = s.find('\0', prelen) - prelen
> format = '2l %ds h c 3h' % strlen # c swallows null byte
shouldn't this be '2l %ds c 3h'??

> 
> Note that C structs can have only one variable-sized field and only at the 
> end.  With that restriction, one could slice and unpack the fixed stuff and 
> then directly slice out the end string.  (Again, untested)
> 
> format = 2l 3h' # for instance
> prelen = struct.calcsize(format)
> tup = struct.unpack(format, s[:prelen])
> varstr = s[prelen, -1] # -1 chops off null byte

Perhaps you meant varstr = s[prelen:-1]

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Regex for repeated character?

2005-06-18 Thread John Machin
Terry Hancock wrote:
> On Saturday 18 June 2005 02:05 am, John Machin wrote:
> 
>>Doug Schwarz wrote:
>>
>>>In article <[EMAIL PROTECTED]>,
>>> Leif K-Brooks <[EMAIL PROTECTED]> wrote:
>>>
>>>>How do I make a regular expression which will match the same character
>>>>repeated one or more times,
>>>
>>>How's this?
>>>
>>>  >>> [x[0] for x in re.findall(r'((.)\2*)', 'abbccccccbba')]
>>>  ['a', 'bb', 'ccc', '', 'ccc', 'bb', 'a']
>>
>>I think it's fantastic, but I'd be bound to say that given that it's the 
>>same as what I posted almost two days ago :-)
> 
> 
> Guess there's only one obvious way to do it, then. ;-)
> 

Yep ... but probably a zillion ways in
re.compile(r"perl", re.I).match(other_languages)

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: catch argc-argv

2005-06-20 Thread John Machin
mg wrote:
> Hello,
> 
> I am writting bindings for a FEM application. In  one of my function 
> 'initModulename', called when the module is imported, I would like to 
> get the argc and argv arguments used in the main function of Python.

This is an "interesting" way of writing bindings. Most people would 
provide an interface in terms of either a library of functions or a 
class or two. I don't recall ever seeing a module or package that did 
what you say you want to do.

Consider that your module should NOT be tied to command-line arguments. 
Abstract out what are the essential inputs to whatever a "FEM 
application" is. Then the caller of your module can parse those inputs 
off the command line using e.g. optparse and/or can collect them via a 
GUI and/or hard code them in a test module and/or read them from a test 
data file or database.


> So, my question is: does the Python API containe fonctions like 
> 'get_argc()' and 'get_argv()' ?
> 

If you can't see them in the documentation, they aren't there. If they 
aren't there, that's probably for a good reason -- no demand, no use case.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: catch argc-argv

2005-06-20 Thread John Machin
Duncan Booth wrote:
> John Machin wrote:
> 
> 
>>>So, my question is: does the Python API containe fonctions like 
>>>'get_argc()' and 'get_argv()' ?
>>>
>>
>>If you can't see them in the documentation, they aren't there. If they
>>aren't there, that's probably for a good reason -- no demand, no use
>>case. 
>>
>>
> 
> 
> Leaving aside whether or not there is a use-case for this, the reason they 
> aren't there is that they aren't needed.

"no use-case" == "no need" in my book

> As the OP was already told, to 
> access argv, you simply import the 'sys' module and access sys.argv.

Simple in Python, not in C.

> 
> There are apis both to import modules and to get an attribute of an 
> existing Python object.

I know that; my point was why should you do something tedious like that 
when you shouldn't be interested in accessing sys.argv from a C 
extension anyway.

>  So all you need is something like (untested):
> 
> PyObject *sys = PyImport_ImportModule("sys");
> PyObject *argv = PyObject_GetAttrString(sys, "argv");
> int argc = PyObject_Length(argv);
> if (argc != -1) {
>... use argc, argv ...
> }
> Py_DECREF(argv);
> Py_DECREF(sys);
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python and encodings drives me crazy

2005-06-20 Thread John Machin
Oliver Andrich wrote:
> 2005/6/21, Konstantin Veretennicov <[EMAIL PROTECTED]>:
> 
>>It does, as long as headline and caption *can* actually be encoded as
>>macroman. After you decode headline from utf-8 it will be unicode and
>>not all unicode characters can be mapped to macroman:
>>
>>
>u'\u0160'.encode('utf8')
>>
>>'\xc5\xa0'
>>
>u'\u0160'.encode('latin2')
>>
>>'\xa9'
>>
>u'\u0160'.encode('macroman')
>>
>>Traceback (most recent call last):
>>  File "", line 1, in ?
>>  File "D:\python\2.4\lib\encodings\mac_roman.py", line 18, in encode
>>return codecs.charmap_encode(input,errors,encoding_map)
>>UnicodeEncodeError: 'charmap' codec can't encode character u'\u0160' in 
>>position
>> 0: character maps to 
> 
> 
> Yes, this and the coersion problems Diez mentioned were the problems I
> faced. Now I have written a little cleanup method, that removes the
> bad characters from the input

By "bad characters", do you mean characters that are in Unicode but not 
in MacRoman?

By "removes the bad characters", do you mean "deletes", or do you mean 
"substitutes one or more MacRoman characters"?

If all you want to do is torch the bad guys, you don't have to write "a 
little cleanup method".

To leave a tombstone for the bad guys:

 >>> u'abc\u0160def'.encode('macroman', 'replace')
'abc?def'
 >>>

To leave no memorial, only a cognitive gap:

 >>> u'The Good Soldier \u0160vejk'.encode('macroman', 'ignore')
'The Good Soldier vejk'

Do you *really* need to encode it as MacRoman? Can't the Mac app 
understand utf8?

You mentioned cp850 in an earlier post. What would you be feeding 
cp850-encoded data that doesn't understand cp1252, and isn't in a museum?

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: reading a list from a file

2005-06-20 Thread John Machin
Rune Strand wrote:
> But iif it are many lists in the file and they're organised like this:
> 
> ['a','b','c','d','e']
> ['a','b','c','d','e']
> ['A','B','C','D','E'] ['X','F','R','E','Q']
> 
> I think this'll do it
> 
> data = open('the_file', 'r').read().split(']')
> 
> lists = []
> for el in data:
>   el = el.replace('[', '').strip()
>   el = el.replace("'", "")
>   lists.append(el.split(','))
> 
> # further processing of lists
> 
> but the type problem is still to be resolved ;-)
> 

Try this:

["O'Reilly, Fawlty, and Manuel", '""', ',', '"Hello", said O\'Reilly']

:-)
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: utf8 silly question

2005-06-21 Thread John Machin
Jeff Epler wrote:
> If you want to work with unicode, then write
> us = u"\N{COPYRIGHT SIGN} some text"

You can avoid almost all the wear and tear on your shift keys:

 >>> u"\N{copyright sign}"
u'\xa9'

... you are stuck with \N for reasons that should be obvious :-)

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: *Python* Power Tools

2005-06-21 Thread John Machin
Micah wrote:
> Anyone know if there is any organized effort underway to implement the
> Python equivalent of "Perl Power Tools" ?
> 
> If not, would starting this be a waste of effort since:

+1 WOFTAM-of-the-year

> 
> - it's already being done in Perl?
> - cygwin thrives?

For windows users, apart from cygwin, there are a couple of sources of 
binaries for *x command-line utilities (unxutils, gnuwin32).

> - UNIX is already pervasive :-) ?
> 
> Or would people really like to claim a pure Python set of UNIX
> utilities?

Sorry, can't parse that last sentence.
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: oddness in shelve module

2005-06-21 Thread John Machin
Michael P. Soulier wrote:
> I'm trying to add objects to a shelve db object via hash assignment, but
> I'm getting an odd exception. 
> 
> Traceback (most recent call last):
>   File "RemGui.py", line 117, in onMonitorButton
> self.startMonitoring()
>   File "RemGui.py", line 163, in startMonitoring
> self.monitor()
>   File "RemGui.py", line 181, in monitor
> self.db.store_sample(dbentry)
>   File "C:\Documents and Settings\Michael Soulier\My
> Documents\projects\rem\pysr
> c\RemDBShelve.py", line 38, in store_sample
> self.db[sample.timestamp] = sample
> TypeError: object does not support item assignment
> 
> The object itself is quite simple. 
> 
> I provide it below. 

AFAICT, wrong "it". The "item assignment" which is alleged not to be 
supported is of this form: an_object[some_key] = a_value

I.e. "self.db" is the suspect, not "sample"
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: oddness in shelve module

2005-06-22 Thread John Machin
Michael P. Soulier wrote:

>On 22/06/05 John Machin said:
>
>  
>
>>AFAICT, wrong "it". The "item assignment" which is alleged not to be 
>>supported is of this form: an_object[some_key] = a_value
>>
>>I.e. "self.db" is the suspect, not "sample"
>>
>>
>
>Ah. Let me test that it is in fact being created properly then. I
>expected an error more like, "object has no property db" in that case. 
>
>Mike
>
>  
>
sorry, perhaps I wasn't clear enough -- you seem to think I meant "self" 
has no attribute called "db". No, "self.db" exists, but it doesn't 
support the activity of assigning to an indexed item, like a dictionary.

The interactive interpreter is your friend:

 >>> adict = {}
 >>> adict[3] = 4
 >>> notlikeadict = 666
 >>> notlikeadict[3] = 4
Traceback (most recent call last):
  File "", line 1, in ?
TypeError: object does not support item assignment
 >>>

This is the message that you are thinking about, the word is 
"attribute", not "property".

 >>> class Dummy:
...pass
...
 >>> x = Dummy()
 >>> x.dc
Traceback (most recent call last):
  File "", line 1, in ?
AttributeError: Dummy instance has no attribute 'dc'

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: trouble subclassing str

2005-06-23 Thread John Machin
Brent wrote:
> I'd like to subclass the built-in str type.  For example:

You'd like to build this weird-looking semi-mutable object as a 
perceived solution to what problem? Perhaps an alternative is a class of 
objects which have a "key" (your current string value) and some data 
attributes? Maybe simply a dict ... adict["some text"] = 100?

> class MyString(str):
> 
> def __init__(self, txt, data):
> super(MyString,self).__init__(txt)
> self.data = data
> 
> if __name__ == '__main__':
> 
> s1 = MyString("some text", 100)
> 
> 
> but I get the error:
> 
> Traceback (most recent call last):
>   File "MyString.py", line 27, in ?
> s1 = MyString("some text", 12)
> TypeError: str() takes at most 1 argument (2 given)
> 
> I am using Python 2.3 on OS X.  Ideas?
> 

__init__ is not what you want.

If you had done some basic debugging before posting (like putting a 
print statement in your __init__), you would have found out that it is 
not even being called.

Suggestions:

1. Read the manual section on __new__
2. Read & run the following:

class MyString(str):

 def __new__(cls, txt, data):
 print "MyString.__new__:"
 print "cls is", repr(cls)
 theboss = super(MyString, cls)
 print "theboss:", repr(theboss)
 new_instance = theboss.__new__(cls, txt)
 print "new_instance:", repr(new_instance)
 new_instance.data = data
 return new_instance

if __name__ == '__main__':

 s1 = MyString("some text", 100)
 print "s1:", type(s1), repr(s1)
 print "s1.data:";, s1.data

3. Note, *if* you provide an __init__ method, it will be called 
[seemingly redundantly???] after __new__ has returned.

HTH,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Sorting part of a list

2005-06-24 Thread John Machin
Sibylle Koczian wrote:
> Hello,
> 
> I thought I understood list slices, but I don't. I want to sort only the 
> last part of a list, preferably in place. If I do
> 
>  >>> ll = [3, 1, 4, 2]
>  >>> ll[2:].sort()

It may help in unravelling any bogglement to point out that this is 
equivalent to

temp = ll[2:]; temp.sort(); del temp


>  >>> ll
> [3, 1, 4, 2]
> 
> ll isn't changed, because ll[2:] is a copy of the last part of the list, 
> and this copy is sorted, not the original list. Right so far?

Quite correct.

> 
> But assignment to the slice:
> 
>  >>> ll[2:] = [2, 4]
>  >>> ll
> [3, 1, 2, 4]
> 
> _does_ change my original ll.

Quite correct.

> 
> What did I misunderstand?


What misunderstanding? You have described the behaviour rather 
precisely. Which of the two cases is boggling you?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: howto load and unload a module

2005-06-24 Thread John Machin
Peter Hansen wrote:
> Guy Robinson wrote:
> 
>> I have a directory of python scripts that all (should) contain a 
>> number of attributes and methods of the same name.
>>
>> I need to import each module, test for these items and unload the 
>> module. I have 2 questions.
[snip]
>> 2.. how do I test for the existance of a method in a module without 
>> running it?

What the OP is calling a 'method' is more usually called a 'function' 
when it is defined at module level rather than class level.

> 
> 
> The object bound to the name used in the import statement is, well, an 
> object, so you can use the usual tests:
> 
> import mymodule
> try:
> mymodule.myfunction
> except AttributeError:
> print 'myfunction does not exist'
> 
> or use getattr(), or some of the introspection features available in the 
> "inspect" module.
> 

Ummm ... doesn't appear to scale well for multiple modules and multiple 
attributes & functions. Try something like this (mostly tested):

modules = ['foomod', 'barmod', 'brentstr', 'zotmod']
attrs = ['att1', 'att2', 'att3', 'MyString']
funcs = ['fun1', 'fun2', 'fun3']
# the above could even be read from file(s)
for modname in modules:
 try:
 mod = __import__(modname)
 except ImportError:
 print "module", modname, "not found"
 continue
 for attrname in attrs:
 try:
 attr = getattr(mod, attrname)
 except AttributeError:
 print "module %s has no attribute named %s" % \
 (modname, attrname)
 continue
 # check that attr is NOT a function (maybe)
 for funcname in funcs:
 pass
 # similar to above but check that it IS a function


BTW, question for the OP: what on earth is the use-case for this? Bulk 
checking of scripts written by students?

Cheers,
John
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: autoconfigure vss python question

2005-06-24 Thread John Machin
lode leroy wrote:
> Hi folks,
> 
> I'm trying to build a python module using MINGW on MSYS
> the "configure" script is determining where python is installed as follows:
> 
> python.exe -c 'import sys; print sys.prefix'
> c:\Python24
> 
> which is good on native windows (i.e. when invoked from CMD.EXE)
> 
> Is there a way to configure something in python or in the environment
> so that when invoked from MSYS, it would behave as follows: (note the 
> '/' vss '\')
> 
> python.exe -c 'import sys; print sys.prefix'
> c:/Python24

Any good reason for not using distutils?
See this:
http://www.python.org/doc/2.4.1/inst/tweak-flags.html#SECTION000622000

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Favorite non-python language trick?

2005-06-24 Thread John Machin
James wrote:
> Interesting thread ...
> 
> 1.) Language support for ranges as in Ada/Pascal/Ruby
> 1..10 rather than range(1, 10)

Did you mean 1..9 or 1...10 or both or neither?

Can this construct be used like this: (i+1)..n ? If not, what would you 
use? What is the frequency of range literals in the average piece of code?


-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Favorite non-python language trick?

2005-06-24 Thread John Machin
James Stroud wrote:
> On Friday 24 June 2005 05:58 am, Steven D'Aprano wrote:
> 
>>with colour do begin
>>red := 0; blue := 255; green := 0;
>>end;
>>
>>instead of:
>>
>>colour.red := 0; colour.blue := 255; colour.green := 0;
>>
>>Okay, so maybe it is more of a feature than a trick, but I miss it and it
>>would be nice to have in Python.
> 
> 
> class color:# americanized
>   red = 0
>   blue = 255
>   green = 0
colour = color
centre = center
# etc etc
> 
> Less typing than pascal. Also avoids those stupid little colons.
> 
-- 
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >