Re: python backup script

2013-05-06 Thread MRAB

On 06/05/2013 23:12, [email protected] wrote:

On Monday, May 6, 2013 5:48:44 PM UTC-4, Enrico 'Henryx' Bianchi wrote:

Enrico 'Henryx' Bianchi wrote:

> cmd2 =  subprocess.Popen(['gzip' '-c'],
> shell=False,
> stdout=filename)

Doh, my fault:

cmd2 = subprocess.Popen(['gzip' '-c'],
shell=False,
stdout=filename
stdin=cmd1.stdout)

Enrico


Thank you Enrico. I've just tried your script and got this error:
  stdin=cmd1.stdout)
 ^
SyntaxError: invalid syntax

any idea?


Missing comma on the previous line.

--
http://mail.python.org/mailman/listinfo/python-list


Re: python backup script

2013-05-06 Thread MRAB

On 06/05/2013 23:40, MMZ wrote:

On Monday, May 6, 2013 6:12:28 PM UTC-4, Chris Angelico wrote:

On Tue, May 7, 2013 at 5:01 AM, MMZ  wrote:

> username = config.get('client', 'mmz')

> password = config.get('client', 'pass1')

> hostname = config.get('client', 'localhost')



Are 'mmz', 'pass1', and 'localhost' the actual values you want for

username, password, and hostname? If so, don't pass them through

config.get() at all - just use them directly. In fact, I'd be inclined

to just stuff them straight into the Database_list_command literal;

that way, it's clear how they're used, and the fact that you aren't

escaping them in any way isn't going to be a problem (tip: an

apostrophe in your password would currently break your script).



It's also worth noting that the ~/ notation is a shell feature. You

may or may not be able to use it in config.read().



ChrisA


Thanks Chris. you are right.
So I used them directly and removed configParser. The new error is:

Traceback (most recent call last):
   File "./bbk.py", line 11, in ?
 for database in os.popen(database_list_command).readlines():
NameError: name 'database_list_command' is not defined

any idea?


Check the spelling (remember that the name is case-sensitive).
--
http://mail.python.org/mailman/listinfo/python-list


Re: Why sfml does not play the file inside a function in this python code?

2013-05-07 Thread MRAB

On 07/05/2013 10:27, [email protected] wrote:

from tkinter import *
import sfml


window = Tk()
window.minsize( 640, 480 )


def sonido():
 file = sfml.Music.from_file('poco.ogg')
 file.play()


test = Button ( window, text = 'Sound test', command=sonido )
test.place ( x = 10, y = 60)

window.mainloop()




Using Windows 7, Python 3.3, sfml 1.3.0 library, the file it is played if i put 
it out of the function. ¿ what am i doing wrong ? Thanks.


Perhaps what's happening is that sonido starts playing it and then
returns, meaning that there's no longer a reference to it ('file' is
local to the function), so it's collected by the garbage collector.

If that's the case, try keeping a reference to it, perhaps by making
'file' global (in a simple program like this one, using global should
be OK).

--
http://mail.python.org/mailman/listinfo/python-list


Re: Why sfml does not play the file inside a function in this python code?

2013-05-07 Thread MRAB

On 07/05/2013 14:56, [email protected] wrote:

El martes, 7 de mayo de 2013 12:53:25 UTC+2, MRAB  escribió:

On 07/05/2013 10:27, [email protected] wrote:
> from tkinter import *
> import sfml
>
> window = Tk()
> window.minsize( 640, 480 )
>
> def sonido():
>  file = sfml.Music.from_file('poco.ogg')
>  file.play()
>
> test = Button ( window, text = 'Sound test', command=sonido )
> test.place ( x = 10, y = 60)
>
> window.mainloop()
>
> Using Windows 7, Python 3.3, sfml 1.3.0 library, the file it is played if i 
put it out of the function. � what am i doing wrong ? Thanks.
>

Perhaps what's happening is that sonido starts playing it and then
returns, meaning that there's no longer a reference to it ('file' is
local to the function), so it's collected by the garbage collector.

If that's the case, try keeping a reference to it, perhaps by making
'file' global (in a simple program like this one, using global should
be OK).


Thanks. A global use of 'sonido' fix the problem. The garbage collector must be 
the point. But this code is part of a longer project. What can i do to fix it 
without the use of globals? I will use more functions like this, and i would 
like to keep learning python as well good programming methodology.
Thanks.


Presumably the details of the window are (or will be) hidden away in a
class, so you could make 'file' an attribute of an instance.

Also, please read this:

http://wiki.python.org/moin/GoogleGroupsPython

because gmail insists on adding extra linebreaks, which can be somewhat
annoying.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Making safe file names

2013-05-07 Thread MRAB

On 07/05/2013 20:58, Andrew Berg wrote:

Currently, I keep Last.fm artist data caches to avoid unnecessary API calls and 
have been naming the files using the artist name. However,
artist names can have characters that are not allowed in file names for most 
file systems (e.g., C/A/T has forward slashes). Are there any
recommended strategies for naming such files while avoiding conflicts (I 
wouldn't want to run into problems for an artist named C-A-T or
CAT, for example)? I'd like to make the files easily identifiable, and there 
really are no limits on what characters can be in an artist name.


Conflicts won't occur if:

1. All of the characters of the artist's name are mapped to an encoding.

2. Different characters map to different encodings.

3. No encoding is a prefix of another encoding.

In practice, you'll be mapping most characters to themselves.

--
http://mail.python.org/mailman/listinfo/python-list


Re: MySQL Database

2013-05-08 Thread MRAB

On 08/05/2013 19:52, Kevin Holleran wrote:

Hello,

I want to connect to a MySQL database, query for some records,
manipulate some data, and then update the database.

When I do something like this:

 db_c.execute("SELECT a, b FROM Users")

for row in db_c.fetchall():

 (r,d) = row[0].split('|')

 (g,e) = domain.split('.')

 db_c.execute("UPDATE Users SET g = '"+ g + "' WHERE a ='"+ row[0])


Will using db_c to update the database mess up the loop that is cycling
through db_c.fetchall()?


You shouldn't be building an SQL string like that because it's
susceptible to SQL injection. You should be doing it more like this:

db_c.execute("UPDATE Users SET g = %s WHERE a = %s", (g, row[0]))

The values will then be handled safely for you.
--
http://mail.python.org/mailman/listinfo/python-list


Re: help on Implementing a list of dicts with no data pattern

2013-05-08 Thread MRAB

On 09/05/2013 00:47, rlelis wrote:

Hi guys,

I'm working on this long file, where i have to keep reading and
storing different excerpts of text (data) in different variables (list).

Once done that i want to store in dicts the data i got from the lists mentioned 
before. I want them on a list of dicts for later RDBMs purpose's.

The data i'm working with, don't have fixed pattern (see example bellow), so 
what i'm doing is for each row, i want to store combinations of  word/value 
(Key-value) to keep track of all the data.

My problem is that once i'm iterating over the list (original one a.k.a 
file_content in the link), then i'm nesting several if clause to match
the keys i want. Done that i select the keys i want to give them values and 
lastly i append that dict into a new list. The problem here is that i end up 
always with the last line repeated several times for each row it found's.

Please take a look on what i have now:
http://pastebin.com/A9eka7p9


You're creating a dict for highway_dict and a dict for aging_dict, and
then using those dicts for every iteration of the 'for' loop.

You're also appending both of the dicts onto the 'queue_row' list for
every iteration of the 'for' loop.

I think that what you meant to do was, for each match, to create a
dict, populate it, and then append it to the list.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Urgent:Serial Port Read/Write

2013-05-09 Thread MRAB

On 09/05/2013 16:35, chandan kumar wrote:

Hi all,
I'm new to python and facing issue using serial in python.I'm facing the
below error

*ser.write(port,command)*
*NameError: global name 'ser' is not defined*
*
*
Please find the attached script and let me know whats wrong in my script
and also how can i read data from serial port for the  same script.


Best Regards,
Chandan.


RunScripts.py


import time
import os
import serial
import glob


ER_Address = [[0x0A,0x01,0x08,0x99,0xBB,0xBB,0xBB,0xBB,0xBB,0xBB,0xBB]]
"""

Function Name: RunSequence

Function Description:
-
A RunSequence function has Multiple calls to the RunSuite function, each call 
is a single testcase consisting of all the parameters
required by the RunTesSuite.
-
"""
def WriteSerialData(command):


'ser' isn't a local variable (local to this function, that is), nor is
it a global variable (global in this file).


 ser.write(port,command)

def SendPacket(Packet):
 str = chr(len(Packet)) + Packet #Concatenation of Packet with the 
PacketLength
 print str
 WriteSerialData(str)


def CreateFrame(Edata):


It's more efficient to build a list of the characters and then join
them together into a string in one step than to build the string one
character at a time.

Also, indexing into the list is considered 'unPythonic'; it's much 
simpler to do it this way:


return chr(0x12) + "".join(chr(d) for d in Edata)


 evt = chr(0x12)
 evt = evt + chr(Edata[0])
 for i in range (1, len(Edata)):
 evt = evt + chr(Edata[i])
 return evt

def SetRequest(data):
 print data
 new = []
 new = sum(data, [])
 Addr = CreateFrame(new)
 SendPacket(Addr)
 print "SendPacket Done"
 ReadPacket()


def OpenPort(COMPort,BAUDRATE):
 """
 This function reads the serial port and writes it.
 """
 comport=COMPort
 BaudRate=BAUDRATE
 try:
 ser = serial.Serial(
 port=comport,
 baudrate=BaudRate,
 bytesize=serial.EIGHTBITS,
 parity=serial.PARITY_NONE,
 stopbits=serial.STOPBITS_ONE,
 timeout=10,
 xonxoff=0,
 rtscts=0,
 dsrdtr=0
 )

 if ser.isOpen():
 print "Port Opened"
 ser.write("Chandan")
 string1 = ser.read(8)
 print string1


This function returns either ser ...


 return ser
 else:
 print "Port CLosed"
 ser.close()


... or 3 ...

 return 3
 except serial.serialutil.SerialException:
 print "Exception"
 ser.close()


... or None!






if __name__ == "__main__":

 CurrDir=os.getcwd()
 files = glob.glob('./*pyc')
 for f in files:
 os.remove(f)


OpenPort returns either ser or 3 or None, but the result is just
discarded.


 OpenPort(26,9600)
 SetRequest(ER_Address)
 #SysAPI.SetRequest('ER',ER_Address)

 print "Test Scripts Execution complete"



--
http://mail.python.org/mailman/listinfo/python-list


Re: object.enable() anti-pattern

2013-05-09 Thread MRAB

On 09/05/2013 19:21, Steven D'Aprano wrote:

On Thu, 09 May 2013 09:07:42 -0400, Roy Smith wrote:


In article <[email protected]>,
 Steven D'Aprano  wrote:


There is no sensible use-case for creating a file without opening it.


Sure there is.  Sometimes just creating the name in the file system is
all you want to do.  That's why, for example, the unix "touch" command
exists.


Since I neglected to make it clear above that I was still talking about
file objects, rather than files on disk, I take responsibility for this
misunderstanding. I thought that since I kept talking about file
*objects* and *constructors*, people would understand that I was talking
about in-memory objects rather than on-disk files. Mea culpa.

So, let me rephrase that sentence, and hopefully clear up any further
misunderstandings.

There is no sensible use-case for creating a file OBJECT unless it
initially wraps an open file pointer.


You might want to do this:

f = File(path)
if f.exists():
...

This would be an alternative to:

if os.path.exists(path):
...


This principle doesn't just apply to OOP languages. The standard C I/O
library doesn't support creating a file descriptor unless it is a file
descriptor to an open file. open() has the semantics:

"It shall create an open file description that refers to a file and a
file descriptor that refers to that open file description."

http://pubs.opengroup.org/onlinepubs/9699919799/functions/open.html

and there is no corresponding function to create a *closed* file
description. (Because such a thing would be pointless.)


[snip]

--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode humor

2013-05-10 Thread MRAB

On 10/05/2013 17:07, rusi wrote:

On May 10, 8:32 pm, Chris Angelico  wrote:

On Sat, May 11, 2013 at 1:24 AM, Ned Batchelder  wrote:
> On 5/10/2013 11:06 AM, jmfauth wrote:

>> On 8 mai, 15:19, Roy Smith  wrote:

>>> Apropos to any of the myriad unicode threads that have been going on
>>> recently:

>>>http://xkcd.com/1209/

>> --

>> This reflects a lack of understanding of Unicode.

>> jmf

> And this reflects a lack of a sense of humor.  :)

Isn't that a crime in the UK?

ChrisA


The problem with English humour (as against standard humor) is that
its not unicode compliant


British humour includes "double entendre", which is not French-compliant.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Getting ASCII encoding where unicode wanted under Py3k

2013-05-13 Thread MRAB

On 13/05/2013 16:59, Jonathan Hayward wrote:

I have a Py3k script, pasted below. When I run it I get an error about
ASCII codecs that can't handle byte values that are too high.

The error that I am getting is:

|UnicodeEncodeError:  'ascii'  codec can't encode character'\u0161' in position 
1442: ordinal not in range(128)
   args = ('ascii', "Content-Type: text/html\n\n\n\n...ype='submit'>\n 
\n \n", 1442, 1443,'ordinalnot  in  range(128)')
   encoding = 'ascii'
   end = 1443
   object = "Content-Type: text/html\n\n\n\n...ype='submit'>\n 
\n \n"
   reason = 'ordinalnot  in  range(128)'
   start = 1442
   with_traceback = |

(And that was posted to StackOverflow--one shot in the dark answer so far.)

My code is below. What should I be doing differently to be, in the most
immediate sense, calls to '''%(foo)s''' % locals()?


[snip]
The 'print' functions send its output to sys.stdout, which, in your
case, is set up to encode to ASCII for output, but '\u0161' can't be
encoded to ASCII.

Try encoding to UTF-8 instead:

from codecs import getwriter

sys.stdout = getwriter("utf-8")(sys.stdout.buffer)

--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode humor

2013-05-15 Thread MRAB

On 15/05/2013 14:19, Jean-Michel Pichavant wrote:

>> >> This reflects a lack of understanding of Unicode.
>>
>> >> jmf
>>
>> > And this reflects a lack of a sense of humor.  :)
>>
>> Isn't that a crime in the UK?
>>
>> ChrisA
>
> The problem with English humour (as against standard humor) is that
> its not unicode compliant
>
British humour includes "double entendre", which is not
French-compliant.


I didn't get that one. Which possibly confirm MRAB's statement.


It's called "double entendre" in English (using French words, from "à
double entente"), but that isn't correct French ("double sens").

--
http://mail.python.org/mailman/listinfo/python-list


Re: Unicode humor

2013-05-15 Thread MRAB

On 15/05/2013 18:04, Jean-Michel Pichavant wrote:

- Original Message -

On 15/05/2013 14:19, Jean-Michel Pichavant wrote:

This reflects a lack of understanding of Unicode.



jmf



And this reflects a lack of a sense of humor.  :)


Isn't that a crime in the UK?

ChrisA


The problem with English humour (as against standard humor)
is that its not unicode compliant


British humour includes "double entendre", which is not
French-compliant.


I didn't get that one. Which possibly confirm MRAB's statement.


It's called "double entendre" in English (using French words, from
"à double entente"), but that isn't correct French ("double
sens").


Thanks for clarifying, I didn't know "double entendre" had actually a
meaning in english, it's obviously 2 french words but this is the
first time I see them used together.


Occasionally speakers of one language will borrow a word or phrase from
another language and use it in a way a native speaker wouldn't (or even
understand).
--
http://mail.python.org/mailman/listinfo/python-list


Re: How to write fast into a file in python?

2013-05-19 Thread MRAB

On 19/05/2013 04:53, Carlos Nepomuceno wrote:



Date: Sat, 18 May 2013 22:41:32 -0400
From: [email protected]
To: [email protected]
Subject: Re: How to write fast into a file in python?

On 05/18/2013 01:00 PM, Carlos Nepomuceno wrote:

Python really writes '\n\r' on Windows. Just check the files.


That's backwards. '\r\n' on Windows, IF you omit the b in the mode when
creating the file.


Indeed! My mistake just made me find out that Acorn used that inversion on 
Acorn MOS.

According to this[1] (at page 449) the OSNEWL routine outputs '\n\r'.

What the hell those guys were thinking??? :p


Doing it that way saved a few bytes.

Code was something like this:

FFE3.OSASCI CMP #&0D
FFE5BNE OSWRCH
FFE7.OSNEWL LDA #&0A
FFE9JSR OSWRCH
FFECLDA #&0D
FFEE.OSWRCH ...

This means that the contents of the accumulator would always be
preserved by a call to OSASCI.


"OSNEWL
This call issues an LF CR (line feed, carriage return) to the currently selected
output stream. The routine is entered at &FFE7."

[1] http://regregex.bbcmicro.net/BPlusUserGuide-1.07.pdf




--
http://mail.python.org/mailman/listinfo/python-list


Re: Accessing Json data (I think I am nearly there) complete beginner

2013-05-23 Thread MRAB

On 23/05/2013 17:09, Andrew Edwards-Adams wrote:

Hey guys
I think its worth stating that I have been trying to code for 1 week.
I am trying to access some Json data. My innitial code was the below:

"import mechanize
import urllib
import re

def getData():
 post_url = 
"http://www.tweetnaps.co.uk/leaderboards/leaderboard_json/all_time";
 browser = mechanize.Browser()
 browser.set_handle_robots(False)
 browser.addheaders = [('User-agent', 'Firefox')]

 #These are the parameters you've got from checking with the aforementioned 
tools
 parameters = {'page' : '1',
   'rp' : '10',
   'sortname' : 'total_pl',
   'sortorder' : 'desc'}
 #Encode the parameters
 data = urllib.urlencode(parameters)
 trans_array = browser.open(post_url,data).read().decode('UTF-8')

 #print trans_array

 myfile = open("test.txt", "w")
 myfile.write(trans_array)
 myfile.close()

getData()

raw_input("Complete")"

I was recommended to use the following code to access the Json data directly, 
however I cannot get it to return anything. I think the guy that recommended me 
this method must have got something wrong? Or perhaps I am simply incompetent:

import mechanize
import urllib
import json
def getData():
 post_url = 
"http://www.tweetnaps.co.uk/leaderboards/leaderboard_json/current_week";
 browser = mechanize.Browser()
 browser.set_handle_robots(False)
 browser.addheaders = [('User-agent', 'Firefox')]

 #These are the parameters you've got from checking with the aforementioned 
tools
 parameters = {'page' : '1',
   'rp' : '50',
   'sortname' : 'total_pl',
   'sortorder' : 'desc'
  }
 #Encode the parameters
 data = urllib.urlencode(parameters)
 trans_array = browser.open(post_url,data).read().decode('UTF-8')

 text1 = json.loads(trans_array)
 print text1['rows'][0]['id']  #play around with these values to access 
different data..

getData()

He told me to "#play around with these values to access different data.." 
really cant get anything out of this, any ideas?

Many thanks AEA


I've just tried it. It prints "1048".
--
http://mail.python.org/mailman/listinfo/python-list


Re: Cutting a deck of cards

2013-05-26 Thread MRAB

On 26/05/2013 18:52, RVic wrote:

Suppose I have a deck of cards, and I shuffle them

import random
cards = []
decks = 6
cards = list(range(13 * 4 * decks))
random.shuffle(cards)

So now I have an array of cards. I would like to cut these cards at some random 
point (between 1 and 13 * 4 * decks - 1, moving the lower half of that to the 
top half of the cards array.

For some reason, I can't see how this can be done (I know that it must be a 
simple line or two in Python, but I am really stuck here). Anyone have any 
direction they can give me on this? Thanks, RVic, python newbie


The list from its start up to, but excluding, index 'i' is cards[ : i],
and the list from index 'i' to its end is cards[i : ].

Now concatenate them those slices.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Encodign issue in Python 3.3.1 (once again)

2013-05-28 Thread MRAB

On 28/05/2013 17:00, Νίκος Γκρ33κ wrote:

I do not know here to find connections.py Michael.

But i do not understand since iam suing the following 2 statements, why a 
unicode error remains.

#needed line, script does *not* work without it
sys.stdout = os.fdopen(1, 'w', encoding='utf-8')

# connect to database
con = pymysql.connect( db = 'pelatologio', host = 'localhost', user = 'myself', 
passwd = 'mypass', init_command='SET NAMES UTF8' )
cur = con.cursor()

Shall i chnage the connector form 'pymysql' => 'MySQLdb' ?


A quick look at the documentation tells me that the charset can be
specified in the 'connect' call, something like this:

con = pymysql.connect( db = 'pelatologio', host = 'localhost', user = 
'myself', passwd = 'mypass', init_command='SET NAMES UTF8', charset = 
'utf-8' )


--
http://mail.python.org/mailman/listinfo/python-list


Re: The state of pySerial

2013-05-29 Thread MRAB

On 29/05/2013 22:38, Terry Jan Reedy wrote:

On 5/29/2013 4:00 PM, William Ray Wing wrote:

On May 29, 2013, at 2:23 PM, Ma Xiaojun  wrote:


Hi, all.

pySerial is probably "the solution" for serial port programming.
Physical serial port is dead on PC but USB-to-Serial give it a second
life. Serial port stuff won't interest end users at all. But it is
still used in the EE world and so on. Arduino uses it to upload
programs. Sensors may use serial port to communicate with PC. GSM
Modem also uses serial port to communicate with PC.

Unforunately, pySerial project doesn't seem to have a good state. I
find pySerial + Python 3.3 broken on my machine (Python 2.7 is OK) .
There are unanswered outstanding bugs, PyPI page has 2.6 while SF
homepage still gives 2.5.

Any idea?
--
http://mail.python.org/mailman/listinfo/python-list


Let me add another vote/request for pySerial support.  I've been using it with 
python 2.7 on OS-X, unaware that there wasn't a path forward to python 3.x.  If 
an external sensor absolutely positively has to be readable, then RS-232 is the 
only way to go.  USB interfaces can and do lock up if recovery from a power 
failure puts power on the external side before the computer has finished 
initializing the CPU side.  RS-232, bless its primitive heart, could care less.


Then 'someone' should ask the author his intentions and offer to help or
take over.


This page:

http://pyserial.sourceforge.net/pyserial.html#requirements

says:

"Python 2.3 or newer, including Python 3.x"


I did some RS-232 interfacing in the  1980s, and once past the fiddly
start/stop/parity bit, baud rate, and wiring issues, I had a program run
connected to multiple machines for years with no more interface problems.



--
http://mail.python.org/mailman/listinfo/python-list


Re: User Input

2013-05-30 Thread MRAB

On 30/05/2013 12:48, Eternaltheft wrote:

On Thursday, May 30, 2013 7:33:41 PM UTC+8, Eternaltheft wrote:

Hi, I'm having trouble oh how prompt the user to enter a file name
and how to set up conditions. For example, if there's no file name
input by the user, a default is returned


Thanks for such a fast reply! and no im not using raw input, im just
using input. does raw_input work on python 3?


In Python 2 it's called "raw_input" and in Python 3 it's called "input".

Python 2 does have a function called "input", but it's not recommended
(it's dangerous because it's equivalent to "eval(raw_input())", which
will evaluate _whatever_ is entered).
--
http://mail.python.org/mailman/listinfo/python-list


Re: The state of pySerial

2013-05-30 Thread MRAB

On 30/05/2013 02:32, Ma Xiaojun wrote:

I've already mailed the author, waiting for reply.

For Windows people, downloading a exe get you pySerial 2.5, which
list_ports and miniterm feature seems not included. To use 2.6,
download the tar.gz and use standard "setup.py install" to install it
(assume you have .py associated) . There is no C compiling involved in
the installation process.

For whether Python 3.3 is supported or not. I observed something like:
http://paste.ubuntu.com/5715275/ .

miniterm works for Python 3.3 at this time.


The problem there is that 'desc' is a bytestring, but the regex pattern
can match only a Unicode string (Python 3 doesn't let you mix
bytestrings and Unicode string like a Python 2).

The simplest fix would probably be to decode 'desc' to Unicode.
--
http://mail.python.org/mailman/listinfo/python-list


Re: How clean/elegant is Python's syntax?

2013-05-30 Thread MRAB

On 30/05/2013 19:44, Chris Angelico wrote:

On Fri, May 31, 2013 at 4:36 AM, Ian Kelly  wrote:

On Wed, May 29, 2013 at 8:49 PM, rusi  wrote:

On May 30, 6:14 am, Ma Xiaojun  wrote:

What interest me is a one liner:
print '\n'.join(['\t'.join(['%d*%d=%d' % (j,i,i*j) for i in
range(1,10)]) for j in range(1,10)])


Ha,Ha! The join method is one of the (for me) ugly features of python.
You can sweep it under the carpet with a one-line join function and
then write clean and pretty code:

#joinwith
def joinw(l,sep): return sep.join(l)


I don't object to changing the join method (one of the more
shoe-horned string methods) back into a function, but to my eyes
you've got the arguments backward.  It should be:

def join(sep, iterable): return sep.join(iterable)


Trouble is, it makes some sense either way. I often put the larger
argument first - for instance, I would write 123412341324*5 rather
than the other way around - and in this instance, it hardly seems as
clear-cut as you imply. But the function can't be written to take them
in either order, because strings are iterable too. (And functions that
take args either way around aren't better than those that make a
decision.)


And additional argument (pun not intended) for putting sep second is
that you can give it a default value:

   def join(iterable, sep=""): return sep.join(iterable)

--
http://mail.python.org/mailman/listinfo/python-list


Re: lstrip problem - beginner question

2013-06-04 Thread MRAB

On 04/06/2013 16:21, mstagliamonte wrote:

Hi everyone,

I am a beginner in python and trying to find my way through... :)

I am writing a script to get numbers from the headers of a text file.

If the header is something like:
h01 = ('>scaffold_1')
I just use:
h01.lstrip('>scaffold_')
and this returns me '1'

But, if the header is:
h02: ('>contig-100_0')
if I use:
h02.lstrip('>contig-100_')
this returns me with: ''
...basically nothing. What surprises me is that if I do in this other way:
h02b = h02.lstrip('>contig-100')
I get h02b = ('_1')
and subsequently:
h02b.lstrip('_')
returns me with: '1' which is what I wanted!

Why is this happening? What am I missing?


The methods 'lstrip', 'rstrip' and 'strip' don't strip a string, they
strip characters.

You should think of the argument as a set of characters to be removed.

This code:

h01.lstrip('>scaffold_')

will return the result of stripping the characters '>', '_', 'a', 'c',
'd', 'f', 'l', 'o' and 's' from the left-hand end of h01.

A simpler example:

>>> 'xyyxyabc'.lstrip('xy')
'abc'

It strips the characters 'x' and 'y' from the string, not the string
'xy' as such.

They are that way because they have been in Python for a long time,
long before sets and such like were added to the language.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-05 Thread MRAB

On 05/06/2013 06:40, Michael Torrie wrote:

On 06/04/2013 10:15 PM, Νικόλαος Κούρας wrote:

One of my Greek filenames is "Ευχή του Ιησού.mp3". Just a Greek
filename with spaces. Is there a problem when a filename contain both
english and greek letters? Isn't it still a unicode string?

All i did in my CentOS was 'mv "Euxi tou Ihsou.mp3" "Ευχή του
Ιησού.mp3"

and the displayed filename after 'ls -l' returned was:

is -rw-r--r-- 1 nikos nikos 3511233 Jun 4 14:11 \305\365\367\336\
\364\357\365\ \311\347\363\357\375.mp3

There is no way at all to check the charset used to store it in hdd?
It should be UTF-8, but it doesn't look like it. Is there some linxu
command or some python command that will print out the actual
encoding of '\305\365\367\336\ \364\357\365\
\311\347\363\357\375.mp3' ?


I can see that you are starting to understand things. I can't answer
your question (don't know the answer), but you're correct about one
thing.  A filename is just a sequence of bytes.  We'd hope it would be
utf-8, but it could be anything.  Even worse, it's not possible to tell
from a byte stream what encoding it is unless we just try one and see
what happens.  Text editors, for example, have to either make a guess
(utf-8 is a good one these days), or ask, or try to read from the first
line of the file using ascii and see if there's a source code character
set command to give it an idea.


From the previous posts I guessed that the filename might be encoded
using ISO-8859-7:

>>> s = b"\305\365\367\336\ \364\357\365\ \311\347\363\357\375.mp3"
>>> s.decode("iso-8859-7")
'Ευχή\\ του\\ Ιησού.mp3'

Yes, that looks the same.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-05 Thread MRAB

On 05/06/2013 18:43, Νικόλαος Κούρας wrote:

Τη Τετάρτη, 5 Ιουνίου 2013 8:56:36 π.μ. UTC+3, ο χρήστης Steven D'Aprano έγραψε:

Somehow, I don't know how because I didn't see it happen, you have one or
more files in that directory where the file name as bytes is invalid when
decoded as UTF-8, but your system is set to use UTF-8. So to fix this you
need to rename the file using some tool that doesn't care quite so much
about encodings. Use the bash command line to rename each file in turn
until the problem goes away.

But renaming ia hsell access like 'mv 'Euxi tou Ihsou.mp3' 'Ευχή του Ιησου.mp3' 
leade to that unknown encoding of this bytestream '\305\365\367\336\ 
\364\357\365\ \311\347\363\357\375.mp3'

But please tell me Steven what linux tool you think it can encode the weird 
filename to proper 'Ευχή του Ιησου.mp3' utf-8?

or we cna write a script as i suggested to decode back the bytestream using all 
sorts of available decode charsets boiling down to the original greek letters.


Using Python, I think you could get the filenames using os.listdir,
passing the directory name as a bytestring so that it'll return the
names as bytestrings.

Then, for each name, you could decode from its current encoding and
encode to UTF-8 and rename the file, passing the old and new paths to
os.rename as bytestrings.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-06 Thread MRAB

On 06/06/2013 04:43, Νικόλαος Κούρας wrote:

Τη Τετάρτη, 5 Ιουνίου 2013 9:43:18 μ.μ. UTC+3, ο χρήστης Νικόλαος Κούρας έγραψε:
> Τη Τετάρτη, 5 Ιουνίου 2013 9:32:15 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:
>
> > On 05/06/2013 18:43, οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ wrote:
>
> >
>
> > > οΏ½οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½, 5 οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ 2013 8:56:36 
οΏ½.οΏ½. UTC+3, οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ Steven D'Aprano οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½:
>
> >
>
> > >
>
> >
>
> > > Somehow, I don't know how because I didn't see it happen, you have one or
>
> >
>
> > > more files in that directory where the file name as bytes is invalid when
>
> >
>
> > > decoded as UTF-8, but your system is set to use UTF-8. So to fix this you
>
> >
>
> > > need to rename the file using some tool that doesn't care quite so much
>
> >
>
> > > about encodings. Use the bash command line to rename each file in turn
>
> >
>
> > > until the problem goes away.
>
> >
>
> > >
>
> >
>
> ' leade to that unknown encoding of this bytestream '\305\365\367\336\ 
\364\357\365\ \311\347\363\357\375.mp3'
>
> >
>
> > >
>
> >
>
> > > But please tell me Steven what linux tool you think it can encode the 
weird filename to proper 'οΏ½οΏ½οΏ½οΏ½ οΏ½οΏ½οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½.mp3' utf-8?
>
> >
>
> > >
>
> >
>
> > > or we cna write a script as i suggested to decode back the bytestream 
using all sorts of available decode charsets boiling down to the original greek letters.
>
> >
>
> > >
>
> >
>
>
>
>
>
> Actually you were correct i was typing greek and is aw the fileneme here in 
gogole groups as:
>
>
>
> > > But renaming ia hsell access like 'mv 'Euxi tou Ihsou.mp3' 'οΏ½οΏ½οΏ½οΏ½ 
οΏ½οΏ½οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½.mp3
>
>
>
> so maybe the filenames have to be decoded to greek-iso but then agian the 
contain both greek letters but their extension are in english chars like '.mp3'
>
>
>
>
>
> > Using Python, I think you could get the filenames using os.listdir,
>
> > passing the directory name as a bytestring so that it'll return the
>
> > names as bytestrings.
>
>
>
>
>
> > Then, for each name, you could decode from its current encoding and
>
> > encode to UTF-8 and rename the file, passing the old and new paths to
>
> > os.rename as bytestrings.
>
>
>
> Iam not sure i follow:
>
>
>
> Change this:
>
>
>
> # Compute a set of current fullpaths
>
> fullpaths = set()
>
> path = "/home/nikos/public_html/data/apps/"
>
>
>
> for root, dirs, files in os.walk(path):
>
>for fullpath in files:
>
>fullpaths.add( os.path.join(root, fullpath) )
>
>
>
>
>
> to what to make the full url readable by files.py?

MRAB can you please explain in more clarity your idea of solution?
I was suggesting a way to rename the files so that their names are 
encoded in UTF-8 (they appear to be encoded in ISO-8859-7).


You MUST TEST IT thoroughly first, of course, before trying it on the 
actual files.


It could go something like this:


import os

# Give the path as a bytestring so that we'll get the names as bytestrings.
root_folder = b"/home/nikos/public_html/data/apps/"

# Setting TESTING to True will make it print out what renamings it will 
do, but

# not actually do them.
TESTING = True

# Walk through the files.
for root, dirs, files in os.walk(root_folder):
for name in files:
try:
# Is this name encoded in UTF-8?
name.decode("utf-8")
except UnicodeDecodeError:
# Decoding from UTF- failed, which means that the name is 
not valid

# UTF-8.

# It appears (from elsewhere) that the names are encoded in
# ISO-8859-7, so decode from that and re-encode to UTF-8.
new_name = name.decode("iso-8859-7").encode("utf-8")

old_path = os.path.join(root, name)
new_path = os.path.join(root, new_name)
if TESTING:
print("Will rename {!r} to {!r}".format(old_path, 
new_path))

else:
print("Renaming {!r} to {!r}".format(old_path, new_path))
os.rename(old_path, new_path)

--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-06 Thread MRAB

On 06/06/2013 13:04, Νικόλαος Κούρας wrote:

First of all thank you for helping me MRAB.
After make some alternation to your code ia have this:


# Give the path as a bytestring so that we'll get the filenames as bytestrings
path = b"/home/nikos/public_html/data/apps/"

# Setting TESTING to True will make it print out what renamings it will do, but 
not actually do them
TESTING = True

# Walk through the files.
for root, dirs, files in os.walk( path ):
for filename in files:
try:
# Is this name encoded in UTF-8?
filename.decode('utf-8')
except UnicodeDecodeError:
# Decoding from UTF-8 failed, which means that the name 
is not valid UTF-8
# It appears that the filenames are encoded in 
ISO-8859-7, so decode from that and re-encode to UTF-8
new_filename = 
filename.decode('iso-8859-7').encode('utf-8')

old_path = os.path.join(root, filename)
new_path = os.path.join(root, new_filename)
if TESTING:
print( '''Will rename {!r} ---> 
{!r}'''.format( old_path, new_path ) )
else:
print( '''Renaming {!r} ---> 
{!r}'''.format( old_path, new_path ) )
os.rename( old_path, new_path )
sys.exit(0)
-

and the output can be seen here: http://superhost.gr/cgi-bin/files.py

We are in test mode so i dont know if when renaming actually take place what 
the encodings will be.

Shall i switch off test mode and try it for real?


The first one is '/home/nikos/public_html/data/apps/Ευχή του Ιησού.mp3'.

The second one is '/home/nikos/public_html/data/apps/Σκέψου έναν 
αριθμό.exe'.


These names are currently encoded in ISO-8859-7, but will be encoded in
UTF-8 if you turn off test mode.

If you're happy for that change to happen, then go ahead.
--
http://mail.python.org/mailman/listinfo/python-list


Re: How to store a variable when a script is executing for next time execution?

2013-06-06 Thread MRAB

On 06/06/2013 16:37, Chris Angelico wrote:

On Thu, Jun 6, 2013 at 10:14 PM, Dave Angel  wrote:

If you're planning on having the files densely populated (meaning no gaps in
the numbering), then you could use a binary search to find the last one.
Standard algorithm would converge with 10 existence checks if you have a
limit of 1000 files.


Or, if you can dedicate a directory to those files, you could go even simpler:

dataFile = open('filename0.0.%d.json'%len(os.listdir()), 'w')

The number of files currently existing equals the number of the next file.


Assuming no gaps.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-06 Thread MRAB

  
  
On 06/06/2013 19:13, Νικόλαος Κούρας
  wrote:


  Τη Πέμπτη, 6 Ιουνίου 2013 3:50:52 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:

> If you're happy for that change to happen, then go ahead.

I have made some modifications to the code you provided me but i think something that doesnt accur to me needs fixing.


for example i switched:

# Give the path as a bytestring so that we'll get the filenames as bytestrings 
path = b"/home/nikos/public_html/data/apps/" 

# Walk through the files. 
for root, dirs, files in os.walk( path ): 
for filename in files: 

to:

# Give the path as a bytestring so that we'll get the filenames as bytestrings
path = os.listdir( b'/home/nikos/public_html/data/apps/' )


os.listdir returns a list of the names of the objects in the given
directory.


  # iterate over all filenames in the apps directory


Exactly, all the names.


  for fullpath in path
	# Grabbing just the filename from path


The name is a bytestring. Note, name, NOT full path.

The following line will fail because the name is a bytestring,
and you can't mix bytestrings with Unicode strings:

  	filename = fullpath.replace( '/home/nikos/public_html/data/apps/', '' )

       ^ bytestring   
    ^ Unicode string    
    ^ Unicode string

  I dont know if it has the same effect:
Here is the the whole snippet:


=
# Give the path as a bytestring so that we'll get the filenames as bytestrings
path = os.listdir( b'/home/nikos/public_html/data/apps/' )

# iterate over all filenames in the apps directory
for fullpath in path
	# Grabbing just the filename from path
	filename = fullpath.replace( '/home/nikos/public_html/data/apps/', '' )
	try: 
		# Is this name encoded in utf-8? 
		filename.decode('utf-8') 
	except UnicodeDecodeError: 
		# Decoding from UTF-8 failed, which means that the name is not valid utf-8
			
		# It appears that this filename is encoded in greek-iso, so decode from that and re-encode to utf-8
		new_filename = filename.decode('iso-8859-7').encode('utf-8') 
			
		# rename filename form greek bytestream-> utf-8 bytestream
		old_path = os.path.join(root, filename) 
		new_path = os.path.join(root, new_filename)
		os.rename( old_path, new_path )


#
# Compute a set of current fullpaths 
path = os.listdir( '/home/nikos/public_html/data/apps/' )

# Load'em
for fullpath in path:
	try:
		# Check the presence of a file against the database and insert if it doesn't exist
		cur.execute('''SELECT url FROM files WHERE url = "" (fullpath,) )
		data = ""#URL is unique, so should only be one
		
		if not data:
			# First time for file; primary key is automatic, hit is defaulted 
			cur.execute('''INSERT INTO files (url, host, lastvisit) VALUES (%s, %s, %s)''', (fullpath, host, lastvisit) )
	except pymysql.ProgrammingError as e:
		print( repr(e) )
==

The error is:
[Thu Jun 06 21:10:23 2013] [error] [client 79.103.41.173]   File "files.py", line 64
[Thu Jun 06 21:10:23 2013] [error] [client 79.103.41.173] for fullpath in path
[Thu Jun 06 21:10:23 2013] [error] [client 79.103.41.173]^
[Thu Jun 06 21:10:23 2013] [error] [client 79.103.41.173] SyntaxError: invalid syntax


Doesn't os.listdir( ...) returns a list with all filenames?

But then again when replacing take place to shert the fullpath to just the filane i think it doesn't not work because the os.listdir was opened as bytestring and not as a string

What am i doing wrong?


You're changing things without checking what they do!

  

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-06 Thread MRAB

On 06/06/2013 22:07, Lele Gaifax wrote:

Νικόλαος Κούρας  writes:


Tahnks here is what i have up until now with many corrections.


I'm afraid many more are needed :-)


...
# rename filename form greek bytestreams --> utf-8 bytestreams
old_path = b'/home/nikos/public_html/data/apps/' + b'filename')
new_path = b'/home/nikos/public_html/data/apps/' + 
b'new_filename')
os.rename( old_path, new_path )


a) there are two syntax errors, you have spurious close brackets there
b) you are basically assigning *constant* expressions to both variables,
most probably not what you meant

Yet again, he's changed things unnecessarily, and the code was meant 
only as a one-time

fix to correct the encoding of some filenames. :-(
--
http://mail.python.org/mailman/listinfo/python-list


Re: trigger at TDM/2 only

2013-06-06 Thread MRAB

On 07/06/2013 01:03, cerr wrote:

Hi,

I have a process that I can trigger only at a certain time. Assume I have a TDM 
period of 10min, that means, I can only fire my trigger at the 5th minute of 
every 10min cycle i.e. at XX:05, XX:15, XX:25... For hat I came up with 
following algorithm which oly leaves the waiting while loop if minute % TDM/2 
is 0 but not if minute % TDM is 0:
min = datetime.datetime.now().timetuple().tm_hour*60 + 
datetime.datetime.now().timetuple().tm_min
while not (min%tdm_timeslot != 0 ^ min%(int(tdm_timeslot/2)) != 0):
time.sleep(10)
logger.debug("WAIT 
"+str(datetime.datetime.now().timetuple().tm_hour*60 + 
datetime.datetime.now().timetuple().tm_min))
logger.debug(str(min%(int(tdm_timeslot/2)))+" - 
"+str(min%tdm_timeslot))
min = datetime.datetime.now().timetuple().tm_hour*60 + 
datetime.datetime.now().timetuple().tm_min
logger.debug("RUN UPDATE CHECK...")

But weird enough, the output I get is something like this:
I would expect my while to exit the loop as soon as the minute turns 1435... 
why is it staying in? What am I doing wrong here?

WAIT 1434
3 - 3
WAIT 1434
4 - 4
WAIT 1434
4 - 4
WAIT 1434
4 - 4
WAIT 1434
4 - 4
WAIT 1434
4 - 4
WAIT 1435
4 - 4
WAIT 1435
0 - 5
WAIT 1435
0 - 5
WAIT 1435
0 - 5
WAIT 1435
0 - 5
WAIT 1435
0 - 5
WAIT 1436
0 - 5
RUN UPDATE CHECK...


Possibly it's due to operator precedence. The bitwise operators &, |
and ^ have a higher precedence than comparisons such as !=.

A better condition might be:

min % tdm_timeslot != tdm_timeslot // 2

or, better yet, work out how long before the next trigger time and then
sleep until then.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Problems with serial port interface

2013-06-07 Thread MRAB

On 07/06/2013 11:17, [email protected] wrote:

Sorry for my quote,
but do you have any suggestion?

Il giorno martedì 4 giugno 2013 23:25:21 UTC+2, [email protected] ha 
scritto:

Hi,

i'm programming in python for the first time: i want to create a serial port 
reader. I'm using python3.3 and pyQT4; i'm using also pyserial.

Below a snippet of the code:


class CReader(QThread):
def start(self, ser, priority = QThread.InheritPriority):
self.ser = ser
QThread.start(self, priority)
self._isRunning = True
self.numData=0;

def run(self):
print("Enter Creader")
while True:
if self._isRunning:
try:
data = self.ser.read(self.numData)
n = self.ser.inWaiting()
if n:
data = self.ser.read(n)
self.emit(SIGNAL("newData(QString)"), 
data.decode('cp1252', 'ignore'))
self.ser.flushInput()
except:
pass
else:
return

def stop(self):
self._isRunning = False
self.wait()

This code seems work well, but i have problems in this test case:

+baud rate:19200
+8/n/1
+data transmitted: 1 byte every 5ms

After 30seconds (more or less) the program crashes: seems a buffer problem, but 
i'm not really sure.

What's wrong?


Using a "bare except" like this:

try:
...
except:
...

is virtually always a bad idea. The only time I'd ever do that would
be, say, to catch something, print a message, and then re-raise it:

try:
...
except:
print("Something went wrong!")
raise

Even then, catching Exception would be better than a bare except. A
bare except will catch _every_ exception, including NameError (which
would mean that it can't find a name, possibly due to a spelling error).

A bare except with pass, like you have, is _never_ a good idea. Python
might be trying to complain about a problem, but you're preventing it
from doing so.

Try removing the try...except: pass and let Python tell you if it has a
problem.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-07 Thread MRAB

On 07/06/2013 12:53, Νικόλαος Κούρας wrote:
[snip]


#
# Collect filenames of the path dir as bytes
greek_filenames = os.listdir( b'/home/nikos/public_html/data/apps/' )

for filename in greek_filenames:
# Compute 'path/to/filename' in bytes
greek_path = b'/home/nikos/public_html/data/apps/' + b'filename'
try:


This is a worse way of doing it because the ISO-8859-7 encoding has 1
byte per codepoint, meaning that it's more 'tolerant' (if that's the
word) of errors. A sequence of bytes that is actually UTF-8 can be
decoded as ISO-8859-7, giving gibberish.

UTF-8 is less tolerant, and it's the encoding that ideally you should
be using everywhere, so it's better to assume UTF-8 and, if it fails, 
try ISO-8859-7 and then rename so that any names that were ISO-8859-7

will be converted to UTF-8.

That's the reason I did it that way in the code I posted, but, yet
again, you've changed it without understanding why!


filepath = greek_path.decode('iso-8859-7')

# Rename current filename from greek bytes --> utf-8 bytes
os.rename( greek_path, filepath.encode('utf-8') )
except UnicodeDecodeError:
# Since its not a greek bytestring then its a proper utf8 
bytestring
filepath = greek_path.decode('utf-8')


[snip]

--
http://mail.python.org/mailman/listinfo/python-list


Re: Errin when executing a cgi script that sets a cookie in the browser

2013-06-07 Thread MRAB

On 07/06/2013 08:51, Νικόλαος Κούρας wrote:

Finally no suexec erros any more after chown all log files to nobody:nobody and 
thei corresponding paths.

Now the error has been transformed to:


[Fri Jun 07 10:48:47 2013] [error] [client 79.103.41.173] (2)No such file or 
directory: exec of '/home/nikos/public_html/cgi-bin/koukos.py' failed
[Fri Jun 07 10:48:47 2013] [error] [client 79.103.41.173] Premature end of 
script headers: koukos.py
[Fri Jun 07 10:48:47 2013] [error] [client 79.103.41.173] File does not exist: 
/home/nikos/public_html/500.shtml


but from interpretor view:

[email protected] [~/www/cgi-bin]# python koukos.py
Set-Cookie: nikos=admin; expires=Mon, 02 Jun 2014 07:50:18 GMT; Path=/
Content-type: text/html; charset=utf-8

ΑΠΟ ΔΩ ΚΑΙ ΣΤΟ ΕΞΗΣ ΔΕΝ ΣΕ ΕΙΔΑ, ΔΕΝ ΣΕ ΞΕΡΩ, ΔΕΝ ΣΕ ΑΚΟΥΣΑ! ΘΑ ΕΙΣΑΙ ΠΛΕΟΝ Ο 
ΑΟΡΑΤΟΣ ΕΠΙΣΚΕΠΤΗΣ!!



(2)No such file or directory: exec of 
'/home/nikos/public_html/cgi-bin/koukos.py' failed

 Can find what? koukos.py is there inside the cg-bin dir with 755 perms.


It's looking for '/home/nikos/public_html/cgi-bin/koukos.py'.

Have a look in '/home/nikos/public_html/cgi-bin'. Is 'koukos.py' in
there?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-07 Thread MRAB

On 07/06/2013 20:31, Zero Piraeus wrote:

:

On 7 June 2013 14:52, Νικόλαος Κούρας  wrote:
File "/home/nikos/public_html/cgi-bin/files.py", line 81

[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] if( flag == 
'greek' )
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173]   
  ^
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] SyntaxError: invalid 
syntax
[Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] Premature end of 
script headers: files.py
---
i dont know why that if statement errors.


Oh for f... READ SOME DOCUMENTATION, FOR THE LOVE OF BOB!!! READ YOUR
OWN EFFING CODE!

Look at this:

   http://docs.python.org/2/tutorial/controlflow.html

Read it now? Of course not. Go away and read it.

Now have you read it? GO AND READ IT.

What does an if statement end with? Hint: yep, that's it.


Have you noticed how the line in the traceback doesn't match the line
in the post?
--
http://mail.python.org/mailman/listinfo/python-list


Re: Errin when executing a cgi script that sets a cookie in the browser

2013-06-07 Thread MRAB

On 07/06/2013 19:24, Νικόλαος Κούρας wrote:

Τη Παρασκευή, 7 Ιουνίου 2013 5:32:09 μ.μ. UTC+3, ο χρήστης MRAB έγραψε:

Can find what? koukos.py is there inside the cg-bin dir with 755 perms.



It's looking for '/home/nikos/public_html/cgi-bin/koukos.py'.


Its looking for its self?!?!


Have a look in '/home/nikos/public_html/cgi-bin'. Is 'koukos.py' in
there?

Yes it is.


[email protected] [~/www/cgi-bin]# ls -l
total 56
drwxr-xr-x 2 nikos nikos   4096 Jun  6 20:29 ./
drwxr-x--- 4 nikos nobody  4096 Jun  5 11:32 ../
-rwxr-xr-x 1 nikos nikos   1199 Apr 25 15:33 convert.py*
-rwxr-xr-x 1 nikos nikos   5434 Jun  7 14:51 files.py*
-rw-r--r-- 1 nikos nikos170 May 30 15:18 .htaccess
-rwxr-xr-x 1 nikos nikos   1160 Jun  6 06:27 koukos.py*
-rwxr-xr-x 1 nikos nikos   9356 Jun  6 09:13 metrites.py*
-rwxr-xr-x 1 nikos nikos  13512 Jun  6 09:13 pelatologio.py*
[email protected] [~/www/cgi-bin]#


The prompt says "~/www/cgi-bin".

Is that the same as "/home/nikos/public_html/cgi-bin"?

Try:

ls -l /home/nikos/public_html/cgi-bin
--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-08 Thread MRAB

On 08/06/2013 07:49, Νικόλαος Κούρας wrote:

Τη Σάββατο, 8 Ιουνίου 2013 5:52:22 π.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε:

On 07Jun2013 11:52, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= 
 wrote:

| [email protected] [~/www/cgi-bin]# [Fri Jun 07 21:49:33 2013] [error] [client 
79.103.41.173]   File "/home/nikos/public_html/cgi-bin/files.py", line 81

| [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] if( flag == 
'greek' )

| [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] 
^

| [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] SyntaxError: 
invalid syntax

| [Fri Jun 07 21:49:33 2013] [error] [client 79.103.41.173] Premature end of 
script headers: files.py

| ---

| i dont know why that if statement errors.



Python statements that continue (if, while, try etc) end in a colon, so:


Oh iam very sorry.
Oh my God i cant beleive i missed a colon *again*:

I have corrected this:

#
# Collect filenames of the path dir as bytes
filename_bytes = os.listdir( b'/home/nikos/public_html/data/apps/' )

for filename in filename_bytes:
# Compute 'path/to/filename' into bytes
filepath_bytes = b'/home/nikos/public_html/data/apps/' + b'filename'
flag = False

try:
# Assume current file is utf8 encoded
filepath = filepath_bytes.decode('utf-8')
flag = 'utf8'
except UnicodeDecodeError:
try:
# Since current filename is not utf8 encoded then it 
has to be greek-iso encoded
filepath = filepath_bytes.decode('iso-8859-7')
flag = 'greek'
except UnicodeDecodeError:
print( '''I give up! File name is unreadable!''' )

if flag == 'greek':
# Rename filename from greek bytes --> utf-8 bytes
os.rename( filepath_bytes, filepath.encode('utf-8') )
==

Now everythitng were supposed to work but instead iam getting this surrogate 
error once more.
What is this surrogate thing?

Since i make use of error cathcing and handling like 'except 
UnicodeDecodeError:'

then it utf8's decode fails for some reason, it should leave that file alone 
and try the next file?
try:
# Assume current file is utf8 encoded
filepath = filepath_bytes.decode('utf-8')
flag = 'utf8'
except UnicodeDecodeError:

This is what it supposed to do, correct?

==
[Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173]   File 
"/home/nikos/public_html/cgi-bin/files.py", line 94, in 
[Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173] 
cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )
[Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173]   File 
"/usr/local/lib/python3.3/site-packages/PyMySQL3-0.5-py3.3.egg/pymysql/cursors.py",
 line 108, in execute
[Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173] query = 
query.encode(charset)
[Sat Jun 08 09:39:34 2013] [error] [client 79.103.41.173] UnicodeEncodeError: 
'utf-8' codec can't encode character '\\udcce' in position 35: surrogates not 
allowed


Look at the traceback.

It says that the exception was raised by:

query = query.encode(charset)

which was called by:

cur.execute('''SELECT url FROM files WHERE url = %s''', (filename,) )

But what is 'filename'? And what has it to do with the first code
snippet? Does the traceback have _anything_ to do with the first code
snippet?

--
http://mail.python.org/mailman/listinfo/python-list


Re: Changing filenames from Greeklish => Greek (subprocess complain)

2013-06-08 Thread MRAB

On 08/06/2013 17:53, Νικόλαος Κούρας wrote:

Sorry for th delay guys, was busy with other thigns today and i am still 
reading your resposes, still ahvent rewad them all just Cameron's:

Here is what i have now following Cameron's advices:


#
# Collect filenames of the path directory as bytes
path = b'/home/nikos/public_html/data/apps/'
filenames_bytes = os.listdir( path )

for filename_bytes in filenames_bytes:
try:
filename = filename_bytes.decode('utf-8)
except UnicodeDecodeError:
# Since its not a utf8 bytestring then its for sure a greek 
bytestring

# Prepare arguments for rename to happen
utf8_filename = filename_bytes.encode('utf-8')
greek_filename = filename_bytes.encode('iso-8859-7')

utf8_path = path + utf8_filename
greek_path = path + greek_filename

# Rename current filename from greek bytes --> utf8 bytes
os.rename( greek_path, utf8_path )
==

I know this is wrong though.


Yet you did it anyway!


Since filename_bytes is the current filename encoded as utf8 or greek-iso
then i cannot just *encode* what is already encoded by doing this:

utf8_filename = filename_bytes.encode('utf-8')
greek_filename = filename_bytes.encode('iso-8859-7')


Try reading and understanding the code I originally posted.

--
http://mail.python.org/mailman/listinfo/python-list


Re: A certainl part of an if() structure never gets executed.

2013-06-11 Thread MRAB

On 11/06/2013 21:20, Νικόλαος Κούρας wrote:

[code]
if not re.search( '=', name ) and not re.search( '=', month ) 
and not re.search( '=', year ):
cur.execute( '''SELECT * FROM works WHERE clientsID = 
(SELECT id FROM clients WHERE name = %s) and MONTH(lastvisit) = %s and 
YEAR(lastvisit) = %s ORDER BY lastvisit ASC''', (name, month, year) )
elif not re.search( '=', month ) and not re.search( '=', year ):
cur.execute( '''SELECT * FROM works WHERE 
MONTH(lastvisit) = %s and YEAR(lastvisit) = %s ORDER BY lastvisit ASC''', 
(month, year) )
elif not re.search( '=', year ):
cur.execute( '''SELECT * FROM works WHERE 
YEAR(lastvisit) = %s ORDER BY lastvisit ASC''', year )
else:
print('''Πώς να γίνει αναζήτηση αφού δεν επέλεξες 
ούτε πελάτη ούτε μήνα ή τουλάχιστον το έτος?''')
print( '' )
sys.exit(0)

data = cur.fetchall()

hits = money = 0

for row in data:
hits += 1
money = money + row[2]

..
..
selects based on either name, month, year or all of them
[/code]


The above if structure works correctly *only* if the user sumbits by form:

name, month, year
or
month, year

If, he just enter a year in the form and sumbit then, i get no error, but no 
results displayed back.

Any ideas as to why this might happen?


What are the values of 'name', 'month' and 'year' in each of the cases?
Printing out ascii(name), ascii(month) and ascii(year), will be helpful.

Then try stepping through those lines in your head.
--
http://mail.python.org/mailman/listinfo/python-list


Re: A certainl part of an if() structure never gets executed.

2013-06-11 Thread MRAB

On 12/06/2013 02:25, [email protected] wrote:

Τη Τετάρτη, 12 Ιουνίου 2013 1:43:21 π.μ. UTC+3, ο χρήστης MRAB έγραψε:

On 11/06/2013 21:20, Νικόλαος Κούρας wrote:

[snip]


What are the values of 'name', 'month' and 'year' in each of the cases?
Printing out ascii(name), ascii(month) and ascii(year), will be helpful.

Then try stepping through those lines in your head.


i hav epribted all values of those variables and they are all correct.
i just dont see why ti fails to enter the specific if case.

is there a shorter and more clear way to write this?
i didnt understood what Rick trie to told me.

can you help me write it more easily?


What are the values that are printed?

--
http://mail.python.org/mailman/listinfo/python-list


Re: A certainl part of an if() structure never gets executed.

2013-06-12 Thread MRAB

On 12/06/2013 12:17, Νικόλαος Κούρας wrote:



As with most of your problems you are barking up the wrong tree.
Why not use the actual value you get from the form to check whether you
have a valid month?
Do you understand why "0" is submitted instead of "=="?

Bye, Andreas


I have corrected the enumerate loop but it seems thet now the year works
and the selected name nad month fail:

if '=' not in ( name and month and year ):
cur.execute( '''SELECT * FROM works WHERE clientsID = 
(SELECT id FROM
clients WHERE name = %s) and MONTH(lastvisit) = %s and YEAR(lastvisit) =
%s ORDER BY lastvisit ASC''', (name, month, year) )
elif '=' not in ( month and year ):
cur.execute( '''SELECT * FROM works WHERE 
MONTH(lastvisit) = %s and
YEAR(lastvisit) = %s ORDER BY lastvisit ASC''', (month, year) )
elif '=' not in year:
cur.execute( '''SELECT * FROM works WHERE 
YEAR(lastvisit) = %s ORDER
BY lastvisit ASC''', year )
else:
print( 'Πώς να γίνει αναζήτηση αφού 
δεν επέλεξες
ούτε πελάτη ούτε μήνα ή τουλάχιστον το έτος?' )
print( '' )
sys.exit(0)


i tried in , not in and all possible combinations. but somehow it
confuses me.

doesn't that mean?

if '=' not in ( name and month and year ):

if '=' does not exists as a char inside the name and month and year
variables?

i think it does, but why it fails then?


You think it does, but you're wrong.
--
http://mail.python.org/mailman/listinfo/python-list


Re: A certainl part of an if() structure never gets executed.

2013-06-12 Thread MRAB

On 12/06/2013 18:13, Νικόλαος Κούρας wrote:

On 12/6/2013 7:40 μμ, MRAB wrote:

On 12/06/2013 12:17, Νικόλαος Κούρας wrote:



As with most of your problems you are barking up the wrong tree.
Why not use the actual value you get from the form to check whether you
have a valid month?
Do you understand why "0" is submitted instead of "=="?

Bye, Andreas


I have corrected the enumerate loop but it seems thet now the year works
and the selected name nad month fail:

if '=' not in ( name and month and year ):
cur.execute( '''SELECT * FROM works WHERE clientsID =
(SELECT id FROM
clients WHERE name = %s) and MONTH(lastvisit) = %s and YEAR(lastvisit) =
%s ORDER BY lastvisit ASC''', (name, month, year) )
elif '=' not in ( month and year ):
cur.execute( '''SELECT * FROM works WHERE MONTH(lastvisit)
= %s and
YEAR(lastvisit) = %s ORDER BY lastvisit ASC''', (month, year) )
elif '=' not in year:
cur.execute( '''SELECT * FROM works WHERE YEAR(lastvisit)
= %s ORDER
BY lastvisit ASC''', year )
else:
print( 'Πώς να γίνει αναζήτηση αφού
δεν επέλεξες
ούτε πελάτη ούτε μήνα ή τουλάχιστον το έτος?' )
print( '' )
sys.exit(0)


i tried in , not in and all possible combinations. but somehow it
confuses me.

doesn't that mean?

if '=' not in ( name and month and year ):

if '=' does not exists as a char inside the name and month and year
variables?

i think it does, but why it fails then?


You think it does, but you're wrong.


How would you telll in english word what this is doing?

if '=' not in ( name and month and year ):


In English, the result of:

x and y

is basically:

if bool(x) is false then the result is x, otherwise the result is y

For example:

>>> bool("")
False
>>> "" and "world"
''
>>> bool("Hello")
True
>>> "Hello" and "world"
'world'



and then what this is doing?

if '=' not in ( name or month or year ):


In English, the result of:

x or y

is basically:

if bool(x) is true then the result is x, otherwise the result is y

For example:

>>> bool("")
False
>>> "" or "world"
'world'
>>> bool("Hello")
True
>>> "Hello" or "world"
'Hello'

These can be strung together, so that:

x and y and z

is equivalent to:

(x and y) and z

and:

x or y or z

is equivalent to:

(x or y) or z

and so on, however many times you wish to do it.


Never before i used not in with soe many variables in parenthesi, up
until now i was specified it as not in var 1 and not in var 2 and not in
var 2 and so on


Keep it simple:

if '=' not in name and '=' not in month and '=' not in year:

There may be a shorter way, but you seem confused enough as it is.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Version Control Software

2013-06-13 Thread MRAB

On 13/06/2013 07:00, cutems93 wrote:

Thank you everyone for such helpful responses! Actually, I have one
more question. Does anybody have experience with closed source
version control software? If so, why did you buy it instead of
downloading open source software? Does closed source vcs have some
benefits over open source in some part?


I've used Microsoft SourceSafe. I didn't like it (does anyone? :-)).
--
http://mail.python.org/mailman/listinfo/python-list


Re: Eval of expr with 'or' and 'and' within

2013-06-14 Thread MRAB

On 14/06/2013 18:28, Michael Torrie wrote:

On 06/14/2013 10:49 AM, Steven D'Aprano wrote:

Correct. In Python, all boolean expressions are duck-typed: they aren't
restricted to True and False, but to any "true-ish" and "false-ish"
value, or as the Javascript people call them, truthy and falsey values.

There are a couple of anomalies -- the timestamp representing midnight is
falsey, because it is implemented as a zero number of seconds; also
exhausted iterators and generators ought to be considered falsey, since
they are empty, but because they don't know they are empty until called,
they are actually treated as truthy. But otherwise, the model is very
clean.


Good explanation! Definitely enlightened me.  Thank you.


The general rule is that an object is true-ish unless it's false-ish
(there are fewer false-ish objects than true-ish objects, e.g. zero vs
non-zero int).
--
http://mail.python.org/mailman/listinfo/python-list


Re: problem uploading docs to pypi

2013-06-14 Thread MRAB

On 14/06/2013 23:53, Irmen de Jong wrote:

Hi,

I'm experiencing some trouble when trying to upload the documentation for one 
of my
projects on Pypi. I'm getting a Bad Gateway http error message.
Anyone else experiencing this? Is this an intermittent issue or is there a 
problem with
Pypi?

Downloading documentation (from pythonhosted.org) works fine.


About 10 ten days ago I got the error:

 Upload failed (503): backend write error

while trying to upload to PyPI, and it failed the same way the second 
time, but worked some

time later.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Eval of expr with 'or' and 'and' within

2013-06-14 Thread MRAB

On 15/06/2013 00:06, Nobody wrote:

On Fri, 14 Jun 2013 16:49:11 +, Steven D'Aprano wrote:


Unlike Javascript though, Python's idea of truthy and falsey is actually
quite consistent:


Beyond that, if a user-defined type implements a __nonzero__() method then
it determines whether an instance is true or false. If it implements a
__len__() method, then an instance is true if it has a non-zero length.


It's __nonzero__ in Python 2, __bool__ in Python 3.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Fatal Python error: Py_Initialize: can't initialize sys standard streams

2013-06-15 Thread MRAB

On 15/06/2013 23:10, alex23 wrote:

On Jun 16, 7:29 am, [email protected] wrote:

I get this error when I try to save .dxf files in Inkscape:

Fatal Python error: Py_Initialize: can't initialize sys standard streams

Then it seems to recover but it doesn't really recover. It saves the files and 
then DraftSite won't open them. Here is what the  > thing says when Inkscape 
tried to fix the saving problem.


What do you mean by "Inkscape tried to fix the saving problem"?


File "D:\Program Files (x86)\Inkscape\python\Lib\encodings\__init__.py", line 
123
raise CodecRegistryError,\
^
SyntaxError: invalid syntax


To me that traceback looks like it's Python 3 trying to run code written 
for Python 2.



Here's a report of a similar issue with Blender (which also provides a
local install of Python under Windows):
http://translate.google.com.au/translate?hl=en&sl=fr&u=http://blenderclan.tuxfamily.org/html/modules/newbb/viewtopic.php%3Ftopic_id%3D36497&prev=/search%3Fq%3Dinkscape%2BCodecRegistryError

(Sorry for the ugly url, it's a Google translation of a french
language page)

Do you have a separate installation of Python? It's possible it may be
conflicting. If you rename it's folder to something else (which will
temporarily break that install), do you still see this same issue in
Inkscape?



--
http://mail.python.org/mailman/listinfo/python-list


Re: Updating a filename's counter value failed each time

2013-06-17 Thread MRAB

On 17/06/2013 17:39, Simpleton wrote:

Hello again, something simple this time:

After a user selects a file from the form, that sleection of his can be
found form reading the variable 'filename'

If the filename already exists in to the database i want to update its
counter and that is what i'm trying to accomplish by:

---
if form.getvalue('filename'):
cur.execute('''UPDATE files SET hits = hits + 1, host = %s, lastvisit =
%s WHERE url = %s''', (host, lastvisit, filename) )
---

For some reason this never return any data, because for troubleshooting
i have tried:

-
data = cur.fetchone()

if data:
print("something been returned out of this"_


Since for sure the filename the user selected is represented by a record
inside 'files' table why its corresponding counter never seems to get
updated?


You say "for sure". Really? Then why isn't it working as you expect?

When it comes to debugging, """assumption is the mother of all
-ups""" [insert relevant expletive for ""].

Assume nothing.

What is the value of 'filename'?

What are the entries in the 'files' table?

Print them out, for example:

print("filename is", ascii(filename))

or write them into a log file and then look at them.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Updating a filename's counter value failed each time

2013-06-17 Thread MRAB

On 17/06/2013 19:32, Jens Thoms Toerring wrote:

Νίκος  wrote:

On 17/6/2013 8:54 μμ, Jens Thoms Toerring wrote:
> Also take care to check the filename you insert - a malicous
> user might cobble together a file name that is actually a SQL
> statement and then do nasty things to your database. I.e. never
> insert values you received from a user without checking them.



Yes in generally user iput validation is needed always, but here here
the filename being selected is from an html table list of filenames.



But i take it you eman that someone might tried it to pass a bogus
"filename" value from the url like:



http://superhost.gr/cgi-bin/files.py?filename="Select.";



Si that what you mean?


Well, you neer wrote where this filename is coming from.
so all I could assume was that the user can enter a more
or less random file name. If he only can select one from
a list you put together there's probably less of a problem.


But the comma inside the execute statement doesn't protect me from such
actions opposed when i was using a substitute operator?



> I would guess because you forgot the uotes around string
> values in your SQL statement which thus wasn't executed.



i tried you suggestions:



cur.execute('''UPDATE files SET hits = hits + 1, host = %s, lastvisit =
%s WHERE url = "%s"''', (host, lastvisit, filename) )



seems the same as:



cur.execute('''UPDATE files SET hits = hits + 1, host = %s, lastvisit =
%s WHERE url = %s''', (host, lastvisit, filename) )



since everything is tripled quoted already what would the difference be
in "%s" opposed to plain %s ?


As I wrote you need *single* quotes around strings in
SQL statements. Double quotes won't do - this is SQL
and not Python so you're dealing with a different lan-
guage and thus different rules apply. The triple single
quotes are seen by Python, but SQL needs its own.


The query looks safe to me as he _is_ using a parametrised query.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Updating a filename's counter value failed each time

2013-06-17 Thread MRAB

On 17/06/2013 21:44, John Gordon wrote:

In  Alister  writes:


> #update file's counter if cookie does not exist cur.execute('''UPDATE
> files SET hits = hits + 1, host = %s, lastvisit =
> %s WHERE url = %s''', (host, lastvisit, filename) )
>
> if cur.rowcount:
>print( " database has been affected" )
>
> indeed every time i select afilename the message gets printed bu then
> again noticing the database via phpmyadmin the filename counter is
> always remaining 0, and not added by +1



replase
 if cur.rowcount:
print( " database has been affected" )



with print cur.rowcount()


rowcount isn't a method call; it's just an attribute.  You don't need
the parentheses.


Well, you do need parentheses, it's just that you need them around the
'print':

if cur.rowcount:
print(cur.rowcount)

--
http://mail.python.org/mailman/listinfo/python-list


Re: Why is regex so slow?

2013-06-18 Thread MRAB

On 18/06/2013 17:45, Roy Smith wrote:

I've got a 170 MB file I want to search for lines that look like:

[2010-10-20 16:47:50.339229 -04:00] INFO (6): songza.amie.history - ENQUEUEING: 
/listen/the-station-one

This code runs in 1.3 seconds:

--
import re

pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0

for line in open('error.log'):
 m = pattern.search(line)
 if m:
 count += 1

print count
--

If I add a pre-filter before the regex, it runs in 0.78 seconds (about
twice the speed!)

--
import re

pattern = re.compile(r'ENQUEUEING: /listen/(.*)')
count = 0

for line in open('error.log'):
 if 'ENQ' not in line:
 continue
 m = pattern.search(line)
 if m:
 count += 1

print count
--

Every line which contains 'ENQ' also matches the full regex (61425
lines match, out of 2.1 million total).  I don't understand why the
first way is so much slower.

Once the regex is compiled, you should have a state machine pattern
matcher.  It should be O(n) in the length of the input to figure out
that it doesn't match as far as "ENQ".  And that's exactly how long it
should take for "if 'ENQ' not in line" to run as well.  Why is doing
twice the work also twice the speed?

I'm running Python 2.7.3 on Ubuntu Precise, x86_64.

I'd be interested in how the 'regex' module 
(http://pypi.python.org/pypi/regex) compares. :-)


--
http://mail.python.org/mailman/listinfo/python-list


Re: Why is regex so slow?

2013-06-18 Thread MRAB

On 18/06/2013 20:21, Roy Smith wrote:

In article ,
Mark Lawrence   wrote:


Out of curiousity have the tried the new regex module from pypi rather
than the stdlib version?  A heck of a lot of work has gone into it see
http://bugs.python.org/issue2636


I just installed that and gave it a shot.  It's *slower* (and, much
higher variation from run to run).  I'm too exhausted fighting with
OpenOffice to get this into some sane spreadsheet format, so here's
the raw timings:

Built-in re module:
0:01.32
0:01.33
0:01.32
0:01.33
0:01.35
0:01.32
0:01.35
0:01.36
0:01.33
0:01.32

regex with flags=V0:
0:01.66
0:01.53
0:01.51
0:01.47
0:01.81
0:01.58
0:01.78
0:01.57
0:01.64
0:01.60

regex with flags=V1:
0:01.53
0:01.57
0:01.65
0:01.61
0:01.83
0:01.82
0:01.59
0:01.60
0:01.55
0:01.82

I reckon that about 1/3 of that time is spent in 
PyArg_ParseTupleAndKeywords, just getting the arguments!


There's a higher initial overhead in using regex than string methods,
so working just a line at time will take longer.
--
http://mail.python.org/mailman/listinfo/python-list


Re: A few questiosn about encoding

2013-06-20 Thread MRAB

On 20/06/2013 07:26, Steven D'Aprano wrote:

On Wed, 19 Jun 2013 18:46:59 -0700, Rick Johnson wrote:


On Thursday, June 13, 2013 2:11:08 AM UTC-5, Steven D'Aprano wrote:


Gah! That's twice I've screwed that up. Sorry about that!


Yeah, and your difficulty explaining the Unicode implementation reminds
me of a passage from the Python zen:

 "If the implementation is hard to explain, it's a bad idea."


The *implementation* is easy to explain. It's the names of the encodings
which I get tangled up in.


You're off by one below!


ASCII: Supports exactly 127 code points, each of which takes up exactly 7
bits. Each code point represents a character.


128 codepoints.


Latin-1, Latin-2, MacRoman, MacGreek, ISO-8859-7, Big5, Windows-1251, and
about a gazillion other legacy charsets, all of which are mutually
incompatible: supports anything from 127 to 65535 different code points,
usually under 256.


128 to 65536 codepoints.


UCS-2: Supports exactly 65535 code points, each of which takes up exactly
two bytes. That's fewer than required, so it is obsoleted by:


65536 codepoints.

etc.


UTF-16: Supports all 1114111 code points in the Unicode charset, using a
variable-width system where the most popular characters use exactly two-
bytes and the remaining ones use a pair of characters.

UCS-4: Supports exactly 4294967295 code points, each of which takes up
exactly four bytes. That is more than needed for the Unicode charset, so
this is obsoleted by:

UTF-32: Supports all 1114111 code points, using exactly four bytes each.
Code points outside of the range 0 through 1114111 inclusive are an error.

UTF-8: Supports all 1114111 code points, using a variable-width system
where popular ASCII characters require 1 byte, and others use 2, 3 or 4
bytes as needed.


Ignoring the legacy charsets, only UTF-16 is a terribly complicated
implementation, due to the surrogate pairs. But even that is not too bad.
The real complication comes from the interactions between systems which
use different encodings, and that's nothing to do with Unicode.




--
http://mail.python.org/mailman/listinfo/python-list


Re: A few questiosn about encoding

2013-06-20 Thread MRAB

On 20/06/2013 17:37, Chris Angelico wrote:

On Fri, Jun 21, 2013 at 2:27 AM,   wrote:

And all these coding schemes have something in common,
they work all with a unique set of code points, more
precisely a unique set of encoded code points (not
the set of implemented code points (byte)).

Just what the flexible string representation is not
doing, it artificially devides unicode in subsets and try
to handle eache subset differently.




UTF-16 divides Unicode into two subsets: BMP characters (encoded using
one 16-bit unit) and astral characters (encoded using two 16-bit units
in the D800::/5 netblock, or equivalent thereof). Your beloved narrow
builds are guilty of exactly the same crime as the hated 3.3.


UTF-8 divides Unicode into subsets which are encoded in 1, 2, 3, or 4
bytes, and those who previously used ASCII still need only 1 byte per
codepoint!

--
http://mail.python.org/mailman/listinfo/python-list


Re: Does upgrade from 2.7.3 to 2.7.5 require uninstall?

2013-06-20 Thread MRAB

On 20/06/2013 19:35, Wanderer wrote:

Do I need to uninstall Python 2.7.3 before installing Python 2.7.5?


No.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Default Value

2013-06-21 Thread MRAB

On 21/06/2013 19:26, Rick Johnson wrote:

On Friday, June 21, 2013 12:47:56 PM UTC-5, Rotwang wrote:

It isn't clear to me from your posts what exactly you're
proposing as an alternative to the way Python's default
argument binding works. In your version of Python, what
exactly would happen when I passed a mutable argument as a
default value in a def statement? E.g. this:

 >>> a = [1, 2, 3]
 >>> a.append(a)
 >>> b = object()
 >>> def f(x = [None, b, [a, [4]]]):
... pass # do something

What would you like to see the interpreter do in this case?


Ignoring that this is a completely contrived example that has
no use in the real world, here are one of three methods by
which i can handle this:


  The Benevolent Approach:

I could cast a "virtual net" over my poor lemmings before
they jump off the cliff by throwing an exception:

   Traceback (most recent screw-up last):
Line BLAH in SCRIPT
 def f(x = [None, b, [a, [4]]]):
   ArgumentError: No mutable default arguments allowed!


What about this:

def f(x=Foo()):
pass # do something

Should it raise an exception? Only if a Foo instance is mutable? How do
you know whether such an instance is mutable?



  The Apathetic Approach:

I could just assume that a programmer is responsible for the
code he writes. If he passes mutables into a function as
default arguments, and then mutates the mutable later, too
bad, he'll understand the value of writing solid code after
a few trips to exception Hell.


  The Malevolent Approach (disguised as beneva-loon-icy):

I could use early binding to confuse the hell out of him and
enjoy the laughs with all my ivory tower buddies as he falls
into fits of confusion and rage. Then enjoy again when he
reads the docs. Ahh, the gift that just keeps on giving!


How does the "Apathetic Approach" differ from the "Malevolent Approach"?



  Conclusion:

As you can probably guess the malevolent approach has some
nice fringe benefits.

You know, out of all these post, not one of you guys has
presented a valid use-case that will give validity to the
existence of this PyWart -- at least not one that CANNOT be
reproduced by using my fine examples. All you can muster is
some weak argument about protecting the lemmings.

  Is anyone up the challenge?
  Does anyone here have any real chops?

PS: I won't be holding my breath.


Speaking of which, on 11 January 2013, in the thread "PyWart: Import
resolution order", you were asked:

"""Got any demonstrable code for Python 4000 yet?"""

and you said:

"""I am working on it. Stay tuned. Rick is going to rock your little 
programming world /very/ soon."""


How soon is "/very/ soon" (clearly longer than 5 months), and how did
you fix this "PyWart"?

--
http://mail.python.org/mailman/listinfo/python-list


Re: Default Value

2013-06-21 Thread MRAB

On 21/06/2013 21:44, Rick Johnson wrote:

On Friday, June 21, 2013 2:25:49 PM UTC-5, MRAB wrote:

On 21/06/2013 19:26, Rick Johnson wrote:
> 
>   The Apathetic Approach:
> 
> I could just assume that a programmer is responsible for the
> code he writes. If he passes mutables into a function as
> default arguments, and then mutates the mutable later, too
> bad, he'll understand the value of writing solid code after
> a few trips to exception Hell.
> 
>   The Malevolent Approach (disguised as beneva-loon-icy):
> 
> I could use early binding to confuse the hell out of him and
> enjoy the laughs with all my ivory tower buddies as he falls
> into fits of confusion and rage. Then enjoy again when he
> reads the docs. Ahh, the gift that just keeps on giving!

How does the "Apathetic Approach" differ from the
"Malevolent Approach"?


In the apathetic approach i allow the programmer to be the
sole proprietor of his own misfortunes. He lives by the
sword, and thus, he can die by the sword.

Alternatively the malevolent approach injects misfortunes
for the programmer on the behalf of esoteric rules. In this
case he will live by sword, and he could die by the sword,
or he could be unexpectedly blown to pieces by a supersonic
Howitzer shell.

It's an Explicit death versus an Implicit death; and Explicit
should ALWAYS win!

The only way to strike a reasonable balance between the
explicit death and implicit death is to throw up a warning:

  "INCOMING"

Which in Python would be the "MutableArgumentWarning".

*school-bell*


I notice that you've omitted any mention of how you'd know that the
argument was mutable.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Default Value

2013-06-21 Thread MRAB

On 22/06/2013 00:51, Rick Johnson wrote:

On Friday, June 21, 2013 5:49:51 PM UTC-5, MRAB wrote:

I notice that you've omitted any mention of how you'd know that the
argument was mutable.


My argument has always been that mutables should not be
passed into subroutines as default arguments because bad
things can happen. And Python's excuse of saving the poor
dummies is no excuse.

It does not matter if we are passing the arguments into the
current implementation of "python functions which maintain
state of default mutables arguments between successive
calls" or in a more desirable system of truly "stateless
subroutines".

I also believe that a programmer should not be prevented
from passing mutable default arguments, but if he does, I'm
not going to provide any sort of protection -- other than
possibly throwing up a warning message.


So, having mutables as default arguments is a bad idea, but a
programmer should not be prevented from doing that, and a warning
message should be printed on such occasions.


Now, YOU, and everyone else, cannot destroy the main points
of my argument because the points are in fact rock solid,
however, what you will do is to focus in one small detail,
one little tiny (perceived) weakness in the armor, and you
will proceed to destroy that small detail (in this case how
i will determine mutability), and hope that the destruction
of this insignificant detail will start a chain-reaction
that will propagate out and bring down my entire position.


In order to print a warning, Python needs to know whether the object is
mutable, so it's an important detail.


So you want me to tell you how to query the mutability of an
object... Ha Ha Ha! Sorry, but that's not going to happen!


It's a detail that you're not going to help to solve.


Why should i help the developers of this language. What have
they done for me?


They've developed this language, and provided it for free. They've even
released the source code.

You perceive flaws that you say must be fixed, but you're not going to
help to fix them.


WOULD YOU OFFER ASSISTANCE TO PEOPLE THAT HAVE TREATED YOU THIS WAY?

And let's just be honest. You don't want my assistance. You
just want me to fumble the ball. Then you can use that
fumble as an excuse to write me off. Nice try!


I _do_ want you to help to improve the language, and I don't care if
you don't get it right first time. I didn't get it right first time
when I worked on the regex module (I think that what I have on PyPI is
my _third_ attempt!).


You want to gain my respect? Then start engaging in honest
debates. Start admitting that yes, somethings about Python
are not only undesirable, they're just plain wrong.


Python isn't perfect, but then no language is perfect. There will
always be compromises, and the need to maintain backwards compatibility
means that we're stuck with some "mis-features", but I think it's still
worth using; I still much prefer it to other languages.


Stop calling me a troll when i am not. And not just me, stop
calling other people trolls too! Stop using the personal
attacks and straw man arguments.


???


Finally, get the core devs to realize that this list matters
and they need to participate (including you know who!)


Everyone is a volunteer. The core devs contribute by developing the
language, and whether they participate in this particular list is
entirely up to them; how they choose to spend _their own_ free time is,
again, entirely up to them.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Default Value

2013-06-21 Thread MRAB

On 22/06/2013 02:40, Chris Angelico wrote:

On Sat, Jun 22, 2013 at 11:31 AM, Steven D'Aprano
 wrote:

Thinking about this, I think that the only safe thing to do in Rickython
4000 is to prohibit putting mutable objects inside tuples. Putting a list
or a dict inside a tuple is just a bug waiting to happen!


I think you're onto something here, but you really haven't gone far
enough. Mutable objects *anywhere* are a problem. The solution?
Abolish mutable objects. Strings (bytes and Unicode), integers,
decimals (floats are a problem to many people), tuples of the above,
and dictionaries mapping any of the above to any other of the above,
should be enough to do everything.


Pure functional languages don't have mutables, or even variables, but
then we're not talking about a pure functional language, we're talking
about Python.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Default Value

2013-06-22 Thread MRAB

On 22/06/2013 03:32, Rick Johnson wrote:

On Friday, June 21, 2013 8:54:50 PM UTC-5, MRAB wrote:

On 22/06/2013 00:51, Rick Johnson wrote:
> On Friday, June 21, 2013 5:49:51 PM UTC-5, MRAB wrote:
> My argument has always been that mutables should not be
> passed into subroutines as default arguments because bad
> things can happen. [...] I also believe that a programmer
> should not be prevented from passing mutable default
> arguments [...]
So, having mutables as default arguments is a bad idea,
but a programmer should not be prevented from doing that,
and a warning message should be printed on such occasions.


Well i'll admit that does sound like a contradiction.
Basically i meant, programmers should be *discouraged* from
passing mutables as default arguments but not *prevented*.
Of course, utilizing a stateless subroutine like i suggest,
argument mutability would not matter.

Sometimes when you're passionate about something your
explanations become so verbose as to render your idea lost
in the noise. Obviously i made that mistake here :)


Yes, a more measured explanation tends to work better. :-)


In my last reply to Rotwang i explained the functionality i
seek to achieve in a set of three interactive examples.
Take a look at those and let me know what you think.


Hmm. Like they say, "The devil's in the details". As with the
mutability thing, I need to think about it some more. Sometimes it
seems straight-forward, until you try to do it! :-)


> Why should i help the developers of this language. What have
> they done for me?

They've developed this language, and provided it for free.
They've even released the source code. You perceive flaws
that you say must be fixed, but you're not going to help
to fix them.


Agreed. And i am thankful for everyone's contributions. I
can be a bit harsh sometimes but my intention has always
been to improve Python.


I _do_ want you to help to improve the language, and I
don't care if you don't get it right first time. I didn't
get it right first time when I worked on the regex module
(I think that what I have on PyPI is my _third_ attempt!).


Well thanks for admitting you are not perfect. I know i am
not. We all had to start somewhere and anyone who believes
he knows everything is most assuredly a fool. Learning is
a perpetual process, same for software evolution.


> You want to gain my respect? Then start engaging in honest
> debates. Start admitting that yes, somethings about Python
> are not only undesirable, they're just plain wrong.
Python isn't perfect, but then no language is perfect.
There will always be compromises, and the need to maintain
backwards compatibility means that we're stuck with some
"mis-features", but I think it's still worth using; I
still much prefer it to other languages.


I understand. We can't break backwards compatibility for
everything, even breaking it for some large flaws could
cause a fatal abandonment of the language by long time
users.

I just don't understand why i get so much hostility when i
present the flaws for discussion. Part of my intention is to
air the flaw, both for new users and old users, but a larger
intention is to discover the validity of my, or others,
possible solutions.


The problem is in _how_ you do it, namely, very confrontationally.
You call yourself "RantingRick". People don't like ranting!

Instead of saying "This is obviously a flaw, and you're a fool if you
don't agree", you should say "IMHO, this is a flaw, and this is how I
think it could be fixed". Then, if someone points out a problem in your
suggested fix, you can say "OK, I see your point, I'll try to see
whether I can think of a way around that". Etc.


And even if that solution involves a fork, that is not a bad
thing. Creating a new fork and then garnering an acceptance
of the new spinoff would lead to at worse, a waste of time
and a huge learning experience, or at best, an evolution of
the language.


> Stop calling me a troll when i am not. And not just me, stop
> calling other people trolls too! Stop using the personal
> attacks and straw man arguments.


Sorry. I failed to explain that this statement was meant not
directly for you but as a general statement to all members.
Sometimes i feel like my back is against the wall and i'm
fighting several foes at once. That can lead to me getting
defensive.



--
http://mail.python.org/mailman/listinfo/python-list


Re: n00b question on spacing

2013-06-22 Thread MRAB

On 23/06/2013 00:56, Dave Angel wrote:

On 06/22/2013 07:37 PM, Chris Angelico wrote:

On Sun, Jun 23, 2013 at 9:28 AM, Dave Angel  wrote:

On 06/22/2013 07:12 PM, Chris Angelico wrote:


On Sun, Jun 23, 2013 at 1:24 AM, Rick Johnson
 wrote:


_fmtstr = "Item wrote to MongoDB database {0}, {1}"
msg = _fmtstr.format(_arg1, _arg2)



As a general rule, I don't like separating format strings and their
arguments. That's one of the more annoying costs of i18n. Keep them in
a single expression if you possibly can.



On the contrary, i18n should be done with config files.  The format string


**as specified in the physical program**


is the key to the actual string which is located in the file/dict.
Otherwise you're shipping separate source files for each language -- blecch.


What I was trying to say is that the programmereze format string in the
code is replaced at runtime by the French format string in the config file.



The simplest way to translate is to localize the format string; that's
the point of .format()'s named argument system (since it lets you
localize in a way that reorders the placeholders). What that does is
it puts the format string away in a config file, while the replaceable
parts are here in the source. That's why I say that's a cost of i18n -
it's a penalty that has to be paid in order to move text strings away.




Certainly the reorderability of the format string is significant.  Not
only can it be reordered, but more than one instance of some of the
values is permissible if needed.  (What's missing is a decent handling
of such things as singular/plural, where you want a different version
per country of one (or a few) words from the format string, based on
whether a value is exactly 1.)


[snip]
One vs not-one isn't good enough. Some languages use the singular with
any numbers ending in '1'. Some languages have singular, dual, and
plural. Etc. It's surprising how inventive people can be! :-)

--
http://mail.python.org/mailman/listinfo/python-list


Re: Looking for a name for a deployment framework...

2013-06-24 Thread MRAB

On 24/06/2013 13:50, Roy Smith wrote:

In article <[email protected]>,
  [email protected] wrote:


Hi all,

Any suggestions for a good name, for a framework that does automatic server
deployments?

It's like Fabric, but more powerful.
It has some similarities with Puppet, Chef and Saltstack, but is written in
Python.

Key points are that it uses Python, but is still very declarative and
supports introspection. It supports parallel deployments, and interactivity.
And it has a nice commandline shell with autocompletion for traversing the
deployment tree.

The repository:
https://github.com/jonathanslenders/python-deployer/tree/refactoring-a-lot-v2


Suggestions welcome :)
Jonathan


Without forming any opinion on the software itself, the best advice I
can offer is that naming puns are very popular.  If you're thinking of
this as a fabric replacement, I would go with cloth, textile, material,
gabardine, etc.


Snakeskin? Oh, I see that's already taken. :-(
--
http://mail.python.org/mailman/listinfo/python-list


Re: [SPAM] Re: Default Value

2013-06-24 Thread MRAB

On 24/06/2013 15:22, Grant Edwards wrote:

On 2013-06-22, Ian Kelly  wrote:

On Fri, Jun 21, 2013 at 7:15 PM, Steven D'Aprano
 wrote:

On Fri, 21 Jun 2013 23:49:51 +0100, MRAB wrote:


On 21/06/2013 21:44, Rick Johnson wrote:

[...]

Which in Python would be the "MutableArgumentWarning".

*school-bell*


I notice that you've omitted any mention of how you'd know that the
argument was mutable.


That's easy. Just call ismutable(arg). The implementation of ismutable is
just an implementation detail, somebody else can work that out. A
language designer of the sheer genius of Rick can hardly be expected to
worry himself about such trivial details.


While we're at it, I would like to petition for a function
terminates(f, args) that I can use to determine whether a function
will terminate before I actually call it.


I think it should be terminate_time() -- so you can also find out how
long it's going to run.  It can return None if it's not going to
terminate...


Surely that should be float("inf")! Anything else would be ridiculous!
:-)
--
http://mail.python.org/mailman/listinfo/python-list


Re: Is this PEP-able? fwhile

2013-06-24 Thread MRAB

On 24/06/2013 23:35, Chris Angelico wrote:

On Tue, Jun 25, 2013 at 8:30 AM, Tim Chase
 wrote:

On 2013-06-25 07:38, Chris Angelico wrote:

Python has no issues with breaking out of loops, and even has
syntax specifically to complement it (the 'else:' clause). Use
break/continue when appropriate.


from minor_gripes import breaking_out_of_nested_loops_to_top_level


True. There are times I do wish for a 'goto'. But if goto were
implemented, I would also use it for jumping _into_ loops, and I'm not
sure that's going to make the feature popular :)


I think a better way would be to label the outer loop somehow and then
break out of it by name.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Python development tools

2013-06-24 Thread MRAB

On 25/06/2013 03:24, rusi wrote:

On Tuesday, June 25, 2013 4:41:22 AM UTC+5:30, Ben Finney wrote:

rusi  writes:
> I dont however think that the two philosophies are the same. See
> http://www.tcl.tk/doc/scripting.html

That essay constrasts “scripting” versus “system programming”, a useful
(though terminologically confusing) distinction.

It's a mistake to think that essay contrasts “scripting“ versus
“programming”. But the essay never justifies its aversion to
“programming” as a term for what it's describing, so that mistake is
easy to make.


The essay is 15 years old. So a bit dated. Referred to it as it conveys the 
sense/philosophy of scripting.



> On Monday, June 24, 2013 11:50:38 AM UTC+5:30, Ben Finney wrote:
> > Any time someone has shown me a “Python script”, I don't see how
> > it's different from what I'd call a “Python program”. So I just
> > mentally replace “scripting with “programming”.
>
> If you are saying that python spans the scripting to programming
> spectrum exceptionally well, I agree.

I'm saying that “scripting” is a complete subset of “programming”, so
it's nonsense to talk about “the scripting-to-programming spectrum”.

Scripting is, always, programming. Scripts are, always, programs. (But
not vice-versa; I do acknowledge there is more to programming than
scripting.) I say this because anything anyone has said to me about the
former is always something included already by the latter.

So I don't see much need for treating scripts as somehow distinct from
programs, or scripting as somehow distinct from programming. Whenever
you're doing the former, you're doing the latter by definition.



My personal associations with the word 'scripting'

- Cavalier attitude towards efficiency


And convenience for the programmer.

"""Manipulating long texts using variable-length strings? Yes, I know 
it's inefficient, but it's still faster than doing it by hand!"""



- No interest (and maybe some scorn) towards over-engineering (hence OOP)
- Heavy use of regular expressions, also sophistication of the command-line args
- A sense (maybe vague) of being glue more than computation, eg. a bash script 
is almost certain to invoke something other than builtins alone and is more 
likely to invoke a non-bash script than a bash script.  For a C program that 
likelihood is the other way round.  For python it could be either

Automating tasks, e.g. controlling other applications and stringing 
together tasks that you would otherwise be doing by hand.


--
http://mail.python.org/mailman/listinfo/python-list


Re: io module and pdf question

2013-06-25 Thread MRAB

On 25/06/2013 17:15, [email protected] wrote:

Thank you Rusi and Christian!

So it sounds like I should read the pdf data in as binary:


import os

pdfPath = '~/Desktop/test.pdf'

colorlistData = ''

with open(os.path.expanduser(pdfPath), 'rb') as f:
 for i in f:
 if 'XYZ:colorList' in i:
 colorlistData = i.split('XYZ:colorList')[1]
 break

print(colorlistData)


This gives me the error:
TypeError: Type str doesn't support the buffer API

I admit I know nothing about binary, except it's ones and zeroes.  Is there a way to read 
it in as binary, convert it to ascii/unicode, and then somehow split it by newline 
characters so that I can pull the appropriate metadata lines out?  For example, 
XYZ:colorList="DarkBlue,Yellow"


In Python 2, string literals like '' are by default bytestrings. If you
want a Unicode string you need to add the prefix u, so u''.

In Python 3, string literals like '' are by default Unicode. If you
want a bytestring you need to add the prefix b, so b''.

Python 2 was lax when mixing bytestrings with Unicode strings.

Python 3, on the other hand, insists that you know the difference: is
it text (Unicode) or binary data (bytestring)?


Thanks!

Jay

--


Most of the PDF objects are therefore not encoded. It is, however,
possible to include a PDF into another PDF and to encode it, but that's
a rare case. Therefore the metadata can usually be read in text mode.
However, to correctly find all objects, the xref-table indexes offsets
into the PDF. It must be treated binary in any case, and that's the
funny reason for the first 3 characters of the PDF - they must include
characters with the 8th bit set, such that FTP applications treat it as
binary.



Christian




--
http://mail.python.org/mailman/listinfo/python-list


Re: Parsing soap/xml result

2013-06-25 Thread MRAB

On 25/06/2013 23:28, miguel olivares varela wrote:


I try to parse a soap/xml answer like:

http://schemas.xmlsoap.org/soap/envelope/";
xmlns:xsd="http://www.w3.org/2001/XMLSchema";
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";>

   http://schemas.xmlsoap.org/soap/encoding/";
xmlns:ns1="http://192.168.2.135:8490/gift-ws/services/SRV_GIFT_PKG";>
  http://schemas.xmlsoap.org/soap/encoding/";>
  xsi:type="xsd:string">0
  xsi:type="xsd:string">OK
xsi:type="xsd:string">
  
   





[snip]
The XML contains this:



That's the problem.
--
http://mail.python.org/mailman/listinfo/python-list


Re: re.finditer() skips unicode into selection

2013-06-26 Thread MRAB

On 26/06/2013 20:18, [email protected] wrote:

I am using the following Highlighter class for Spell Checking to work on my 
QTextEdit.

class Highlighter(QSyntaxHighlighter):


In Python 2.7, the re module has a somewhat limited idea of what a
"word" character is. It recognises 'DEVANAGARI LETTER NA' as a letter,
but 'DEVANAGARI VOWEL SIGN E' as a diacritic. The pattern ur'(?u)\w+'
will therefore split "नेपाली" into 3 parts.


 pattern = ur'\w+'
 def __init__(self, *args):
 QSyntaxHighlighter.__init__(self, *args)
 self.dict = None

 def setDict(self, dict):
 self.dict = dict

 def highlightBlock(self, text):
 if not self.dict:
 return
 text = unicode(text)
 format = QTextCharFormat()
 format.setUnderlineColor(Qt.red)
 format.setUnderlineStyle(QTextCharFormat.SpellCheckUnderline)


The LOCALE flag is for locale-sensitive 1-byte per character
bytestrings. It's rarely useful.

The UNICODE flag is for dealing with Unicode strings, which is what you
need here. You shouldn't be using both at the same time!


 unicode_pattern=re.compile(self.pattern,re.UNICODE|re.LOCALE)

 for word_object in unicode_pattern.finditer(text):
 if not self.dict.spell(word_object.group()):
 print word_object.group()
 self.setFormat(word_object.start(), word_object.end() - 
word_object.start(), format)

But whenever I pass unicode values into my QTextEdit the re.finditer() does not 
seem to collect it.

When I pass "I am a नेपाली" into the QTextEdit. The output is like this:

 I I I a I am I am I am a I am a I am a I am a I am a I am a I am a I am a

It is completely ignoring the unicode. What might be the issue. I am new to 
PyQt and regex. Im using Python 2.7 and PyQt4.


There's an alternative regex implementation at:

http://pypi.python.org/pypi/regex

It's a drop-in replacement for the re module, but with a lot of
additions, including better handling of Unicode.
--
http://mail.python.org/mailman/listinfo/python-list


Re: Devnagari Unicode Conversion Issues

2013-06-27 Thread MRAB

On 27/06/2013 16:05, darpan6aya wrote:

How can i convert text of the following type

 नेपाली

into devnagari unicode in Python 2.7?


Is that a bytestring? In other words, is its type 'str'?

If so, you need to decode it. That particular string is UTF-8:

>>> print "नेपाली".decode("utf-8")
नेपाली

--
http://mail.python.org/mailman/listinfo/python-list


Re: Why is the argparse module so inflexible?

2013-06-29 Thread MRAB

On 29/06/2013 06:28, Steven D'Aprano wrote:

On Fri, 28 Jun 2013 18:36:37 -0700, Ethan Furman wrote:


On 06/27/2013 03:49 PM, Steven D'Aprano wrote:


[rant]
I think it is lousy design for a framework like argparse to raise a
custom ArgumentError in one part of the code, only to catch it
elsewhere and call sys.exit. At the very least, that OUGHT TO BE A
CONFIG OPTION, and OFF BY DEFAULT.


[emphasis added]


Libraries should not call sys.exit, or raise SystemExit. Whether to
quit or not is not the library's decision to make, that decision
belongs to the application layer. Yes, the application could always
catch SystemExit, but it shouldn't have to.


So a library that is explicitly designed to make command-line scripts
easier and friendlier should quit with a traceback?

Really?


Yes, really.


[snip]
+1

It's the job of argparse to parse the arguments. What should happen if
they're invalid is for its caller to decide.

--
http://mail.python.org/mailman/listinfo/python-list


Re: MeCab UTF-8 Decoding Problem

2013-06-29 Thread MRAB

On 29/06/2013 12:29, [email protected] wrote:

Hi,

I am trying to use a program called MeCab, which does syntax analysis on 
Japanese text. The problem I am having is that it returns a byte string and if 
I try to print it, it prints question marks for almost all characters. However, 
if I try to use .decide, it throws an error. Here is my code:

#!/usr/bin/python
# -*- coding:utf-8 -*-

import MeCab
tagger = MeCab.Tagger("-Owakati")


This is a bytestring. Are you sure it shouldn't be a Unicode string
instead, i.e. u'MeCabで遊んでみよう!'?


text = 'MeCabで遊んでみよう!'

result = tagger.parse(text)
print result

result = result.decode('utf-8')
print result

And here is the output:

MeCab �� �� ��んで�� �� ��う!

Traceback (most recent call last):
   File "test.py", line 11, in 
 result = result.decode('utf-8')
   File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
 return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode bytes in position 6-7: invalid 
continuation byte


--
(program exited with code: 1)
Press return to continue

Also my terminal is able to display Japanese characters properly. For example 
print '日本語' works perfectly fine.

Any ideas?



--
http://mail.python.org/mailman/listinfo/python-list


Re: math functions with non numeric args

2013-06-30 Thread MRAB

On 30/06/2013 19:53, Andrew Berg wrote:

On 2013.06.30 13:46, Andrew Z wrote:

Hello,

print max(-10, 10)
10
print max('-10', 10)
-10

My guess max converts string to number bye decoding each of the characters to 
it's ASCII equivalent?

Where can i read more on exactly how the situations like these are dealt with?

This behavior is fixed in Python 3:


max('10', 10)

Traceback (most recent call last):
   File "", line 1, in 
TypeError: unorderable types: int() > str()

Python is strongly typed, so it shouldn't magically convert something from one 
type to another.
Explicit is better than implicit.


It doesn't magically convert anyway.

In Python 2, comparing objects of different types like that gives a
consistent but arbitrary result: in this case, bytestrings ('str') are
greater than integers ('int'):

>>> max('-10', 10)
'-10'
>>> max('10', -10)
'10'

--
http://mail.python.org/mailman/listinfo/python-list


Re: socket data sending problem

2013-07-03 Thread MRAB

On 03/07/2013 23:38, [email protected] wrote:

im trying to do a simple socket test program for a school project using the 
socket module, but im having difficulty in sending data between the client and 
host program.

so far all tutorials and examples have used something along the lines of:

   s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   host = socket.gethostname()
   port = 12345
   s.connect((host, port))


and received it on the server end with:

   s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   host = ''
   port = 12345
   s.bind((host, port))
   s.listen(1)
   conn, addr = s.accept()
   print ('client is at', addr)
   data = conn.recv(5)
   print(data)

it all works fine, except for when i try to use:

   s.send("hello")

to send data between the client and server, i just get this error message:

   >>>
   Traceback (most recent call last):
 File "C:/Users/Ollie/Documents/code/chatroom/client3.py", line 9, in 

   s.send("hello")
   TypeError: 'str' does not support the buffer interface
   >>>

if anyone can either show me what im doing wrong, what this means and what's 
causing it, or even better how to fix it it would be greatly appreciated


You didn't say which version of Python you're using, but I think that
you're using Python 3.

A socket handles bytes, not Unicode strings, so you need to encode the
Unicode strings to bytes before sending, and decode the bytes to
Unicode strings after receiving.

--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

2013-07-04 Thread MRAB

On 04/07/2013 11:38, Νίκος wrote:

Στις 4/7/2013 12:50 μμ, ο/η Ulrich Eckhardt έγραψε:

Am 04.07.2013 10:37, schrieb Νίκος:

I just started to have this error without changing nothing


Well, undo the nothing that you didn't change. ;)


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0:
invalid start byte
[Thu Jul 04 11:35:14 2013] [error] [client 108.162.229.97] Premature end
of script headers: metrites.py

Why cant it decode the starting byte? what starting byte is that?


It's the 0xb6 but it's expecting the starting byte of a UTF-8 sequence.
Please do some research on UTF-8, that should clear it up. You could
also search for common causes of that error.


So you are also suggesting that what gesthostbyaddr() returns is not
utf-8 encoded too?

What character is 0xb6 anyways?


Well, it's from a bytestring, so you'll have to specify what encoding
you're using! (It clearly isn't UTF-8.)

If it's ISO-8859-7 (what you've previously referred to as "greek-iso"),
then:

>>> import unicodedata
>>> unicodedata.name(b"\xb6".decode("ISO-8859-7"))
'GREEK CAPITAL LETTER ALPHA WITH TONOS'

You'll need to find out where that bytestring is coming from.

--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

2013-07-04 Thread MRAB

On 04/07/2013 12:29, Νίκος wrote:

Στις 4/7/2013 1:54 μμ, ο/η Chris Angelico έγραψε:

On Thu, Jul 4, 2013 at 8:38 PM, �  wrote:

So you are also suggesting that what gesthostbyaddr() returns is not utf-8
encoded too?

What character is 0xb6 anyways?


It isn't. It's a byte. Bytes are not characters.

http://www.joelonsoftware.com/articles/Unicode.html


Well in case of utf-8 encoding for the first 127 codepoing we can safely
say that a character equals a byte :)


Equals? No. Bytes are not characters. (Strictly speaking, they're
codepoints, not characters.)

And anyway, it's the first _128_ codepoints.
--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

2013-07-04 Thread MRAB

On 04/07/2013 12:36, Νίκος wrote:

Στις 4/7/2013 2:06 μμ, ο/η MRAB έγραψε:

On 04/07/2013 11:38, Νίκος wrote:

Στις 4/7/2013 12:50 μμ, ο/η Ulrich Eckhardt έγραψε:

Am 04.07.2013 10:37, schrieb Νίκος:

I just started to have this error without changing nothing


Well, undo the nothing that you didn't change. ;)


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0:
invalid start byte
[Thu Jul 04 11:35:14 2013] [error] [client 108.162.229.97] Premature
end
of script headers: metrites.py

Why cant it decode the starting byte? what starting byte is that?


It's the 0xb6 but it's expecting the starting byte of a UTF-8 sequence.
Please do some research on UTF-8, that should clear it up. You could
also search for common causes of that error.


So you are also suggesting that what gesthostbyaddr() returns is not
utf-8 encoded too?

What character is 0xb6 anyways?


Well, it's from a bytestring, so you'll have to specify what encoding
you're using! (It clearly isn't UTF-8.)

If it's ISO-8859-7 (what you've previously referred to as "greek-iso"),
then:

 >>> import unicodedata
 >>> unicodedata.name(b"\xb6".decode("ISO-8859-7"))
'GREEK CAPITAL LETTER ALPHA WITH TONOS'

You'll need to find out where that bytestring is coming from.


Right.
But nowhere in my script(metrites.py) i use an 'Ά' so i really have no
clue where this is coming from.

And you are right if it was a byte came from an utf-8 encoding scheme
then it would be automatically decoded.

The only thing i can say for use is that this problem a[[ear only when i
cloudflare my domain "superhost.gr"

If i un-cloudlflare it it cease to display errors.

Can you tell me hpw to write the following properly:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or 'UnResolved'

so even if the function fails "unresolved" to be returned back?
Somehow i need to capture the error.

Or it dosnt have to do it the or operand will be returned?


If gethostbyaddr fails, it raises socket.gaierror, (which, from Python
3.3 onwards, is a subclass of OSError), so try catching that, setting
'host' to 'UnResolved' if it's raised.

Also, try printing out ascii(os.environ['REMOTE_ADDR']).

--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

2013-07-04 Thread MRAB

On 04/07/2013 13:52, Νίκος wrote:

Στις 4/7/2013 3:07 μμ, ο/η MRAB έγραψε:

Also, try printing out ascii(os.environ['REMOTE_ADDR']).


'108.162.229.97' is the result of:

print( ascii(os.environ['REMOTE_ADDR']) )

Seems perfectly valid. and also have a PTR record, so that leaved us
clueless about the internal server error.


For me, socket.gethostbyaddr('108.162.229.97') raises socket.herror,
which is also a subclass of OSError from Python 3.3 onwards.

--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

2013-07-04 Thread MRAB

On 04/07/2013 13:47, Νίκος wrote:

Στις 4/7/2013 3:07 μμ, ο/η MRAB έγραψε:

On 04/07/2013 12:36, Νίκος wrote:

Στις 4/7/2013 2:06 μμ, ο/η MRAB έγραψε:

On 04/07/2013 11:38, Νίκος wrote:

Στις 4/7/2013 12:50 μμ, ο/η Ulrich Eckhardt έγραψε:

Am 04.07.2013 10:37, schrieb Νίκος:

I just started to have this error without changing nothing


Well, undo the nothing that you didn't change. ;)


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in
position 0:
invalid start byte
[Thu Jul 04 11:35:14 2013] [error] [client 108.162.229.97] Premature
end
of script headers: metrites.py

Why cant it decode the starting byte? what starting byte is that?


It's the 0xb6 but it's expecting the starting byte of a UTF-8
sequence.
Please do some research on UTF-8, that should clear it up. You could
also search for common causes of that error.


So you are also suggesting that what gesthostbyaddr() returns is not
utf-8 encoded too?

What character is 0xb6 anyways?


Well, it's from a bytestring, so you'll have to specify what encoding
you're using! (It clearly isn't UTF-8.)

If it's ISO-8859-7 (what you've previously referred to as "greek-iso"),
then:

 >>> import unicodedata
 >>> unicodedata.name(b"\xb6".decode("ISO-8859-7"))
'GREEK CAPITAL LETTER ALPHA WITH TONOS'

You'll need to find out where that bytestring is coming from.


Right.
But nowhere in my script(metrites.py) i use an 'Ά' so i really have no
clue where this is coming from.

And you are right if it was a byte came from an utf-8 encoding scheme
then it would be automatically decoded.

The only thing i can say for use is that this problem a[[ear only when i
cloudflare my domain "superhost.gr"

If i un-cloudlflare it it cease to display errors.

Can you tell me hpw to write the following properly:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
'UnResolved'

so even if the function fails "unresolved" to be returned back?
Somehow i need to capture the error.

Or it dosnt have to do it the or operand will be returned?


If gethostbyaddr fails, it raises socket.gaierror, (which, from Python
3.3 onwards, is a subclass of OSError), so try catching that, setting
'host' to 'UnResolved' if it's raised.

Also, try printing out ascii(os.environ['REMOTE_ADDR']).



I have followed your suggestion by trying this:

try:
host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]
except socket.gaierror:
host = "UnResolved"

and then re-cloudlflared "superhost.gr" domain

http://superhost.gr/ gives internal server error.


Try catching OSError instead. (As I said, from Python 3.3,
socket.gaierror is a subclass of it.)

--
http://mail.python.org/mailman/listinfo/python-list


Re: Important features for editors

2013-07-04 Thread MRAB

On 04/07/2013 14:22, Tim Chase wrote:

On 2013-07-04 05:02, Dave Angel wrote:
[snip an excellent list of things to look for in an editor]

Also,

- the ability to perform changes in bulk, especially across files.
   Often, this is done with the ability to record/playback macros,
   though some editors have multiple insertion/edit cursors; others
   allow for performing a bulk-change command across the entire file
   or list of files.

- folding (the ability to collapse multiple lines of text down to one
   line).  Especially if there are various ways to do it (manual
   folding, language-block folding, folding by indentation)

- multiple clipboard buffers/registers

- multiple bookmarks

- the ability to interact with external programs (piping a portion of
   a file through an external utility)

- a good community around it in case you have questions

- easy navigation to "important" things in your file (where
   "important" may vary based on file-type, but may include function
   definitions, paragraph boundaries, matching
   paren/bracket/brace/tag, etc)

Other nice-to-haves include

- split window editing
- tabbed windows
- Unicode support (including various encodings)


It's 2013, yet Unicode support is merely a "nice-to-have"?


- vimgolf.com ;-)


Candidates?
emacs  - standard on most OS's, available for Windows from


And I'll put in a plug for Vim.



--
http://mail.python.org/mailman/listinfo/python-list


Re: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in position 0: invalid start byte

2013-07-04 Thread MRAB

On 04/07/2013 14:38, Νίκος Γκρ33κ wrote:

Στις 4/7/2013 4:34 μμ, ο/η MRAB έγραψε:

On 04/07/2013 13:47, Νίκος wrote:

Στις 4/7/2013 3:07 μμ, ο/η MRAB έγραψε:

On 04/07/2013 12:36, Νίκος wrote:

Στις 4/7/2013 2:06 μμ, ο/η MRAB έγραψε:

On 04/07/2013 11:38, Νίκος wrote:

Στις 4/7/2013 12:50 μμ, ο/η Ulrich Eckhardt έγραψε:

Am 04.07.2013 10:37, schrieb Νίκος:

I just started to have this error without changing nothing


Well, undo the nothing that you didn't change. ;)


UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb6 in
position 0:
invalid start byte
[Thu Jul 04 11:35:14 2013] [error] [client 108.162.229.97]
Premature
end
of script headers: metrites.py

Why cant it decode the starting byte? what starting byte is that?


It's the 0xb6 but it's expecting the starting byte of a UTF-8
sequence.
Please do some research on UTF-8, that should clear it up. You could
also search for common causes of that error.


So you are also suggesting that what gesthostbyaddr() returns is not
utf-8 encoded too?

What character is 0xb6 anyways?


Well, it's from a bytestring, so you'll have to specify what encoding
you're using! (It clearly isn't UTF-8.)

If it's ISO-8859-7 (what you've previously referred to as
"greek-iso"),
then:

 >>> import unicodedata
 >>> unicodedata.name(b"\xb6".decode("ISO-8859-7"))
'GREEK CAPITAL LETTER ALPHA WITH TONOS'

You'll need to find out where that bytestring is coming from.


Right.
But nowhere in my script(metrites.py) i use an 'Ά' so i really have no
clue where this is coming from.

And you are right if it was a byte came from an utf-8 encoding scheme
then it would be automatically decoded.

The only thing i can say for use is that this problem a[[ear only
when i
cloudflare my domain "superhost.gr"

If i un-cloudlflare it it cease to display errors.

Can you tell me hpw to write the following properly:

host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0] or
'UnResolved'

so even if the function fails "unresolved" to be returned back?
Somehow i need to capture the error.

Or it dosnt have to do it the or operand will be returned?


If gethostbyaddr fails, it raises socket.gaierror, (which, from Python
3.3 onwards, is a subclass of OSError), so try catching that, setting
'host' to 'UnResolved' if it's raised.

Also, try printing out ascii(os.environ['REMOTE_ADDR']).



I have followed your suggestion by trying this:

try:
host = socket.gethostbyaddr( os.environ['REMOTE_ADDR'] )[0]
except socket.gaierror:
host = "UnResolved"

and then re-cloudlflared "superhost.gr" domain

http://superhost.gr/ gives internal server error.


Try catching OSError instead. (As I said, from Python 3.3,
socket.gaierror is a subclass of it.)



At least CloudFlare doesn't give me issues:

if i try this:

try:
host = os.environ['REMOTE_ADDR'][0]
except socket.gaierror:
host = "UnResolved"


It's pointless trying to catch a socket exception here because you're
not using a socket, you're just getting a string from an environment
variable.


then i get no errors and a valid ip back

but the above fails.

I don't know how to catch the exception with OSError.

i know only this two:

except socket.gaierror:
except socket.herror

both fail.


What do you mean "I don't know how to catch the exception with
OSError"? You've tried "except socket.gaierror" and "except
socket.herror", well just write "except OSError" instead!

--
http://mail.python.org/mailman/listinfo/python-list


Re: How to make this faster

2013-07-05 Thread MRAB

On 05/07/2013 16:17, Helmut Jarausch wrote:

On Fri, 05 Jul 2013 15:45:25 +0100, Oscar Benjamin wrote:


Presumably then you're now down to the innermost loop as a bottle-neck:

  Possibilities= 0
  for d in range(1,10) :
if Row_Digits[r,d] or Col_Digits[c,d] or Sqr_Digits[Sq_No,d] : continue
Possibilities+= 1

If you make it so that e.g. Row_Digits[r] is a set of indices rather
than a list of bools then you can do this with something like

Possibilities = len(Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No])

or perhaps

Possibilities = len(set.union(Row_Digits[r], Col_Digits[c],
Sqr_Digits[Sq_No]))

which I would expect to be a little faster than looping over range
since the loop is then performed under the hood by the builtin
set-type.

It just takes practice.


indeed


It's a little less obvious in Python than in
low-level languages where the bottlenecks will be and which operations
are faster/slower but optimisation always involves a certain amount of
trial and error anyway.


Oscar


I've tried the following version

def find_good_cell() :
   Best= None
   minPoss= 10
   for r,c in Grid :
 if  Grid[(r,c)] > 0 : continue
 Sq_No= (r//3)*3+c//3
 Possibilities= 9-len(Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No])
 if ( Possibilities < minPoss ) :
   minPoss= Possibilities
   Best= (r,c)

   if minPoss == 0 : Best=(-1,-1)
   return Best

All_digits= set((1,2,3,4,5,6,7,8,9))

def Solve(R_Cells) :
   if  R_Cells == 0 :
 print("\n\n++ S o l u t i o n ++\n")
 Print_Grid()
 return True

   r,c= find_good_cell()
   if r < 0 : return False
   Sq_No= (r//3)*3+c//3

   for d in All_digits - (Row_Digits[r] | Col_Digits[c] | Sqr_Digits[Sq_No]) :
 # put d into Grid
 Grid[(r,c)]= d
 Row_Digits[r].add(d)
 Col_Digits[c].add(d)
 Sqr_Digits[Sq_No].add(d)

 Success= Solve(R_Cells-1)

 # remove d again
 Grid[(r,c)]= 0
 Row_Digits[r].remove(d)
 Col_Digits[c].remove(d)
 Sqr_Digits[Sq_No].remove(d)

 if Success :
   Zuege.append((d,r,c))
   return True

   return False

which turns out to be as fast as the previous "dictionary only version".
Probably,  set.remove is a bit slow


For comparison, here's my solution:

from collections import Counter

problem = '''
_
_3_85
__1_2
___5_7___
__4___1__
_9___
5__73
__2_1
4___9
'''

# Build the grid.
digits = "123456789"

grid = []

for row in problem.splitlines():
  if not row:
continue

  new_row = []

  for cell in row:
if cell.isdigit():
  new_row.append({cell})
else:
  new_row.append(set(digits))

  grid.append(new_row)

# Solve the grid.
changed = True
while changed:
  changed = False

  # Look for cells that contain only one digit.
  for r in range(9):
for c in range(9):
  if len(grid[r][c]) == 1:
digit = list(grid[r][c])[0]

# Remove from other cells in same row.
for c2 in range(9):
  if c2 != c and digit in grid[r][c2]:
grid[r][c2].remove(digit)
changed = True

# Remove from other cells in same column.
for r2 in range(9):
  if r2 != r and digit in grid[r2][c]:
grid[r2][c].remove(digit)
changed = True

# Remove from other cells in the same block of 9.
start_row = r - r % 3
start_column = c - c % 3
for r2 in range(start_row, start_row + 3):
  for c2 in range(start_column, start_column + 3):
if (r2, c2) != (r, c) and digit in grid[r2][c2]:
  grid[r2][c2].remove(digit)
  changed = True

  # Look for digits that occur in only one cell in a row.
  for r in range(9):
counts = Counter()
for c in range(9):
  counts += Counter(grid[r][c])

unique = {digit for digit, times in counts.items() if times == 1}

for c in range(9):
  if len(grid[r][c]) > 1 and len(grid[r][c] & unique) == 1:
grid[r][c] &= unique
changed = True

  # Look for digits that occur in only one cell in a column.
  for c in range(9):
counts = Counter()
for r in range(9):
  counts += Counter(grid[r][c])

unique = {digit for digit, times in counts.items() if times == 1}

for r in range(9):
  if len(grid[r][c]) > 1 and len(grid[r][c] & unique) == 1:
grid[r][c] &= unique
changed = True

  # Look for digits that occur in only one cell in a block of 9.
  for start_row in range(0, 9, 3):
for start_column in range(0, 9, 3):
  counts = Counter()
  for r in range(start_row, start_row + 3):
for c in range(start_column, start_column + 3):
  counts += Counter(grid[r][c])

  unique = {digit for digit, times in counts.items() if times == 1}

  for r in range(start_row, start_row + 3):
for c in range(start_column, start_column + 3):
  if len(grid[r][c]) > 1 and len(grid[r][c] & unique) == 1:
grid[r][c] &= unique

Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread MRAB

On 08/07/2013 21:56, Dave Angel wrote:

On 07/08/2013 01:53 PM, [email protected] wrote:

Hi Steven,

thank you for your reply... I really needed another python guru which
is also an English teacher! Sorry if English is not my mother tongue...
"uncorrect" instead of "incorrect" (I misapplied the "similarity
principle" like "unpleasant...>...uncorrect").

Apart from these trifles, you said:

All characters are UTF-8, characters. "a" is a UTF-8 character. So is "ă".

Not using python 3, for me (a programmer which was present at the beginning of
computer science, badly interacting with many languages from assembler to
Fortran and from c to Pascal and so on) it was an hard job to arrange the
abrupt transition from characters only equal to bytes to some special
characters defined with 2, 3 bytes and even more.


Characters do not have a width.

[snip]

It depends what you mean by "width"! :-)

Try this (Python 3):

>>> print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}")
AA

--
http://mail.python.org/mailman/listinfo/python-list


Re: hex dump w/ or w/out utf-8 chars

2013-07-08 Thread MRAB

On 08/07/2013 23:02, Joshua Landau wrote:

On 8 July 2013 22:38, MRAB  wrote:

On 08/07/2013 21:56, Dave Angel wrote:

Characters do not have a width.


[snip]

It depends what you mean by "width"! :-)

Try this (Python 3):


print("A\N{FULLWIDTH LATIN CAPITAL LETTER A}")

AA


Serious question: How would one find the width of a character by that
definition?


>>> import unicodedata
>>> unicodedata.east_asian_width("A")
'Na'
>>> unicodedata.east_asian_width("\N{FULLWIDTH LATIN CAPITAL LETTER A}")
'F'

The possible widths are:

N  = Neutral
A  = Ambiguous
H  = Halfwidth
W  = Wide
F  = Fullwidth
Na = Narrow

All you then need to do is find out what those actually mean...

--
http://mail.python.org/mailman/listinfo/python-list


Re: GeoIP2 for retrieving city and region ?

2013-07-12 Thread MRAB

On 12/07/2013 17:32, Νικόλας wrote:


I know i have asked before but hwta i get is ISP city not visitors
precise city.

GeoLiteCity.dat isnt accurate that's why it comes for free.
i must somehow get access to GeoIPCity.dat which is the full version.

And of course it can be done, i dont want to believe that it cant.

When visiting http://www.geoiptool.com/en/__ip_info/ it pinpoints my
_exact_ city of living, not the ISP's.


Have you considered that your ISP might be in the same city as you?

According to geoiptool, my ISP is near Leeds, UK, but the important
point is that _I'm not_.


It did not even ask me to allow a geop ip javascript to run it present
sit instantly.

So, it certainly is possible if only one can find the correct database
to use.

So, my question now is, if there is some way we can get an accurate Geo
City database.



--
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-12 Thread MRAB

On 12/07/2013 23:16, Tim Delaney wrote:

On 13 July 2013 03:58, Devyn Collier Johnson mailto:[email protected]>> wrote:


Thanks for the thorough response. I learned a lot. You should write
articles on Python.
I plan to spend some time optimizing the re.py module for Unix
systems. I would love to amp up my programs that use that module.


If you are finding that regular expressions are taking too much time,
have a look at the https://pypi.python.org/pypi/re2/ and
https://pypi.python.org/pypi/regex/2013-06-26 modules to see if they
already give you enough of a speedup.


FYI, you're better off going to http://pypi.python.org/pypi/regex
because that will take you to the latest version.
--
http://mail.python.org/mailman/listinfo/python-list


Re: what thread-synch mech to use for clean exit from a thread

2013-07-15 Thread MRAB

On 15/07/2013 04:04, Steven D'Aprano wrote:

On Mon, 15 Jul 2013 10:27:45 +0800, Gildor Oronar wrote:


A currency exchange thread updates exchange rate once a minute. If the
thread faield to update currency rate for 5 hours, it should inform
main() for a clean exit. This has to be done gracefully, because main()
could be doing something delicate.

I, a newbie, read all the thread sync tool, and wasn't sure which one to
use. In fact I am not sure if there is a need of thread sync, because
there is no racing cond. I thought of this naive way:

class CurrencyExchange():
def __init__(in_case_callback):
   this.callback = in_case_callback


You need to declare the instance parameter, which is conventionally
called "self" not "this". Also, your class needs to inherit from Thread,
and critically it MUST call the superclass __init__.

So:

class CurrencyExchange(threading.Thread):
 def __init__(self, in_case_callback):
 super(CurrencyExchange, self).__init__()
 self.callback = in_case_callback

But I'm not sure that a callback is the right approach here. See below.



def __run__():


Likewise, you need a "self" parameter.



   while time.time() - self.rate_timestamp < 5*3600:
  ... # update exchange rate
  if success:


The "==" in this line should, of course, be "=":


 self.rate_timestamp == time.time()
  time.sleep(60)
   this.callback() # rate not updated 5 hours, a crisis


I think that a cleaner way is to just set a flag on the thread instance.
Initiate it with:

 self.updates_seen = True

in the __init__ method, and then add this after the while loop:

 self.updates_seen = False




def main():
def callback()
   Go_On = False


I don't believe this callback will work, because it will simply create a
local variable call "Go_On", not change the non-local variable.

In Python 3, you can use the nonlocal keyword to get what you want, but I
think a better approach is with a flag on the thread.


agio = CurrencyExchange(in_case = callback)
agio.start()

Go_On = True
while Go_On:
   do_something_delicate(rate_supplied_by=agio)


Change to:

 while agio.updates_seen:
 do_something_delicate...




--
http://mail.python.org/mailman/listinfo/python-list


Re: Question regarding building Python Windows installer

2013-07-15 Thread MRAB

On 15/07/2013 14:11, Mcadams, Philip W wrote:

I’m attempting to create a Python 64-bit Windows Installer.  Following
the instructions here: http://docs.python.org/2/distutils/builtdist.html
I’m to navigate to my Python folder and user command:

python setup.py build --plat-name=win-amd64 bdist_wininst

I get error: COMPILED_WTH_PYDEBUG = (‘—with-pydebug’ in
sysconfig.get_config_var(“CONFIG_ARGS”))

TypeError: argument of type ‘NoneType’ is not iterable

I also have tried:

setup.py build --plat-name=win-amd64 bdist_wininst

and get error:

File “setup.py”, line 263
Print “%-*s %-*s %-*s” % (longest, e, longet, f,

SyntaxError: invalid syntax


Does the line really start with "Print" (initial capital letter)?

Also, are you using Python 2 or Python 3? From the link above it looks
like Python 2.


I followed the instructions here:
http://docs.python.org/devguide/setup.html to create a PC build for
Windows which allows me to run a Python prompt.  Now I need to create a
Windows Installer to install this Python on a Windows Server 2008 R2 box.

To explain why I’m attempting to do this instead of just using the
Windows Installer provided by Python:

I needed to modify a _ssl.c file in the Python source code to deal a
Mercurial that I’m trying to resolve.

Any help on why I’m hitting these errors would be appreciated.



--
http://mail.python.org/mailman/listinfo/python-list


Re: help on python regular expression named group

2013-07-16 Thread MRAB

On 16/07/2013 11:18, Mohan L wrote:




On Tue, Jul 16, 2013 at 2:12 PM, Joshua Landau mailto:[email protected]>> wrote:

On 16 July 2013 07:55, Mohan L mailto:[email protected]>> wrote:
 >
 > Dear All,
 >
 > Here is my script :
 >
 > #!/usr/bin/python
 > import re
 >
 > # A string.
 > logs = "date=2012-11-28 time=21:14:59"
 >
 > # Match with named groups.
 > m =
 >
re.match("(?P(date=(?P[^\s]+))\s+(time=(?P[^\s]+)))",
 > logs)
 >
 > # print
 > print m.groupdict()
 >
 > Output:
 > 
 >
 > {'date': '2012-11-28', 'datetime': 'date=2012-11-28
time=21:14:59', 'time':
 > '21:14:59'}
 >
 >
 > Required output :
 > ==
 >
 > {'date': '2012-11-28', 'datetime': '2012-11-28 21:14:59', 'time':
 > '21:14:59'}
 >
 > need help to correct the below regex
 >
 > (?P(date=(?P[^\s]+))\s+(time=(?P[^\s]+)))"
 >
 > so that It will have : 'datetime': '2012-11-28 21:14:59' instead of
 > 'datetime': 'date=2012-11-28 time=21:14:59'
 >
 > any help would be greatly appreciated

Why do you need to do this in a single Regex? Can't you just "
".join(..) the date and time?


I using another third party python script. It takes the regex from
configuration file. I can't write any code. I have to do all this in
single regex.


A capture group captures a single substring.

What you're asking is for it to with capture 2 substrings (the date and
the time) and then join them together, or capture 1 substring and then
remove part of it.

I don't know of _any_ regex implementation that lets you do that.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Converting a list of lists to a single list

2013-07-23 Thread MRAB

On 23/07/2013 22:52, [email protected] wrote:

I think that itertools may be able to do what I want but I have not been able 
to figure out how.

I want to convert an arbitrary number of lists with an arbitrary number of 
elements in each list into a single list as follows.

Say I have three lists:

[[A0,A1,A2], [B0,B1,B2] [C0,C1,C2]]

I would like to convert those to a single list that looks like this:

[A0,B0,C0,C1,C2,B1,C0,C1,C2,B2,C0,C1,C2,A1,B0,C0,C1,C2,B1,C0,C1,C2,B2,C0,C1,C2,A2,B0,C0,C1,C2,B1,C0,C1,C2,B2,C0,C1,C2]

An easier way to visualize the pattern I want is as a tree.

A0
B0
C0
C1
C2
B1
C0
C1
C2
B2
C0
C1
C2
A1
B0
C0
C1
C2
B1
C0
C1
C2
B2
C0
C1
C2
A2
B0
C0
C1
C2
B1
C0
C1
C2
B2
C0
C1
C2


Using recursion:

def tree_list(items):
if len(items) == 1:
return items[0]

sublist = tree_list(items[1 : ])

result = []

for item in items[0]:
result.append(item)
result.extend(sublist)

return result

items = [["A0","A1","A2"], ["B0","B1","B2"], ["C0","C1","C2"]]
print(tree_list(items))

--
http://mail.python.org/mailman/listinfo/python-list


Re: Python Script Hashplings

2013-07-25 Thread MRAB

On 25/07/2013 14:42, Devyn Collier Johnson wrote:

If I execute a Python3 script with this haspling (#!/usr/bin/python3.3)
and Python3.3 is not installed, but Python3.2 is installed, would the
script still work? Would it fall back to Python3.2?


Why don't you try it?


I hope Dihedral is listening. I would like to see another response from HIM.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Python Script Hashplings

2013-07-26 Thread MRAB

On 26/07/2013 11:43, Chris Angelico wrote:

On Fri, Jul 26, 2013 at 11:37 AM, Devyn Collier Johnson
 wrote:


On 07/25/2013 09:54 AM, MRAB wrote:


On 25/07/2013 14:42, Devyn Collier Johnson wrote:


If I execute a Python3 script with this haspling (#!/usr/bin/python3.3)
and Python3.3 is not installed, but Python3.2 is installed, would the
script still work? Would it fall back to Python3.2?


Why don't you try it?


I hope Dihedral is listening. I would like to see another response from
HIM.




Good point, but if it falls back to Python3.2, how would I know? Plus, I
have Python3.3, 3.2, and 2.7 installed. I cannot uninstall them due to
dependencies.


Easy:

#!/usr/bin/python3.3
import sys
print(sys.version)

Now run that on lots of different computers (virtual computers work
well for this).


There's also sys.version_info:

>>> import sys
>>> sys.version_info
sys.version_info(major=3, minor=3, micro=2, releaselevel='final', serial=0)

If you want to test what would happen if that version wasn't installed,
set the shebang line to a future version, such as Python 3.4. I doubt
you have that installed! :-)

--
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-28 Thread MRAB

On 28/07/2013 19:13, [email protected] wrote:

Le dimanche 28 juillet 2013 05:53:22 UTC+2, Ian a écrit :

On Sat, Jul 27, 2013 at 12:21 PM,   wrote:

> Back to utf. utfs are not only elements of a unique set of encoded

> code points. They have an interesting feature. Each "utf chunk"

> holds intrisically the character (in fact the code point) it is

> supposed to represent. In utf-32, the obvious case, it is just

> the code point. In utf-8, that's the first chunk which helps and

> utf-16 is a mixed case (utf-8 / utf-32). In other words, in an

> implementation using bytes, for any pointer position it is always

> possible to find the corresponding encoded code point and from this

> the corresponding character without any "programmed" information. See

> my editor example, how to find the char under the caret? In fact,

> a silly example, how can the caret can be positioned or moved, if

> the underlying corresponding encoded code point can not be

> dicerned!



Yes, given a pointer location into a utf-8 or utf-16 string, it is

easy to determine the identity of the code point at that location.

But this is not often a useful operation, save for resynchronization

in the case that the string data is corrupted.  The caret of an editor

does not conceptually correspond to a pointer location, but to a

character index.  Given a particular character index (e.g. 127504), an

editor must be able to determine the identity and/or the memory

location of the character at that index, and for UTF-8 and UTF-16

without an auxiliary data structure that is a O(n) operation.



> 2) Take a look at this. Get rid of the overhead.

>

 sys.getsizeof('b'*100 + 'c')

> 126

 sys.getsizeof('b'*100 + '€')

> 240

>

> What does it mean? It means that Python has to

> reencode a str every time it is necessary because

> it works with multiple codings.



Large strings in practical usage do not need to be resized like this

often.  Python 3.3 has been in production use for months now, and you

still have yet to produce any real-world application code that

demonstrates a performance regression.  If there is no real-world

regression, then there is no problem.



> 3) Unicode compliance. We know retrospectively, latin-1,

> is was a bad choice. Unusable for 17 European languages.

> Believe of not. 20 years of Unicode of incubation is not

> long enough to learn it. When discussing once with a French

> Python core dev, one with commit access, he did not know one

> can not use latin-1 for the French language!



Probably because for many French strings, one can.  As far as I am

aware, the only characters that are missing from Latin-1 are the Euro

sign (an unfortunate victim of history), the ligature œ (I have no

doubt that many users just type oe anyway), and the rare capital Ÿ

(the miniscule version is present in Latin-1).  All French strings

that are fortunate enough to be absent these characters can be

represented in Latin-1 and so will have a 1-byte width in the FSR.


--

latin-1? that's not even truth.


sys.getsizeof('a')

26

sys.getsizeof('ü')

38

sys.getsizeof('aa')

27

sys.getsizeof('aü')

39



>>> sys.getsizeof('aa') - sys.getsizeof('a')
1

One byte per codepoint.

>>> sys.getsizeof('üü') - sys.getsizeof('ü')
1

Also one byte per codepoint.

>>> sys.getsizeof('ü') - sys.getsizeof('a')
12

Clearly there's more going on here.

FSR is an optimisation. You'll always be able to find some
circumstances where an optimisation makes things worse, but what
matters is the overall result.

--
http://mail.python.org/mailman/listinfo/python-list


Re: FSR and unicode compliance - was Re: RE Module Performance

2013-07-28 Thread MRAB

On 28/07/2013 20:23, [email protected] wrote:
[snip]


Compare these (a BDFL exemple, where I'using a non-ascii char)

Py 3.2 (narrow build)


Why are you using a narrow build of Python 3.2? It doesn't treat all
codepoints equally (those outside the BMP can't be stored in one code
unit) and, therefore, it isn't "Unicode compliant"!


timeit.timeit("a = 'hundred'; 'x' in a")

0.09897159682121348

timeit.timeit("a = 'hundre€'; 'x' in a")

0.09079501961732461

sys.getsizeof('d')

32

sys.getsizeof('€')

32

sys.getsizeof('dd')

34

sys.getsizeof('d€')

34


Py3.3


timeit.timeit("a = 'hundred'; 'x' in a")

0.12183182740848858

timeit.timeit("a = 'hundre€'; 'x' in a")

0.2365732969632326

sys.getsizeof('d')

26

sys.getsizeof('€')

40

sys.getsizeof('dd')

27

sys.getsizeof('d€')

42

Tell me which one seems to be more "unicode compliant"?
The goal of Unicode is to handle every char "equaly".

Now, the problem: memory. Do not forget that à la "FSR"
mechanism for a non-ascii user is *irrelevant*. As
soon as one uses one single non-ascii, your ascii feature
is lost. (That why we have all these dedicated coding
schemes, utfs included).


sys.getsizeof('abc' * 1000 + 'z')

3026

sys.getsizeof('abc' * 1000 + '\U00010010')

12044

A bit secret. The larger a repertoire of characters
is, the more bits you needs.
Secret #2. You can not escape from this.


jmf



--
http://mail.python.org/mailman/listinfo/python-list


Re: Unexpected results comparing float to Fraction

2013-07-29 Thread MRAB

On 29/07/2013 16:43, Steven D'Aprano wrote:

Comparing floats to Fractions gives unexpected results:

# Python 3.3
py> from fractions import Fraction
py> 1/3 == Fraction(1, 3)
False

but:

py> 1/3 == float(Fraction(1, 3))
True


I expected that float-to-Fraction comparisons would convert the Fraction
to a float, but apparently they do the opposite: they convert the float
to a Fraction:

py> Fraction(1/3)
Fraction(6004799503160661, 18014398509481984)


Am I the only one who is surprised by this? Is there a general rule for
which way numeric coercions should go when doing such comparisons?


I'm surprised that Fraction(1/3) != Fraction(1, 3); after all, floats
are approximate anyway, and the float value 1/3 is more likely to be
Fraction(1, 3) than Fraction(6004799503160661, 18014398509481984).
--
http://mail.python.org/mailman/listinfo/python-list


Re: Unexpected results comparing float to Fraction

2013-07-29 Thread MRAB

On 29/07/2013 17:20, Chris Angelico wrote:

On Mon, Jul 29, 2013 at 5:09 PM, MRAB  wrote:

I'm surprised that Fraction(1/3) != Fraction(1, 3); after all, floats
are approximate anyway, and the float value 1/3 is more likely to be
Fraction(1, 3) than Fraction(6004799503160661, 18014398509481984).


At what point should it become Fraction(1, 3)?


When the error drops below a certain threshold.


Fraction(0.3)

Fraction(5404319552844595, 18014398509481984)

Fraction(0.33)

Fraction(5944751508129055, 18014398509481984)

Fraction(0.333)

Fraction(5998794703657501, 18014398509481984)

Fraction(0.333)

Fraction(6004798902680711, 18014398509481984)

Fraction(0.33)

Fraction(6004799502560181, 18014398509481984)

Fraction(0.3)

Fraction(6004799503160061, 18014398509481984)

Fraction(0.3)

Fraction(6004799503160661, 18014398509481984)

Rounding off like that is a job for a cool library function (one of
which was mentioned on this list a little while ago, I believe), but
not IMO for the Fraction constructor.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Unexpected results comparing float to Fraction

2013-07-29 Thread MRAB

On 29/07/2013 17:40, Ian Kelly wrote:

On Mon, Jul 29, 2013 at 10:20 AM, Chris Angelico  wrote:

On Mon, Jul 29, 2013 at 5:09 PM, MRAB  wrote:

I'm surprised that Fraction(1/3) != Fraction(1, 3); after all, floats
are approximate anyway, and the float value 1/3 is more likely to be
Fraction(1, 3) than Fraction(6004799503160661, 18014398509481984).


At what point should it become Fraction(1, 3)?


At the point where the float is exactly equal to the value you get
from the floating-point division 1/3.  If it's some other float then
the user didn't get there by entering 1/3, so it's not worth trying to
pretend that they did.


I thought that you're not meant to check for equality when using floats.


We do a similar rounding when formatting floats to strings, but in
that case one only has to worry about divisors that are powers of 10.
I imagine it's going to take more time to find the correct fraction
when any pair of relatively prime integers can be a candidate
numerator and denominator.  Additionally, the string rounding only
occurs when the float is being formatted for display; we certainly
don't do it as the result of numeric operations where it could result
in loss of precision.



--
http://mail.python.org/mailman/listinfo/python-list


Re: Bitwise Operations

2013-07-29 Thread MRAB

On 30/07/2013 00:34, Devyn Collier Johnson wrote:


On 07/29/2013 05:53 PM, Grant Edwards wrote:

On 2013-07-29, Devyn Collier Johnson  wrote:


On Python3, how can I perform bitwise operations? For instance, I want
something that will 'and', 'or', and 'xor' a binary integer.

http://www.google.com/search?q=python+bitwise+operations


I understand the symbols. I want to know how to perform the task in a
script or terminal. I have searched Google, but I never saw a command.
Typing "101 & 010" or "x = (int(101, 2) & int(010, 2))" only gives errors.


In Python 2, an integer with a leading 0, such as 0101, was octal (base
8). This was a feature borrowed from C but often confused newbies
because it looked like decimal ("Why does 0101 == 101 return False?").

In Python 3, octal is indicated by a leading 0o, such as 0o101 (==
1*64+0*8+1==65) and the old style raises an exception so that those who
have switched from Python 2 will get a clear message that something has
changed.

For binary you need a leading 0b and for hexadecimal you need a leading
0x, so doing something similar for octal makes sense.

0b101 == 1*4+0*2+0 == 5
0o101 == 1*64+0*8+1 == 65
0x101 == 1*256+0*16+1 == 257

--
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-30 Thread MRAB

On 30/07/2013 15:38, Antoon Pardon wrote:

Op 30-07-13 16:01, [email protected] schreef:


I am pretty sure that once you have typed your 127504 ascii
characters, you are very happy the buffer of your editor does not
waste time in reencoding the buffer as soon as you enter an €, the
125505th char. Sorry, I wanted to say z instead of euro, just to
show that backspacing the last char and reentering a new char
implies twice a reencoding.


Using a single string as an editor buffer is a bad idea in python for
the simple reason that strings are immutable.


Using a single string as an editor buffer is a bad idea in _any_
language because an insertion would require all the following
characters to be moved.


So adding characters would mean continuously copying the string
buffer into a new string with the next character added. Copying
127504 characters into a new string will not make that much of a
difference whether the octets are just copied to octets or are
unpacked into 32 bit words.


Somebody wrote "FSR" is just an optimization. Yes, but in case of
an editor à la FSR, this optimization take place everytime you
enter a char. Your poor editor, in fact the FSR, is finally
spending its time in optimizing and finally it optimizes nothing.
(It is even worse).


Even if you would do it this way, it would *not* take place every
time you enter a char. Once your buffer would contain a wide
character, it would just need to convert the single character that is
added after each keystroke. It would not need to convert the whole
buffer after each key stroke.


If you type correctly a z instead of an €, it is not necessary to
reencode the buffer. Problem, you do you know that you do not have
to reencode? simple just check it, and by just checking it wastes
time to test it you have to optimized or not and hurt a little bit
more what is supposed to be an optimization.


Your scenario is totally unrealistic. First of all because of the
immutable nature of python strings, second because you suggest that
real time usage would result in frequent conversions which is highly
unlikely.


What you would have is a list of mutable chunks.

Inserting into a chunk would be fast, and a chunk would be split if
it's already full. Also, small adjacent chunks would be joined together.

Finally, a chunk could use FSR to reduce memory usage.
--
http://mail.python.org/mailman/listinfo/python-list


Re: RE Module Performance

2013-07-30 Thread MRAB

On 30/07/2013 17:39, Antoon Pardon wrote:

Op 30-07-13 18:13, MRAB schreef:

On 30/07/2013 15:38, Antoon Pardon wrote:

Op 30-07-13 16:01, [email protected] schreef:


I am pretty sure that once you have typed your 127504 ascii
characters, you are very happy the buffer of your editor does not
waste time in reencoding the buffer as soon as you enter an €, the
125505th char. Sorry, I wanted to say z instead of euro, just to
show that backspacing the last char and reentering a new char
implies twice a reencoding.


Using a single string as an editor buffer is a bad idea in python for
the simple reason that strings are immutable.


Using a single string as an editor buffer is a bad idea in _any_
language because an insertion would require all the following
characters to be moved.


Not if you use a gap buffer.


The disadvantage there is that when you move the cursor you must move
characters around. For example, what if the cursor was at the start and
you wanted to move it to the end? Also, when the gap has been filled,
you need to make a new one.

--
http://mail.python.org/mailman/listinfo/python-list


Re: Conditional decoration

2012-06-18 Thread MRAB

On 18/06/2012 23:16, Roy Smith wrote:

Is there any way to conditionally apply a decorator to a function?
For example, in django, I want to be able to control, via a run-time
config flag, if a view gets decorated with @login_required().

@login_required()
def my_view(request):
 pass


A decorator is just syntactic sugar for function application after the
definition.

This:

@deco
def func():
pass

is just another way of writing:

def func():
pass
func = deco(func)

Not as neat, but you can make it conditional.
--
http://mail.python.org/mailman/listinfo/python-list


Re: re.finditer with lookahead and lookbehind

2012-06-20 Thread MRAB

On 20/06/2012 14:30, Christian wrote:

Hi,

i have some trouble to split a pattern like s.   Even have this
problems with the first and last match. Some greedy problems?

Thanks in advance
Christian

import re

s='v1=pattern1&v2=pattern2&v3=pattern3&v4=pattern4&v5=pattern5&x1=patternx'
pattern =r'(?=[a-z0-9]+=)(.*?)(?<=&)'
regex = re.compile(pattern,re.IGNORECASE)
for match in regex.finditer(s):
print  match.group(1)

My intention:
pattern1
pattern2
pattern3
pattern4
pattern5
patternx


You could do it like this:

import re

s = 
'v1=pattern1&v2=pattern2&v3=pattern3&v4=pattern4&v5=pattern5&x1=patternx'

pattern = r'=([^&]*)'
regex = re.compile(pattern, re.IGNORECASE)
for match in regex.finditer(s):
print match.group(1)

or avoid regex entirely:

>>> values = [p.partition("=")[2] for p in s.split("&")]
>>> values
['pattern1', 'pattern2', 'pattern3', 'pattern4', 'pattern5', 'patternx']
--
http://mail.python.org/mailman/listinfo/python-list


  1   2   3   4   5   6   7   8   9   10   >