Introducing pickleDB; a simple, lightweight, and fast key-value database.
Hello I have recently started work on a new project called pickleDB. It is a lightweight key-value database engine (inspired by redis). Check it out at http://packages.python.org/pickleDB -- Harrison Erd -- http://mail.python.org/mailman/listinfo/python-list
SSE4a with ctypes in python? (gcc __builtin_popcount)
Hi guys,
Here is the sample code
http://stackoverflow.com/questions/6389841/efficiently-find-binary-strings-with-low-hamming-distance-in-large-set/6390606#6390606
static inline int distance(unsigned x, unsigned y)
{
return __builtin_popcount(x^y);
}
Is it possible to rewrite the above gcc code in python using ctypes
(preferably Win/*nix compatible)?
TIA!
--
http://mail.python.org/mailman/listinfo/python-list
Re: SSE4a with ctypes in python? (gcc __builtin_popcount)
Am 31.10.2011 04:13, schrieb est: Is it possible to rewrite the above gcc code in python using ctypes (preferably Win/*nix compatible)? No; the (gcc-injected) functions starting with __builtin_* are not "real" functions in the sense that they can be called by calling into a library, but rather are converted to a series of assembler instructions by the compiler directly. Wrapping this (distance) primitive by writing a C-module for Python, thus exposing the respective gcc-generated assembler code to Python through a module, won't yield any relevant speedups either, because most of the time will be spent in the call sequence for calling the function, and not in the actual computation. -- --- Heiko. -- http://mail.python.org/mailman/listinfo/python-list
Re: Introducing pickleDB; a simple, lightweight, and fast key-value database.
patx wrote:
> Hello I have recently started work on a new project called pickleDB. It is
> a lightweight key-value database engine (inspired by redis).
>
> Check it out at http://packages.python.org/pickleDB
>
> import json as pickle # ;)
>
> def load(location):
> global db
> try:
> db = pickle.load(open(location, 'rb'))
> except IOError:
> db = {}
> global loco
> loco = location
> return True
>
> def set(key, value):
> db[key] = value
> pickle.dump(db, open(loco, 'wb'))
> return True
>
> def get(key):
> return db[key]
Hmm, I don't think that will scale...
--
http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
hi http://www.tkdocs.com/tutorial/index.html remember that you have to import line from tkinter import ttk (at "from tkinter import *" ttk in not included) -- http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
hi http://www.tkdocs.com/tutorial/index.html remember that you have to import like from tkinter import ttk (at "from tkinter import *" ttk in not included) -- http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
hi http://www.tkdocs.com/tutorial/index.html remember that you have to import like from tkinter import ttk (at "from tkinter import *" ttk in not included) -- http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
hi http://www.tkdocs.com/tutorial/index.html remember that you have to import like from tkinter import ttk (at "from tkinter import *" ttk in not included) -- http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
hi http://www.tkdocs.com/tutorial/index.html remember that you have to import like from tkinter import ttk (at "from tkinter import *" ttk in not included) -- http://mail.python.org/mailman/listinfo/python-list
Re: How to mix-in __getattr__ after the fact?
Thanks for all of the responses; everyone was exactly correct, and obeying the binding rules for special methods did work in the example above. Unfortunately, I only have read-only access to the class itself (it was a VTK class wrapped with SWIG), so I had to find another way to accomplish what I was after. On Oct 28, 10:26 pm, Lie Ryan wrote: > On 10/29/2011 05:20 AM, Ethan Furman wrote: > > > > > > > > > > > > > Python only looks up __xxx__ methods in new-style classes on the class > > itself, not on the instances. > > > So this works: > > > 8< > > class Cow(object): > > pass > > > def attrgetter(self, a): > > print "CAUGHT: Attempting to get attribute", a > > > bessie = Cow() > > > Cow.__getattr__ = attrgetter > > > print bessie.milk > > 8< > > a minor modification might be useful: > > bessie = Cow() > bessie.__class__.__getattr__ = attrgetter -- http://mail.python.org/mailman/listinfo/python-list
locate executables for different platforms
Suppose that I have a project which (should be)/is multiplatform in python, which, however, uses some executables as black-boxes. These executables are platform-dependent and at the moment they're just thrown inside the same egg, and using pkg_resources to get the path. I would like to rewrite this thing being able to: - detect the OS - find the right executable version - get the path and run it It would be nice to still be able to use pkg_resources, but at that point I think I would need to store all the executables in another egg, is that correct? Is there already something available to manage external multi-platform executables? Thanks, Andrea -- http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
On 10/31/11 12:37 AM, Ric@rdo wrote: What would be an equivalent widget in ttk like a Listbox and if possible a small example? I tried to look here http://docs.python.org/library/ttk.html but did not see anything. Maybe I did not look in the right place? tia The listbox isn't part of the themed ttk widgets. The ttk::treview is, and that can be set up as a single-column list display. There may be an example of how to do this in the docs or source code tree (I don't use the widget myself so I don't have any sample code to share). --Kevin -- Kevin Walzer Code by Kevin http://www.codebykevin.com -- http://mail.python.org/mailman/listinfo/python-list
Re: locate executables for different platforms
On 10/31/2011 02:00 PM, Andrea Crotti wrote: Suppose that I have a project which (should be)/is multiplatform in python, which, however, uses some executables as black-boxes. These executables are platform-dependent and at the moment they're just thrown inside the same egg, and using pkg_resources to get the path. I would like to rewrite this thing being able to: - detect the OS - find the right executable version - get the path and run it It would be nice to still be able to use pkg_resources, but at that point I think I would need to store all the executables in another egg, is that correct? Is there already something available to manage external multi-platform executables? Thanks, Andrea The most simple possible way I can think of to solve this problem would be, to create a directory for each executable, and in each directory the executable with the platform in the name, like: - exec1: + exec1-linux2 + exec1-darwin ... And to look up would be very simple (as below), but I feel that is not very smart... import sys from os import path BASE_DIR = '.' exec_name = sys.argv[1] assert path.isdir(exec_name) plat = sys.platform name_ex = "%s-%s" % (exec_name, plat) if plat == 'win32': name_ex += '.exe' res_ex = path.join(exec_name, name_ex) assert path.isfile(res_ex) print path.abspath(res_ex) -- http://mail.python.org/mailman/listinfo/python-list
[ANN] Karrigell-4.3.6 released
Hi,
A new version of the Karrigell web framework for Python 3.2+ has just
been released on http://code.google.com/p/karrigell/
One of the oldest Python web frameworks around (the first version was
released back in 2002), it now has 2 main versions, one for Python 2
and another one for Python 3. The Python 2.x version is available at
http://karrigell.sf.net ; this branch is maintained, but no new
feature is going to be developed
All the development work is now focused on the version for Python 3.
The first release was published in February and we are already at the
10th release
Karrigell's design is about simplicity for the programmer and
integration of all the web environment in the scripts namespace. For
instance, the "Hello world" script requires 2 lines :
def index():
return "Hello world"
All the HTML tags are available as classes in the scripts namespace :
def index():
return HTML(BODY("Hello world"))
To build an HTML document as a tree, the HTML tags objects support the
operators + (add brother) and <= (add child) :
def index():
form = FORM(action="insert",method="post")
form <= INPUT(name="foo")+BR()+INPUT(name="bar")
form <= INPUT(Type="submit",value="Ok")
return HTML(BODY(form))
The scripts can be served by a built-in web server, or through the
Apache server, either on CGI mode or using the WSGI interface
The package obvioulsy has built-in support for usual features such as
cookie and session management, localization, user login/logout/role
management. It also includes a complete documentation, with a tutorial
and a set of how-to's
A helpful and friendly community welcomes users at
http://groups.google.com/group/karrigell
Enjoy !
Pierre
--
http://mail.python.org/mailman/listinfo/python-list
Re: locate executables for different platforms
On Oct 31, 10:00 am, Andrea Crotti wrote: > Suppose that I have a project which (should be)/is multiplatform in python, > which, however, uses some executables as black-boxes. > > These executables are platform-dependent and at the moment they're just > thrown inside the same egg, and using pkg_resources to get the path. > > I would like to rewrite this thing being able to: > - detect the OS > - find the right executable version > - get the path and run it > > It would be nice to still be able to use pkg_resources, but at that > point I think > I would need to store all the executables in another egg, is that correct? > Is there already something available to manage external multi-platform > executables? > > Thanks, > Andrea While this doesn't answer your question fully, here is a beta snippet I wrote in Python, that returns a list of full pathnames, for a set of specified filenames, found in paths specified by PATH environment variable. Only tested on WIN32. Note on WIN32 systems the snippet tries to find filenames with extensions specified by the environment varible PATHEXT. On Unix it will also try with no extension, of cource (not tested). Enjoy. -- http://mail.python.org/mailman/listinfo/python-list
Re: Review Python site with useful code snippets
When visitors visit your site to post their code; often such posts ask for username and email address; consider adding additional fields to generate some Python documenting feature like Sphinx or epydoc. and let your site inject the docstring (module string) into the snippet; primarily, author, url=YOUR url if not provided by them, date, python version, os, etc... See reStructuRE for possible fields to inject. -- http://mail.python.org/mailman/listinfo/python-list
Tweepy: Invalid arguments at function call (tweepy.Stream())
Hi i'm trying to fetch realtime data from twitter using tweepy.Stream(). So I have tried the following... After successfully authenticate using oauth: auth = tweepy.OAuthHandler(...) (it works fine, i have my access_token.key and secret) i did: streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout='90') and: streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout='90') none of this works, it keeps giving me the same error: Traceback (most recent call last): File "", line 1, in streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(), timeout='60') TypeError: *init*() takes at least 4 arguments (4 given) then i have searched for the parameters of the function: tweedy.streaming.Stream(login,password,Listener(),...etc) but i thought this login and pass was the authentication method using in the basic authentication not in the oauth case. Now i'm really confused, a little help please? pd: As you can see, i'm trying to get realtime data from twitter (and make some further NLP with it), i have chose tweepy because the dev.twitter.compage recommended it, but if you have any other suggestion for doing this, it will be welcomed -- http://mail.python.org/mailman/listinfo/python-list
C API: Making a context manager
I am currently rewritting a class using the Python C API to improve performance of it, however I have not been able to find any documentation about how to make a context manager using the C API. The code I am working to produce is the following (its a method of a class): @contextlib.contextmanager def connected(self, *args, **kwargs): connection = self.connect(*args, **kwargs) try: yield finally: connection.disconnect() For this, my first question is: is there any built-in method to make this type of method in the C API? If not, is there a slot on the type object I am missing for __enter__ and __exit__, or should just be defined using the PyMethodDef struct on the class (presumably named the same as the Python functions)? Chris -- http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
On Mon, 31 Oct 2011 10:00:22 -0400, Kevin Walzer wrote: >On 10/31/11 12:37 AM, Ric@rdo wrote: >> >> What would be an equivalent widget in ttk like a Listbox and if >> possible a small example? I tried to look here >> http://docs.python.org/library/ttk.html but did not see anything. >> >> Maybe I did not look in the right place? >> >> tia > >The listbox isn't part of the themed ttk widgets. The ttk::treview is, >and that can be set up as a single-column list display. There may be an >example of how to do this in the docs or source code tree (I don't use >the widget myself so I don't have any sample code to share). > >--Kevin Thank you for the information. -- http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
On Mon, 31 Oct 2011 10:00:22 -0400, Kevin Walzer wrote: >On 10/31/11 12:37 AM, Ric@rdo wrote: >> >> What would be an equivalent widget in ttk like a Listbox and if >> possible a small example? I tried to look here >> http://docs.python.org/library/ttk.html but did not see anything. >> >> Maybe I did not look in the right place? >> >> tia > >The listbox isn't part of the themed ttk widgets. The ttk::treview is, >and that can be set up as a single-column list display. There may be an >example of how to do this in the docs or source code tree (I don't use >the widget myself so I don't have any sample code to share). > >--Kevin Quick question: Then why is it mentioned here http://www.tkdocs.com/tutorial/morewidgets.html? Listboxes are created using the Listbox function: l = Listbox(parent, height=10) -- http://mail.python.org/mailman/listinfo/python-list
Re: C API: Making a context manager
On Mon, Oct 31, 2011 at 13:34, Chris Kaynor wrote: > I am currently rewritting a class using the Python C API to improve > performance of it, however I have not been able to find any > documentation about how to make a context manager using the C API. > > The code I am working to produce is the following (its a method of a class): > > @contextlib.contextmanager > def connected(self, *args, **kwargs): > connection = self.connect(*args, **kwargs) > try: > yield > finally: > connection.disconnect() > > For this, my first question is: is there any built-in method to make > this type of method in the C API? If not, is there a slot on the type > object I am missing for __enter__ and __exit__, or should just be > defined using the PyMethodDef struct on the class (presumably named > the same as the Python functions)? You'd just add "__enter__" and "__exit__" in the PyMethodDef. If you have the CPython source, we do it in there in a few places. Off the top of my head, PC\winreg.c contains at least one class that works as a context manager (PyHKEY), although there are a few others scattered around the source. -- http://mail.python.org/mailman/listinfo/python-list
Re: C API: Making a context manager
On Mon, Oct 31, 2011 at 12:15 PM, Brian Curtin wrote: > > You'd just add "__enter__" and "__exit__" in the PyMethodDef. If you > have the CPython source, we do it in there in a few places. Off the > top of my head, PC\winreg.c contains at least one class that works as > a context manager (PyHKEY), although there are a few others scattered > around the source. > That is what I figured. I was just hoping there was some helper class similar to the contextmanager decorator that would make it easier to use, however at the same time it makes sense that there is not. Thanks, Chris -- http://mail.python.org/mailman/listinfo/python-list
Experience with ActivateState Stackato or PiCloud SaaS/PaaS offerings?
Looking for feedback from anyone who has tried or is using ActiveState Stackato, PiCloud or other Python orientated SaaS/PaaS offerings? Pros, cons, advice, lessons learned? Thank you, Malcolm -- http://mail.python.org/mailman/listinfo/python-list
Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually and compare them against a set of legal chars using standard Python code (and this works fine), but I will be working with some very large files in the 100's Gb to several Tb size range so I'd thought I'd check to see if there was a built-in in C that might handle this type of check more efficiently. Does this sound like a use case for cython or pypy? Thanks, Malcolm -- http://mail.python.org/mailman/listinfo/python-list
How do I pass a variable to os.popen?
I'm trying to write a simple Python script to print out network
interfaces (as found in the "ifconfig -a" command) and their speed
("ethtool "). The idea is to loop for each interface and
print out its speed. os.popen seems to be the right solution for the
ifconfig command, but it doesn't seem to like me passing the interface
variable as an argument. Code snippet is below:
#!/usr/bin/python
# Quick and dirty script to print out available interfaces and their
speed
# Initializations
output = " Interface: %s Speed: %s"
import os, socket, types
fp = os.popen("ifconfig -a")
dat=fp.read()
dat=dat.split('\n')
for line in dat:
if line[10:20] == "Link encap":
interface=line[:9]
cmd = 'ethtool %interface'
print cmd
gp = os.popen(cmd)
fat=gp.read()
fat=fat.split('\n')
=
I'm printing out "cmd" in an attempt to debug, and "interface" seems
to be passed as a string and not a variable. Obviously I'm a newbie,
and I'm hoping this is a simple syntax issue. Thanks in advance!
--
http://mail.python.org/mailman/listinfo/python-list
Re: How do I pass a variable to os.popen?
On Mon, Oct 31, 2011 at 2:16 PM, extraspecialbitter wrote: > cmd = 'ethtool %interface' That is not Python syntax for string interpolation. Try: cmd = 'ethtool %s' % interface On a side note, os.popen is deprecated. You should look into using the higher-level subprocess.check_output instead. Cheers, Ian -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I pass a variable to os.popen?
On Mon, Oct 31, 2011 at 16:16, extraspecialbitter wrote: > if line[10:20] == "Link encap": > interface=line[:9] > cmd = 'ethtool %interface' > print cmd > gp = os.popen(cmd) because you're saying that cmd is 'ethtool %interface' as you pointed out later on... how about: cmd = 'ethtool %s' % interface note the spaces there... that tells it to convert the contents of interface to a string and insert them into the string you're assigning to cmd... assuming interface is things like eth0, you shoud now see "ethtool eth0" when the print statement runs. -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I pass a variable to os.popen?
On Mon, 31 Oct 2011 13:16:25 -0700, extraspecialbitter wrote: > cmd = 'ethtool %interface' Do you perhaps mean: cmd = 'ethtool %s' % (interface, ) -- http://mail.python.org/mailman/listinfo/python-list
Re: How do I pass a variable to os.popen?
On Mon, Oct 31, 2011 at 1:16 PM, extraspecialbitter
wrote:
> I'm trying to write a simple Python script to print out network
> interfaces (as found in the "ifconfig -a" command) and their speed
> ("ethtool "). The idea is to loop for each interface and
> print out its speed. os.popen seems to be the right solution for the
os.popen() is somewhat deprecated. Use the subprocess module instead.
> ifconfig command, but it doesn't seem to like me passing the interface
> variable as an argument. Code snippet is below:
>
>
>
> #!/usr/bin/python
>
> # Quick and dirty script to print out available interfaces and their
> speed
>
> # Initializations
>
> output = " Interface: %s Speed: %s"
>
> import os, socket, types
>
> fp = os.popen("ifconfig -a")
> dat=fp.read()
> dat=dat.split('\n')
> for line in dat:
> if line[10:20] == "Link encap":
> interface=line[:9]
> cmd = 'ethtool %interface'
cmd will literally contain a percent-sign and the word "interface". If
your shell happens to use % as a prefix to indicate a variable, note
that Python variables are completely separate from and not accessible
from the shell. So either ethtool will get the literal string
"%interface" as its argument, or since there is no such shell
variable, after expansion it will end up getting no arguments at all.
Perhaps you meant:
cmd = "ethtool %s" % interface
Which could be more succinctly written:
cmd = "ethtool " + interface
> print cmd
> gp = os.popen(cmd)
> fat=gp.read()
The subprocess equivalent is:
fat = subprocess.check_output(["ethtool", interface])
> fat=fat.split('\n')
>
> =
>
> I'm printing out "cmd" in an attempt to debug, and "interface" seems
> to be passed as a string and not a variable. Obviously I'm a newbie,
> and I'm hoping this is a simple syntax issue. Thanks in advance!
Cheers,
Chris
--
http://rebertia.com
--
http://mail.python.org/mailman/listinfo/python-list
Re: How do I pass a variable to os.popen?
This was exactly what I was looking for. Thanks!
On Mon, Oct 31, 2011 at 4:48 PM, Chris Rebert wrote:
> On Mon, Oct 31, 2011 at 1:16 PM, extraspecialbitter
> wrote:
> > I'm trying to write a simple Python script to print out network
> > interfaces (as found in the "ifconfig -a" command) and their speed
> > ("ethtool "). The idea is to loop for each interface and
> > print out its speed. os.popen seems to be the right solution for the
>
> os.popen() is somewhat deprecated. Use the subprocess module instead.
>
> > ifconfig command, but it doesn't seem to like me passing the interface
> > variable as an argument. Code snippet is below:
> >
> >
> >
> > #!/usr/bin/python
> >
> > # Quick and dirty script to print out available interfaces and their
> > speed
> >
> > # Initializations
> >
> > output = " Interface: %s Speed: %s"
> >
> > import os, socket, types
> >
> > fp = os.popen("ifconfig -a")
> > dat=fp.read()
> > dat=dat.split('\n')
> > for line in dat:
> >if line[10:20] == "Link encap":
> > interface=line[:9]
> >cmd = 'ethtool %interface'
>
> cmd will literally contain a percent-sign and the word "interface". If
> your shell happens to use % as a prefix to indicate a variable, note
> that Python variables are completely separate from and not accessible
> from the shell. So either ethtool will get the literal string
> "%interface" as its argument, or since there is no such shell
> variable, after expansion it will end up getting no arguments at all.
> Perhaps you meant:
> cmd = "ethtool %s" % interface
> Which could be more succinctly written:
> cmd = "ethtool " + interface
>
> >print cmd
> >gp = os.popen(cmd)
> >fat=gp.read()
>
> The subprocess equivalent is:
> fat = subprocess.check_output(["ethtool", interface])
>
> >fat=fat.split('\n')
> >
> > =
> >
> > I'm printing out "cmd" in an attempt to debug, and "interface" seems
> > to be passed as a string and not a variable. Obviously I'm a newbie,
> > and I'm hoping this is a simple syntax issue. Thanks in advance!
>
> Cheers,
> Chris
> --
> http://rebertia.com
>
--
Paul David Mena
[email protected]
--
http://mail.python.org/mailman/listinfo/python-list
Re: Tweepy: Invalid arguments at function call (tweepy.Stream())
On 10/31/2011 12:18 PM, Ricardo Mansilla wrote:
Hi i'm trying to fetch realtime data from twitter using tweepy.Stream().
A reference to your source for tweepy would help.
The link below gives https://github.com/tweepy/tweepy
for the current source.
http://pypi.python.org/pypi/tweepy/1.7.1
has versions for 2.4,5,6 from May 2010.
You neglected to mention which version of Python you are using
streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(),
timeout='90')
and:
streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(),
timeout='90')
These look identical.
none of this works, it keeps giving me the same error:
Traceback (most recent call last):
File "", line 1, in
streaming_api = tweepy.streaming.Stream(auth, CustomStreamListener(),
timeout='60')
TypeError: *init*() takes at least 4 arguments (4 given)
3.2 prints the proper method name: __init__. I do not remember that
older versions did that conversion, but maybe so. In any case, the error
message us screwed up. It has been improved in current Python.
then i have searched for the parameters of the function:
tweedy.streaming.Stream(login,password,Listener(),...etc)
but i thought this login and pass was the authentication method using in
the basic authentication not in the oauth case.
Now i'm really confused, a little help please?
The current tweepy/streaming.py source code from the site above says:
class Stream(object):
def __init__(self, auth, listener, **options):
self.auth = auth
self.listener = listener
self.running = False
self.timeout = options.get("timeout", 300.0)
According to this, __init__ takes 3 positional params, which is what you
gave it. Perhaps, this was different in an earlier version. Look at the
code you are running.
i have chose tweepy because the
dev.twitter.com page recommended it,
That page mentions no libraries. Perhaps you meant
https://dev.twitter.com/docs/twitter-libraries
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On 10/31/2011 03:54 PM, [email protected] wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually and compare them against a set of legal chars using standard Python code (and this works fine), but I will be working with some very large files in the 100's Gb to several Tb size range so I'd thought I'd check to see if there was a built-in in C that might handle this type of check more efficiently. Does this sound like a use case for cython or pypy? Thanks, Malcolm How about doing a .replace() method call, with all those characters turning into '', and then see if there's anything left? -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On 10/31/2011 05:47 PM, Dave Angel wrote: On 10/31/2011 03:54 PM, [email protected] wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually and compare them against a set of legal chars using standard Python code (and this works fine), but I will be working with some very large files in the 100's Gb to several Tb size range so I'd thought I'd check to see if there was a built-in in C that might handle this type of check more efficiently. Does this sound like a use case for cython or pypy? Thanks, Malcolm How about doing a .replace() method call, with all those characters turning into '', and then see if there's anything left? I was wrong once again. But a simple combination of translate() and split() methods might do it. Here I'm suggesting that the table replace all valid characters with space, so the split() can use its default behavior. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On Mon, Oct 31, 2011 at 4:08 PM, Dave Angel wrote:
> I was wrong once again. But a simple combination of translate() and
> split() methods might do it. Here I'm suggesting that the table replace all
> valid characters with space, so the split() can use its default behavior.
That sounds overly complicated and error-prone. For instance, split()
will split on vertical tab, which is not one of the characters the OP
wanted. I would probably use a regular expression for this.
import re
if re.search(r'[^\r\n\t\040-\177]', string_to_test):
print("Invalid!")
Cheers,
Ian
--
http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On Mon, 31 Oct 2011 17:47:06 -0400, Dave Angel wrote: > On 10/31/2011 03:54 PM, [email protected] wrote: >> Wondering if there's a fast/efficient built-in way to determine if a >> string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or >> Tab? >> >> I know I can look at the chars of a string individually and compare >> them against a set of legal chars using standard Python code (and this >> works fine), but I will be working with some very large files in the >> 100's Gb to several Tb size range so I'd thought I'd check to see if >> there was a built-in in C that might handle this type of check more >> efficiently. >> >> Does this sound like a use case for cython or pypy? >> >> Thanks, >> Malcolm >> > How about doing a .replace() method call, with all those characters > turning into '', and then see if there's anything left? No offense Dave, but do you really think that making a copy of as much as a terabyte of data is *more* efficient than merely scanning the data and stopping on the first non-ASCII character you see? There is no way of telling whether a string includes non-ASCII characters without actually inspecting each character. So in the event that the string *is* fully ASCII text, you have to check every character, there can be no shortcuts. However, there is a shortcut if the string isn't fully ASCII text: once you've found a single non-text character, stop. So the absolute least amount of work you can do is: # Define legal characters: LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f' # everybody forgets about formfeed... \f # and are you sure you want to include chr(127) as a text char? def is_ascii_text(text): for c in text: if c not in LEGAL: return False return True Algorithmically, that's as efficient as possible: there's no faster way of performing the test, although one implementation may be faster or slower than another. (PyPy is likely to be faster than CPython, for example.) But can we get better in Python? Yes, I think so. First off, CPython is optimized for local variable lookups over globals, and since you are looking up the global LEGAL potentially 1 times, even a 1% saving per lookup will help a lot. So the first step is to make a local reference, by adding this line just above the for loop: legal = LEGAL But we can do even better still. Each time we test for "c not in legal", we do a linear search of 100 characters. On average, that will mean comparing 50 characters for equality at best. We can do better by using a set or frozenset, which gives us approximately constant time lookups: legal = frozenset(LEGAL) Finally, we can try to do as much work as possible in fast C code and as little as necessary in relatively slow Python: def is_ascii_text(text): legal = frozenset(LEGAL) return all(c in legal for c in text) Since all() is guaranteed to keep short-cut semantics, that will be as fast as possible in Python, and quite possibly just as fast as any C extension you might write. If that's still too slow, use smaller files or get a faster computer :) -- Steven -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On 10/31/2011 3:54 PM, [email protected] wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I presume you also want to disallow the other ascii control chars? I know I can look at the chars of a string individually and compare them against a set of legal chars using standard Python code (and this works If, by 'string', you mean a string of bytes 0-255, then I would, in Python 3, where bytes contain ints in [0,255], make a byte mask of 256 0s and 1s (not '0's and '1's). Example: mask = b'\0\1'*121 for c in b'\0\1help': print(mask[c]) 1 0 1 0 1 1 In your case, use \1 for forbidden and replace the print with "if mask[c]: ; break" In 2.x, where iterating byte strings gives length 1 byte strings, you would need ord(c) as the index, which is much slower. fine), but I will be working with some very large files in the 100's Gb to several Tb size range so I'd thought I'd check to see if there was a built-in in C that might handle this type of check more efficiently. Does this sound like a use case for cython or pypy? Cython should get close to c speed, especially with hints. Make sure you compile something like the above as Py 3 code. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On 10/31/11 18:02, Steven D'Aprano wrote:
# Define legal characters:
LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f'
# everybody forgets about formfeed... \f
# and are you sure you want to include chr(127) as a text char?
def is_ascii_text(text):
for c in text:
if c not in LEGAL:
return False
return True
Algorithmically, that's as efficient as possible: there's no faster way
of performing the test, although one implementation may be faster or
slower than another. (PyPy is likely to be faster than CPython, for
example.)
Additionally, if one has some foreknowledge of the character
distribution, one might be able to tweak your
def is_ascii_text(text):
legal = frozenset(LEGAL)
return all(c in legal for c in text)
with some if/else chain that might be faster than the hashing
involved in a set lookup (emphasis on the *might*, not being an
expert on CPython internals) such as
def is_ascii_text(text):
return all(
(' ' <= c <= '\x7a') or
c == '\n' or
c == '\t'
for c in text)
But Steven's main points are all spot on: (1) use an O(1) lookup;
(2) return at the first sign of trouble; and (3) push it into the
C implementation rather than a for-loop. (and the "locals are
faster in CPython" is something I didn't know)
-tkc
--
http://mail.python.org/mailman/listinfo/python-list
Re: ttk Listbox
On 10/31/11 4:03 PM, Ric@rdo wrote: On Mon, 31 Oct 2011 10:00:22 -0400, Kevin Walzer wrote: On 10/31/11 12:37 AM, Ric@rdo wrote: What would be an equivalent widget in ttk like a Listbox and if possible a small example? I tried to look here http://docs.python.org/library/ttk.html but did not see anything. Maybe I did not look in the right place? tia The listbox isn't part of the themed ttk widgets. The ttk::treview is, and that can be set up as a single-column list display. There may be an example of how to do this in the docs or source code tree (I don't use the widget myself so I don't have any sample code to share). --Kevin Quick question: Then why is it mentioned here http://www.tkdocs.com/tutorial/morewidgets.html? Listboxes are created using the Listbox function: l = Listbox(parent, height=10) The listbox is a Tk widget, not a ttk widget. It's one of the original Tk/Tkinter widgets, and has no themed equivalent. -- Kevin Walzer Code by Kevin http://www.codebykevin.com -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On Mon, Oct 31, 2011 at 4:08 PM, Dave Angel wrote: Yes. Actually, you don't even need the split() -- you can pass an optional deletechars parameter to translate(). On Oct 31, 5:52 pm, Ian Kelly wrote: > That sounds overly complicated and error-prone. Not really. > For instance, split() will split on vertical tab, > which is not one of the characters the OP wanted. That's just the default behavior. You can explicitly specify the separator to split on. But it's probably more efficient to just use translate with deletechars. > I would probably use a regular expression for this. I use 'em all the time, but not for stuff this simple. Regards, Pat -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On 10/31/2011 7:02 PM, Steven D'Aprano wrote: On Mon, 31 Oct 2011 17:47:06 -0400, Dave Angel wrote: On 10/31/2011 03:54 PM, [email protected] wrote: Wondering if there's a fast/efficient built-in way to determine if a string has non-ASCII chars outside the range ASCII 32-127, CR, LF, or Tab? I know I can look at the chars of a string individually and compare them against a set of legal chars using standard Python code (and this works fine), but I will be working with some very large files in the 100's Gb to several Tb size range so I'd thought I'd check to see if there was a built-in in C that might handle this type of check more efficiently. Does this sound like a use case for cython or pypy? Thanks, Malcolm How about doing a .replace() method call, with all those characters turning into '', and then see if there's anything left? No offense Dave, but do you really think that making a copy of as much as a terabyte of data is *more* efficient than merely scanning the data and stopping on the first non-ASCII character you see? There is no way of telling whether a string includes non-ASCII characters without actually inspecting each character. So in the event that the string *is* fully ASCII text, you have to check every character, there can be no shortcuts. However, there is a shortcut if the string isn't fully ASCII text: once you've found a single non-text character, stop. So the absolute least amount of work you can do is: # Define legal characters: LEGAL = ''.join(chr(n) for n in range(32, 128)) + '\n\r\t\f' # everybody forgets about formfeed... \f # and are you sure you want to include chr(127) as a text char? def is_ascii_text(text): for c in text: if c not in LEGAL: return False return True If text is 3.x bytes, this does not work ;-). OP did not specify bytes or unicode or Python version. Algorithmically, that's as efficient as possible: This is a bit strange since you go on to explain that it is inefficient -- O(n*k) where n = text length and k = legal length -- whereas below is O(n). there's no faster way of performing the test, although one implementation may be faster or slower than another. (PyPy is likely to be faster than CPython, for example.) But can we get better in Python? Yes, I think so. First off, CPython is optimized for local variable lookups over globals, and since you are looking up the global LEGAL potentially 1 times, even a 1% saving per lookup will help a lot. So the first step is to make a local reference, by adding this line just above the for loop: legal = LEGAL But we can do even better still. Each time we test for "c not in legal", we do a linear search of 100 characters. On average, that will mean comparing 50 characters for equality at best. We can do better by using a set or frozenset, which gives us approximately constant time lookups: legal = frozenset(LEGAL) Finally, we can try to do as much work as possible in fast C code and as little as necessary in relatively slow Python: def is_ascii_text(text): legal = frozenset(LEGAL) return all(c in legal for c in text) Since all() is guaranteed to keep short-cut semantics, that will be as fast as possible in Python, A dangerous statement to make. 'c in legal' has to get hash(c) and look that up in the hash table, possible skipping around a bit if t If text is byte string rather than unicode, a simple lookup 'mask[c]', where mask is a 0-1 byte array, should be faster (see my other post). On my new Pentium Win 7 machine, it is -- by albout 5%. For 100,000,000 legal bytes, a minimum of 8.69 versus 9.17 seconds. from time import clock legal_set = frozenset(range(32, 128)) legal_ray = 128 * b'\1' illegal = 128 * b'\0' # only testing legal char 'a' text = b'a' * 1 print(clock()) print(all(c in legal_set for c in text), clock()) # min 9.17 t = clock() print(all(legal_ray[c] for c in text), clock()-t) # min 8.69 ##for c in text: ##if illegal[c]: print(False); break # slower, about 9.7 ##print(True, clock()) The explicit loop took about 9.7 seconds. It is more flexible as it could detect the position of the first bad character, or of all of them. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On 10/31/2011 08:32 PM, Patrick Maupin wrote: On Mon, Oct 31, 2011 at 4:08 PM, Dave Angel wrote: Yes. Actually, you don't even need the split() -- you can pass an optional deletechars parameter to translate(). On Oct 31, 5:52 pm, Ian Kelly wrote: That sounds overly complicated and error-prone. Not really. For instance, split() will split on vertical tab, which is not one of the characters the OP wanted. That's just the default behavior. You can explicitly specify the separator to split on. But it's probably more efficient to just use translate with deletechars. I would probably use a regular expression for this. I use 'em all the time, but not for stuff this simple. Regards, Pat I would claim that a well-written (in C) translate function, without using the delete option, should be much quicker than any python loop, even if it does copy the data. Incidentally, on the Pentium family, there's a machine instruction for that, to do the whole loop in one instruction (with rep prefix). I don't know if the library version is done so. And with the delete option, it wouldn't be copying anything, if the data is all legal. As for processing a gig of data, I never said to do it all in one pass. Process it maybe 4k at a time, and quit the first time you encounter a character not in the table. But I didn't try to post any code, since the OP never specified Python version, nor the encoding of the data. He just said string. And we all know that without measuring, it's all speculation. DaveA -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: Tweepy: Invalid arguments at function call (tweepy.Stream()) (Terry Reedy)
Thanks a lot for your answer. I'm using python 2.7.2 and tweetpy 1.7 >>> help(tweepy) Help on package tweepy: NAME tweepy - Tweepy Twitter API library (...) VERSION 1.7.1 and probably that is the problem, the link that you gave me refers to the 1.2 version page... Anyway, i already have their IRC direction and i think it would be easier to find support there. Thanks again. Ricardo Mansilla ps: sometimes i get lazy about writing the whole link to a precise direction which lacks of importance in my point; please, don't judge me for my exquisite way of keep the attention in the correct place... :) -- http://mail.python.org/mailman/listinfo/python-list
Re: Efficient, built-in way to determine if string has non-ASCII chars outside ASCII 32-127, CRLF, Tab?
On Oct 31, 9:12 pm, Dave Angel wrote:
> I would claim that a well-written (in C) translate function, without
> using the delete option, should be much quicker than any python loop,
> even if it does copy the data.
Are you arguing with me? I was agreeing with you, I thought, that
translate would be faster than a regex. I also pointed out that the
delete function was available as , but OTOH, that might be a little
slow because I think it does the deletion before the translate. I
think I'd just do something like this:
>>> transtab = ''.join(' ' if 32 <= x <= 126 else chr(x) for x in range(256))
>>> 'abc\x05\def\x0aghi'.translate(transtab).replace(' ', '')
'\x05\n'
Regards,
Pat
--
http://mail.python.org/mailman/listinfo/python-list
Re: Unicode literals and byte string interpretation.
On Oct 28, 3:06 am, Steven D'Aprano wrote:
> On Thu, 27 Oct 2011 20:05:13 -0700, Fletcher Johnson wrote:
> > If I create a newUnicodeobject u'\x82\xb1\x82\xea\x82\xcd' how does
> > this creation process interpret the bytes in the byte string?
>
> It doesn't, because there is no byte-string. You have created aUnicode
> object from aliteralstring ofunicodecharacters, not bytes. Those
> characters are:
>
> Dec Hex Char
> 130 0x82 ‚
> 177 0xb1 ±
> 130 0x82 ‚
> 234 0xea ê
> 130 0x82 ‚
> 205 0xcd Í
>
> Don't be fooled that all of the characters happen to be in the range
> 0-255, that is irrelevant.
>
> > Does it
> > assume the string represents a utf-16 encoding, at utf-8 encoding,
> > etc...?
>
> None of the above. It assumes nothing. It takes a string of characters,
> end of story.
>
> > For reference the string is これは in the 'shift-jis' encoding.
>
> No it is not. The way to get aunicodeliteralwith those characters is
> to use aunicode-aware editor or terminal:
>
> >>> s = u'これは'
> >>> for c in s:
>
> ... print ord(c), hex(ord(c)), c
> ...
> 12371 0x3053 こ
> 12428 0x308c れ
> 12399 0x306f は
>
> You are confusing characters with bytes. I believe that what you are
> thinking of is the following: you start with a byte string, and then
> decode it intounicode:
>
> >>> bytes = '\x82\xb1\x82\xea\x82\xcd' # not u'...'
> >>> text = bytes.decode('shift-jis')
> >>> print text
>
> これは
>
> If you get the encoding wrong, you will get the wrong characters:
>
> >>> print bytes.decode('utf-16')
>
> 놂춂
>
> If you start with theUnicodecharacters, you can encode it into various
> byte strings:
>
> >>> s = u'これは'
> >>> s.encode('shift-jis')
>
> '\x82\xb1\x82\xea\x82\xcd'>>> s.encode('utf-8')
>
> '\xe3\x81\x93\xe3\x82\x8c\xe3\x81\xaf'
>
> --
> Steven
Thanks Steven. You are right. I was confusing characters with bytes.
--
http://mail.python.org/mailman/listinfo/python-list
Module for Python and SGE interaction
Hey Guys I shud mention I am relative new to the language. Could you please let me know based on your experience which module could help me with farm out jobs to our existing clusters(we use SGE here) using python. Ideally I would like to do the following. 1. Submit #N jobs to cluster 2. monitor their progress 3. When all #N finishes, push another set of jobs Thanks! -Abhi -- http://mail.python.org/mailman/listinfo/python-list
