Re: "
Ian Kelly, 04.05.2012 01:02: > BeautifulSoup is supposed to parse like a browser would Not at all, that would be html5lib. Stefan -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
Steve Howell writes:
> compressor = zlib.compressobj()
> s = compressor.compress("foobar")
> s += compressor.flush(zlib.Z_SYNC_FLUSH)
>
> s_start = s
> compressor2 = compressor.copy()
I think you also want to make a decompressor here, and initialize it
with s and then clone it. Then you don't have to reinitialize every
time you want to decompress something.
I also seem to remember that the first few bytes of compressed output
are always some fixed string or checksum, that you can strip out after
compression and put back before decompression, giving further savings in
output size when you have millions of records.
--
http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On Thu, May 3, 2012 at 11:03 PM, Paul Rubin wrote: > > Sort of as you suggest, you could build a Huffman encoding for a > > representative run of data, save that tree off somewhere, and then use > > it for all your future encoding/decoding. > > Zlib is better than Huffman in my experience, and Python's zlib module > already has the right entry points. > > Isn't zlib kind of dated? Granted, it's newer than Huffman, but there's been bzip2 and xz since then, among numerous others. Here's something for xz: http://stromberg.dnsalias.org/svn/xz_mod/trunk/ An xz module is in the CPython 3.3 alphas - the above module wraps it if available, otherwise it uses ctypes or a pipe to an xz binary.. And I believe bzip2 is in the standard library for most versions of CPython. -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On May 3, 11:59 pm, Paul Rubin wrote:
> Steve Howell writes:
> > compressor = zlib.compressobj()
> > s = compressor.compress("foobar")
> > s += compressor.flush(zlib.Z_SYNC_FLUSH)
>
> > s_start = s
> > compressor2 = compressor.copy()
>
> I think you also want to make a decompressor here, and initialize it
> with s and then clone it. Then you don't have to reinitialize every
> time you want to decompress something.
Makes sense. I believe I got that part correct:
https://github.com/showell/KeyValue/blob/master/salted_compressor.py
> I also seem to remember that the first few bytes of compressed output
> are always some fixed string or checksum, that you can strip out after
> compression and put back before decompression, giving further savings in
> output size when you have millions of records.
I'm pretty sure this happens for free as long as the salt is large
enough, but maybe I'm misunderstanding.
--
http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
Steve Howell writes: > Makes sense. I believe I got that part correct: > > https://github.com/showell/KeyValue/blob/master/salted_compressor.py The API looks nice, but your compress method makes no sense. Why do you include s.prefix in s and then strip it off? Why do you save the prefix and salt in the instance, and have self.salt2 and s[len(self.salt):] in the decompress? You should be able to just get the incremental bit. > I'm pretty sure this happens for free as long as the salt is large > enough, but maybe I'm misunderstanding. No I mean there is some fixed overhead (a few bytes) in the compressor output, to identify it as such. That's fine when the input and output are both large, but when there's a huge number of small compressed strings, it adds up. -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On May 4, 1:01 am, Paul Rubin wrote: > Steve Howell writes: > > Makes sense. I believe I got that part correct: > > > https://github.com/showell/KeyValue/blob/master/salted_compressor.py > > The API looks nice, but your compress method makes no sense. Why do you > include s.prefix in s and then strip it off? Why do you save the prefix > and salt in the instance, and have self.salt2 and s[len(self.salt):] > in the decompress? You should be able to just get the incremental bit. This is fixed now. https://github.com/showell/KeyValue/commit/1eb316d6e9e44a37ab4f3ca73fcaf4ec0e7f22b4#salted_compressor.py > > I'm pretty sure this happens for free as long as the salt is large > > enough, but maybe I'm misunderstanding. > > No I mean there is some fixed overhead (a few bytes) in the compressor > output, to identify it as such. That's fine when the input and output > are both large, but when there's a huge number of small compressed > strings, it adds up. It it's in the header, wouldn't it be part of the output that comes before Z_SYNC_FLUSH? -- http://mail.python.org/mailman/listinfo/python-list
Re: Create directories and modify files with Python
On 1/05/12 17:34:57, [email protected] wrote: > from __future__ import print_function #1 > > > > #1: Not sure whether you're using Python 2 or 3. I ran > this on Python 2.7 and think it will run on Python 3 if > you remove this line. You don't have to remove that line: Python3 will accept it. It doesn't do anything in python3, since 'print' is a function whether or not you include that line, but for backward compatibility, you're still allowed to say it. Incidentally, the same is true for all __future__ features. For example, Python3 still accepts: from __future__ import nested_scopes , even though it's only really needed if you're using python 2.1, since from 2.2 onwards scopes have nested with or without that command. HTH, -- HansM -- http://mail.python.org/mailman/listinfo/python-list
Re: pyjamas / pyjs
On 5/4/2012 12:52 AM, John O'Hagan wrote: Just read the thread on pyjamas-dev. Even without knowing anything about the lead-up to the coup, its leader's linguistic contortions trying to justify it And what is the name of the miscreant, so we know who to have nothing to with? -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
Steve Howell writes: >> You should be able to just get the incremental bit. > This is fixed now. Nice. > It it's in the header, wouldn't it be part of the output that comes > before Z_SYNC_FLUSH? Hmm, maybe you are right. My version was several years ago and I don't remember it well, but I half-remember spending some time diddling around with this issue. -- http://mail.python.org/mailman/listinfo/python-list
Re: pyjamas / pyjs
On Thursday, 3 May 2012 12:52:36 UTC+1, alex23 wrote: > Anyone else following the apparent hijack of the pyjs project from its > lead developer? Yes, me. The guy now in control got the owner of the domain name to turn it over to him, which is probably ok legally, but he had no public mandate or support. As far as I can see from the mailing list, only 3 or 4 out of the 650 subscribers actively support his actions. He's a long time contributor and genuinely seems quite talented. However there's no getting away from the fact that he's done this undemocratically, when he could have forked the project. To my mind he hasn't made a good enough reasoned justification of his arguments and he's coming across as being very defensive at the moment. The former leader, Luke Leighton, seemed to have vanished from the face of the earth but I mailed him yesterday and he's on holiday so trying not to pay too much attention to it at the moment. There's also an allegation, which I am not making myself at this point - only describing its nature, that a person may have lifted data from the original mail server without authorisation and used it to recreate the mailing list on a different machine. *If* that were to be true, then the law has been broken in at least one country. I'm arguing that there should be a public consultation over who gets to run this project and I'm also thinking of making a suggestion to the python software foundation or maybe other bodies such as the FSF (I'm not a FOSS expert but they were suggested by others) that they host a fork of this project so that we can have a legitimate and stable route forward. The problem for me with all this is that I use pyjamas in a commercial capacity and (sorry if this sounds vague but I have to be a bit careful) there are probably going to be issues with our clients - corporate people distrust FOSS at the best of times and this kind of thing will make them run for the bloody hills. In fact, there appear to be a lot of "sleeper" users who make a living out of this stuff and the actions of the new de-facto leader has jeopardised this, pretty needlessly in our opinion. James -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On Thu, May 3, 2012 at 11:16 PM, Terry Reedy wrote:
> On 5/3/2012 8:36 PM, Peng Yu wrote:
>>
>> Hi,
>>
>> list(a_set)
>>
>> When convert two sets with the same elements to two lists, are the
>> lists always going to be the same (i.e., the elements in each list are
>> ordered the same)? Is it documented anywhere?
>
>
> "A set object is an unordered collection of distinct hashable objects".
> If you create a set from unequal objects with equal hashes, the iteration
> order may (should, will) depend on the insertion order as the first object
> added with a colliding hash will be at its 'natural position in the hash
> table while succeeding objects will be elsewhere.
>
> Python 3.3.0a3 (default, May 1 2012, 16:46:00)
hash('a')
> -292766495615408879
hash(-292766495615408879)
> -292766495615408879
a = {'a', -292766495615408879}
b = {-292766495615408879, 'a'}
list(a)
> [-292766495615408879, 'a']
list(b)
> ['a', -292766495615408879]
Thanks. This is what I'm looking for. I think that this should be
added to the python document as a manifestation (but nonnormalized) of
what "A set object is an unordered collection of distinct hashable
objects" means.
--
Regards,
Peng
--
http://mail.python.org/mailman/listinfo/python-list
Re: pyjamas / pyjs
By the way, there's a lot more to say on this, which I'll cover another time. There are arguments for and against what's happened; at this stage I'm just trying to flag up that there is *not* unanimity and we are not just carrying on as normal. -- http://mail.python.org/mailman/listinfo/python-list
Re: syntax for code blocks
On 5/4/2012 4:44, alex23 wrote:
On May 4, 2:17 am, Kiuhnm wrote:
On 5/3/2012 2:20, alex23 wrote:
locals() is a dict. It's not injecting anything into func's scope
other than a dict so there's not going to be any name clashes. If you
don't want any of its content in your function's scope, just don't use
that content.
The clashing is *inside* the dictionary itself. It contains *all* local
functions and variables.
This is nonsense.
locals() produces a dict of the local scope. I'm passing it into a
function. Nothing in the local scope clashes, so the locals() dict has
no "internal clashing". Nothing is injecting it into the function's
local scope, so _there is no "internal clashing"_.
To revise, your original "pythonic" example was, effectively:
def a(): pass
def b(): pass
func_packet = {'a': a, 'b': b}
func(arg, func_packet)
My version was:
def a(): pass
def b(): pass
func_packet = locals()
func(arg, func_packet)
Now, please explain how that produces name-clashes that your version
does not.
It doesn't always produce name-clashes but it may do so.
Suppose that func takes some functions named fn1, fn2 and fn3. If you
only define fn2 but you forget that you already defined somewhere before
fn1, you inadvertently pass to func both fn1 and fn2.
Even worse, if you write
def a(): pass
def b(): pass
func(arg, locals())
and then you want to call func again with c() alone, you must write this:
def c(): pass
a = b = None
func(arg, locals())
Moreover, think what happens if you add a function whose name is equal
to that of a function accepted by func.
That's what I call name-clashing.
My solution avoids all these problems, promote encapsulation and let you
program in a more functional way which is more concise that the OOP way,
sometimes.
That's not the same thing. If a function accepts some optional
callbacks, and you call that function more than once, you will have
problems. You'll need to redefine some callbacks and remove others.
That's total lack of encapsulation.
Hand-wavy, no real example, doesn't make sense.
Really? Then I don't know what would make sense to you.
You haven't presented *any* good code or use cases.
Says who? You and some others? Not enough.
So far, pretty much everyone who has tried to engage you on this
subject on the list. I'm sorry we're not all ZOMGRUBYBLOCKS111
like the commenters on your project page.
It's impossible to have a constructive discussion while you and others
feel that way. You're so biased that you don't even see how biased you are.
The meaning is clear from the context.
Which is why pretty much every post in this thread mentioned finding
it confusing?
I would've come up with something even better if only Python wasn't so rigid.
The inability for people to add 6 billion mini-DSLs to solve any
stupid problem _is a good thing_. It makes Python consistent and
predictable, and means I don't need to parse _the same syntax_ utterly
different ways depending on the context.
If I and my group of programmers devised a good and concise syntax and
semantics to describe some applicative domain, then we would want to
translate that into the language we use.
Unfortunately, Python doesn't let you do that.
I also think that uniformity is the death of creativity. What's worse,
uniformity in language is also uniformity in thinking.
As I said in some other posts, I think that Python is a good language,
but as soon as you need to do something a little different or just
differently, it's a pain to work with.
Because that would reveal part of the implementation.
Suppose you have a complex visitor. The OOP way is to subclass, while
the FP way is to accept callbacks. Why the FP way? Because it's more
concise.
In any case, you don't want to reveal how the visitor walks the data
structure or, better, the user doesn't need to know about it.
Again, nothing concrete, just vague intimations of your way being
better.
Sigh.
So define&use a different scope! Thankfully module level isn't the
only one to play with.
We can do OOP even in ASM, you know?
???
You can do whatever you want by hand: you can certainly define your
functions inside another function or a class, but that's just more noise
added to the mix.
I'm sorry but it is still clear-as-mud what you're trying to show
here. Can you show _one_ practical, real-world, non-toy example that
solves a real problem in a way that Python cannot?
I just did. It's just that you can't see it.
"I don't understand this example, can you provide one." "I just did,
you didn't understand it."
Your rephrasing is quite wrong. You asked for a practical example and I
said that I already showed you one. It's just that you can't see it (as
practical).
Okay, done with this now. Your tautologies and arrogance are not
clarifying your position at all, and I really don't give a damn, so
*plonk*
I don't care if you don't read this post.
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On Fri, May 4, 2012 at 8:14 PM, Peng Yu wrote: > Thanks. This is what I'm looking for. I think that this should be > added to the python document as a manifestation (but nonnormalized) of > what "A set object is an unordered collection of distinct hashable > objects" means. There are other things that can prove it to be unordered, too; the exact pattern and order of additions and deletions can affect the iteration order. The only thing you can be sure of is that you can't be sure of it. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: syntax for code blocks
On Fri, May 4, 2012 at 9:12 PM, Kiuhnm wrote: > If I and my group of programmers devised a good and concise syntax and > semantics to describe some applicative domain, then we would want to > translate that into the language we use. > Unfortunately, Python doesn't let you do that. No, this is not unfortunate. Python does certain things and does them competently. If Python doesn't let you write what you want the way you want, then you do not want Python. This is not an insult to Python, nor is it a cop-out whereby the Python Cabal tells you to shut up and go away, you aren't doing things the Proper Way, you need to change your thinking to be more in line with Correct Syntax. It is simply a reflection of the nature of languages. If I want to write a massively-parallel program that can be divided across any number of computers around the world, Python isn't the best thing to use. If I want to write a MUD with efficient reloading of code on command, Python isn't the best thing to use. If I want to write a device driver, Python isn't the best thing to use. If I want to write a simple script that does exactly what it should and didn't take me long to write, then Python quite likely IS the best thing to use. But whatever you do, play to the strengths of the language you use, don't play to its weaknesses. Don't complain when C leaks the memory that you forgot to free(), don't bemoan LISP's extreme parenthesizing, don't fight the Python object model. You'll only hurt yourself. In any case, you know where to find Ruby any time you want it. ChrisA -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On Fri, May 4, 2012 at 6:21 AM, Chris Angelico wrote: > On Fri, May 4, 2012 at 8:14 PM, Peng Yu wrote: >> Thanks. This is what I'm looking for. I think that this should be >> added to the python document as a manifestation (but nonnormalized) of >> what "A set object is an unordered collection of distinct hashable >> objects" means. > > There are other things that can prove it to be unordered, too; the > exact pattern and order of additions and deletions can affect the > iteration order. The only thing you can be sure of is that you can't > be sure of it. I agree. My point was just to suggest adding more explanations on the details in the manual. -- Regards, Peng -- http://mail.python.org/mailman/listinfo/python-list
set PYTHONPATH for a directory?
I'm testing some software I'm building against an alternative version of a library. So I have an alternative library in directory L. Then I have in an unrelated directory, the test software, which I need to use the library version from directory L. One approach is to set PYTHONPATH whenever I run this test software. Any suggestion on a more foolproof approach? -- http://mail.python.org/mailman/listinfo/python-list
Re: numpy (matrix solver) - python vs. matlab
On 05/04/2012 05:52 AM, Steven D'Aprano wrote:
On Thu, 03 May 2012 19:30:35 +0200, someone wrote:
So how do you explain that the natural frequencies from FEM (with
condition number ~1e6) generally correlates really good with real
measurements (within approx. 5%), at least for the first 3-4 natural
frequencies?
I would counter your hand-waving ("correlates really good", "within
approx 5%" of *what*?) with hand-waving of my own:
Within 5% of experiments of course.
There is not much else to compare with.
"Sure, that's exactly what I would expect!"
*wink*
By the way, if I didn't say so earlier, I'll say so now: the
interpretation of "how bad the condition number is" will depend on the
underlying physics and/or mathematics of the situation. The
interpretation of loss of digits of precision is a general rule of thumb
that holds in many diverse situations, not a rule of physics that cannot
be broken in this universe.
If you have found a scenario where another interpretation of condition
number applies, good for you. That doesn't change the fact that, under
normal circumstances when trying to solve systems of linear equations, a
condition number of 1e6 is likely to blow away *all* the accuracy in your
measured data. (Very few physical measurements are accurate to more than
six digits.)
Not true, IMHO.
Eigenfrequencies (I think that is a very typical physical measurement
and I cannot think of something that is more typical) don't need to be
accurate with 6 digits. I'm happy with below 5% error. So if an
eigenfrequency is measured to 100 Hz, I'm happy if the numerical model
gives a result in the 5%-range of 95-105 Hz. This I got with a condition
number of approx. 1e6 and it's good enough for me. I don't think anyone
expects 6-digit accuracy with eigenfrequncies.
--
http://mail.python.org/mailman/listinfo/python-list
Re: numpy (matrix solver) - python vs. matlab
On 05/04/2012 06:15 AM, Russ P. wrote: On May 3, 4:59 pm, someone wrote: On 05/04/2012 12:58 AM, Russ P. wrote: Ok, but I just don't understand what's in the "empirical" category, sorry... I didn't look it up, but as far as I know, empirical just means based on experiment, which means based on measured data. Unless I am FEM based on measurement data? Still, I don't understand it, sorry. mistaken , a finite element analysis is not based on measured data. I'm probably a bit narrow-thinking because I just worked with this small FEM-program (in Matlab), but can you please give an example of a matrix-problem that is based on measurement data? Yes, the results can be *compared* with measured data and perhaps calibrated with measured data, but those are not the same thing. Exactly. That's why I don't understand what solving a matrix system using measurement/empirical data, could typically be an example of...? I agree with Steven D's comment above, and I will reiterate that a condition number of 1e6 would not inspire confidence in me. If I had a condition number like that, I would look for a better model. But that's just a gut reaction, not a hard scientific rule. I don't have any better model and don't know anything better. I still think that 5% accuracy is good enough and that nobody needs 6-digits precision for practical/engineering/empirical work... Maybe quantum physicists needs more than 6 digits of accuracy, but most practical/engineering problems are ok with an accuracy of 5%, I think, IMHO... Please tell me if I'm wrong. -- http://mail.python.org/mailman/listinfo/python-list
Re: pyjamas / pyjs
james hedley wrote: > There's also an allegation, which I am not making myself at this point > - only describing its nature, that a person may have lifted data from > the original mail server without authorisation and used it to recreate > the mailing list on a different machine. *If* that were to be true, > then the law has been broken in at least one country. > I don't know whether they moved it to another machine or not, but what they definitely did do was start sending emails to all the people on the list who had sending of emails disabled (including myself) which resulted in a flood of emails and from the sound of it a lot of annoyed people. If he wanted to community support for the takeover that probably wasn't a good start. In case it isn't obvious why I might be subscribed but emails turned off, I read mailing lists like that through gmane in which case I still need to sign up to the list to post but definitely don't want to receive emails. -- Duncan Booth http://kupuguy.blogspot.com -- http://mail.python.org/mailman/listinfo/python-list
Re: set PYTHONPATH for a directory?
On 05/04/2012 08:21 AM, Neal Becker wrote: > I'm testing some software I'm building against an alternative version of a > library. So I have an alternative library in directory L. Then I have in an > unrelated directory, the test software, which I need to use the library > version > from directory L. > > One approach is to set PYTHONPATH whenever I run this test software. Any > suggestion on a more foolproof approach? > Simply modify sys.path at the beginning of your test software. That's where import searches. -- DaveA -- http://mail.python.org/mailman/listinfo/python-list
Re: set PYTHONPATH for a directory?
Isn't virtualenv for this kind of scenario? Pedro. On Fri, May 4, 2012 at 3:46 PM, Dave Angel wrote: > On 05/04/2012 08:21 AM, Neal Becker wrote: >> I'm testing some software I'm building against an alternative version of a >> library. So I have an alternative library in directory L. Then I have in an >> unrelated directory, the test software, which I need to use the library >> version >> from directory L. >> >> One approach is to set PYTHONPATH whenever I run this test software. Any >> suggestion on a more foolproof approach? >> > Simply modify sys.path at the beginning of your test software. That's > where import searches. > > > > -- > > DaveA > > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list
Re: "
On Fri, May 4, 2012 at 12:57 AM, Stefan Behnel wrote: > Ian Kelly, 04.05.2012 01:02: >> BeautifulSoup is supposed to parse like a browser would > > Not at all, that would be html5lib. Well, I guess that depends on whether we're talking about BeautifulSoup 3 (a regex-based screen scraper with methods for navigating and searching the resulting tree) or 4 (purely a parse tree navigation library that relies on external libraries to do the actual parsing). According to the BS3 documentation, "The BeautifulSoup class is full of web-browser-like heuristics for divining the intent of HTML authors." If we're talking about BS4, though, then the problem in this instance would have nothing to do with BS4 and instead would be an issue of whatever underlying parser the OP is using. -- http://mail.python.org/mailman/listinfo/python-list
Re: syntax for code blocks
On 05/04/2012 05:12 AM, Kiuhnm wrote: >> Hand-wavy, no real example, doesn't make sense. > > Really? Then I don't know what would make sense to you. Speaking as as an observer here, I've read your blog post, and looked at your examples. They don't make sense to me either. They aren't real examples. They are abstract examples. They do not answer the questions, "what actual, real world python problems does this solve?" and "how is this better than a plain python solution?" For example, I've seen ruby code where blocks are used in a real-world way. Could you not put in something similar in your examples? Since you've written this code you must use it in everyday python coding. Show us what you've been doing with it. Also while some of your blog snippets are snippets, other code examples you provide purport to be complete examples, when in fact they are not. For example, about 45% of the way down your blog page you have a block of code that looks to be self-contained. It has "import logging" and "import random" at the top of it. Yet it cannot run as it's missing an import of your module. You haven't presented *any* good code or use cases. >>> >>> Says who? You and some others? Not enough. How many people do you need to tell you this before it's good enough? Doesn't matter how genius your code is if no one knows when or how to use it. > It's impossible to have a constructive discussion while you and others > feel that way. You're so biased that you don't even see how biased you are. Having followed the conversation somewhat, I can say that you have been given a fair hearing. People aren't just dissing on it because it's ruby. You are failing to listen to them just as much as you claim they are failing to listen to them. >>> The meaning is clear from the context. Not really. For one we're not Ruby programmers here, and like has been said, where is a real example of real code that's not just some abstract "hello this is block1, this is block 2" sort of thing? Providing non-block code to compare is important too. > Unfortunately, communication is a two-people thing. It's been clear from > the first post that your intention wasn't to understand what I'm proposing. > There are some things, like what I say about name-clashing, that you > should understand no matter how biased you are. > If you don't, you're just pretending or maybe you weren't listening at all. well there's my attempt. -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On May 3, 6:10 pm, Miki Tebeka wrote: > > I'm looking for a fairly lightweight key/value store that works for > > this type of problem: > > I'd start with a benchmark and try some of the things that are already in the > standard library: > - bsddb > - sqlite3 (table of key, value, index key) > - shelve (though I doubt this one) > Thanks. I think I'm ruling out bsddb, since it's recently deprecated: http://www.gossamer-threads.com/lists/python/python/106494 I'll give sqlite3 a spin. Has anybody out there wrapped sqlite3 behind a hash interface already? I know it's simple to do conceptually, but there are some minor details to work out for large amounts of data (like creating the index after all the inserts), so if somebody's already tackled this, it would be useful to see their code. > You might find that for a little effort you get enough out of one of these. > > Another module which is not in the standard library is hdf5/PyTables and in > my experience very fast. Thanks. -- http://mail.python.org/mailman/listinfo/python-list
Re: syntax for code blocks
You know what I find rich about all of this? >>>[ ... ]> I'd like to change the syntax of my module 'codeblocks' to make it >>>more >>>[ ... ]> pythonic. Kiuhnm posted a thread to the group asking us to help him make it more Pythonic, but he has steadfastly refused every single piece of help he was offered because he feels his code is good enough after all. So why are we perpetuating it? ~Temia -- When on earth, do as the earthlings do. -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On 05/04/12 10:27, Steve Howell wrote: > On May 3, 6:10 pm, Miki Tebeka wrote: >>> I'm looking for a fairly lightweight key/value store that works for >>> this type of problem: >> >> I'd start with a benchmark and try some of the things that are already in >> the standard library: >> - bsddb >> - sqlite3 (table of key, value, index key) > > Thanks. I think I'm ruling out bsddb, since it's recently deprecated: Have you tested the standard library's anydbm module (certainly not deprecated)? In a test I threw together, after populating one gig worth of data, lookups were pretty snappy (compared to the lengthy time it took to populate the 1gb of junk data). -tkc -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On 5/4/2012 8:00 AM, Peng Yu wrote: On Fri, May 4, 2012 at 6:21 AM, Chris Angelico wrote: On Fri, May 4, 2012 at 8:14 PM, Peng Yu wrote: Thanks. This is what I'm looking for. I think that this should be added to the python document as a manifestation (but nonnormalized) of what "A set object is an unordered collection of distinct hashable objects" means. There are other things that can prove it to be unordered, too; the exact pattern and order of additions and deletions can affect the iteration order. The only thing you can be sure of is that you can't be sure of it. I agree. My point was just to suggest adding more explanations on the details in the manual. I am not sure how much clearer we can be in the language manual. The word 'unordered' means just that. If one imposes an arbitrary linear order on an unordered collection, it is arbitrary. It is frustrating that people do not want to believe that, and even write tests depending on today's arbitrary serialization order being deterministic indefinitely. There is a section about this in the doctest doc, but people do it anyway. I will think about a sentence to add. -- Terry Jan Reedy -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On 05/04/12 12:22, Steve Howell wrote:
> Which variant do you recommend?
>
> """ anydbm is a generic interface to variants of the DBM database
> — dbhash (requires bsddb), gdbm, or dbm. If none of these modules
> is installed, the slow-but-simple implementation in module
> dumbdbm will be used.
>
> """
If you use the stock anydbm module, it automatically chooses the
best it knows from the ones available:
import os
import hashlib
import random
from string import letters
import anydbm
KB = 1024
MB = KB * KB
GB = MB * KB
DESIRED_SIZE = 1 * GB
KEYS_TO_SAMPLE = 20
FNAME = "mydata.db"
i = 0
md5 = hashlib.md5()
db = anydbm.open(FNAME, 'c')
try:
print("Generating junk data...")
while os.path.getsize(FNAME) < 6*GB:
key = md5.update(str(i))[:16]
size = random.randrange(1*KB, 4*KB)
value = ''.join(random.choice(letters)
for _ in range(size))
db[key] = value
i += 1
print("Gathering %i sample keys" % KEYS_TO_SAMPLE)
keys_of_interest = random.sample(db.keys(), KEYS_TO_SAMPLE)
finally:
db.close()
print("Reopening for a cold sample set in case it matters")
db = anydbm.open(FNAME)
try:
print("Performing %i lookups")
for key in keys_of_interest:
v = db[key]
print("Done")
finally:
db.close()
(your specs said ~6gb of data, keys up to 16 characters, values of
1k-4k, so this should generate such data)
-tkc
--
http://mail.python.org/mailman/listinfo/python-list
RE: most efficient way of populating a combobox (in maya)
> > I'm making a GUI in maya using python only and I'm trying to see which > > is more efficient. I'm trying to populate an optionMenuGrp / combo box > > whose contents come from os.listdir(folder). Now this is fine if the > > folder isn't that full but the folder has a few hundred items (almost in > > the thousands), it is also on the (work) network and people are > > constantly reading from it as well. Now I'm trying to write the GUI so > > that it makes the interface, and using threading - Thread, populate the > > box. Is this a good idea? Has anyone done this before and have > > experience with any limitations on it? Is the performance not > > significant? > > Thanks for any advice > > > Why don't you try it and see? > > > It's not like populating a combobox in Tkinter with the contents of > os.listdir requires a large amount of effort. Just try it and see whether > it performs well enough. In my experience, a generic combobox with hundreds or thousands of elements is difficult and annoying to use. Not sure if the Tkinter version has scroll bars or auto-completion, but if not you may want to subclass and add those features. Ramit Ramit Prasad | JPMorgan Chase Investment Bank | Currencies Technology 712 Main Street | Houston, TX 77002 work phone: 713 - 216 - 5423 -- This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email. -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On 5/4/2012 10:46 AM Tim Chase said...
I hit a few snags testing this on my winxp w/python2.6.1 in that getsize
wasn't finding the file as it was created in two parts with .dat and
.dir extension.
Also, setting key failed as update returns None.
The changes I needed to make are marked below.
Emile
import os
import hashlib
import random
from string import letters
import anydbm
KB = 1024
MB = KB * KB
GB = MB * KB
DESIRED_SIZE = 1 * GB
KEYS_TO_SAMPLE = 20
FNAME = "mydata.db"
FDATNAME = r"mydata.db.dat"
i = 0
md5 = hashlib.md5()
db = anydbm.open(FNAME, 'c')
try:
print("Generating junk data...")
while os.path.getsize(FNAME)< 6*GB:
while os.path.getsize(FDATNAME) < 6*GB:
key = md5.update(str(i))[:16]
md5.update(str(i))
key = md5.hexdigest()[:16]
size = random.randrange(1*KB, 4*KB)
value = ''.join(random.choice(letters)
for _ in range(size))
db[key] = value
i += 1
print("Gathering %i sample keys" % KEYS_TO_SAMPLE)
keys_of_interest = random.sample(db.keys(), KEYS_TO_SAMPLE)
finally:
db.close()
print("Reopening for a cold sample set in case it matters")
db = anydbm.open(FNAME)
try:
print("Performing %i lookups")
for key in keys_of_interest:
v = db[key]
print("Done")
finally:
db.close()
--
http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On 05/04/12 14:14, Emile van Sebille wrote: > On 5/4/2012 10:46 AM Tim Chase said... > > I hit a few snags testing this on my winxp w/python2.6.1 in that getsize > wasn't finding the file as it was created in two parts with .dat and > .dir extension. Hrm...must be a Win32 vs Linux thing. > Also, setting key failed as update returns None. Doh, that's what I get for not testing my hand-recreation of the test program I cobbled together and then deleted. Thanks for tweaking that. -tkc -- http://mail.python.org/mailman/listinfo/python-list
pickle question: sequencing of operations
What is the sequence of calls when unpickling a class with __setstate__? >From experimentation I see that __setstate__ is called and __init__ is not, but I think I need more info. I'm trying to pickle an instance of a class that is a subclass of another class that contains unpickleable objects. What I'd like to do is basically just pickle the constructor parameters and then use those to reconstruct the object on unpickle, but I'm not sure how to go about this. Or an example if anyone has one. -- Russell -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On Fri, May 4, 2012 at 12:43 PM, Terry Reedy wrote: > On 5/4/2012 8:00 AM, Peng Yu wrote: >> >> On Fri, May 4, 2012 at 6:21 AM, Chris Angelico wrote: >>> >>> On Fri, May 4, 2012 at 8:14 PM, Peng Yu wrote: Thanks. This is what I'm looking for. I think that this should be added to the python document as a manifestation (but nonnormalized) of what "A set object is an unordered collection of distinct hashable objects" means. >>> >>> >>> There are other things that can prove it to be unordered, too; the >>> exact pattern and order of additions and deletions can affect the >>> iteration order. The only thing you can be sure of is that you can't >>> be sure of it. >> >> >> I agree. My point was just to suggest adding more explanations on the >> details in the manual. > > > I am not sure how much clearer we can be in the language manual. The word > 'unordered' means just that. If one imposes an arbitrary linear order on an > unordered collection, it is arbitrary. It is frustrating that people do not > want to believe that, and even write tests depending on today's arbitrary > serialization order being deterministic indefinitely. There is a section > about this in the doctest doc, but people do it anyway. I will think about a > sentence to add. You can just add the example that you posted to demonstrate what the unordered means. A curious user might want to know under what condition the "unorderness" can affect the results, because for trivial examples (like the following), it does seem that there is some orderness in a set. set(['a', 'b', 'c']) set(['c', 'b', 'a']) -- Regards, Peng -- http://mail.python.org/mailman/listinfo/python-list
Re: key/value store optimized for disk storage
On 5/4/2012 12:49 PM Tim Chase said... On 05/04/12 14:14, Emile van Sebille wrote: On 5/4/2012 10:46 AM Tim Chase said... I hit a few snags testing this on my winxp w/python2.6.1 in that getsize wasn't finding the file as it was created in two parts with .dat and .dir extension. Hrm...must be a Win32 vs Linux thing. Or an anydbm thing -- you may get different results depending... Emile -- http://mail.python.org/mailman/listinfo/python-list
for loop: weird behavior
Hi there,
I simply can't print anything in the second for-loop bellow:
#
#!/usr/bin/env python
import sys
filename = sys.argv[1]
outname = filename.split('.')[0] + '_pdr.dat'
begin = 'Distance distribution'
end = 'Reciprocal'
first = 0
last = 0
with open(filename) as inf:
for num, line in enumerate(inf, 1):
#print num, line
if begin in line:
first = num
if end in line:
last = num
for num, line in enumerate(inf, 1):
print 'Ok!'
print num, line
if num in range(first + 5, last - 1):
print line
print first, last
print range(first + 5, last - 1)
The output goes here:
http://pastebin.com/egnahct2
Expected: at least the string 'Ok!' from the second for-loop.
What I'm doing wrong?
thanks in advance.
Fred
--
View this message in context:
http://python.6.n6.nabble.com/for-loop-weird-behavior-tp4953214.html
Sent from the Python - python-list mailing list archive at Nabble.com.
--
http://mail.python.org/mailman/listinfo/python-list
Re: for loop: weird behavior
On 5/4/2012 4:33 PM, ferreirafm wrote:
Hi there,
I simply can't print anything in the second for-loop bellow:
#
#!/usr/bin/env python
import sys
filename = sys.argv[1]
outname = filename.split('.')[0] + '_pdr.dat'
begin = 'Distance distribution'
end = 'Reciprocal'
first = 0
last = 0
with open(filename) as inf:
for num, line in enumerate(inf, 1):
#print num, line
if begin in line:
first = num
if end in line:
last = num
The file pointer is now at the end of the file. As an iterator, the file
is exhausted. To reiterate, return the file pointer to the beginning
with inf.seek(0).
for num, line in enumerate(inf, 1):
print 'Ok!'
print num, line
if num in range(first + 5, last - 1):
print line
print first, last
print range(first + 5, last - 1)
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list
Re: recruiter spam
Please don't spam the list with job adverts, post to the job board instead: http://www.python.org/community/jobs/howto/ cheers, Chris On 03/05/2012 22:13, Preeti Bhattad wrote: Hi there, If you have USA work visa and if you reside in USA; -- Simplistix - Content Management, Batch Processing & Python Consulting - http://www.simplistix.co.uk -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On 04May2012 15:08, Peng Yu wrote: | On Fri, May 4, 2012 at 12:43 PM, Terry Reedy wrote: | > On 5/4/2012 8:00 AM, Peng Yu wrote: | >> On Fri, May 4, 2012 at 6:21 AM, Chris Angelico wrote: | >>> On Fri, May 4, 2012 at 8:14 PM, Peng Yu wrote: | Thanks. This is what I'm looking for. I think that this should be | added to the python document as a manifestation (but nonnormalized) of | what "A set object is an unordered collection of distinct hashable | objects" means. | >>> | >>> There are other things that can prove it to be unordered, too; the | >>> exact pattern and order of additions and deletions can affect the | >>> iteration order. The only thing you can be sure of is that you can't | >>> be sure of it. | >> | >> I agree. My point was just to suggest adding more explanations on the | >> details in the manual. | > | > I am not sure how much clearer we can be in the language manual. The word | > 'unordered' means just that. [...] | | You can just add the example that you posted to demonstrate what the | unordered means. A curious user might want to know under what | condition the "unorderness" can affect the results, because for | trivial examples (like the following), it does seem that there is some | orderness in a set. I'm with Terry here: anything else in the line you suggest would complicate things for the reader, and potentially mislead. Future implementation changes (and, indeed, _other_ implementations like Jython) can change any of this. So there _are_ no ``condition the "unorderness" can affect the results'': a set is unordered, and you could even _legitimately_ get different orders from the same set if you iterate over it twice. It is unlikely, but permissable. Any attempt to describe such conditions beyond "it might happen at any time" would be misleading. | set(['a', 'b', 'c']) | set(['c', 'b', 'a']) The language does not say these will get the same iteration order. It happens that the Python you're using, today, does that. You can't learn the language specification from watching behaviour; you learn the guarrenteed behaviour -- what you may rely on happening -- from the specification, and you can test that an implementation obeys (or at any rate, does not disobey) the specification by watching behaviour. You seem to be trying to learn the spec from behaviour. Cheers, -- Cameron Simpson DoD#743 http://www.cskk.ezoshosting.com/cs/ Loud pipes make noise. Skill and experience save lives. - EdBob Morandi -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On Fri, May 4, 2012 at 6:12 PM, Cameron Simpson wrote: > On 04May2012 15:08, Peng Yu wrote: > | On Fri, May 4, 2012 at 12:43 PM, Terry Reedy wrote: > | > On 5/4/2012 8:00 AM, Peng Yu wrote: > | >> On Fri, May 4, 2012 at 6:21 AM, Chris Angelico wrote: > | >>> On Fri, May 4, 2012 at 8:14 PM, Peng Yu wrote: > | Thanks. This is what I'm looking for. I think that this should be > | added to the python document as a manifestation (but nonnormalized) of > | what "A set object is an unordered collection of distinct hashable > | objects" means. > | >>> > | >>> There are other things that can prove it to be unordered, too; the > | >>> exact pattern and order of additions and deletions can affect the > | >>> iteration order. The only thing you can be sure of is that you can't > | >>> be sure of it. > | >> > | >> I agree. My point was just to suggest adding more explanations on the > | >> details in the manual. > | > > | > I am not sure how much clearer we can be in the language manual. The word > | > 'unordered' means just that. [...] > | > | You can just add the example that you posted to demonstrate what the > | unordered means. A curious user might want to know under what > | condition the "unorderness" can affect the results, because for > | trivial examples (like the following), it does seem that there is some > | orderness in a set. > > I'm with Terry here: anything else in the line you suggest would > complicate things for the reader, and potentially mislead. > > Future implementation changes (and, indeed, _other_ implementations like > Jython) can change any of this. So there _are_ no ``condition the > "unorderness" can affect the results'': a set is unordered, and you > could even _legitimately_ get different orders from the same set > if you iterate over it twice. It is unlikely, but permissable. > > Any attempt to describe such conditions beyond "it might happen at any > time" would be misleading. > > | set(['a', 'b', 'c']) > | set(['c', 'b', 'a']) > > The language does not say these will get the same iteration order. It > happens that the Python you're using, today, does that. > > You can't learn the language specification from watching behaviour; > you learn the guarrenteed behaviour -- what you may rely on happening -- > from the specification, and you can test that an implementation obeys (or > at any rate, does not disobey) the specification by watching behaviour. > > You seem to be trying to learn the spec from behaviour. My point is if something is said in the document, it is better to be substantiated by an example. I don't think that this has anything with "learn the spec from behaviour." -- Regards, Peng -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
On 05/05/2012 00:37, Peng Yu wrote: My point is if something is said in the document, it is better to be substantiated by an example. I don't think that this has anything with "learn the spec from behaviour." I side with the comments made by Terry Reedy and Cameron Simpson so please give it a rest, you're flogging a dead horse. -- Cheers. Mark Lawrence. -- http://mail.python.org/mailman/listinfo/python-list
Re: When convert two sets with the same elements to lists, are the lists always going to be the same?
Peng, I actually am thinking about it.
Underlying problem: while unordered means conceptually unordered as far
as the collection goes, the items in the collection, if homogenous
enough, may have a natural order, which users find hard to ignore. Even
if not comparable, an implementation such as CPython that uses linear
sequential memory will impose some order. Even if the implementation
uses unordered (holographic?) memory, order will be imposed to iterate,
as when creating a serialized representation of the collection. Abstract
objects, concrete objects, and serialized representations are three
different things, but people tend to conflate them.
Order consistency issues: if the unordered collection is iterated, when
can one expect the order to be the same? Theoretically, essentially
never, except that iterating dicts by keys, values, or key-value pairs
is guaranteed to be consistent, which means that re-iterating has to be
consistent. I actually think the same might as well be true for sets,
although there is no doc that says so.
If two collections are equal, should the iteration order be the same? It
has always been true that if hash values collide, insertion order
matters. However, a good hash function avoids hash collisions as much as
possible in practical use cases. Without doing something artificial, as
I did with the example, collisions should be especially rare on 64-bit
builds. If one collection has a series of additions and deletions so
that the underlying hash table has a different size than an equal
collection build just from insertions, then order will also be different.
If the same collection is built by insertion in the same order, but in
different runs, bugfix versions, or language versions, will iteration
order by the same? Historically, it has been for CPython for about a
decade, and people has come to depend on that constancy, in spite of
warning not to. Even core developers have not been immune, as the
CPython test suite has a few set or dict iteration order dependencies
until it was recently randomized.
Late last year, it became obvious that this constancy is a practical
denial-of-service security hole. The option to randomize hashing for
each run was added to 2.6, 2.7, 3.1, and 3.2. Randomized hashing by run
is part of 3.3. So some of the discussion above is obsolete. The example
I gave only works for that one run, as hash('a') changes each run. So
iteration order now changes with each run in fact as well as in theory.
For the doc, the problem is what to say and where without being
repetitous (and to get multiple people to agree ;-).
--
Terry Jan Reedy
--
http://mail.python.org/mailman/listinfo/python-list
