Re: [Python-Dev] standard library mimetypes module patholog ically broken?

2009-08-15 Thread Antoine Pitrou
Hello,

Jacob Rus  gmail.com> writes:
> 
> Okay, I made another patch,
> 
> http://bugs.python.org/issue6626
> 
> That adds some deprecation warnings to many of the functions/methods
> in the module.

After a fair amount of discussion on Rietveld, I think you should post another
patch without the deprecations.
(since the discussion was fairly long, I won't repeat here the reasons I gave
unless someone asks me to :-))
Besides, it would be nice to have the additional tests you were talking about.

Thanks for doing this anyway.

> (I think the 'strict' parameters should also be deprecated. But I'm
> considering actually making a new class, MimeTypesRegistry, or
> something, and then just making its API stay mostly compatible with
> MimeTypes, but extended to behave the way I think it should, and
> deprecating the MimeTypes class altogether, making it a subclass in
> the interim.)

This sounds very pie-in-the-sky compared to the original intent of the patch
(that is, fix the mimetypes module's implementation oddities). Let's remain
focused. The more a patch tries to cater for different issues, the less easy it
if to review and discuss (and, consequently, the less likely it is to go to the
end of the approval process).

Regards

Antoine.


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] random number generator state

2009-08-15 Thread Scott David Daniels

I find I have a need in randomized testing for a shorter version
of getstate, even if it _is_ slower to restore.  When running
exhaustive tests, a failure report should show the start state
of the generator.  Unfortunately, our current state includes a
625-element array.  I want a state that can be read off a report
and typed in to reproduce the state.  Something a bit like the
initial seed, a count of cycle calls, and a few other things.

So, in addition to .getstate() and  .setstate(...), I'd at
least need to have .get_slow_state() and possibly expand what
.setstate(...) takes.  However, a call to .setstate should
reset the counter or all is for naught.  That means I need to
change the results of .getstate, thus giving me three kinds of
input to .setstate: old, new-short, and new-long.  In trying to
get this to work, I found what might be a bug:
code says
  mt[0] = 0x8000UL; /* MSB is 1; assuring non-zero initial array */
but probably should be:
  mt[0] |= 0x8000UL; /* MSB is 1; assuring non-zero initial array */

In checking into that issue, I went to the original Mersenne-Twister
code, and I see the original authors are pursuing a newer generator,
dSFMT.

I now have a dilemma.  Should I continue the work on the original M-T
code (which is now seeming problematic for compatibility) or simply make
a new generator with similar calls using dSFMT and put the new feature
in that where there is no compatibility problem.  Which would be more
useful for the Python community?

--Scott David Daniels
scott.dani...@acm.org

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] request for comments - standardization of python's purelib and platlib

2009-08-15 Thread Brett Cannon
Please do not cross-post to python-dev. This discussion has been taken
to the distutils SIG.

On Fri, Aug 14, 2009 at 17:59, David Lyon wrote:
>
> Hi Tarek,
>
> What is needed is to remove/refactor the hardcoding of paths that
> currently exists within distutils and replace it with the ability to
> override the defaults via configuration files. (distutils.cfg?)
>
> If there's one thing that's certain for the future, its that
> python will go onto more platforms. Using different paths.
>
> When people are complaining about paths being hard-coded into
> distutils and it causing angst, I think that their complaints are
> valid.
>
> I can find posts going back to 2004 for windows users complaining
> about exactly the same thing. So it isn't a new issue. The problem
> applies to both linux and windows.
>
> Anyway.. do you know the code that we're talking about?
>
> David
>
>
> On Fri, 14 Aug 2009 10:02:03 +0200, Tarek Ziadé 
> wrote:
>> On Thu, Aug 13, 2009 at 9:22 PM, Brett Cannon wrote:
>>>
>>>
>>> On Thu, Aug 13, 2009 at 11:23, Jan Matejek 
>>> wrote:

 Hello,

 I'm cross-posting this to distributi...@freedesktop and python-dev,
 because the topic is relevant to both groups and should be solved in
 cooperation.

 The issue:

 In Python's default configuration (on linux), both purelib (location
> for
 pure python modules) and platlib (location for platform-dependent
> binary
 extensions) point to $prefix/lib/pythonX.Y/site-packages.
 That is no good for two main reasons.

 One, python depends on the "lib" directory. (from distro's point of
 view, prefix is /usr, so let's talk /usr/lib) Due to this, it's
 impossible to install python under /usr/lib64 without heavy patching.
 Repeated attempts to bring python developers to acknowledge and rectify
 the situation have all failed (common argument here is "that would mean
 redesign of distutils and huge parts of whatnot").
>>>
>>> This is now Tarek's call, so this may or may not have changed in terms
> of
>>> what the (now) distutils maintainer thinks.
>>>
>>
>> I don't recall those repeated attempts , but I've been around for less
>> than two years.
>>
>> You are very welcome to come in the Distutils-SIG ML to discuss these
>> matters.
>> I'm moving the discussion there.
>>
>> Among the proposals you have detailed, the sharedir way seems like the
>> most simple/interesting
>> one (depending on you answer to Brett's question )
>>
>>
>> Regards
>> Tarek
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe: 
> http://mail.python.org/mailman/options/python-dev/brett%40python.org
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] random number generator state

2009-08-15 Thread Raymond Hettinger


[Scott David Daniels]

I find I have a need in randomized testing for a shorter version
of getstate, even if it _is_ slower to restore.  When running
exhaustive tests, a failure report should show the start state
of the generator.  Unfortunately, our current state includes a
625-element array.  I want a state that can be read off a report
and typed in to reproduce the state.  Something a bit like the
initial seed, a count of cycle calls, and a few other things.


Sounds like you could easily wrap the generator to get this.
It would slow you down but would give the information you want.

I think it would be a mistake to complexify the API to accomodate
short states -- I'm not even sure than they are generally useful
(recording my initial seed and how many cycles I've run through
is only helpful for sequences short enough that I'm willing to rerun
them).

I'm curious what your use case is.  Why not just record the 
the sequence as generated -- I don't see any analytic value to
just knowing the initial seed and cycle count.  


Ability to print out a short state implies that you are using only a
small subset of possible states (i.e. the ones you can get to with
a short seed).  A short state print out isn't even possible if you actually
have a random initial state (every state having an equal chance of
being the starting point).




 In trying to
get this to work, I found what might be a bug:
code says
  mt[0] = 0x8000UL; /* MSB is 1; assuring non-zero initial array */
but probably should be:
  mt[0] |= 0x8000UL; /* MSB is 1; assuring non-zero initial array */


Please file a bug report for this and assign to me.  I put in the existing
MT code and took it directly from the author's published (and widely
tested code).  Also, our tests for MT exactly reproduce their published test
sequence.  But, if there is an error, I would be happy to fix it.




In checking into that issue, I went to the original Mersenne-Twister
code, and I see the original authors are pursuing a newer generator,
dSFMT.


The MT itself has the advantage of having been widely exercised and
tested.  The newer generator may have more states but has not been
as extensively tested.



I now have a dilemma.  Should I continue the work on the original M-T
code (which is now seeming problematic for compatibility) or simply make
a new generator with similar calls using dSFMT and put the new feature
in that where there is no compatibility problem.  Which would be more
useful for the Python community?


It's not hard to subclass Random and add different generators.  Why not
publish some code on ASPN and see how it gets received.  I've put a
recipe there for a long period generator, 
http://code.activestate.com/recipes/576707/ ,
but there doesn't seem to have been any real interest in generators with
longer periods than MT. 



Raymond


___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] random number generator state

2009-08-15 Thread Mark Dickinson
On Sat, Aug 15, 2009 at 8:54 PM, Scott David
Daniels wrote:
> [...] input to .setstate: old, new-short, and new-long.  In trying to
> get this to work, I found what might be a bug:
> code says
>  mt[0] = 0x8000UL; /* MSB is 1; assuring non-zero initial array */
> but probably should be:
>  mt[0] |= 0x8000UL; /* MSB is 1; assuring non-zero initial array */

I'm 92.3% sure that this isn't a bug.  For one thing, that line comes
directly from the authors' code[1], so if it's a bug then it's a bug in
the original code, dating from 2002;  this seems unlikely, given how
widely used and (presumably) well-scrutinized MT is.

For a more technical justification, the Mersenne Twister is based
on a linear transformation of a 19937-dimensional vector space
over F2, so its state naturally consists of 19937 bits of information,
which is 623 words plus one additional bit.  In this implementation,
that extra bit is the top bit of the first word;  the other 31 bits of that
first word shouldn't really be regarded as part of the state proper.
If you examine the genrand_int32 function in _randommodule.c,
you'll see that the low 31 bits of mt[0] play no role in updating the
state;  i.e., their value doesn't affect the new state.  So using
mt[0] |= 0x8000UL instead of mt[0] = 0x8000UL during
initialization should make no difference to the resulting stream of
random numbers (with the possible exception of the first random
number generated).

[1] http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/CODES/mt19937ar.c

Mark
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] random number generator state

2009-08-15 Thread Greg Ewing

Scott David Daniels wrote:

I find I have a need in randomized testing for a shorter version
of getstate, even if it _is_ slower to restore.  When running
exhaustive tests, a failure report should show the start state
of the generator.  Unfortunately, our current state includes a
625-element array.


Do you need to use the Mersenne Twister in particular
for this? There are other kinds of generator with very
long cycles and good statistical properties, that can
easily be restored to any state in constant time given
an initial state and a count.

Let me know if you're interested and I can give you
further details.

--
Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com