Re: [Python-Dev] User's complaints

2006-07-13 Thread Terry Jones
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Jeroen Ruigrok van der Werven wrote:
>> It's just nice to be able to define a single class
>> in multiple modules.

Greg> It *seems* nice until you want to track down which
Greg> source file the definition of some method comes
Greg> from.

Greg> Those used to the "one huge global namespace" of
Greg> C and C++ likely don't see this as a problem. But
Greg> since I've come to appreciate the benefits of
Greg> Python's module system, I don't want to go back
Greg> to that nightmare.

While I think there are good arguments both ways, I don't think that
finding source definitions of methods or classes counts as one - let alone
as a nightmare. Tools like tags (in vi and emacs and lisp environments)
work quickly and accurately, are easy to use (one keystroke in vi and
M-. or similar in emacs to go to a defn, and then a tags pop to come back),
work in an identical way on many source languages, and they have been
around for literally decades. That's to say nothing of IDE or CASE tools
that support finding definitions, callers, etc.

Regards,
Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] [Python-checkins] MSI being downloaded 10x morethan all other files?!

2006-12-13 Thread Terry Jones
> "Guido" == Guido van Rossum <[EMAIL PROTECTED]> writes:
Guido> And I just found out (after everyone else probably :-) that YouTube
Guido> is almost entirely written in Python. (And now I can rub shoulders
Guido> with the developers since they're all Googlers now... :-)

Are any other details available? I'm mainly curious to know if they were
using Twisted.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] splitext('.cshrc')

2007-03-06 Thread Terry Jones
I think there are various good arguments that the current behavior of
splitext isn't optimal. But. these don't feel strong enough to me to
break existing code or to force people who happen to be in the know to go
hunt down and review old code etc. I don't see the point in doing that,
just to fix something that everyone seems to agree is quite marginal.

There are about 500 hits for splitext\(lang:python at code.google.com.
Many of these blindly use something like

  newname = splitext(filename)[0] + '.bak'

You could convincingly argue that these would be "fixed" if the current
behavior were changed. In fact, from looking at 30 examples or so, I
suspect much more code would be fixed than broken. But changing the
behavior of programs doesn't seem like a good idea - even if you can claim
to have fixed them. (This reminds me of SUN adopting a newer malloc in the
mid-90s - it had better internal fragmentation behavior for many requests,
and thereby broke a bunch of working (but in fact buggy) code that was
inadvertently relying on the slop in the older malloc).

I do think the behavior can be improved, and that it should be fixed, but
at a place where other incompatible changes will also be being made, and
where people are already going to have to do serious code review. To me
that argues for fixing it in Py3k. Arguing that it's just a minor change,
and so therefore it can go straight in, seems to also lend support to the
suggestion that it can wait.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Bug Days

2007-03-07 Thread Terry Jones
Why not offer a Python patching tutorial at the next US/Euro PyCon? It
seems like there's plenty that could be taught. I'd attend. I'd suggest
that that specific tutorial be offered free, or be paid for by sponsors.

Similarly, the first day of the post-PyCon sprints could have a group
learning about the patching process. While I'd probably not attend a
"sprint" (because I'd imagine I'd be doing more harm than good) I'd
certainly be interested in showing up for a day explicitly aimed at those
interested in learning / sprinting on patching python.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Breaking calls to object.__init__/__new__

2007-03-22 Thread Terry Jones
> "G" == Guido van Rossum <[EMAIL PROTECTED]> writes:

G> There are different philosophies about the correct style for
G> cooperative super calls.

G> The submitter of the bug report likes to remove "consumed" arguments
G> and pass the others on, having something at the root that complains
G> about any unused arguments. It has the problem that you mention: if
G> multiple classes might be interested in the *same* argument they
G> won't see it. The other style is to pass *all* arguments down, and
G> let everyone cherry-pick them. The last call then just throws them
G> away.  This has the problem that misspelled arguments are silently
G> ignored rather than being diagnosed at the point where you can do
G> something about it.

G> I don't know what the "best practice" is (like Greg Ewing, I don't
G> use either style myself) but I've got a feeling that it must be
G> easier to solve the former problem than the latter (also I don't
G> know that the former actually occurs in practice).

I'm following up to the above, as I'm not sure where the best place to jump
into this discussion is.

I think the problem is more general that just consuming and/or passing on.
In the typical style of passing args to super classes that I've seen,
__init__ methods in the call chain also don't really even know *whether*
they should use an argument. There is also the problem of argument change
over time.

E.g., suppose I'm the writer of a class X and, unbeknownst to me, someone
writes class Y(X,Z), where Z is some other class. Suppose Z's __init__
expects an argument called z. If I one day add an argument called z, that
has totally different semantics, things get really hairy. I'm the writer of
X, I'm unaware of classes Y and Z, and if someone passes me an z argument
then naturally I want to (try to) use it.

There are other examples like this, and argument name clashes can be more
specific (like 'limit' or 'debug').

In general, we're faced with a situation that is also both time-sensitive
and semantics-sensitive. Class writers shouldn't need to know all future
argument changes to other classes potentially upstream or downstream from
them in the call chain. Given an unstructured mess of arguments, you don't
know what to consume, what to ignore, what to pass on, etc., and you don't
know the future.

Although this sounds much more difficult than simple diamond inheritance, I
think there's an easy way to solve it.

When writing a class, you know 1) what arguments your __init__ method
wants, and 2) what arguments your superclass(es) __init__ methods want.
You know the second thing from reading the API/docs of your superclasses -
that's why you're calling them. You want to be able, for example, to route
a 'limit' argument to one of your super classes and another different
'limit' argument to another of your super classes. You want this to work
even if one or more of your superclasses (now or in the future) in turn
decides to call some other super class that also has a 'limit' argument
with entirely different (or the same) semantics. Etc.

An easy solution is for __init__ to receive and pass a dictionary whose
keys are class names and whose values are dictionaries of arguments
intended for that class.

That way, an __init__ method in a class knows exactly which arguments are
intended for it. It can detect extra args, missing args, misspelt args,
etc. It can also prepare args for its known immediate superclasses (those
from which it subclasses) - and it does know these classes as it is
explicitly subclassing from them. It can leave arguments that are intended
for the __init__ methods in other classes alone. It can create different
arguments of the same name for its different superclasses. It can change
its signature (adding and dropping args) without affecting the args passed
to its superclass. Classes earlier in the call chain can do the same
without affecting the args received by this class. It is also immune to
differences in superclass name ordering by subclasses (i.e., a subclass of
X and Y should be able to be class Z(X,Y) or class(Y,Z) without screwing up
the args received by either X.__init__ or Y.__init__. An __init__ could
also safely del its own arguments once it has extracted/used them. In some
odd cases it might like to pass them along, perhaps even adding something
(like a None key to its sub-dictionary) to indicate that it has already
been called (yes, this sounds like weird - I'm just mentioning that it
would be possible). There's also no need for any special top-level
subclasses of object that simply ignore all their arguments (though you
_could_ add one if you wanted to be strict and insist that all __init__
methods del their args).

This approach adheres nicely to the explicit is better than implicit maxim.
I think the implicit situation is unsolvable.

I guess this could all be dressed up nicely using a metaclass that pulled
out any available args, made them available to the __init__ as regular
keyword args, and made 

Re: [Python-Dev] Breaking calls to object.__init__/__new__

2007-03-22 Thread Terry Jones
Following up to myself, with some examples.

I probably haven't done this as cleanly as is possible, but below are a
bunch of classes and subclasses that cleanly deal with passing around
arguments, even when those args conflict in name, etc., as outlined in my
previous mail.

Here's the general class __init__ pattern:

class myclass(A, B, C, ...):
def __init__(self, **kwargs):
initargs = kwargs.setdefault('__initargs__', {})  # 1
Aargs = initargs.setdefault(A, {})# 2
Aargs['limit'] = 10   # 3
Bargs = initargs.setdefault(B, {})# 4
Bargs['limit'] = 'wednesday'  # 5
super(myclass, self).__init__(**kwargs)   # 6
myargs = initargs.get(myclass, {})# 7
limit = myargs.get('limit', 7)# 8
if 'huh?' in myargs: raise Exception  # 9

In words, the steps are:

1. Pull the __initargs__ key out of **kwargs, or create it. This gives you
   the top-level dictionary containing your own arguments, if any, and the
   args for your superclass(es), if any.

2. Get the arguments for superclass A, and
3. Add an argument for A.__init__
4 & 5. Do the same for superclass B. We don't alter args for Superclass C.

6. Call super, passing **kwargs along (this contains the original
   __initargs__ that was sent to us, or the one we made in step 1 if
   there was no prior __initargs__.

7. Our arguments, if any, are in initargs too. Get them.
8. Pull out one of our args, or set a default.
9. Check we didn't get any unexpected args, etc.

Some comments:

- This isn't too ugly, I don't think, but it does require mentioning your
  class and superclasses by name. There are many advantages (see last mail)
  if you're willing to do this.

- You can combine positional args and explicit keyword args in a class if
  you choose, and you can then disambiguate or complain if you like (class
  sub4 in the example code below does this). You can also take those
  explicit args and stuff them into the __initargs__ for your superclasses,
  as/if needed. Explicit args might be a slight pain to deal with, but you
  need them as you don't want to destroy the simple class instantiation
  mechanism of python that passes args normally to __init__.

- It does require that classes cooperate to a small extent. E.g., you don't
  probably don't want the __init__ class in one of your super classes to go
  fiddling with initargs[myclass] before myclass even gets to see its own
  args (though this particular nit could be solved by having myclass copy /
  remove its args before calling super).

- A class could also optionally delete it args once done. In that case it's
  cleaner to have line 7 use initargs.setdefault(myclass, {}) so a del
  initargs[myclass] always just works.

- If you don't want to mess with the args to your superclasses, don't. In
  that case lines 2-5 go away in the above (and 1 and 6 could be swapped).

- You can add *args to all your __init__ methods. This allows your class to
  sit in the middle of a class chain where a lower method wants to pass a
  positional argument on to a higher class (that wasn't written according
  to the above pattern). So the args just gets passed along. It's not as
  nice, but it doesn't break anything (by silently not passing on any
  positional args that may be present). Classes written using the above
  pattern can just ignore all positional args.

- If I were encouraging doing something like the above in python proper,
  I think I'd allow line 1 to read

initargs = kwargs.setdefault(None, {})

  which adopts the convention that the __init__ keywords are passed in the
  None slot of kwargs. Currently you can't do this, as python complains
  that all keyword args must be strings. While this goes against explicit
  is better that implicit, using __initargs__ as I have done is polluting
  the keyword space and would probably one day cause someone a problem.

There's some example code below.

Terry


class err(Exception): pass

class sum(object):
def __init__(self, **kwargs):
print "-> in sum"
initargs = kwargs.setdefault('__initargs__', {})
super(sum, self).__init__(**kwargs)
myargs = initargs.get(sum, {})
limit = myargs.get('limit', 5)
print "my limit is %d" % limit

class daylimitmaker(object):
def __init__(self, **kwargs):
print "-> in daylimitmaker"
initargs = kwargs.setdefault('__initargs__', {})
super(daylimitmaker, self).__init__(**kwargs)
myargs = initargs.get(daylimitmaker, {})
limit = myargs.get('limit', 'sunday (last day of the week)')
print "my limit is '%s'" % limit

class lastday(object):
def __init__(self, **kwargs):
print "-> in lastday"
initargs = kwargs.setdefault('__ini

Re: [Python-Dev] Hello, I'm the other new guy

2007-11-15 Thread Terry Jones
> "Guido" == Guido van Rossum <[EMAIL PROTECTED]> writes:
Guido> I feel left out. I have only one child and I don't qualify as
Guido> 'strange' by any stretch of the imagination... Sometimes I think I'm
Guido> the only regular guy working on Python. ;-)

Ah well, that explains a lot!  :-)

Anyone else here think they're normal?

  1. You're a programmer
  2. You work on Python
  3. You're on the dev mailing list (and you read it)

Each one of those must be worth at least one unit of standard deviation.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Syntax suggestion for imports

2008-01-02 Thread Terry Jones
> "John" == John Barham <[EMAIL PROTECTED]> writes:

>> * import readline or emptymodule

John> This I find more problematic as "emptymodule" seems too magical.
John> Even now any code that wants to use a module that might not have been
John> successfully imported needs to check if that's the case.  E.g., a
John> fuller current use-case would be:

John> try:
John>readline = None
John>import readline
John> except ImportError:
John>pass

John> if readline is not None:
John>readline.foo()

John> Conceivably emptymodule could act as a Null object but that could
John> create more problems than it solves.

How about

import readline or None as readline

This is just for cases where you don't want to trigger ImportException on
import and do want the symbol set to None for later testing. A standalone
"import None" could have no effect.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Dropping __init__.py requirement for subpackages

2006-04-26 Thread Terry Jones
It might be helpful to consider how people would tackle Guido's problem by
pretending that a regular Joe (i.e., someone who couldn't consider changing
the semantics of Python itself) had asked this question.

I would suggest adding a hook to their version control system to
automatically create (and preferably also check out) an __init__.py file
whenever a new (source code) directory was placed under version control
(supposing you can distinguish source code directories from the check in
dirname).  From one point of view this is a file system issue, so a file
system solution might be a good way to solve it. This approach would also
allow you to add this behavior/support on a per-user basis.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Efficient set complement and operation on large/infinite sets.

2006-05-11 Thread Terry Jones
I'm about to write some code to manage sets, and wanted to float a few
thoughts here because I have various ideas about how to implement what I
want to do, and I think one of them could be done by changing Python's set
type in useful and backward compatible way.

Apologies if this is discussed in the archives. I didn't see it.

My original motivation is a need to work efficiently with very large sets,
let's say 10M elements. In some cases, you can operate on a set without
needing to explicitly represent all its elements. For example, if my set is
the Universal (U) set, then X in U, should return True. If my set is U and
I union it with any other set, it is unchanged, etc. By also maintaining a
finite set of things that are _not_ in my large set, I can do other
operations efficiently. E.g., calling U.remove(X) would _add_ X to the set
of elements that were known not to be in the big set that you were trying
to represent efficiently. In most cases, such a set class would simply do
the opposite/inverse operation on its finite exclusion set.

While it's convenient to warm up by talk about the infinite Universal set,
we could just as easily be talking about arbitrarily large finite sets.
There are some examples below.

Another way to discuss the same thing is to talk about set complements. It
would be nice to be able to complement (or invert) a set. Once you did so,
you might have an infinite set on your hands, but as the last paragraph
argues, you can still operate on an infinite set efficiently. Naturally,
you can't fully enumerate it, or take its size.

I think that giving Python sets the ability to handle complementarity would
have some nice uses - and that these go well beyond the particular thing
that I need right now.

For example, suppose you want to represent the set of all floating point
numbers. You just create an empty set and complement it. Then you can test
it as usual, and begin to exclude things from it. Or if you have a system
with 10M objects and you need to support operations on these objects via a
query language that lets the user type "not X" where X is a set of objects,
there is a natural and efficient way (constant time) to execute the query -
you just mark the set as being inverted (complemented).

If I were going to implement the above by changing Python's built in set
type, here's what I think I'd do:

Add four optional keyword arguments to the set constructor:

  - invert=False
  - universalSet=None
  - universalSetSize=None
  - universeExclusionFunc=None

invert is just a flag, indicating the sense of the set.

If inverse is False:

  the set behaves exactly as a normal Python set and the other three
  new arguments are ignored.

If inverse is True:

  universalSet represents the universal set. If it is None, then the
  universal set is infinite. Otherwise, universalSet is an iterable. The
  implementation should not call this iterable unless it's unavoidable, on
  the presumption that if the programmer wanted to operate directly on the
  contents of this iterable, they'd be doing it in a conventional fashion
  (e.g., with a normal set). The universalSetSize, used for len()
  calculations, is the number of elements in universalSet, if known &
  if finite.

  The universeExclusionFunc can be called with a single element argument to
  see if the element should be considered excluded from the universalSet.
  This may seem like a weird idea, but it's the key to flexibility and
  efficiency. In an invert=True set, the code would use the normal set
  content as a mutable set of objects that are not in the universe, as
  described above, but would, in addition, use the universeExclusionFunc
  (if defined) to identify elements not in the set (because they are not in
  the universe), and thus avoid the use of the expensive (or infinite)
  universalSet.

Note that an inverted (or normal) set can be inverted by simply setting
invert=False, so this requires a new invert() method (which could be
triggered by the use of something like 'not' or '!' or '~'). In this case,
an inverted set becomes a normal Python set. The elements represented by
universeExclusionFunc remain invalid - they are simply not in the universe
(and, if deemed sensible, this could be sanity checked in add()).

If it's not clear, when a set is inverted, any iterable given to __init__,
(i.e., the iterable argument in the normal case of constructing a Python
set), is just treated as usual (but in this case will be taken to
represent things not in the set initially).


Here are some examples of usage:

1. Suppose you want to work with the set of integers [0, 1) and
   that initially your set is all such integers. You could create this set
   via:

   S = set(invert=True,
   universalSet=xrange(1),
   universalSetSize=1,
   universeExclusionFunc=(lambda x: x >= 1)

   This has the intended effect, is efficient, and no-one need call the
   iterator. You can (I t

Re: [Python-Dev] Efficient set complement and operation on large/infinite sets.

2006-05-11 Thread Terry Jones
A quick followup to my own posting:

I meant to say something about implementing __rand__() and pop(). I'd
either add another optional function argument to the constructor. It would
return a random element from the universe. Then for __rand__() and pop(),
you'd call until it (hopefully!) returned something not excluded. Or, do
something non-random, like return a random (non-excluded) integer. Or, just
raise an exception.

I think I prefer the extra argument approach, where the docs state clearly
that you can expect to wait longer and longer for random elements as you
empty a finite inverted set. I prefer this approach because getting a
random element from a set is something you really should be able to
do. Just raising an exception is the cleanest and clearest choice.

One thing I certainly would not consider is trying to mess around with the
excluded set (which may in any case be empty) to figure out a suitable
return type.

And yes, I agree in advance, adding 5 new optional arguments to the set()
constructor isn't pretty. Is the added functionality is worth it?

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Efficient set complement and operation on large/infinite sets.

2006-05-11 Thread Terry Jones
> "Guido" == Guido van Rossum <[EMAIL PROTECTED]> writes:
Guido> Hm...  Without reading though all this, I expect that you'd be
Guido> better off implementing this for yourself without attempting to pull
Guido> the standard library sets into the picture (especially since sets.py
Guido> is obsolete as of 2.4; set and frozenset are now built-in types).
Guido> You're really after rather specialized set representations. I've
Guido> done this myself and as long as you stick to solving *just* the
Guido> problem at hand, it's easy. If you want a general solution, it's
Guido> hard...

I certainly would be better off in the short term, and probably the long
term too. It's likely what I'll do in any case as it's much, much quicker,
I only need a handful of the set operations, and I don't need to talk to
anyone :-)

I don't think I'm proposing something specialized. Set complement is
something one learns in primary school. It's just difficult to provide in
general, as you say.

Aside from my own itch, which I know how to scratch, my question is whether
it's worth trying to work this into Python itself. Sounds like not.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Efficient set complement and operation on large/infinite sets.

2006-05-12 Thread Terry Jones
> "Raymond" == Raymond Hettinger <[EMAIL PROTECTED]> writes:
Raymond> There's room in the world for alternate implementations of sets,
Raymond> each with its own strengths and weaknesses. 
...
Raymond> Alternatve implementations will most likely start-off as
Raymond> third-party extension modules

Raymond> As for the built-in set types, I recommend leaving those alone and 
Raymond> keeping a API as simple as possible.

Agreed. I implemented most of what I need last night - not surprisingly in
less time than it took to write the original mail.

Raymond> The __rand__ idea is interesting but I don't see how to implement
Raymond> an equiprobable hash table selection in O(1) time -- keep in mind
Raymond> that a set may be very sparse at the time of the selection.

I think the rand issue is enough to make the proposal questionable.

Another thing I realized which I think adds to the argument that this
should be separate, is that it's not so straightforward to deal with
operations on sets that have different universes. You can do it, but it's
messy, not elegant, and not fun (and no-one is asking for it, not even me).

This leads to a much nicer approach in which you have a separate module
that provides a UniversalSet class.  When you call the constructor, you
tell it whatever you can about what the universe looks like: how big it is,
how to generate random elements, how to exclude things, etc. This class
provides a simple (a la Python set.__init__) method that hands back a set
within this universe.  You're then free to operate on that set, and other
instances provided by UniversalSet, as usual, and you get to do complement.
It's easy to imagine subclasses such as InfiniteSet, FiniteSet, etc.
Python's sets are the case where nothing is known about the universe (aside
from implementation details, like elements being hashable). The various
different kinds of UniversalSet subclasses would provide methods according
to what was known about their universes.

That would clearly be a 3rd party extension.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-10 Thread Terry Jones
That doc note should surely be removed.  Perhaps it's an artifact from some
earlier shuffle algorithm.

The current algorithm (which is simple, well known, and which produces all
permutations with equal probability) only calls the RNG len(x) - 1 times.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-10 Thread Terry Jones
>>>>> "Tim" == Tim Peters <[EMAIL PROTECTED]> writes:
Tim> [Terry Jones]
>> and which produces all permutations with equal probability)

Tim> That needs proof.  Assuming a true random number generator, such a
Tim> proof is easy.  Using a deterministic PRNG, if the period is "too
Tim> short" it's dead easy (see below) to prove that it can't produce all
Tim> permutations (let alone with equal probablility).

OK, thanks. Sorry for the noise.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-10 Thread Terry Jones
> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> A generator with only N possible internal states can't
Greg> possibly result in more than N different outcomes from
Greg> any algorithm that uses its results.

I don't mean to pick nits, but I do find this a bit too general.

Suppose you have a RNG with a cycle length of 5. There's nothing to stop an
algorithm from taking multiple already returned values and combining them
in some (deterministic) way to generate > 5 outcomes. (Yes, those outcomes
might be more, or less, predictable but that's not the point). If you
expanded what you meant by "internal states" to include the state of the
algorithm (as well as the state of the RNG), then I'd be more inclined to
agree.

Worse, if you have multiple threads / processes using the same RNG, the
individual threads could exhibit _much_ more random behavior if individual
thread outcomes depend on multiple RNG return values (as is the case with
random.shuffle) and the scheduler is mixing things up. Here you'd have to
include the state of the operating system to claim you can't get more
outcomes than the number of internal states. But that's getting pretty far
away from what we'd ordinarily think of as the internal state of the RNG.

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-11 Thread Terry Jones
>>>>> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:

Greg> Terry Jones wrote:
>> Suppose you have a RNG with a cycle length of 5. There's nothing to stop an
>> algorithm from taking multiple already returned values and combining them
>> in some (deterministic) way to generate > 5 outcomes.

Greg> No, it's not. As long as the RNG output is the only input to
Greg> the algorithm, and the algorithm is deterministic, it is
Greg> not possible get more than N different outcomes. It doesn't
Greg> matter what the algorithm does with the input.

Greg> If the algorithm can start out with more than one initial
Greg> state, then the RNG is not the only input.

The code below uses a RNG with period 5, is deterministic, and has one
initial state. It produces 20 different outcomes.

It's just doing a simplistic version of what a lagged RNG generator does,
but the lagged part is in the "algorithm" not in the rng. That's why I said
if you included the state of the algorithm in what you meant by "state" I'd
be more inclined to agree.

Terry



n = map(float, range(1, 17, 3))
i = 0

def rng():
global i
i += 1
if i == 5: i = 0
return n[i]

if __name__ == '__main__':
seen = {}
history = [rng()]
o = 0
for lag in range(1, 5):
for x in range(5):
o += 1
new = rng()
outcome = new / history[-lag]
if outcome in seen: print "DUP!"
seen[outcome] = True
print "outcome %d = %f" % (o, outcome)
history.append(new)


# Outputs
outcome 1 = 1.75
outcome 2 = 1.428571
outcome 3 = 1.30
outcome 4 = 0.076923
outcome 5 = 4.00
outcome 6 = 7.00
outcome 7 = 2.50
outcome 8 = 1.857143
outcome 9 = 0.10
outcome 10 = 0.307692
outcome 11 = 0.538462
outcome 12 = 10.00
outcome 13 = 3.25
outcome 14 = 0.142857
outcome 15 = 0.40
outcome 16 = 0.70
outcome 17 = 0.769231
outcome 18 = 13.00
outcome 19 = 0.25
outcome 20 = 0.571429
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] a note in random.shuffle.__doc__ ...

2006-06-13 Thread Terry Jones
>>>>> "Dan" == Dan Christensen <[EMAIL PROTECTED]> writes:
Dan> Greg Ewing <[EMAIL PROTECTED]> writes:
>> Terry Jones wrote:
>> 
>>> The code below uses a RNG with period 5, is deterministic, and has one
>>> initial state. It produces 20 different outcomes.
>> 
>> You misunderstand what's meant by "outcome" in this
>> context. The outcome of your algorithm is the whole
>> *sequence* of numbers it produces, not each individual
>> number. 

Dan> I think Terry's point is valid.  While no one call to
Dan> random.shuffle(L) can produce every possible ordering of L (when
Dan> len(L) is large), since random.shuffle shuffle's the data in place,
Dan> repeated calls to random.shuffle(L) could in principle produce every
Dan> possible ordering, since the "algorithm" is saving state.  Down below
Dan> I show code that calls random.shuffle on a 4 element list using a
Dan> "random" number generator of period 13, and it produces all
Dan> permutations.

Maybe I should reiterate what I meant, as it seems the discussion is really
just semantics at this point.

Greg said:

>>>>> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> A generator with only N possible internal states can't
Greg> possibly result in more than N different outcomes from
Greg> any algorithm that uses its results.

And I replied:

I don't mean to pick nits, but I do find this a bit too general.

Suppose you have a RNG with a cycle length of 5. There's nothing to
stop an algorithm from taking multiple already returned values and
combining them in some (deterministic) way to generate > 5 outcomes.
(Yes, those outcomes might be more, or less, predictable but that's not
the point). If you expanded what you meant by "internal states" to
include the state of the algorithm (as well as the state of the RNG),
then I'd be more inclined to agree.

I was not meaning to say that anyone was wrong, just that I found Greg's
characterization a bit too general, or not as well defined as it might have
been.

It's clear, I think, from the example code that I and Dan posted, that one
can move the boundary between the RNG and the algorithm using it. You can
take a technique (like using lagging as I did, or Dan's method which stores
and composes permutations) out of the RNG and put it in the algorithm.
That's the reason I reacted to Greg's summary - I don't think it's right
when you confine yourself to just the internal states of the generator. As
I said, if you also consider the states of the algorithm, then the argument
is easier to defend. From an information theory point of view, it's simpler
not to draw the distinction between what's in the "RNG" and what's in the
"algorithm" that uses it. I guess we must all agree at that level, which to
me means that the recent discussion is necessarily just semantics.

[Tim's response to my first post wasn't just semantics - I was wrong in
what I said, and he made it clear (to me anyway) why. But at that point
there was no discussion of what any algorithm could produce as an outcome,
algorithm state, determinism, etc.]

And yes, you can also define outcome as you like. I deliberately included
the word 'outcome' in my print statement.  I thought that was definitive :-)

Terry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com