Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-09 Thread Francesc Alted
A Sunday 06 December 2009 11:47:23 Francesc Alted escrigué: > A Saturday 05 December 2009 11:16:55 Dag Sverre Seljebotn escrigué: > > > In [19]: t = np.dtype("i4,f4") > > > > > > In [20]: t > > > Out[20]: dtype([('f0', ' > > > > > In [21]: hash(t) > > > Out[21]: -9041335829180134223 > > > > > > In

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-06 Thread Francesc Alted
A Saturday 05 December 2009 11:16:55 Dag Sverre Seljebotn escrigué: > > Mmh, the only case that I'm aware about dtype *mutability* is changing > > the names of compound types: > > > > In [19]: t = np.dtype("i4,f4") > > > > In [20]: t > > Out[20]: dtype([('f0', ' > > > In [21]: hash(t) > > Out[21]:

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-05 Thread David Cournapeau
On Sat, Dec 5, 2009 at 7:16 PM, Dag Sverre Seljebotn wrote: >> Perhaps this should be marked as a bug?  I'm not sure about that, because the >> above seems quite useful. > > Well, I for one don't like this, but that's just an opinion. I think it > is unwise to leave object which supports hash() m

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-05 Thread Dag Sverre Seljebotn
Francesc Alted wrote: > A Thursday 03 December 2009 14:56:16 Dag Sverre Seljebotn escrigué: >> Pauli Virtanen wrote: >>> Thu, 03 Dec 2009 14:03:13 +0100, Dag Sverre Seljebotn wrote: >>> [clip] >>> Great! Are you storing the format string in the dtype types as well? (So that no release is

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-04 Thread Bruce Southey
On 12/04/2009 10:57 AM, David Cournapeau wrote: > On Sat, Dec 5, 2009 at 1:31 AM, Bruce Southey wrote: > >> On 12/04/2009 10:12 AM, David Cournapeau wrote: >> >>> On Fri, Dec 4, 2009 at 9:23 PM, Francesc Alted >>> wrote: >>> >>> A Thursday 03 December 2009 14:56:16 Dag S

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-04 Thread David Cournapeau
On Sat, Dec 5, 2009 at 1:57 AM, David Cournapeau wrote: > On Sat, Dec 5, 2009 at 1:31 AM, Bruce Southey wrote: >> On 12/04/2009 10:12 AM, David Cournapeau wrote: >>> On Fri, Dec 4, 2009 at 9:23 PM, Francesc Alted  wrote: >>> A Thursday 03 December 2009 14:56:16 Dag Sverre Seljebotn escrigué:

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-04 Thread Francesc Alted
A Friday 04 December 2009 17:12:09 David Cournapeau escrigué: > > Mmh, the only case that I'm aware about dtype *mutability* is changing > > the names of compound types: > > > > In [19]: t = np.dtype("i4,f4") > > > > In [20]: t > > Out[20]: dtype([('f0', ' > > > In [21]: hash(t) > > Out[21]: -90413

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-04 Thread David Cournapeau
On Sat, Dec 5, 2009 at 1:31 AM, Bruce Southey wrote: > On 12/04/2009 10:12 AM, David Cournapeau wrote: >> On Fri, Dec 4, 2009 at 9:23 PM, Francesc Alted  wrote: >> >>> A Thursday 03 December 2009 14:56:16 Dag Sverre Seljebotn escrigué: >>> Pauli Virtanen wrote: > Thu, 03 Dec 2009 14:

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-04 Thread Bruce Southey
On 12/04/2009 10:12 AM, David Cournapeau wrote: > On Fri, Dec 4, 2009 at 9:23 PM, Francesc Alted wrote: > >> A Thursday 03 December 2009 14:56:16 Dag Sverre Seljebotn escrigué: >> >>> Pauli Virtanen wrote: >>> Thu, 03 Dec 2009 14:03:13 +0100, Dag Sverre Seljebotn wrote:

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-04 Thread David Cournapeau
On Fri, Dec 4, 2009 at 9:23 PM, Francesc Alted wrote: > A Thursday 03 December 2009 14:56:16 Dag Sverre Seljebotn escrigué: >> Pauli Virtanen wrote: >> > Thu, 03 Dec 2009 14:03:13 +0100, Dag Sverre Seljebotn wrote: >> > [clip] >> > >> >> Great! Are you storing the format string in the dtype types

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-04 Thread Francesc Alted
A Thursday 03 December 2009 14:56:16 Dag Sverre Seljebotn escrigué: > Pauli Virtanen wrote: > > Thu, 03 Dec 2009 14:03:13 +0100, Dag Sverre Seljebotn wrote: > > [clip] > > > >> Great! Are you storing the format string in the dtype types as well? (So > >> that no release is needed and acquisitions a

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-03 Thread Dag Sverre Seljebotn
Pauli Virtanen wrote: > Thu, 03 Dec 2009 14:03:13 +0100, Dag Sverre Seljebotn wrote: > [clip] > >> Great! Are you storing the format string in the dtype types as well? (So >> that no release is needed and acquisitions are cheap...) >> > > I regenerate it on each buffer acquisition. It's sim

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-03 Thread Pauli Virtanen
Thu, 03 Dec 2009 14:03:13 +0100, Dag Sverre Seljebotn wrote: [clip] > Great! Are you storing the format string in the dtype types as well? (So > that no release is needed and acquisitions are cheap...) I regenerate it on each buffer acquisition. It's simple low-level C code, and I suspect it will

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-03 Thread Dag Sverre Seljebotn
Dag Sverre Seljebotn wrote: > Dag Sverre Seljebotn wrote: > >> Pauli Virtanen wrote: >> >> >>> Fri, 27 Nov 2009 23:19:58 +0100, Dag Sverre Seljebotn wrote: >>> [clip] >>> >>> >>> One thing to keep in mind here is that PEP 3118 actually defines a standard dtype fo

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-03 Thread Dag Sverre Seljebotn
Dag Sverre Seljebotn wrote: > Pauli Virtanen wrote: > >> Fri, 27 Nov 2009 23:19:58 +0100, Dag Sverre Seljebotn wrote: >> [clip] >> >> >>> One thing to keep in mind here is that PEP 3118 actually defines a >>> standard dtype format string, which is (mostly) incompatible with >>> NumPy's.

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-03 Thread Dag Sverre Seljebotn
Pauli Virtanen wrote: > Fri, 27 Nov 2009 23:19:58 +0100, Dag Sverre Seljebotn wrote: > [clip] > >> One thing to keep in mind here is that PEP 3118 actually defines a >> standard dtype format string, which is (mostly) incompatible with >> NumPy's. It should probably be supported as well when PEP

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-12-03 Thread Pauli Virtanen
Fri, 27 Nov 2009 23:19:58 +0100, Dag Sverre Seljebotn wrote: [clip] > One thing to keep in mind here is that PEP 3118 actually defines a > standard dtype format string, which is (mostly) incompatible with > NumPy's. It should probably be supported as well when PEP 3118 is > implemented. PEP 3118 i

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Dag Sverre Seljebotn
Francesc Alted wrote: > A Friday 27 November 2009 16:41:04 Pauli Virtanen escrigué: I think so. However, I think S is probably closest to bytes... and maybe S can be reused for bytes... I'm not sure though. >>> That could be a good idea because that would ensure compatibility with >>> ex

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Christopher Barker
Anne Archibald wrote: >>> I don't think it makes sense to handle format strings in Unicode >>> internally -- they should always be coerced to bytes. >> This should be fine -- we control what is a valid format string, and >> thus they can always be ASCII-safe. > > I have to disagree. Why should we

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Anne Archibald
2009/11/27 Christopher Barker : > >> The point is that I don't think we can just decide to use Unicode or >> Bytes in all places where PyString was used earlier. > > Agreed. I only half agree. It seems to me that for almost all situations where PyString was used, the right data type is a python3 s

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Pauli Virtanen
pe, 2009-11-27 kello 10:36 -0800, Christopher Barker kirjoitti: [clip] > > Which one it will > > be should depend on the use. Users will expect that eg. array([1,2,3], > > dtype='f4') still works, and they don't have to do e.g. array([1,2,3], > > dtype=b'f4'). > > Personally, I try to use np.float

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Christopher Barker
> The point is that I don't think we can just decide to use Unicode or > Bytes in all places where PyString was used earlier. Agreed. I think it's helpful to remember the origins of all this: IMHO, there are two distinct types of data that Python2 strings support: 1) text: this is the traditi

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Francesc Alted
A Friday 27 November 2009 16:41:04 Pauli Virtanen escrigué: > > > I think so. However, I think S is probably closest to bytes... and > > > maybe S can be reused for bytes... I'm not sure though. > > > > That could be a good idea because that would ensure compatibility with > > existing NumPy scrip

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Pauli Virtanen
pe, 2009-11-27 kello 16:33 +0100, Francesc Alted kirjoitti: > A Friday 27 November 2009 15:09:00 René Dudfield escrigué: > > On Fri, Nov 27, 2009 at 1:49 PM, Francesc Alted wrote: > > > Correct. But, in addition, we are going to need a new 'bytes' dtype for > > > NumPy for Python 3, right? > > >

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Francesc Alted
A Friday 27 November 2009 15:09:00 René Dudfield escrigué: > On Fri, Nov 27, 2009 at 1:49 PM, Francesc Alted wrote: > > Correct. But, in addition, we are going to need a new 'bytes' dtype for > > NumPy for Python 3, right? > > I think so. However, I think S is probably closest to bytes... and >

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread René Dudfield
On Fri, Nov 27, 2009 at 3:07 PM, René Dudfield wrote: > > hey, > > yeah I definitely would :)   I don't have much time for the next week > or so though. > btw, feel free to just copy whatever you like from there into your tree. cheers, ___ NumPy-Discuss

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread René Dudfield
On Fri, Nov 27, 2009 at 1:49 PM, Francesc Alted wrote: > Correct.  But, in addition, we are going to need a new 'bytes' dtype for NumPy > for Python 3, right? I think so. However, I think S is probably closest to bytes... and maybe S can be reused for bytes... I'm not sure though. Also, what wi

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread René Dudfield
On Fri, Nov 27, 2009 at 1:41 PM, Pauli Virtanen wrote: >> 2to3/3to2 fixers will probably have to be written for users code >> here... whatever is decided.  At least warnings should be generated >> I'm guessing. > > Possibly. Does 2to3 support plugins? If yes, it could be possible to > write one.

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Francesc Alted
A Friday 27 November 2009 13:23:10 René Dudfield escrigué: > >> I don't think they are internally UTF-8: > >> http://docs.python.org/3.1/c-api/unicode.html > >> > >> """Python’s default builds use a 16-bit type for Py_UNICODE and store > >> Unicode values internally as UCS2.""" > > > > Ah! No chan

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Pauli Virtanen
pe, 2009-11-27 kello 13:23 +0100, René Dudfield kirjoitti: [clip] > I imagine dtype 'S' and 'U' need more clarification. As it misses the > concept of encodings it seems? Currently, S appears to mean 8bit > characters no encoding, and U appears to mean 16bit characters no > encoding? Or are some

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread René Dudfield
On Fri, Nov 27, 2009 at 11:50 AM, Francesc Alted wrote: > A Friday 27 November 2009 11:27:00 Pauli Virtanen escrigué: >> Yes. But now I wonder, should >> >>       array(['foo'], str) >>       array(['foo']) >> >> be of dtype 'S' or 'U' in Python 3? I think I'm leaning towards 'U', >> which will me

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Francesc Alted
A Friday 27 November 2009 11:27:00 Pauli Virtanen escrigué: > Yes. But now I wonder, should > > array(['foo'], str) > array(['foo']) > > be of dtype 'S' or 'U' in Python 3? I think I'm leaning towards 'U', > which will mean unavoidable code breakage -- there's probably no > avoiding i

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Pauli Virtanen
pe, 2009-11-27 kello 11:17 +0100, Francesc Alted kirjoitti: > A Friday 27 November 2009 10:47:53 Pauli Virtanen escrigué: > > 1) For 'S' dtype, I believe we use Bytes for the raw data and the > >interface. > > > >Maybe we want to introduce a separate "bytes" dtype that's an alias > >fo

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Francesc Alted
A Friday 27 November 2009 10:47:53 Pauli Virtanen escrigué: > 1) For 'S' dtype, I believe we use Bytes for the raw data and the >interface. > >Maybe we want to introduce a separate "bytes" dtype that's an alias >for 'S'? Yeah. As regular strings in Python 3 are Unicode, I think that

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Pauli Virtanen
pe, 2009-11-27 kello 18:30 +0900, David Cournapeau kirjoitti: > Pauli Virtanen wrote: > > By the way, should I commit this stuff (after factoring the commits to > > logical chunks) to SVN? > > I would prefer getting at least one py3 buildbot before doing anything > significant, I can add it to min

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread David Cournapeau
Pauli Virtanen wrote: > By the way, should I commit this stuff (after factoring the commits to > logical chunks) to SVN? > I would prefer getting at least one py3 buildbot before doing anything significant, cheers, David ___ NumPy-Discussion mailing

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-27 Thread Pauli Virtanen
to, 2009-11-26 kello 17:37 -0700, Charles R Harris kirjoitti: [clip] > I'm not clear on your recommendation here, is it that we should use > bytes, with unicode converted to UTF8? The point is that I don't think we can just decide to use Unicode or Bytes in all places where PyString was used earli

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-26 Thread René Dudfield
On Fri, Nov 27, 2009 at 1:37 AM, Charles R Harris wrote: > Hi Pauli, > > On Thu, Nov 26, 2009 at 4:08 PM, Pauli Virtanen wrote: >> >> Hi, >> >> The Python 3 porting needs some decisions on what is Bytes and >> what is Unicode. >> >> I'm currently taking the following approach. Comments? >> >>    

Re: [Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-26 Thread Charles R Harris
Hi Pauli, On Thu, Nov 26, 2009 at 4:08 PM, Pauli Virtanen wrote: > Hi, > > The Python 3 porting needs some decisions on what is Bytes and > what is Unicode. > > I'm currently taking the following approach. Comments? > >*** > > dtype field names > >Either Bytes or Unicode. >

[Numpy-discussion] Bytes vs. Unicode in Python3

2009-11-26 Thread Pauli Virtanen
Hi, The Python 3 porting needs some decisions on what is Bytes and what is Unicode. I'm currently taking the following approach. Comments? *** dtype field names Either Bytes or Unicode. But 'a' and b'a' are *different* fields. The issue is that: Pyt