>> I think the value for wstr/uninitialized/reserved should not be >> removed. The wstr representation is still used in the error case in >> the utf8 decoder because these strings can be resized. > > In Python, you can resize an object if it has only one reference. Why is > it not possible in your branch?
If you use the new API to create a string (knowing how many characters you have, and what the maximum character is), the Unicode object is allocated as a single memory block. It can then not be resized. If you allocate in the old style (i.e. giving NULL as the data pointer, and a length), it still creates a second memory blocks for the Py_UNICODE[], and allows resizing. When you then call PyUnicode_Ready, the object gets frozen. > I don't like "reserved" value, especially if its value is 0, the first > value. See Microsoft file formats: they waste a lot of space because > most fields are reserved, and 10 years later, these fields are still > unused. Can't we add the value 4 when we will need a new kind? I don't get the analogy, or the relationship with the value 0. "Reserving" the value 0 is entirely different from reserving a field. In a field, it wastes space; the value 0 however fills the same space as the values 1,2,3. It's just used to denote an object where the str pointer is not filled out yet, i.e. which can still be resized. >>> I suppose that compilers prefer a switch with all cases defined, 0 a >>> first item >>> and contiguous values. We may need an enum. >> >> During the Summer of Code, Martin and I did a experiment with GCC and >> it did not seem to produce a jump table as an optimization for three >> cases but generated comparison instructions anyway. > > You mean with a switch with a case for each possible value? No, a computed jump on the assembler level. Consider this code enum kind {null,ucs1,ucs2,ucs4}; void foo(void *d, enum kind k, int i, int v) { switch(k){ case ucs1:((unsigned char*)d)[i] = v;break; case ucs2:((unsigned short*)d)[i] = v;break; case ucs4:((unsigned int*)d)[i] = v;break; } } gcc 4.6.1 compiles this to foo: .LFB0: .cfi_startproc cmpl $2, %esi je .L4 cmpl $3, %esi je .L5 cmpl $1, %esi je .L7 .p2align 4,,5 rep ret .p2align 4,,10 .p2align 3 .L7: movslq %edx, %rdx movb %cl, (%rdi,%rdx) ret .p2align 4,,10 .p2align 3 .L5: movslq %edx, %rdx movl %ecx, (%rdi,%rdx,4) ret .p2align 4,,10 .p2align 3 .L4: movslq %edx, %rdx movw %cx, (%rdi,%rdx,2) ret .cfi_endproc As you can see, it generates a chain of compares, rather than an indirect jump through a jump table. Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com