[Bug libgcc/79280] New: mbtowc converts only one byte

2017-01-30 Thread janturon at email dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79280

Bug ID: 79280
   Summary: mbtowc converts only one byte
   Product: gcc
   Version: 4.8.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: libgcc
  Assignee: unassigned at gcc dot gnu.org
  Reporter: janturon at email dot cz
  Target Milestone: ---

mbtowc doesn't seem to work with chars longer than one byte, see the following
snippet:

int u8toint(const char* str) {
  if(!(*str&128)) return *str;
  unsigned char c = *str, bytes = 0;
  while((c<<=1)&128) ++bytes;
  int result = 0;
  for(int i=bytes; i>0; --i) result|= (*(str+i)&127)<<(6*(bytes-i));
  int mask = 1;
  for(int i=bytes; i<6; ++i) mask<<= 1, mask|= 1;
  result|= (*str&mask)<<(6*bytes);
  return result;
}

union data {
  wchar_t w;
  struct {
unsigned char b1, b2;
  } bytes;
} a,b,c;

mbtowc(&(a.w),"ř",6);
b.w = u8toint("ř");
c.w = L'ř';

printf("\na = %hhx%hhx", a.bytes.b2, a.bytes.b1); // a = 0c5 wrong
printf("\nb = %hhx%hhx", b.bytes.b2, b.bytes.b1); // b = 159 right
printf("\nc = %hhx%hhx", c.bytes.b2, c.bytes.b1); // c = 159 right

[Bug libgcc/79280] mbtowc converts only one byte

2017-01-31 Thread janturon at email dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79280

Jan Turoň  changed:

   What|Removed |Added

 Resolution|INVALID |WORKSFORME

[Bug libgcc/79280] mbtowc converts only one byte

2017-01-31 Thread janturon at email dot cz
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79280

--- Comment #2 from Jan Turoň  ---
setlocale does some change, but still not right:

My system locale is cs_CZ, the default codepage is 1250, console uses 852. I
Have these results, considering this code:

const char *str = "ř";
mbtowc(&(a.w), str, 6);
printf("\na = %hhx%hhx", a.bytes.b2, a.bytes.b1);

setlocale(LC_CTYPE,"C"); // gives 0c5
setlocale(LC_CTYPE,"Czech_Czech Republic.1250"); // (same as "") gives 139
setlocale(LC_CTYPE,"Czech_Czech Republic.852"); // (same as "") gives 253c

The expected result is 159 (Unicode number of "ř").