Re: Re: unicode problem?

2010-10-09 Thread hidura
I had a similar problem but i can 't encode a byte to a file what has been uploaded, without damage the data if i used utf-8 to encode the file duplicates the size, and i try to change the codec to raw_unicode_escape and this barely give me the correct size but still damage the file, i used

Re: unicode problem?

2010-10-09 Thread Chris Rebert
On Sat, Oct 9, 2010 at 4:59 PM, Brian Blais wrote: > This may be a stemming from my complete ignorance of unicode, but when I do > this (Python 2.6): > > s='\xc2\xa9 2008 \r\n' > > and I want the ascii version of it, ignoring any non-ascii chars, I thought I > could do: > > s.encode('ascii','ign

Re: unicode problem?

2010-10-09 Thread Benjamin Kaplan
On Sat, Oct 9, 2010 at 7:59 PM, Brian Blais wrote: > This may be a stemming from my complete ignorance of unicode, but when I do > this (Python 2.6): > > s='\xc2\xa9 2008 \r\n' > > and I want the ascii version of it, ignoring any non-ascii chars, I thought I > could do: > > s.encode('ascii','ign

Re: Unicode problem in ucs4

2009-03-25 Thread abhi
On Mar 24, 4:55 am, "Martin v. Löwis" wrote: > > So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3 > > \0s after a char, printf or wprintf is only printing one letter. > > No. printf indeed will see a terminating character. However, wprintf > should correctly know that a wchar_t

Re: Unicode problem in ucs4

2009-03-23 Thread Martin v. Löwis
> So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3 > \0s after a char, printf or wprintf is only printing one letter. No. printf indeed will see a terminating character. However, wprintf should correctly know that a wchar_t has four bytes per character, and print it correctly. M

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 12:57, abhi wrote: >>> Is there any way >>> by which I can force wchar_t to be 2 bytes, or can I convert this UCS4 >>> data to UCS2 explicitly? >> Sure: just use the appropriate UTF-16 codec for this. >> >> /* Generic codec based encoding API. >> >>object is passed through the enc

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 14:05, abhi wrote: > Hi Marc, >Is there any way to ensure that wchar_t size would always be 2 > instead of 4 in ucs4 configured python? Googling gave me the > impression that there is some logic written in PyUnicode_AsWideChar() > which can take care of ucs4 to ucs2 conversion

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 23, 4:57 pm, abhi wrote: > On Mar 23, 4:37 pm, "M.-A. Lemburg" wrote: > > > > > On 2009-03-23 11:50, abhi wrote: > > > > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > > > Thanks Marc, John, > > >          With your help, I am at least somewhere. I re-wrote the code > > > to compare Py_Unic

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 23, 4:37 pm, "M.-A. Lemburg" wrote: > On 2009-03-23 11:50, abhi wrote: > > > > > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > > Thanks Marc, John, > >          With your help, I am at least somewhere. I re-wrote the code > > to compare Py_Unicode and wchar_t outputs and they both look exac

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 11:50, abhi wrote: > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > Thanks Marc, John, > With your help, I am at least somewhere. I re-wrote the code > to compare Py_Unicode and wchar_t outputs and they both look exactly > the same. > > #include > > static PyObject *unicode_

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote: > On 2009-03-23 08:18, abhi wrote: > > > > > On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote: > >>> unicodeTest.c > >>> #include > >>> static PyObject *unicode_helper(PyObject *self,PyObject *args){ > >>>    PyObject *sampleObj = NULL; > >>>            Py_UNIC

Re: Unicode problem in ucs4

2009-03-23 Thread M.-A. Lemburg
On 2009-03-23 08:18, abhi wrote: > On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote: >>> unicodeTest.c >>> #include >>> static PyObject *unicode_helper(PyObject *self,PyObject *args){ >>>PyObject *sampleObj = NULL; >>>Py_UNICODE *sample = NULL; >>> if (!PyArg_ParseTuple(args, "O", &

Re: Unicode problem in ucs4

2009-03-23 Thread John Machin
On Mar 23, 6:41 pm, John Machin had a severe attack of backslashitis: > [presuming littleendian] The ucs4 string will look like "\t\0\0\0e > \0\0\0s\0\0\0t\0\0\0" in memory. I suspect that your wprintf is > grokking only 16-bit doodads -- "\t\0" is printed and then "\0\0" is > end-of-string. Try

Re: Unicode problem in ucs4

2009-03-23 Thread John Machin
On Mar 23, 6:18 pm, abhi wrote: [snip] > Hi Mark, >      Thanks for the help. I tried PyUnicode_AsWideChar() but I am > getting the same result i.e. only the first letter. > > sample code: > > #include > > static PyObject *unicode_helper(PyObject *self,PyObject *args){ >         PyObject *sampleO

Re: Unicode problem in ucs4

2009-03-23 Thread abhi
On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote: > On 2009-03-20 12:13, abhi wrote: > > > > > > > On Mar 20, 11:03 am, "Martin v. Löwis" wrote: > >>> Any idea on why this is happening? > >> Can you provide a complete example? Your code looks correct, and should > >> just work. > > >> How do you know th

Re: Unicode problem in ucs4

2009-03-20 Thread M.-A. Lemburg
On 2009-03-20 12:13, abhi wrote: > On Mar 20, 11:03 am, "Martin v. Löwis" wrote: >>> Any idea on why this is happening? >> Can you provide a complete example? Your code looks correct, and should >> just work. >> >> How do you know the result contains only 't' (i.e. how do you know it >> does not c

Re: Unicode problem in ucs4

2009-03-20 Thread abhi
On Mar 20, 11:03 am, "Martin v. Löwis" wrote: > > Any idea on why this is happening? > > Can you provide a complete example? Your code looks correct, and should > just work. > > How do you know the result contains only 't' (i.e. how do you know it > does not contain 'e', 's', 't')? > > Regards, >

Re: Unicode problem in ucs4

2009-03-19 Thread Martin v. Löwis
> Any idea on why this is happening? Can you provide a complete example? Your code looks correct, and should just work. How do you know the result contains only 't' (i.e. how do you know it does not contain 'e', 's', 't')? Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list

Re: Unicode Problem

2008-10-30 Thread Bard Aase
On Thu, Oct 30, 2008 at 8:28 AM, Seid Mohammed <[EMAIL PROTECTED]> wrote: > I am new to python. > I want to print Amharic character using the Python IDLE. > here goes somple code > == abebe = 'አበበ በሶ በላ' abebe > '\xe1\x8a\xa0\xe1

Re: Unicode Problem

2008-10-30 Thread Ulrich Eckhardt
Seid Mohammed wrote: > I am new to python. Welcome! :) abebe = 'አበበ በሶ በላ' abebe > '\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89\xa0 \xe1\x89\xa0\xe1\x88\xb6 > \xe1\x89\xa0\xe1\x88\x8b' print abebe > አበበ በሶ በላ abeba = ['አበበ','በሶ','በላ'] abeba > ['\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89\xa0',

Re: Unicode Problem

2008-10-30 Thread Marc 'BlackJack' Rintsch
On Thu, 30 Oct 2008 10:28:39 +0300, Seid Mohammed wrote: > I am new to python. > I want to print Amharic character using the Python IDLE. here goes > somple code > == abebe = 'አበበ በሶ በላ' abebe > '\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89

Re: Unicode problem

2007-07-08 Thread [EMAIL PROTECTED]
> > What software did you use to make that so? The Python codec certainly > never would do such a thing. > > Are you sure it was latin-1 and \x27, and not windows-1252 and \x92? > > Regards, > Martin you're right...the source of text are html pages and obviously webmasters have poor knowledge o

Re: Unicode problem

2007-07-07 Thread Erik Max Francis
[EMAIL PROTECTED] wrote: > Hi to all, I have a little problem with unicode handling under Python. > > I have this code > > s = u'A unicode string with this damn apostrophe \x2019' > > outf = codecs.open('filename.txt', 'w', 'iso-8859-15') > outf.write(s) > > what I obtain is a UnicodeEncodeErr

Re: Unicode problem

2007-07-07 Thread Alex Martelli
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote: ... Ah, I answered you on the Italian NG before seeing you had also posted the same request here. What I proposed there was (untested): import codecs _rimedi = { u'\x2019': "'" } def rimedia(exc): if isinstance(exc, (UnicodeEncodeError, Unic

Re: Unicode problem

2007-07-07 Thread Martin v. Löwis
> I agree, but the problem is much subtle. I have coverted a text from > iso-8859-1 to utf-8 and the codecs have translated \x27 ( the iso > apostrophe ) to \xe28099 in utf-8 ( or u'2019' in unicode code point > notation ) What software did you use to make that so? The Python codec certainly never

Re: Unicode problem

2007-07-07 Thread [EMAIL PROTECTED]
> No it shouldn't because \x2019 is a "right single quotation mark" and not > an apostrophe. > > Ciao, > Marc 'BlackJack' Rintsch I agree, but the problem is much subtle. I have coverted a text from iso-8859-1 to utf-8 and the codecs have translated \x27 ( the iso apostrophe ) to \xe28099

Re: Unicode problem

2007-07-07 Thread Marc 'BlackJack' Rintsch
On Sat, 07 Jul 2007 16:06:03 +, [EMAIL PROTECTED] wrote: > Hi to all, I have a little problem with unicode handling under Python. > > I have this code > > s = u'A unicode string with this damn apostrophe \x2019' > > outf = codecs.open('filename.txt', 'w', 'iso-8859-15') > outf.write(s) > >

Re: Unicode problem

2007-04-09 Thread Martin v. Löwis
> BTW, any reason why an EncodedFile can't act like a Unicode > writer/reader object > if one of its encodings is explicitly set to None? AFAIU, that's not the intention of EncodedFile: instead, it is meant to do recoding. I find it a pretty useless API, and rather see it go away than being enhanc

Re: Unicode problem

2007-04-09 Thread Georg Brandl
Martin v. Löwis schrieb: >> Thanks! That's a nice little stumbling block for a newbie like me ;) Is >> there a way to make utf-8 the default encoding for every string, so that >> I do not have to encode each string explicitly? > > You can make sys.stdout encode each string with UTF-8, with > >

Re: Unicode problem

2007-04-08 Thread Martin v. Löwis
> Thanks! That's a nice little stumbling block for a newbie like me ;) Is > there a way to make utf-8 the default encoding for every string, so that > I do not have to encode each string explicitly? You can make sys.stdout encode each string with UTF-8, with sys.stdout = codecs.getwriter('utf-8

Re: Unicode problem

2007-04-08 Thread Rehceb Rotkiv
On Sat, 07 Apr 2007 12:46:49 -0700, Gabriel Genellina wrote: > You have to encode the Unicode object explicitely: print > fileString.encode("utf-8") > (or any other suitable one; I said utf-8 just because you read the input > file using that) Thanks! That's a nice little stumbling block for a new

Re: Unicode problem

2007-04-07 Thread Gabriel Genellina
Rehceb Rotkiv wrote: > #!/usr/bin/python > import sys > import codecs > fileHandle = codecs.open(sys.argv[1], 'r', 'utf-8') > fileString = fileHandle.read() > print fileString > > if I call it from a Bash shell like this > > $ ./test.py testfile.utf8.txt > > it works just fine, but when I try to p

Re: Unicode problem in BeautifulSoup; worked in Python 2.4, fails in Python 2.5.

2007-02-04 Thread Mizipzor
On Feb 4, 11:39 pm, John Nagle <[EMAIL PROTECTED]> wrote: > I'm running a website page through BeautifulSoup. It parses OK > with Python 2.4, but Python 2.5 fails with an exception: > > Traceback (most recent call last): >File "./sitetruth/InfoSitePage.py", line 268, in httpfetch > se

Re: Unicode problem with exec

2006-06-23 Thread John Machin
On 23/06/2006 9:06 PM, Thomas Heller wrote: > I'm using code.Interactive console but it doesn't work correctly > with non-ascii characters. I think it boils down to this problem: > > Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "cre

Re: Unicode problem with exec

2006-06-23 Thread Diez B. Roggisch
Thomas Heller schrieb: > I'm using code.Interactive console but it doesn't work correctly > with non-ascii characters. I think it boils down to this problem: > > Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on > win32 > Type "help", "copyright", "credits" or "license" fo