I had a similar problem but i can 't encode a byte to a file what has been
uploaded, without damage the data if i used utf-8 to encode the file
duplicates the size, and i try to change the codec to raw_unicode_escape
and this barely give me the correct size but still damage the file, i used
On Sat, Oct 9, 2010 at 4:59 PM, Brian Blais wrote:
> This may be a stemming from my complete ignorance of unicode, but when I do
> this (Python 2.6):
>
> s='\xc2\xa9 2008 \r\n'
>
> and I want the ascii version of it, ignoring any non-ascii chars, I thought I
> could do:
>
> s.encode('ascii','ign
On Sat, Oct 9, 2010 at 7:59 PM, Brian Blais wrote:
> This may be a stemming from my complete ignorance of unicode, but when I do
> this (Python 2.6):
>
> s='\xc2\xa9 2008 \r\n'
>
> and I want the ascii version of it, ignoring any non-ascii chars, I thought I
> could do:
>
> s.encode('ascii','ign
On Mar 24, 4:55 am, "Martin v. Löwis" wrote:
> > So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3
> > \0s after a char, printf or wprintf is only printing one letter.
>
> No. printf indeed will see a terminating character. However, wprintf
> should correctly know that a wchar_t
> So, both Py_UNICODE and wchar_t are 4 bytes and since it contains 3
> \0s after a char, printf or wprintf is only printing one letter.
No. printf indeed will see a terminating character. However, wprintf
should correctly know that a wchar_t has four bytes per character,
and print it correctly. M
On 2009-03-23 12:57, abhi wrote:
>>> Is there any way
>>> by which I can force wchar_t to be 2 bytes, or can I convert this UCS4
>>> data to UCS2 explicitly?
>> Sure: just use the appropriate UTF-16 codec for this.
>>
>> /* Generic codec based encoding API.
>>
>>object is passed through the enc
On 2009-03-23 14:05, abhi wrote:
> Hi Marc,
>Is there any way to ensure that wchar_t size would always be 2
> instead of 4 in ucs4 configured python? Googling gave me the
> impression that there is some logic written in PyUnicode_AsWideChar()
> which can take care of ucs4 to ucs2 conversion
On Mar 23, 4:57 pm, abhi wrote:
> On Mar 23, 4:37 pm, "M.-A. Lemburg" wrote:
>
>
>
> > On 2009-03-23 11:50, abhi wrote:
>
> > > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote:
> > > Thanks Marc, John,
> > > With your help, I am at least somewhere. I re-wrote the code
> > > to compare Py_Unic
On Mar 23, 4:37 pm, "M.-A. Lemburg" wrote:
> On 2009-03-23 11:50, abhi wrote:
>
>
>
> > On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote:
> > Thanks Marc, John,
> > With your help, I am at least somewhere. I re-wrote the code
> > to compare Py_Unicode and wchar_t outputs and they both look exac
On 2009-03-23 11:50, abhi wrote:
> On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote:
> Thanks Marc, John,
> With your help, I am at least somewhere. I re-wrote the code
> to compare Py_Unicode and wchar_t outputs and they both look exactly
> the same.
>
> #include
>
> static PyObject *unicode_
On Mar 23, 3:04 pm, "M.-A. Lemburg" wrote:
> On 2009-03-23 08:18, abhi wrote:
>
>
>
> > On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote:
> >>> unicodeTest.c
> >>> #include
> >>> static PyObject *unicode_helper(PyObject *self,PyObject *args){
> >>> PyObject *sampleObj = NULL;
> >>> Py_UNIC
On 2009-03-23 08:18, abhi wrote:
> On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote:
>>> unicodeTest.c
>>> #include
>>> static PyObject *unicode_helper(PyObject *self,PyObject *args){
>>>PyObject *sampleObj = NULL;
>>>Py_UNICODE *sample = NULL;
>>> if (!PyArg_ParseTuple(args, "O", &
On Mar 23, 6:41 pm, John Machin had a severe
attack of backslashitis:
> [presuming littleendian] The ucs4 string will look like "\t\0\0\0e
> \0\0\0s\0\0\0t\0\0\0" in memory. I suspect that your wprintf is
> grokking only 16-bit doodads -- "\t\0" is printed and then "\0\0" is
> end-of-string. Try
On Mar 23, 6:18 pm, abhi wrote:
[snip]
> Hi Mark,
> Thanks for the help. I tried PyUnicode_AsWideChar() but I am
> getting the same result i.e. only the first letter.
>
> sample code:
>
> #include
>
> static PyObject *unicode_helper(PyObject *self,PyObject *args){
> PyObject *sampleO
On Mar 20, 5:47 pm, "M.-A. Lemburg" wrote:
> On 2009-03-20 12:13, abhi wrote:
>
>
>
>
>
> > On Mar 20, 11:03 am, "Martin v. Löwis" wrote:
> >>> Any idea on why this is happening?
> >> Can you provide a complete example? Your code looks correct, and should
> >> just work.
>
> >> How do you know th
On 2009-03-20 12:13, abhi wrote:
> On Mar 20, 11:03 am, "Martin v. Löwis" wrote:
>>> Any idea on why this is happening?
>> Can you provide a complete example? Your code looks correct, and should
>> just work.
>>
>> How do you know the result contains only 't' (i.e. how do you know it
>> does not c
On Mar 20, 11:03 am, "Martin v. Löwis" wrote:
> > Any idea on why this is happening?
>
> Can you provide a complete example? Your code looks correct, and should
> just work.
>
> How do you know the result contains only 't' (i.e. how do you know it
> does not contain 'e', 's', 't')?
>
> Regards,
>
> Any idea on why this is happening?
Can you provide a complete example? Your code looks correct, and should
just work.
How do you know the result contains only 't' (i.e. how do you know it
does not contain 'e', 's', 't')?
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list
On Thu, Oct 30, 2008 at 8:28 AM, Seid Mohammed <[EMAIL PROTECTED]> wrote:
> I am new to python.
> I want to print Amharic character using the Python IDLE.
> here goes somple code
> ==
abebe = 'አበበ በሶ በላ'
abebe
> '\xe1\x8a\xa0\xe1
Seid Mohammed wrote:
> I am new to python.
Welcome! :)
abebe = 'አበበ በሶ በላ'
abebe
> '\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89\xa0 \xe1\x89\xa0\xe1\x88\xb6
> \xe1\x89\xa0\xe1\x88\x8b'
print abebe
> አበበ በሶ በላ
abeba = ['አበበ','በሶ','በላ']
abeba
> ['\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89\xa0',
On Thu, 30 Oct 2008 10:28:39 +0300, Seid Mohammed wrote:
> I am new to python.
> I want to print Amharic character using the Python IDLE. here goes
> somple code
> ==
abebe = 'አበበ በሶ በላ'
abebe
> '\xe1\x8a\xa0\xe1\x89\xa0\xe1\x89
>
> What software did you use to make that so? The Python codec certainly
> never would do such a thing.
>
> Are you sure it was latin-1 and \x27, and not windows-1252 and \x92?
>
> Regards,
> Martin
you're right...the source of text are html pages and obviously webmasters
have poor knowledge o
[EMAIL PROTECTED] wrote:
> Hi to all, I have a little problem with unicode handling under Python.
>
> I have this code
>
> s = u'A unicode string with this damn apostrophe \x2019'
>
> outf = codecs.open('filename.txt', 'w', 'iso-8859-15')
> outf.write(s)
>
> what I obtain is a UnicodeEncodeErr
[EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
...
Ah, I answered you on the Italian NG before seeing you had also posted
the same request here. What I proposed there was (untested):
import codecs
_rimedi = { u'\x2019': "'" }
def rimedia(exc):
if isinstance(exc, (UnicodeEncodeError, Unic
> I agree, but the problem is much subtle. I have coverted a text from
> iso-8859-1 to utf-8 and the codecs have translated \x27 ( the iso
> apostrophe ) to \xe28099 in utf-8 ( or u'2019' in unicode code point
> notation )
What software did you use to make that so? The Python codec certainly
never
> No it shouldn't because \x2019 is a "right single quotation mark" and not
> an apostrophe.
>
> Ciao,
> Marc 'BlackJack' Rintsch
I agree, but the problem is much subtle. I have coverted a text from
iso-8859-1 to utf-8 and the codecs have translated \x27 ( the iso
apostrophe ) to \xe28099
On Sat, 07 Jul 2007 16:06:03 +, [EMAIL PROTECTED] wrote:
> Hi to all, I have a little problem with unicode handling under Python.
>
> I have this code
>
> s = u'A unicode string with this damn apostrophe \x2019'
>
> outf = codecs.open('filename.txt', 'w', 'iso-8859-15')
> outf.write(s)
>
>
> BTW, any reason why an EncodedFile can't act like a Unicode
> writer/reader object
> if one of its encodings is explicitly set to None?
AFAIU, that's not the intention of EncodedFile: instead, it is
meant to do recoding. I find it a pretty useless API, and
rather see it go away than being enhanc
Martin v. Löwis schrieb:
>> Thanks! That's a nice little stumbling block for a newbie like me ;) Is
>> there a way to make utf-8 the default encoding for every string, so that
>> I do not have to encode each string explicitly?
>
> You can make sys.stdout encode each string with UTF-8, with
>
>
> Thanks! That's a nice little stumbling block for a newbie like me ;) Is
> there a way to make utf-8 the default encoding for every string, so that
> I do not have to encode each string explicitly?
You can make sys.stdout encode each string with UTF-8, with
sys.stdout = codecs.getwriter('utf-8
On Sat, 07 Apr 2007 12:46:49 -0700, Gabriel Genellina wrote:
> You have to encode the Unicode object explicitely: print
> fileString.encode("utf-8")
> (or any other suitable one; I said utf-8 just because you read the input
> file using that)
Thanks! That's a nice little stumbling block for a new
Rehceb Rotkiv wrote:
> #!/usr/bin/python
> import sys
> import codecs
> fileHandle = codecs.open(sys.argv[1], 'r', 'utf-8')
> fileString = fileHandle.read()
> print fileString
>
> if I call it from a Bash shell like this
>
> $ ./test.py testfile.utf8.txt
>
> it works just fine, but when I try to p
On Feb 4, 11:39 pm, John Nagle <[EMAIL PROTECTED]> wrote:
> I'm running a website page through BeautifulSoup. It parses OK
> with Python 2.4, but Python 2.5 fails with an exception:
>
> Traceback (most recent call last):
>File "./sitetruth/InfoSitePage.py", line 268, in httpfetch
> se
On 23/06/2006 9:06 PM, Thomas Heller wrote:
> I'm using code.Interactive console but it doesn't work correctly
> with non-ascii characters. I think it boils down to this problem:
>
> Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
> win32
> Type "help", "copyright", "cre
Thomas Heller schrieb:
> I'm using code.Interactive console but it doesn't work correctly
> with non-ascii characters. I think it boils down to this problem:
>
> Python 2.4.3 (#69, Mar 29 2006, 17:35:34) [MSC v.1310 32 bit (Intel)] on
> win32
> Type "help", "copyright", "credits" or "license" fo
35 matches
Mail list logo