Convert a list with wrong encoding to utf8
Hello, i have tried the following to chnage encoding to utf8 because for some
reason it has changed regarding list names
[python]
#populate client listing into list
names.append( name )
names.append( '' )
names.sort()
for name in names:
name = name.encode('latin1').decode('utf8')
[/python]
and the error that was presented was:
[output]
UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')
[/output]
Why it cannot encode in latin nad decode in utf8 normally?
And since 'names' are being fetced from mysql database, which they were stored
as utf8 strings WHY/HOW the 'names' enrolled in latin-1?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
On Fri, Feb 15, 2019 at 3:41 AM wrote:
>
> Hello, i have tried the following to chnage encoding to utf8 because for some
> reason it has changed regarding list names
>
> [python]
> #populate client listing into list
> names.append( name )
>
>
> names.append( '' )
> names.sort()
>
> for name in names:
> name = name.encode('latin1').decode('utf8')
> [/python]
>
> and the error that was presented was:
>
> [output]
> UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in
> range(256)')
> [/output]
>
> Why it cannot encode in latin nad decode in utf8 normally?
> And since 'names' are being fetced from mysql database, which they were
> stored as utf8 strings WHY/HOW the 'names' enrolled in latin-1?
You're going to have to figure out what encoding they are ACTUALLY in,
or (since it looks like you're working with strings) what encoding
they were decoded using. Without that information, all you're doing is
taking stabs in the dark.
BTW, you should probably fix your encodings *before* attempting to
sort the names. It doesn't make much sense to sort by byte values in
mojibake.
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
You can only decode FROM the same encoding you've encoded TO. Any decoding
must know the input it receives follows the rules of its encoding scheme.
latin1 is not utf8.
However, in your case, you aren't seeing problem with the decoding. That
step is never reached. It is failing to encode the string as latin1 because
it is not compatible with the latin1 scheme. Your string contains
characters which cannot be represented in latin1.
It really is not clear what you're trying to accomplish here. The string
encoding was already handled when you pulled this out of the database and
you should not need to do anything like this at all. You already have a
decoded string, because in python ALL strings are decoded already. Encoding
is only a process of converting strings to raw bytes for storage or
transmission, which you don't appear to be doing here.
On Thu, Feb 14, 2019 at 11:40 AM wrote:
> Hello, i have tried the following to chnage encoding to utf8 because for
> some reason it has changed regarding list names
>
> [python]
> #populate client listing into list
> names.append( name )
>
>
> names.append( '' )
> names.sort()
>
> for name in names:
> name = name.encode('latin1').decode('utf8')
> [/python]
>
> and the error that was presented was:
>
> [output]
> UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in
> range(256)')
> [/output]
>
> Why it cannot encode in latin nad decode in utf8 normally?
> And since 'names' are being fetced from mysql database, which they were
> stored as utf8 strings WHY/HOW the 'names' enrolled in latin-1?
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
CALVIN SPEALMAN
SENIOR QUALITY ENGINEER
[email protected] M: +1.336.210.5107
TRIED. TESTED. TRUSTED.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Why float('Nan') == float('Nan') is False
ast writes:
> Le 13/02/2019 à 14:21, ast a écrit :
>> Hello
>>
>> >>> float('Nan') == float('Nan')
>> False
>>
>> Why ?
>>
>> Regards
>>
>
> Thank you for answers.
>
> If you wonder how I was trapped with it, here
> is the failing program.
>
>
> r = float('Nan')
>
> while r==float('Nan'):
> inp = input("Enter a number\n")
> try:
> r = float(inp)
> except ValueError:
> r = float('Nan')
import math
while math.isnan(r) :
will do what you're looking for.
If you're using python 3.5 or higher, you can also use math.nan instead
of float('nan').
--
https://mail.python.org/mailman/listinfo/python-list
Re: Why float('Nan') == float('Nan') is False
On Fri, Feb 15, 2019 at 3:56 AM Joe Pfeiffer wrote:
>
> ast writes:
>
> > Le 13/02/2019 à 14:21, ast a écrit :
> >> Hello
> >>
> >> >>> float('Nan') == float('Nan')
> >> False
> >>
> >> Why ?
> >>
> >> Regards
> >>
> >
> > Thank you for answers.
> >
> > If you wonder how I was trapped with it, here
> > is the failing program.
> >
> >
> > r = float('Nan')
> >
> > while r==float('Nan'):
> > inp = input("Enter a number\n")
> > try:
> > r = float(inp)
> > except ValueError:
> > r = float('Nan')
>
> import math
> while math.isnan(r) :
>
> will do what you're looking for.
>
> If you're using python 3.5 or higher, you can also use math.nan instead
> of float('nan').
Or even better, use None instead of nan. There's nothing in Python
says you have to (ab)use a floating-point value as a signal. Or use
"while True" and add a break if the exception isn't thrown.
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 6:45:29 μ.μ. UTC+2, ο χρήστης Calvin Spealman
έγραψε:
> You can only decode FROM the same encoding you've encoded TO. Any decoding
> must know the input it receives follows the rules of its encoding scheme.
> latin1 is not utf8.
>
> However, in your case, you aren't seeing problem with the decoding. That
> step is never reached. It is failing to encode the string as latin1 because
> it is not compatible with the latin1 scheme. Your string contains
> characters which cannot be represented in latin1.
>
> It really is not clear what you're trying to accomplish here. The string
> encoding was already handled when you pulled this out of the database and
> you should not need to do anything like this at all. You already have a
> decoded string, because in python ALL strings are decoded already. Encoding
> is only a process of converting strings to raw bytes for storage or
> transmission, which you don't appear to be doing here.
Names in database are stored in utf8
When the script runs it reads them and handles them as utf8, right?
If it like this, then why when i print 'names' list i see bytes in hexadecimal
format?
'\xce\x86\xce\xba\xce\xb7\xcf\x82
\xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
And only if i
for name in names:
print( name.encode('latin1').decode('utf8') )
i can see the values of 'name' list correctly in Greek.
But where did the latin-iso took in place? And aparrt for printing the name
like above how can i store them in proper utf ?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Why float('Nan') == float('Nan') is False
> Or even better, use None instead of nan. ++ On Thu, Feb 14, 2019 at 3:26 AM Joe Pfeiffer wrote: > [email protected] writes: > > > There are more integers than odd numbers, and more odd numbers than prime > > numbers. An infinite set may be a subset of another infinite set although > > they may both have the same cardinality. Or in other words, the number of > > elements in each set is not equal. One has more elements than the other. > > AND, by induction you can also prove that the other one has more elements > > than the first one. So the number of elements in two infinite sets can't > be > > equal. Even, if you compare the same set to itself. > > You would expect that to be true, but it is not. There are in fact the > same number of odd integers as integers, and the same number of primes > as integers. Counterintuitive but true. > I know it's a Python mailing list and not Math, and this thread is off-topic. But anyway, it depends how you define "more". There are infinite integers and odd numbers, and as I said you can't compare infinite "numbers" of elements. But, the set of odd numbers is a subset of the set of integers. If you take any big range, for example from 0 to google (10**100) - there are more integers in this range than odd numbers. There are integers which are not odd numbers but there are no odd numbers which are not integers. The set of integers which are not odd numbers is infinite. So in this sense, there are more integers than odd numbers. And also, there are more odd numbers than prime numbers (there is one prime number which is not odd, and many many odd numbers which are not prime). > There are in fact the same number of odd integers as integers, and the same number of primes as integers. If you mean that the "number" of odd integers is equal to the "number" of integers, it is not. They are both infinite and infinity is not a number. Two sets can have the same cardinality even if one set contains more elements than the other. At least in the sense I define "more". A cardinality is equal to the number of elements only for finite sets. For infinite sets the cardinality is not the number of elements, the number of elements is infinite. > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: What's up with Activestate Python?
Il 14/02/2019 00:06, Grant Edwards ha scritto: For many, many years I've always installed ActiveState's ActivePython Community edition when forced to use Windows. It has always included ... I guess it's time to switch to Anaconda or ??? I've also used the ActiveState python, expecially for the 2.7.x series, mainly for the oflline docs and the pywin32 libraries. Now the situation is better and with pip is really easy to have an updated python with a lot of libs so there is less need for the ActiveState distribution. My 2 cent. Daniele Forghieri -- https://mail.python.org/mailman/listinfo/python-list
Re: Why float('Nan') == float('Nan') is False
Chris Angelico writes: > > Or even better, use None instead of nan. There's nothing in Python > says you have to (ab)use a floating-point value as a signal. Or use > "while True" and add a break if the exception isn't thrown. Good point. -- https://mail.python.org/mailman/listinfo/python-list
Re: What's up with Activestate Python?
On 2019-02-14, Liste guru wrote: > Il 14/02/2019 00:06, Grant Edwards ha scritto: >> For many, many years I've always installed ActiveState's ActivePython >> Community edition when forced to use Windows. It has always included >> ... >> I guess it's time to switch to Anaconda or ??? > > I've also used the ActiveState python, expecially for the 2.7.x > series, mainly for the oflline docs and the pywin32 libraries. > > Now the situation is better and with pip is really easy to have an > updated python with a lot of libs so there is less need for the > ActiveState distribution. How does that work for libraries that require a C compiler? Does pip know how to download and install any required C/C++ toolchains? -- Grant Edwards grant.b.edwardsYow! When you get your at PH.D. will you get able to gmail.comwork at BURGER KING? -- https://mail.python.org/mailman/listinfo/python-list
Re: What's up with Activestate Python?
On Fri, Feb 15, 2019 at 5:11 AM Grant Edwards wrote: > > On 2019-02-14, Liste guru wrote: > > Il 14/02/2019 00:06, Grant Edwards ha scritto: > >> For many, many years I've always installed ActiveState's ActivePython > >> Community edition when forced to use Windows. It has always included > >> ... > >> I guess it's time to switch to Anaconda or ??? > > > > I've also used the ActiveState python, expecially for the 2.7.x > > series, mainly for the oflline docs and the pywin32 libraries. > > > > Now the situation is better and with pip is really easy to have an > > updated python with a lot of libs so there is less need for the > > ActiveState distribution. > > How does that work for libraries that require a C compiler? Does pip > know how to download and install any required C/C++ toolchains? > Ideally, it'll be downloading prebuilt wheels. Whether that's always possible, I don't know. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
If you see something like this
'\xce\x86\xce\xba\xce\xb7\xcf\x82
\xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
then you don't have a string, you have raw bytes. You don't "encode" bytes,
you decode them. If you know this is already encoded as UTF-8 then you just
need the decode('utf8') part and *not* the encode('latin1') step.
encode() is something that turns text into bytes
decode() is something that turns bytes into text
So, if you already have bytes and you need text, you should only want to be
doing a decode() and you just need to specific the correct encoding.
On Thu, Feb 14, 2019 at 12:15 PM wrote:
> Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 6:45:29 μ.μ. UTC+2, ο χρήστης Calvin
> Spealman έγραψε:
> > You can only decode FROM the same encoding you've encoded TO. Any
> decoding
> > must know the input it receives follows the rules of its encoding scheme.
> > latin1 is not utf8.
> >
> > However, in your case, you aren't seeing problem with the decoding. That
> > step is never reached. It is failing to encode the string as latin1
> because
> > it is not compatible with the latin1 scheme. Your string contains
> > characters which cannot be represented in latin1.
> >
> > It really is not clear what you're trying to accomplish here. The string
> > encoding was already handled when you pulled this out of the database and
> > you should not need to do anything like this at all. You already have a
> > decoded string, because in python ALL strings are decoded already.
> Encoding
> > is only a process of converting strings to raw bytes for storage or
> > transmission, which you don't appear to be doing here.
>
> Names in database are stored in utf8
> When the script runs it reads them and handles them as utf8, right?
>
> If it like this, then why when i print 'names' list i see bytes in
> hexadecimal format?
>
> '\xce\x86\xce\xba\xce\xb7\xcf\x82
> \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>
> And only if i
>
> for name in names:
> print( name.encode('latin1').decode('utf8') )
>
> i can see the values of 'name' list correctly in Greek.
>
> But where did the latin-iso took in place? And aparrt for printing the
> name like above how can i store them in proper utf ?
> --
> https://mail.python.org/mailman/listinfo/python-list
>
--
CALVIN SPEALMAN
SENIOR QUALITY ENGINEER
[email protected] M: +1.336.210.5107
TRIED. TESTED. TRUSTED.
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
On 2019-02-14 18:16, Calvin Spealman wrote:
If you see something like this
'\xce\x86\xce\xba\xce\xb7\xcf\x82
\xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
then you don't have a string, you have raw bytes. You don't "encode" bytes,
you decode them. If you know this is already encoded as UTF-8 then you just
need the decode('utf8') part and *not* the encode('latin1') step.
encode() is something that turns text into bytes
decode() is something that turns bytes into text
So, if you already have bytes and you need text, you should only want to be
doing a decode() and you just need to specific the correct encoding.
It doesn't have a 'b' prefix, so either it's Python 2 or it's a Unicode
string that was decoded wrongly from the bytes.
On Thu, Feb 14, 2019 at 12:15 PM wrote:
Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 6:45:29 μ.μ. UTC+2, ο χρήστης Calvin
Spealman έγραψε:
> You can only decode FROM the same encoding you've encoded TO. Any
decoding
> must know the input it receives follows the rules of its encoding scheme.
> latin1 is not utf8.
>
> However, in your case, you aren't seeing problem with the decoding. That
> step is never reached. It is failing to encode the string as latin1
because
> it is not compatible with the latin1 scheme. Your string contains
> characters which cannot be represented in latin1.
>
> It really is not clear what you're trying to accomplish here. The string
> encoding was already handled when you pulled this out of the database and
> you should not need to do anything like this at all. You already have a
> decoded string, because in python ALL strings are decoded already.
Encoding
> is only a process of converting strings to raw bytes for storage or
> transmission, which you don't appear to be doing here.
Names in database are stored in utf8
When the script runs it reads them and handles them as utf8, right?
If it like this, then why when i print 'names' list i see bytes in
hexadecimal format?
'\xce\x86\xce\xba\xce\xb7\xcf\x82
\xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
And only if i
for name in names:
print( name.encode('latin1').decode('utf8') )
i can see the values of 'name' list correctly in Greek.
But where did the latin-iso took in place? And aparrt for printing the
name like above how can i store them in proper utf ?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:16:40 μ.μ. UTC+2, ο χρήστης Calvin Spealman
έγραψε:
> If you see something like this
>
> '\xce\x86\xce\xba\xce\xb7\xcf\x82
> \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
>
> then you don't have a string, you have raw bytes. You don't "encode" bytes,
> you decode them. If you know this is already encoded as UTF-8 then you just
> need the decode('utf8') part and *not* the encode('latin1') step.
>
> encode() is something that turns text into bytes
> decode() is something that turns bytes into text
>
> So, if you already have bytes and you need text, you should only want to be
> doing a decode() and you just need to specific the correct encoding.
I Agree but I don't know in what encoding the string is encoded into.
I just tried
names = tuple( [s.decode('utf8') for s in names] )
but i get the error of:
AttributeError("'str' object has no attribute 'decode'",)
but why it says s is a string object? Since we have names in raw bytes is
should be a bytes object?
How can i turn names from raw bytes to utf-8 strings?
ps. Who encoded them in raw bytes anyways? Since they fetced directly from the
database shouldn't python3 have them stored in names as utf-8 strings? why raw
bytes instead?
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
Hi,
On Thu, Feb 14, 2019 at 1:10 PM wrote:
>
> Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:16:40 μ.μ. UTC+2, ο χρήστης Calvin
> Spealman έγραψε:
> > If you see something like this
> >
> > '\xce\x86\xce\xba\xce\xb7\xcf\x82
> > \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
> >
> > then you don't have a string, you have raw bytes. You don't "encode" bytes,
> > you decode them. If you know this is already encoded as UTF-8 then you just
> > need the decode('utf8') part and *not* the encode('latin1') step.
> >
> > encode() is something that turns text into bytes
> > decode() is something that turns bytes into text
> >
> > So, if you already have bytes and you need text, you should only want to be
> > doing a decode() and you just need to specific the correct encoding.
>
> I Agree but I don't know in what encoding the string is encoded into.
>
> I just tried
>
> names = tuple( [s.decode('utf8') for s in names] )
>
> but i get the error of:
>
> AttributeError("'str' object has no attribute 'decode'",)
>
> but why it says s is a string object? Since we have names in raw bytes is
> should be a bytes object?
>
> How can i turn names from raw bytes to utf-8 strings?
>
> ps. Who encoded them in raw bytes anyways? Since they fetced directly from
> the database shouldn't python3 have them stored in names as utf-8 strings?
> why raw bytes instead?
What DBMS? How do you access the DB?
Maybe the field is BLOB?
Thank you.
> --
> https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Problem : Generator
How to implement reverse generator It is only passing data in reverse or how it is Yeild always returns next value and is question valid? Thanks and Regards Prahallad -- https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 9:14:08 μ.μ. UTC+2, ο χρήστης Igor Korot
έγραψε:
> Hi,
>
> On Thu, Feb 14, 2019 at 1:10 PM wrote:
> >
> > Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:16:40 μ.μ. UTC+2, ο χρήστης Calvin
> > Spealman έγραψε:
> > > If you see something like this
> > >
> > > '\xce\x86\xce\xba\xce\xb7\xcf\x82
> > > \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
> > >
> > > then you don't have a string, you have raw bytes. You don't "encode"
> > > bytes,
> > > you decode them. If you know this is already encoded as UTF-8 then you
> > > just
> > > need the decode('utf8') part and *not* the encode('latin1') step.
> > >
> > > encode() is something that turns text into bytes
> > > decode() is something that turns bytes into text
> > >
> > > So, if you already have bytes and you need text, you should only want to
> > > be
> > > doing a decode() and you just need to specific the correct encoding.
> >
> > I Agree but I don't know in what encoding the string is encoded into.
> >
> > I just tried
> >
> > names = tuple( [s.decode('utf8') for s in names] )
> >
> > but i get the error of:
> >
> > AttributeError("'str' object has no attribute 'decode'",)
> >
> > but why it says s is a string object? Since we have names in raw bytes is
> > should be a bytes object?
> >
> > How can i turn names from raw bytes to utf-8 strings?
> >
> > ps. Who encoded them in raw bytes anyways? Since they fetced directly from
> > the database shouldn't python3 have them stored in names as utf-8 strings?
> > why raw bytes instead?
>
> What DBMS? How do you access the DB?
> Maybe the field is BLOB?
No, the fields are all 'varchar'
i use pymysql and i utilize it like so to grab the values from the db as utf-8.
con = pymysql.connect( db = 'clientele', user = 'vergos', passwd = '**',
charset = 'utf8' )
cur = con.cursor()
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
con = pymysql.connect( db = 'clientele', user = 'vergos', passwd =
'***', charset = 'utf8' )
cur = con.cursor()
Στις Πέμ, 14 Φεβ 2019 στις 9:13 μ.μ., ο/η Igor Korot
έγραψε:
> Hi,
>
> On Thu, Feb 14, 2019 at 1:10 PM wrote:
> >
> > Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:16:40 μ.μ. UTC+2, ο χρήστης Calvin
> Spealman έγραψε:
> > > If you see something like this
> > >
> > > '\xce\x86\xce\xba\xce\xb7\xcf\x82
> > > \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82'
> > >
> > > then you don't have a string, you have raw bytes. You don't "encode"
> bytes,
> > > you decode them. If you know this is already encoded as UTF-8 then you
> just
> > > need the decode('utf8') part and *not* the encode('latin1') step.
> > >
> > > encode() is something that turns text into bytes
> > > decode() is something that turns bytes into text
> > >
> > > So, if you already have bytes and you need text, you should only want
> to be
> > > doing a decode() and you just need to specific the correct encoding.
> >
> > I Agree but I don't know in what encoding the string is encoded into.
> >
> > I just tried
> >
> > names = tuple( [s.decode('utf8') for s in names] )
> >
> > but i get the error of:
> >
> > AttributeError("'str' object has no attribute 'decode'",)
> >
> > but why it says s is a string object? Since we have names in raw bytes
> is should be a bytes object?
> >
> > How can i turn names from raw bytes to utf-8 strings?
> >
> > ps. Who encoded them in raw bytes anyways? Since they fetced directly
> from the database shouldn't python3 have them stored in names as utf-8
> strings? why raw bytes instead?
>
> What DBMS? How do you access the DB?
> Maybe the field is BLOB?
>
> Thank you.
>
> > --
> > https://mail.python.org/mailman/listinfo/python-list
>
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:56:31 μ.μ. UTC+2, ο χρήστης MRAB έγραψε:
> It doesn't have a 'b' prefix, so either it's Python 2 or it's a Unicode
> string that was decoded wrongly from the bytes.
Yes it doesnt have the 'b' prefix so that hexadecimal are representation of
strings and not representation of bytes.
I just tried:
names = tuple( [s.encode('latin1').decode('utf8') for s in names] )
but i get
UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)')
'Άκης Τσιάμης' is a valid name but even so it gives an error.
Is it possible that Python3 a Unicode had the string wrongly decoded from the
bytes ?
What can i do to get the names?!
--
https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
I'm using Python3 and pymysql and already have charset presnt
[python]
con = pymysql.connect( db = 'clientele', user = 'vergos', passwd = '**',
charset = 'utf8' )
cur = con.cursor()
[/python]
From that i understand that the names being fetched from the db to pyhton
script are being fetced as utf8, right?
I dont convert, format the string in the meanwhile. Python3 handles the
encoidng and i dont know from where latin iso get into the middle but when i
[python]names = tuple( [s.encode('latin1').decode('utf8') for s in names]
)[/python]
[output]UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in
range(256)')[/output]
which is a perfectly valid names but still gives an error.
Also the strings produced '\xce\x86\xce\xba\xce\xb7\xcf\x82
\xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82' are strings not raw
bytes.
WHY Python3 instead of fetching the values from the db as 'utf8' it stores the
values in hex representation?
--
https://mail.python.org/mailman/listinfo/python-list
RE: Convert a list with wrong encoding to utf8
Next question is how did you _insert_ those names into the database previously? Are the names showing up ok using any other tool to look at them? The error might have been on insert and you're just seeing weird stuff now because of that. Maybe, where instead of giving it the text and letting the module deal with encodings, you gave it the raw UTF-8 encoding, and the module or db server said "let me encode that into the field or database defined default of latin-1 for you"... or something like that. -Original Message- From: Python-list [mailto:[email protected]] On Behalf Of [email protected] Sent: Thursday, February 14, 2019 2:56 PM To: [email protected] Subject: Re: Convert a list with wrong encoding to utf8 Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:56:31 μ.μ. UTC+2, ο χρήστης MRAB έγραψε: > It doesn't have a 'b' prefix, so either it's Python 2 or it's a Unicode > string that was decoded wrongly from the bytes. Yes it doesnt have the 'b' prefix so that hexadecimal are representation of strings and not representation of bytes. I just tried: names = tuple( [s.encode('latin1').decode('utf8') for s in names] ) but i get UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)') 'Άκης Τσιάμης' is a valid name but even so it gives an error. Is it possible that Python3 a Unicode had the string wrongly decoded from the bytes ? What can i do to get the names?! -- https://mail.python.org/mailman/listinfo/python-list -- https://mail.python.org/mailman/listinfo/python-list
Re: Convert a list with wrong encoding to utf8
On 02/14/2019 12:02 PM, [email protected] wrote: > Τη Πέμπτη, 14 Φεβρουαρίου 2019 - 8:16:40 μ.μ. UTC+2, ο χρήστης Calvin > Spealman έγραψε: >> If you see something like this >> >> '\xce\x86\xce\xba\xce\xb7\xcf\x82 >> \xce\xa4\xcf\x83\xce\xb9\xce\xac\xce\xbc\xce\xb7\xcf\x82' >> >> then you don't have a string, you have raw bytes. You don't "encode" bytes, >> you decode them. If you know this is already encoded as UTF-8 then you just >> need the decode('utf8') part and *not* the encode('latin1') step. >> >> encode() is something that turns text into bytes >> decode() is something that turns bytes into text >> >> So, if you already have bytes and you need text, you should only want to be >> doing a decode() and you just need to specific the correct encoding. > > I Agree but I don't know in what encoding the string is encoded into. > > I just tried > > names = tuple( [s.decode('utf8') for s in names] ) > > but i get the error of: > > AttributeError("'str' object has no attribute 'decode'",) Strictly speaking, that's correct. A Python 3 string object is already decoded unicode. It cannot be decoded again. > but why it says s is a string object? Since we have names in raw bytes is > should be a bytes object? It's clearly not raw bytes. > How can i turn names from raw bytes to utf-8 strings? They apparently aren't raw bytes. If they were, you could use .decode() > ps. Who encoded them in raw bytes anyways? Since they fetced directly from > the database shouldn't > python3 have them stored in names as utf-8 strings? why raw bytes instead? Something very strange is going on with your database and/or your queries. The pymysql api should be already decoding the utf-8 bytes for you and returning a unicode string. I have no idea why you're getting a unicode string that consists of code points that are the same as the utf-8 bytes. You'll have to post a little bit more of your code, like a simple, complete query example (a few lines of code) that shows absolutely everything you're trying to do to the string. Also you will want to use the mysql command-line utilities to try your queries and see what kind of data you're getting out. Because if mysql is told to use utf-8 for varchar, and if you're inserting the data using correctly-formed utf-8 encoded byte strings, it should come back out in Python as unicode. -- https://mail.python.org/mailman/listinfo/python-list
Re: Problem : Generator
Prahallad Achar writes: > How to implement reverse generator Welcome to the Python forum! That sounds like an interesting problem. Can you describe it more precisely? What should a “reverse generator” actually do (and not do)? Ideally, give an example: * Some code you would maybe expect to create a "reverse generator", that you have already tried but doesn't work. * Exactly what you *expect* the resulting object to do; what is its expected behaviour? What is its expected output? -- \ “Software patents provide one more means of controlling access | `\ to information. They are the tool of choice for the internet | _o__) highwayman.” —Anthony Taylor | Ben Finney -- https://mail.python.org/mailman/listinfo/python-list
FW: Why float('Nan') == float('Nan') is False
Other people have replied well enough with better ways to do this but I am
stuck on WHY this was seen as a way to do this at all.
The code was:
r = float('Nan')
while r==float('Nan'):
inp = input("Enter a number\n")
try:
r = float(inp)
except ValueError:
r = float('Nan')
But with a single exception, what the user types in to the input statement is
either a valid number OR it throws an exception. The exception is a string like
"nan" which does indeed return something unusual:
>>> a = float('nan')
>>> a
Nan
With that exception, any sentinel is legitimate for such use such as a Boolean
valid = False that you reset to true if float(inp) is working.
So what if the user types in some variant of "nan" perhaps preceded or followed
by whitespace or more?
I experimented a bit:
>>> float("nan ")
nan
>>> float(" nan")
Nan
>>> float(" nAn")
nan
>>> float("nan ny")
Traceback (most recent call last):
File "", line 1, in
float("nan ny")
ValueError: could not convert string to float: 'nan ny'
It seems the algorithm strips whitespace on both sides and converts the text to
upper or lower and is happy if what is left is three characters corresponding
to n a and n.
So if you want a normal number, you have choices. One is to check the string
explicitly before trying to float it.
if text.strip().lower() == 'nan' : ...
Another is to float it unquestioned and if exception is thrown, check the
proper way if it is a NaN as in:
math.isnan(x)
As noted earlier, and avoiding another mathematical exploration not needed
further in this forum, some things in python are best not to be touched
directly but with specially created access functions that can handle them well.
-Original Message-
From: Python-list On
Behalf Of ast
Sent: Thursday, February 14, 2019 2:00 AM
To: [email protected]
Subject: Re: Why float('Nan') == float('Nan') is False
Le 13/02/2019 à 14:21, ast a écrit :
> Hello
>
> >>> float('Nan') == float('Nan')
> False
>
> Why ?
>
> Regards
>
Thank you for answers.
If you wonder how I was trapped with it, here is the failing program.
r = float('Nan')
while r==float('Nan'):
inp = input("Enter a number\n")
try:
r = float(inp)
except ValueError:
r = float('Nan')
--
https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Re: FW: Why float('Nan') == float('Nan') is False
On 2019-02-14, Avi Gross wrote:
> I experimented a bit:
>
float("nan ")
> nan
float(" nan")
> Nan
float(" nAn")
> nan
That's curious. I've never seen "Nan" before. What version of Python
are you using?
--
Grant Edwards grant.b.edwardsYow! NANCY!! Why is
at everything RED?!
gmail.com
--
https://mail.python.org/mailman/listinfo/python-list
RE: FW: Why float('Nan') == float('Nan') is False
Grant,
I can see why you may be wondering. You see the nan concept as having a
specific spelling using all lowercase and to an extent you are right.
As I pointed out. In the context of a string to convert to a float, any
upper/lower-case spelling of NaN is accepted.
But, to answer you anyway, I actually use many versions of python as I have
loaded direct versions as well as through Cygwin and the Anaconda
distribution. Should not be relevant as explained.
But when displaying a value that is of that not-quite-type, all versions
have a print definition as all lower case as in 'nan' I think.
But here is a curiosity. The numpy add-on package has a nan that is UNIQUE
so two copies are the same. Read this transcript and see if it might
sometimes even be useful while perhaps confusing the heck out of people who
assume all nans are the same, or is it all nans are different?
>>> floata = float('nan')
>>> floatb = float('nan')
>>> floata, floatb
(nan, nan)
>>> floata == floatb
False
>>> floata is floatb
False
>>> numpya = numpy.nan
>>> numpyb = numpy.nan
>>> numpya, numpyb
(nan, nan)
>>> numpya == numpyb
False
>>> numpya is numpyb
True
-Original Message-
From: Python-list On
Behalf Of Grant Edwards
Sent: Thursday, February 14, 2019 6:15 PM
To: [email protected]
Subject: Re: FW: Why float('Nan') == float('Nan') is False
On 2019-02-14, Avi Gross wrote:
> I experimented a bit:
>
float("nan ")
> nan
float(" nan")
> Nan
float(" nAn")
> nan
That's curious. I've never seen "Nan" before. What version of Python are
you using?
--
Grant Edwards grant.b.edwardsYow! NANCY!! Why is
at everything RED?!
gmail.com
--
https://mail.python.org/mailman/listinfo/python-list
--
https://mail.python.org/mailman/listinfo/python-list
Re: FW: Why float('Nan') == float('Nan') is False
On Fri, Feb 15, 2019 at 2:37 PM Avi Gross wrote:
> But here is a curiosity. The numpy add-on package has a nan that is UNIQUE
> so two copies are the same. Read this transcript and see if it might
> sometimes even be useful while perhaps confusing the heck out of people who
> assume all nans are the same, or is it all nans are different?
>
> >>> floata = float('nan')
> >>> floatb = float('nan')
> >>> floata, floatb
> (nan, nan)
> >>> floata == floatb
> False
> >>> floata is floatb
> False
>
> >>> numpya = numpy.nan
> >>> numpyb = numpy.nan
> >>> numpya, numpyb
> (nan, nan)
> >>> numpya == numpyb
> False
> >>> numpya is numpyb
> True
>
You shouldn't be testing floats for identity.
>>> x = 2.0
>>> y, z = x+x, x*x
>>> y == z
True
>>> y is z
False
If nan identity is confusing people, other float identity should be
just as confusing. Or, just don't test value types for identity unless
you're actually trying to see if they're the same object.
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
RE: FW: Why float('Nan') == float('Nan') is False
Chris,
I don't wish to continue belaboring this topic but will answer you and then
ignore anything non-essential.
You said:
> You shouldn't be testing floats for identity.
I am not suggesting anyone compare floats. I repeat that a nan is not
anything. Now as a technicality, it is considered a float by the type
command as there is no easy way to make an int that is a nan:
Here are multiple ways to make a nan:
>>> f = float("nan")
>>> type(f)
Oddly you can make a complex nan, sort of:
>>> c = complex("nan")
>>> c
(nan+0j)
>>> type(c)
>>> c+c
(nan+0j)
>>> c+5
(nan+0j)
>>> c + 5j
(nan+5j)
The above makes me suspect that the underlying implementation sees a complex
number as the combination of two floats.
Now for a deeper anomaly and please don't tell me I shouldn't do this.
There is also a math.nan that seems to behave the same as numpy.nan with a
little twist. It too is unique but not the same anyway. I mean there are two
objects out there in the python world that are implemented seemingly
independently as well as a third that may also be a fourth and fifth and ...
I will now make three kinds of nan, twice, and show how they inter-relate
today in the version of python I am using at this moment. Version 3.71
hosted by IDLE under Cygwin under the latest Windblows. I suspect my other
versions would do the same.
>>> nanfloat1 = float("nan")
>>> nanfloat2 = float("nan")
>>> nanmath1 = math.nan
>>> nanmath2 = math.nan
>>> nannumpy1 = numpy.nan
>>> nannumpy2 = numpy.nan
>>> nanfloat1 is nanfloat2
False
>>> nanmath1 is nanmath2
True
>>> nannumpy1 is nannumpy2
True
>>> nanfloat1 is nanmath1
False
>>> nanfloat1 is nannumpy1
False
>>> nanmath1 is nannumpy1
False
This seems a tad inconsistent but perhaps completely understandable. Yet all
three claim to float ...
>>> list(map(type, [ nanfloat1, nanmath1, nannumpy1 ] ))
[, , ]
Now granted comparing floats is iffy if the floats are computed and often
fails because of the internal bit representation and rounding. But asking if
a copy of a float variable to a new name points to the same internal does
work:
>>> a = 2.0
>>> b = 2.0
>>> a is b
False
>>> c = a
>>> a is c
True
What I see happening here is that math.nan is a real object of some sorts
that is instantiated by the math module at a specific location and
presumable setting anything to it just copies that, sort of.
>>> str(math.nan)
'nan'
>>> dir(math.nan)
['__abs__', '__add__', '__bool__', '__class__', '__delattr__', '__dir__',
'__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__',
'__format__', '__ge__', '__getattribute__', '__getformat__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__int__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__',
'__new__', '__pos__', '__pow__', '__radd__', '__rdivmod__', '__reduce__',
'__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__',
'__round__', '__rpow__', '__rsub__', '__rtruediv__', '__set_format__',
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__',
'__truediv__', '__trunc__', 'as_integer_ratio', 'conjugate', 'fromhex',
'hex', 'imag', 'is_integer', 'real']
>>> id(math.nan)
51774064
Oddly, every copy of it gets another address but the same other address
which hints at some indirection in the way it was set up.
>>> m = math.nan
>>> id(m)
51774064
>>> n = math.nan
>>> id(n)
51774064
>>> o = m
>>> id(o)
51774064
Now do the same for the numpy.nan implementation:
>>> str(numpy.nan)
'nan'
>>> dir(numpy.nan)
['__abs__', '__add__', '__bool__', '__class__', '__delattr__', '__dir__',
'__divmod__', '__doc__', '__eq__', '__float__', '__floordiv__',
'__format__', '__ge__', '__getattribute__', '__getformat__',
'__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__',
'__int__', '__le__', '__lt__', '__mod__', '__mul__', '__ne__', '__neg__',
'__new__', '__pos__', '__pow__', '__radd__', '__rdivmod__', '__reduce__',
'__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__',
'__round__', '__rpow__', '__rsub__', '__rtruediv__', '__set_format__',
'__setattr__', '__sizeof__', '__str__', '__sub__', '__subclasshook__',
'__truediv__', '__trunc__', 'as_integer_ratio', 'conjugate', 'fromhex',
'hex', 'imag', 'is_integer', 'real']
>>> id(numpy.nan)
57329632
This time that same address is reused:
>>> m = numpy.nan
>>> id(m)
57329632
>>> n = numpy.nan
>>> id(n)
57329632
So the numpy nan is unique. The math nan is something else but confusingly
generates a new but same copy. You may be getting the address o f a proxy
one time and the real one another.
>>> m is n
True
But
>>> m is math.nan
False
Should I give up? No, the above makes some sense as the id() function shows
there ware two addresses involved in one case and not the other.
A truly clean implementation might have one copy system-wide as happens with
None or Ellipsis (...) but it seems the development in python went in
multiple directions and is no longer joined.
A similar test (not shown) with numpy.nan s
Re: Problem : Generator
Prahallad Achar writes: > How to implement reverse generator A generator generates a sequence of values. The notion "reverse generator" suggests that you have a sequence of values and want to produce it in reverse order. This is not always possible. Consider: def natural(): i = 0 while True: yield i; i += 1 This generator generates all natural numbers (up to the memory limit). However, there is no "reverse generator" for the sequence of natural numbers. If you have a finate sequence of values, you can turn it into a list, reverse this list and iterate over it. This will give you a reverse generator for your sequence of values (though maybe an inefficient one). -- https://mail.python.org/mailman/listinfo/python-list
Re: FW: Why float('Nan') == float('Nan') is False
On Fri, Feb 15, 2019 at 4:15 PM Avi Gross wrote:
>
> > You shouldn't be testing floats for identity.
>
> I am not suggesting anyone compare floats. I repeat that a nan is not
> anything. Now as a technicality, it is considered a float by the type
> command as there is no easy way to make an int that is a nan:
You've been working with float("nan") all this time. It is, big
surprise, a float. This is not a technicality. It *is* a float.
> Now for a deeper anomaly and please don't tell me I shouldn't do this.
Actually... you're back to comparing floats by identity. So you
shouldn't do this, apart from probing the interpreter itself. This is
not anomalous, it's just the way that Python's immutable types work.
> >>> nanfloat1 = float("nan")
> >>> nanfloat2 = float("nan")
> >>> nanmath1 = math.nan
> >>> nanmath2 = math.nan
> >>> nannumpy1 = numpy.nan
> >>> nannumpy2 = numpy.nan
> >>> nanfloat1 is nanfloat2
> False
> >>> nanmath1 is nanmath2
> True
> >>> nannumpy1 is nannumpy2
> True
> >>> nanfloat1 is nanmath1
> False
> >>> nanfloat1 is nannumpy1
> False
> >>> nanmath1 is nannumpy1
> False
>
> This seems a tad inconsistent but perhaps completely understandable. Yet all
> three claim to float ...
>
> >>> list(map(type, [ nanfloat1, nanmath1, nannumpy1 ] ))
> [, , ]
Well... yes. Every one of them IS a float, and every one of them DOES
carry the value of "nan". and they're not identical. So? You can do
the same with other values of floats, too.
> Now granted comparing floats is iffy if the floats are computed and often
> fails because of the internal bit representation and rounding. But asking if
> a copy of a float variable to a new name points to the same internal does
> work:
Nope, this is nothing to do with rounding.
> What I see happening here is that math.nan is a real object of some sorts
> that is instantiated by the math module at a specific location and
> presumable setting anything to it just copies that, sort of.
Okay, I think I see the problem here. You're expecting Python objects
to have locations (they don't, but they have identities) and to be
copied (they aren't, they're referenced), and you're expecting nan to
not be a value (it is). Python's object model demands that math.nan be
a real object. Otherwise you wouldn't be able to do anything at all
with it.
> >>> id(math.nan)
> 51774064
>
> Oddly, every copy of it gets another address but the same other address
> which hints at some indirection in the way it was set up.
>
> >>> m = math.nan
> >>> id(m)
> 51774064
> >>> n = math.nan
> >>> id(n)
> 51774064
> >>> o = m
> >>> id(o)
> 51774064
That isn't an address, it's an integer representing the object's
identity. And you could do this with literally ANY Python object. That
is the entire definition of assignment in Python. When you assign an
expression to a name, and then look up the object via that name, you
get... that object. That is how most modern high level languages work.
> Now do the same for the numpy.nan implementation:
> This time that same address is reused:
>
> >>> m = numpy.nan
> >>> id(m)
> 57329632
> >>> n = numpy.nan
> >>> id(n)
> 57329632
>
> So the numpy nan is unique. The math nan is something else but confusingly
> generates a new but same copy. You may be getting the address o f a proxy
> one time and the real one another.
No, numpy.nan and math.nan behave exactly the same way. They are
distinct objects in the versions of Python and numpy that you're
using, although other versions would be legitimately able to reuse the
same object if they chose. Everything you do with assignment is going
to behave the same way. No matter what name you assign that object to,
it's the same object, and has the same ID.
> >>> m is n
> True
>
> But
>
> >>> m is math.nan
> False
>
> Should I give up? No, the above makes some sense as the id() function shows
> there ware two addresses involved in one case and not the other.
Not addresses, identities, and yes, there are two distinct objects here.
> A truly clean implementation might have one copy system-wide as happens with
> None or Ellipsis (...) but it seems the development in python went in
> multiple directions and is no longer joined.
True, and an equally clean implementation could guarantee that two
equal strings are stored at the same place in memory. Some languages
guarantee this. Others don't. Python doesn't make this guarantee, and
CPython doesn't behave that way, but it'd be perfectly valid for a
Python implementation to do exactly this. There isn't much benefit in
mandating this for floats, though; they don't take up much space.
> A similar test (not shown) with numpy.nan shows the m and n above are each
> other as well as what they copied because they share an ID.
>
> The solution is to NOT look at nan except using the appropriate functions.
>
> >>> [ (math.isnan(nothing), numpy.isnan(nothing))
> for nothing in [ float("nan"), math.nan, numpy.nan ] ]
>
> [(True, True), (True, True), (True, True)]
Well... yes. Tha
Re: Convert a list with wrong encoding to utf8
[email protected] wrote: I just tried: names = tuple( [s.encode('latin1').decode('utf8') for s in names] ) but i get UnicodeEncodeError('latin-1', 'Άκης Τσιάμης', 0, 4, 'ordinal not in range(256)') This suggests that the string you're getting from the database *has* already been correctly decoded, and there is no need to go through the latin1 re-coding step. What do you get if you do print(names) immediately *before* trying to re-code them? What *may* be happening is that most of your data is stored in the database encoded as utf-8, but some of it is actually using a different encoding, and you're getting confused by the resulting inconsistencies. I suggest you look carefully at *all* the names in the list, straight after getting them from the database. If some of them look okay and some of them look like mojibake, then you have bad data in the database in the form of inconsistent encodings. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Problem : Generator
How about this List1=[ 1,2,3,4] Rever_gen = ( x*x for x in list1, reversed = True) Rever_gen gets generator object and iterating it now gets reverse order.. Am I correct here? Suggest me On Fri, 15 Feb 2019, 12:33 dieter Prahallad Achar writes: > > How to implement reverse generator > > A generator generates a sequence of values. > > The notion "reverse generator" suggests that you have a sequence > of values and want to produce it in reverse order. > This is not always possible. > > Consider: > def natural(): > i = 0 > while True: yield i; i += 1 > > This generator generates all natural numbers (up to the memory limit). > However, there is no "reverse generator" for the sequence of natural > numbers. > > If you have a finate sequence of values, you can turn it into > a list, reverse this list and iterate over it. This will give > you a reverse generator for your sequence of values (though maybe > an inefficient one). > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
