On Fri, Apr 24, 2015 at 01:04:57PM +0200, Laura Creighton wrote:
> In a message of Fri, 24 Apr 2015 12:46:20 +1000, "Steven D'Aprano" writes:
> >The Japanese, Chinese and Korean
> >governments, as well as linguists, are all in agreement that despite a
> >few minor differences, the three languages
>
>
> I wouldn't use utf-8-sig for output, however, as it puts the BOM in the
> file for others to trip over.
>
> --
> DaveA
Yeah, I found that out when I altered the aliases.py dictionary and added
'ubom' : 'utf_8_sig' as an item. Encoding didn't work out so good, but
decoding was fine ;')
_
On Fri, Apr 24, 2015 at 04:34:19PM -0700, Jim Mooney wrote:
> I was looking things up and although there are aliases for utf_8 (utf8 and
> utf-8) I see no aliases for utf_8_sig, so I'm surprised the utf-8-sig I
> tried using, worked at all. Actually, I was trying to find the file where
> the alias
On 04/24/2015 07:34 PM, Jim Mooney wrote:
Apparently so. It looks like utf_8-sig just ignores the sig if it is
present, and uses UTF-8 whether the signature is present or not.
That surprises me.
--
Steve
I was looking things up and although there are aliases for utf_8 (utf8 and
>
> Apparently so. It looks like utf_8-sig just ignores the sig if it is
> present, and uses UTF-8 whether the signature is present or not.
>
> That surprises me.
>
> --
> Steve
>
>
I was looking things up and although there are aliases for utf_8 (utf8 and
utf-8) I see no aliases for
In a message of Fri, 24 Apr 2015 12:46:20 +1000, "Steven D'Aprano" writes:
>The Japanese, Chinese and Korean
>governments, as well as linguists, are all in agreement that despite a
>few minor differences, the three languages share a common character set.
I don't think that is quite the way to sa
On 24/04/15 09:54, Alan Gauld wrote:
numbers or other symbols so there were two sets of meanings to
each pattern and a shift pattern to switch between them (which is
why we have SHIFT keys on modern keyboards).
Sorry, I'm conflating two sets of issues here.
The SHIFT key pre-dated teleprinters
The quoting seems to be all mangled here, so please excuse me if I
misattribute quotes to the wrong person:
On Thu, Apr 23, 2015 at 04:15:39PM -0700, Jim Mooney wrote:
> So is there any way to sniff the encoding, including the BOM (which appears
> to be used or not used randomly for utf-8), so y
On 24/04/15 03:46, Steven D'Aprano wrote:
Early text encodings all worked in a single byte
which is limited to 256 patterns.
Oh it's much more complicated than that!
Note I said *in* a single byte, ie they were all 8 bits or less.
*seven bits*, not even a full byte. It was seven bits so th
So is there any way to sniff the encoding, including the BOM (which appears
to be used or not used randomly for utf-8), so you can then use the proper
encoding, or do you wander in the wilderness?
Pretty much guesswork.
>
Alan Gauld
--
This all sounds suspiciously like the old browser wars I suf
On Fri, Apr 24, 2015 at 12:33:57AM +0100, Alan Gauld wrote:
> On 24/04/15 00:15, Jim Mooney wrote:
> >Pretty much guesswork.
> >Alan Gauld
> >--
> >This all sounds suspiciously like the old browser wars
>
> Its more about history. Early text encodings all worked in a single byte
> which is
> lim
On Thu, Apr 23, 2015 at 05:40:34PM -0400, Dave Angel wrote:
> On 04/23/2015 05:08 PM, Mark Lawrence wrote:
> >
> >Slight aside, why a BOM, all I ever think of is Inspector Clouseau? :)
> >
>
> As I recall, it stands for "Byte Order Mark". Applicable only to
> multi-byte storage formats (eg. UTF
On Thu, Apr 23, 2015 at 10:08:05PM +0100, Mark Lawrence wrote:
> Slight aside, why a BOM, all I ever think of is Inspector Clouseau? :)
:-)
I'm not sure if you mean that as an serious question or not.
BOM stands for Byte Order Mark, and it if needed for UTF-16 and UTF-32
encodings because the
On Wed, Apr 22, 2015 at 10:18:31PM -0700, Jim Mooney wrote:
> My result:
>
> Ï»¿First NameLast Name # odd characters on header line
Any time you see "odd characters" in text like that, you should
immediately think "encoding problem".
These odd characters are normally called m
On 24/04/15 00:15, Jim Mooney wrote:
Pretty much guesswork.
Alan Gauld
--
This all sounds suspiciously like the old browser wars
Its more about history. Early text encodings all worked in a single byte
which is
limited to 256 patterns. That's simply not enough to cover all the
alphabets
aroun
On 04/23/2015 05:08 PM, Mark Lawrence wrote:
Slight aside, why a BOM, all I ever think of is Inspector Clouseau? :)
As I recall, it stands for "Byte Order Mark". Applicable only to
multi-byte storage formats (eg. UTF-16), it lets the reader decide
which of the formats were used.
For exa
On 04/23/2015 02:14 PM, Jim Mooney wrote:
By relying on the default when you read it, you're making an unspoken
assumption about the encoding of the file.
--
DaveA
So is there any way to sniff the encoding, including the BOM (which appears
to be used or not used randomly for utf-8), so you c
On 23/04/15 19:14, Jim Mooney wrote:
By relying on the default when you read it, you're making an unspoken
assumption about the encoding of the file.
So is there any way to sniff the encoding, including the BOM (which appears
to be used or not used randomly for utf-8), so you can then use th
On 23/04/2015 19:14, Jim Mooney wrote:
By relying on the default when you read it, you're making an unspoken
assumption about the encoding of the file.
--
DaveA
So is there any way to sniff the encoding, including the BOM (which appears
to be used or not used randomly for utf-8), so you can
>
> By relying on the default when you read it, you're making an unspoken
> assumption about the encoding of the file.
>
> --
> DaveA
So is there any way to sniff the encoding, including the BOM (which appears
to be used or not used randomly for utf-8), so you can then use the proper
encoding, or
On 04/23/2015 06:37 AM, Jim Mooney wrote:
..
Ï»¿
is the UTF-8 BOM (byte order mark) interpreted as Latin 1.
If the input is UTF-8 you can get rid of the BOM with
with open("data.txt", encoding="utf-8-sig") as csvfile:
Peter Otten
I caught the bad arithmetic on name length, but where is t
Jim Mooney wrote:
> ..
>
>> Ï»¿
>>
>> is the UTF-8 BOM (byte order mark) interpreted as Latin 1.
>>
>> If the input is UTF-8 you can get rid of the BOM with
>>
>> with open("data.txt", encoding="utf-8-sig") as csvfile:
>>
>
> Peter Otten
>
> I caught the bad arithmetic on name length, but where
..
> Ï»¿
>
> is the UTF-8 BOM (byte order mark) interpreted as Latin 1.
>
> If the input is UTF-8 you can get rid of the BOM with
>
> with open("data.txt", encoding="utf-8-sig") as csvfile:
>
Peter Otten
I caught the bad arithmetic on name length, but where is the byte order
mark coming from? My
Jim Mooney wrote:
> I'm trying the csv module. It all went well until I tried shortening a
> long first name I put in just to exercise things. It didn't shorten.
> Original file lines:
> Stewartrewqrhjeiwqhreqwhreowpqhrueqwphruepqhruepqwhruepwhqupr|Dorsey|
nec.malesu...@quisqueporttitoreros.co
I'm trying the csv module. It all went well until I tried shortening a long
first name I put in just to exercise things. It didn't shorten. And I also
got weird first characters on the header line. What went wrong?
import csv
allcsv = []
with open('data.txt') as csvfile:
readCSV = csv.reader(c
25 matches
Mail list logo