Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Steven D'Aprano
On Fri, Apr 24, 2015 at 12:33:57AM +0100, Alan Gauld wrote: > On 24/04/15 00:15, Jim Mooney wrote: > >Pretty much guesswork. > >Alan Gauld > >-- > >This all sounds suspiciously like the old browser wars > > Its more about history. Early text encodings all worked in a single byte > which is > lim

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Steven D'Aprano
On Thu, Apr 23, 2015 at 05:40:34PM -0400, Dave Angel wrote: > On 04/23/2015 05:08 PM, Mark Lawrence wrote: > > > >Slight aside, why a BOM, all I ever think of is Inspector Clouseau? :) > > > > As I recall, it stands for "Byte Order Mark". Applicable only to > multi-byte storage formats (eg. UTF

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Steven D'Aprano
On Thu, Apr 23, 2015 at 10:08:05PM +0100, Mark Lawrence wrote: > Slight aside, why a BOM, all I ever think of is Inspector Clouseau? :) :-) I'm not sure if you mean that as an serious question or not. BOM stands for Byte Order Mark, and it if needed for UTF-16 and UTF-32 encodings because the

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Steven D'Aprano
On Wed, Apr 22, 2015 at 10:18:31PM -0700, Jim Mooney wrote: > My result: > > Ï»¿First NameLast Name # odd characters on header line Any time you see "odd characters" in text like that, you should immediately think "encoding problem". These odd characters are normally called m

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Alan Gauld
On 24/04/15 00:15, Jim Mooney wrote: Pretty much guesswork. Alan Gauld -- This all sounds suspiciously like the old browser wars Its more about history. Early text encodings all worked in a single byte which is limited to 256 patterns. That's simply not enough to cover all the alphabets aroun

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Dave Angel
On 04/23/2015 05:08 PM, Mark Lawrence wrote: Slight aside, why a BOM, all I ever think of is Inspector Clouseau? :) As I recall, it stands for "Byte Order Mark". Applicable only to multi-byte storage formats (eg. UTF-16), it lets the reader decide which of the formats were used. For exa

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Dave Angel
On 04/23/2015 02:14 PM, Jim Mooney wrote: By relying on the default when you read it, you're making an unspoken assumption about the encoding of the file. -- DaveA So is there any way to sniff the encoding, including the BOM (which appears to be used or not used randomly for utf-8), so you c

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Alan Gauld
On 23/04/15 19:14, Jim Mooney wrote: By relying on the default when you read it, you're making an unspoken assumption about the encoding of the file. So is there any way to sniff the encoding, including the BOM (which appears to be used or not used randomly for utf-8), so you can then use th

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Mark Lawrence
On 23/04/2015 19:14, Jim Mooney wrote: By relying on the default when you read it, you're making an unspoken assumption about the encoding of the file. -- DaveA So is there any way to sniff the encoding, including the BOM (which appears to be used or not used randomly for utf-8), so you can

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Jim Mooney
> > By relying on the default when you read it, you're making an unspoken > assumption about the encoding of the file. > > -- > DaveA So is there any way to sniff the encoding, including the BOM (which appears to be used or not used randomly for utf-8), so you can then use the proper encoding, or

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Dave Angel
On 04/23/2015 06:37 AM, Jim Mooney wrote: .. Ï»¿ is the UTF-8 BOM (byte order mark) interpreted as Latin 1. If the input is UTF-8 you can get rid of the BOM with with open("data.txt", encoding="utf-8-sig") as csvfile: Peter Otten I caught the bad arithmetic on name length, but where is t

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Peter Otten
Jim Mooney wrote: > .. > >> Ï»¿ >> >> is the UTF-8 BOM (byte order mark) interpreted as Latin 1. >> >> If the input is UTF-8 you can get rid of the BOM with >> >> with open("data.txt", encoding="utf-8-sig") as csvfile: >> > > Peter Otten > > I caught the bad arithmetic on name length, but where

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Jim Mooney
.. > Ï»¿ > > is the UTF-8 BOM (byte order mark) interpreted as Latin 1. > > If the input is UTF-8 you can get rid of the BOM with > > with open("data.txt", encoding="utf-8-sig") as csvfile: > Peter Otten I caught the bad arithmetic on name length, but where is the byte order mark coming from? My

Re: [Tutor] name shortening in a csv module output

2015-04-23 Thread Peter Otten
Jim Mooney wrote: > I'm trying the csv module. It all went well until I tried shortening a > long first name I put in just to exercise things. It didn't shorten. > Original file lines: > Stewartrewqrhjeiwqhreqwhreowpqhrueqwphruepqhruepqwhruepwhqupr|Dorsey| nec.malesu...@quisqueporttitoreros.co

[Tutor] name shortening in a csv module output

2015-04-23 Thread Jim Mooney
I'm trying the csv module. It all went well until I tried shortening a long first name I put in just to exercise things. It didn't shorten. And I also got weird first characters on the header line. What went wrong? import csv allcsv = [] with open('data.txt') as csvfile: readCSV = csv.reader(c