[Tutor] Reading CSV files in Pandas

2013-10-19 Thread Manish Tripathi
I am trying to import a csv file in Pandas but it throws an error. The
format of the data when opened in notepad++ is as follows with first row
being column names:

"End Customer Organization ID,End Customer Organization Name,End
Customer Top Parent Organization ID,End Customer Top Parent
Organization Name,Reseller Top Parent ID,Reseller Top Parent
Name,Business,Rev Sum Division,Rev Sum Category,Product
Family,Version,Pricing Level,Summary Pricing Level,Detail Pricing
Level,MS Sales Amount,MS Sales Licenses,Fiscal Year,Sales
Date""11027676,Baroda Western Uttar Pradesh Gramin
Bankgfhgfnjgfnmjmhgmghmghmghmnghnmghnmhgnmghnghngh,4078446,Bank Of
Barodadfhhgfjyjtkyukujkyujkuhykluiluilui;iooi';po'fserwefvegwegf,1809012,""Hcl
Infosystems Ltd - Partnerdghftrutyhb
frhywer5y5tyu6ui7iukluyj,lgjmfgnhfrgweffw"",Server &
CALsdgrgrfgtrhytrnhjdgthjtyjkukmhjmghmbhmgfngdfbndfhtgh,SQL Server &
CALdfhtrhtrgbhrghrye5y45y45yu56juhydsgfaefwe,SQL
CALdhdfthtrutrjurhjethfdehrerfgwerweqeadfawrqwerwegtrhyjuytjhyj,SQL
CALdtrye45y3t434tjkabcjkasdhfhasdjkcbaksmjcbfuigkjasbcjkasbkdfhiwh,2005,Openfkvgjesropiguwe90fujklascnioawfy98eyfuiasdbcvjkxsbhg,Open
Lklbjdfoigueroigbjvwioergyuiowerhgosdhvgfoisdhyguiserhguisrh,""Open
Stddfm,vdnoghioerivnsdflierohgushdfovhsiodghuiohdbvgsjdhgouiwerho"",125.85,1,FY07,12/28/2006""12835756,Uttam
Strips Pvt Ltd,12835756,Uttam Strips Pvt Ltd,12565538,Redington C/O
Fortis Financial Services Ltd,MBS,Dynamics ERP,Dynamics NAV,Dynamics
NAV Business Essentials,Non-specific,Other,MBS SA,MBS New Customer
Enhanc. Def,0,0,FY09,9/15/2008""12233135,Bhagwan Singh
Tondon,12233135,Bhagwan Singh Tondon,2652941,H B S Systems Pvt
Ltd,Server & CAL,SQL Server & CAL,SQL CAL,SQL
CAL,Non-specific,Open,Open L&SA,Deferred Open L&SA -
New,0,0,FY09,9/15/2008""11602305,Maya Academy Of Advanced
Cinematics,9750934,Maya Entertainment Ltd,336146,Embee Software Pvt
Ltd,Server & CAL,Windows Server & CAL,Windows Server HPC,Windows
Compute Cluster Server,Non-specific,Open,Open V/MYO - Rec,OLV Perpet
L&SA Recur-Def,0,0,FY09,9/25/2008""13336009,Remiel Softech Solution
Pvt Ltd,13336009,Remiel Softech Solution Pvt Ltd,13335482,Redington
C/O Remiel Softech Solutions Pvt Ltd,MBS,Dynamics ERP,Dynamics
NAV,Dynamics NAV Business Essentials,Non-specific,Other,MBS SA,MBS New
Customer Enhanc. Def,0,0,FY09,12/23/2008""7872800,Science Application
International Corporation,2839760,GOVERNMENT OF
KARNATAKA,10237455,Cubic Computing P.L,Server & CAL,SQL Server &
CAL,SQL Server Standard,SQL Server Standard
Edition,Non-specific,Open,Open SA/UA,Deferred Open SA -
Renewal,0,0,FY09,1/15/2009""13096361,Pratham Software Pvt
Ltd,13096361,Pratham Software Pvt Ltd,10133086,Krap
Computer,Information Worker,Office,Office Standard / Basic,Office
Standard,2007,Open,Open L,Open
Std,7132.44,28,FY09,9/24/2008""12192276,Texmo Precision
Castings,12192276,Texmo Precision Castings,4059430,Quadra Systems. -
Partner,Server & CAL,Windows Server & CAL,Windows Standard
Server,Windows Server Standard,Non-specific,Open,Open L&SA,Deferred
Open L&SA - New,0,0,FY09,11/15/2008"

*Kindly note that the same file when double clicked in the csv format opens
in excel with comma separated values BUT with NO quotation marks in each
line as shown in notepad++.*

I have used encoding as UTF-8 which gives the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0x91 in position
13: invalid start byte

Then used encoding='cp1252' first and then tried with latin1.

df=pd.read_csv(filename,encoding='cp1252')
or

df=pd.read_csv(filename,encoding='latin1')

With both the encodings it didn't give any error and the data got imported
but as one single column and not as different columns.

Does it have to do with the "" marks present before each line in the data?
I had a similar csv file with comma separated values, but that didn't have
double quotation marks in each line and that got imported correctly both
with cp1252 and latin1. But not for UTF-8 even though the file was saved in
utf8 format in notepad++. But in this case utf8 doesnt work as usual and
other two import it as single column.

Please advise.

Thanks
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-20 Thread Manish Tripathi
Thanks Mark. I have already asked this question on StackOverflow but to no
avail. So thought of asking here.


On Sun, Oct 20, 2013 at 5:47 AM, Mark Lawrence wrote:

> On 19/10/2013 15:29, Manish Tripathi wrote:
>
> You are far more likely to get a response to the identical question that
> you've already asked on stackoverflow than you are here.
>
>
> --
> Roses are red,
> Violets are blue,
> Most poems rhyme,
> But this one doesn't.
>
> Mark Lawrence
>
> __**_
> Tutor maillist  -  Tutor@python.org
> To unsubscribe or change subscription options:
> https://mail.python.org/**mailman/listinfo/tutor<https://mail.python.org/mailman/listinfo/tutor>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor


Re: [Tutor] Reading CSV files in Pandas

2013-10-21 Thread Manish Tripathi
It's pipeline data so must have been generated through Siebel and sent as
excel csv.


On Mon, Oct 21, 2013 at 11:32 PM, Danny Yoo  wrote:

> >
> > * Where is this data coming from?
> > * Who or what is generating this file?
>
>
> Just to be more specific about this: I have a very strong suspicion that
> whatever is generating the input that you're trying to read is doing
> something ad-hoc with regards to CSV file format.  Knowing what generated
> the file, whether it be Excel, or some custom script, is very helpful in
> diagnosing where the problem's originating from.
>
>
> Your suspicion about the quotes around entire rows:
>
> > Does it have to do with the "" marks present before each line in the
> data?
>
> sounds reasonable.  I expect quotes around individual fields, but not
> around entire rows.  Such a feature sounds anomalous because it doesn't fit
> the description of known CSV formats:
>
> http://en.wikipedia.org/wiki/Comma-separated_values
>
>
___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor