Re: [R] strange behavior when reading csv - line wraps

Martin Tomko Sun, 31 May 2009 03:13:49 -0700

Dear Jim,

with the help of Ted, we diagnosed that the cause is in the extremevariability in line length during reading in. As the table column numberis apparently determined fro mthe first five lines, what exceeds thislength gets automatically on the next line.I am now trying to find a way to read in the data despite this. I haveno control over the table extent, the only thing that would make senseaccording to my data would be to read in a fixed number of columns andmerge all remaining columns as a long string in the last one. No ideahow to do this, though.


Thanks
Martin


jim holtman wrote:

It is still not clear to me exactly how you want to read the linesin. If the lines have a variable number of fields, and some of thelines might be wrapped, is there some way to determine where the startof each line is.If you are reading them in with read.csv, then the system is assumingthat each line starts a new row. If this is not the case, then youwill have to state the rules that determine where the lines start.You can always read the data in with 'scan' to separate each line andthen do whatever processing is required to put together the rows in adata frame that you want.In one of your examples, you indicated that the line was splitstarting at the word "kempten"; if this is in the middle of the line,then you would have to create the break after reading the line in with'scan' and then creating the rows in the dataframe. All of this canbe done in R if you can state what the criteria is.On Sat, May 30, 2009 at 4:32 AM, Martin Tomko <martin.to...@geo.uzh.ch<mailto:martin.to...@geo.uzh.ch>> wrote:
    Jim,
    the two lines I put in are the actual problematic input lines.
    In these examples, there are no quotes nor # signs, although I
    have no means to make sure they do not occur in the inputs (any
    hints how I could deal with that?).
    I am trying to avoid as much pre-processing outside R as possible,
    and I have to process about 500 files with up to 3000 records
    each, so I need a more or less automated/batch solution. - so any
    string substitution will have to occur in R. But for the moment, I
    do not see a reaason for substitution, and the wrapping still occurs.

    Cheers
    Martin



    jim holtman wrote:

        You need to supply the actual input line so we can see what is
        happening.  Are you sure you do not have unbalanced quotes in
        your input (try quote='') or do you have comment characters
        ("#") in your input?

        On Fri, May 29, 2009 at 3:15 PM, Martin Tomko
        <martin.to...@geo.uzh.ch <mailto:martin.to...@geo.uzh.ch>
        <mailto:martin.to...@geo.uzh.ch
        <mailto:martin.to...@geo.uzh.ch>>> wrote:

           Dear All,
           I am observing a strange behavior and searching the
        archives and
           help pages didn't help much.
           I have a csv with a variable number of fields in each line.

           I use
           dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill
        =TRUE);

           to read it in, and it works. But - some lines are long and
        'wrap',
           or split and continue on the next line. So when I check the
        dim of
           the frame, they are not correct and I can see when I do a
        printout
           that the lines is split into two in the frame. I checked
        the input
           file and all is good.

           an example of the input is:
37;2175168475;13;8.522729;47.19537;16366...@n00;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet;
           where the last values occurs on the next line in the data
        frame.

           It does not have to be the last value, as in the follwong
        example,
           the word "kempten" starts the next line:
39;167757703;12;10.309295;47.724545;21903...@n00;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio;
           What could be the reason?

           I ws thinking about solving the issue by using a different
           separator, that I would use for the first 7 fields and
           concatenating all of the remaining values into a single stirng
           value, but could not figure out how to do such a
        substitution in
           R. Unfortunately, on my system I cannot specify a range for
        sed...

           Thanks for any help/pointers
           Martin

           ______________________________________________
           R-help@r-project.org <mailto:R-help@r-project.org>
        <mailto:R-help@r-project.org <mailto:R-help@r-project.org>>
        mailing list

           https://stat.ethz.ch/mailman/listinfo/r-help
           PLEASE do read the posting guide
           http://www.R-project.org/posting-guide.html
        <http://www.r-project.org/posting-guide.html>
           <http://www.r-project.org/posting-guide.html>

           and provide commented, minimal, self-contained,
        reproducible code.
--Jim Holtman
        Cincinnati, OH
        +1 513 646 9390

        What is the problem that you are trying to solve?





--
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?



--
Martin Tomko

Postdoctoral Research AssistantGeographic Information Systems Division

Department of Geography
University of Zurich - Irchel
Winterthurerstr. 190
CH-8057 Zurich, Switzerland

email:  martin.to...@geo.uzh.ch
site:   http://www.geo.uzh.ch/~mtomko
mob:    +41-788 629 558
tel:    +41-44-6355256
fax:    +41-44-6356848

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] strange behavior when reading csv - line wraps

Reply via email to