Gabor: I had success with the following. 1. I created a csv file with a perl script called "out.txt". Then ran the following successfully library("sqldf") test_df <- read.csv.sql(file="out.txt", sql = "select * from file", header = TRUE, sep = ",", dbname = tempfile())
2. I did not have success with the following. Could you tell me what I may be doing wrong? I could paste the perl script if necessary. From the perl script, I am reading the file, creating the csv record and printing each record one by one and then exiting. Thanks. Not had success with below.. #test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = tempfile()) test_df Error message below: test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = tempfile()) Error in readRegistry(key, maxdepth = 3) : Registry key 'SOFTWARE\R-core' not found In addition: Warning messages: 1: closing unused connection 14 (3wkoutstatfcst_small.dat) 2: closing unused connection 13 (3wkoutstatfcst_small.dat) 3: closing unused connection 11 (3wkoutstatfcst_small.dat) 4: closing unused connection 9 (3wkoutstatfcst_small.dat) 5: closing unused connection 3 (3wkoutstatfcst_small.dat) > test_df <- read.csv2.sql(file="3wkoutstatfcst_small.dat", sql = "select * > from file", header = TRUE, sep = ",", filter="perl parse_3wkout.pl", dbname = > tempfile()) Error in readRegistry(key, maxdepth = 3) : Registry key 'SOFTWARE\R-core' not found -----Original Message----- From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] Sent: Saturday, February 06, 2010 12:14 PM To: Vadlamani, Satish {FLNA} Cc: r-help@r-project.org Subject: Re: [R] Reading large files No. On Sat, Feb 6, 2010 at 1:01 PM, Vadlamani, Satish {FLNA} <satish.vadlam...@fritolay.com> wrote: > Gabor: > Can I pass colClasses as a vector to read.csv.sql? Thanks. > Satish > > > -----Original Message----- > From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] > Sent: Saturday, February 06, 2010 9:41 AM > To: Vadlamani, Satish {FLNA} > Cc: r-help@r-project.org > Subject: Re: [R] Reading large files > > Its just any Windows batch command string that filters stdin to > stdout. What the command consists of should not be important. An > invocation of perl that runs a perl script that filters stdin to > stdout might look like this: > read.csv.sql("myfile.dat", filter = "perl myprog.pl") > > For an actual example see the source of read.csv2.sql which defaults > to using a Windows vbscript program as a filter. > > On Sat, Feb 6, 2010 at 10:16 AM, Vadlamani, Satish {FLNA} > <satish.vadlam...@fritolay.com> wrote: >> Jim, Gabor: >> Thanks so much for the suggestions where I can use read.csv.sql and embed >> Perl (or gawk). I just want to mention that I am running on Windows. I am >> going to read the documentation the filter argument and see if it can take a >> decent sized Perl script and then use its output as input. >> >> Suppose that I write a Perl script that parses this fwf file and creates a >> CSV file. Can I embed this within the read.csv.sql call? Or, can it only be >> a statement or something? If you know the answer, please let me know. >> Otherwise, I will try a few things and report back the results. >> >> Thanks again. >> Saitsh >> >> >> -----Original Message----- >> From: jim holtman [mailto:jholt...@gmail.com] >> Sent: Saturday, February 06, 2010 6:16 AM >> To: Gabor Grothendieck >> Cc: Vadlamani, Satish {FLNA}; r-help@r-project.org >> Subject: Re: [R] Reading large files >> >> In perl the 'unpack' command makes it very easy to parse fixed fielded data. >> >> On Fri, Feb 5, 2010 at 9:09 PM, Gabor Grothendieck >> <ggrothendi...@gmail.com> wrote: >>> Note that the filter= argument on read.csv.sql can be used to pass the >>> input through a filter written in perl, [g]awk or other language. >>> For example: read.csv.sql(..., filter = "gawk -f myfilter.awk") >>> >>> gawk has the FIELDWIDTHS variable for automatically parsing fixed >>> width fields, e.g. >>> http://www.delorie.com/gnu/docs/gawk/gawk_44.html >>> making this very easy but perl or whatever you are most used to would >>> be fine too. >>> >>> On Fri, Feb 5, 2010 at 8:50 PM, Vadlamani, Satish {FLNA} >>> <satish.vadlam...@fritolay.com> wrote: >>>> Hi Gabor: >>>> Thanks. My files are all in fixed width format. They are a lot of them. It >>>> would take me some effort to convert them to CSV. I guess this cannot be >>>> avoided? I can write some Perl scripts to convert fixed width format to >>>> CSV format and then start with your suggestion. Could you let me know your >>>> thoughts on the approach? >>>> Satish >>>> >>>> >>>> -----Original Message----- >>>> From: Gabor Grothendieck [mailto:ggrothendi...@gmail.com] >>>> Sent: Friday, February 05, 2010 5:16 PM >>>> To: Vadlamani, Satish {FLNA} >>>> Cc: r-help@r-project.org >>>> Subject: Re: [R] Reading large files >>>> >>>> If your problem is just how long it takes to load the file into R try >>>> read.csv.sql in the sqldf package. A single read.csv.sql call can >>>> create an SQLite database and table layout for you, read the file into >>>> the database (without going through R so R can't slow this down), >>>> extract all or a portion into R based on the sql argument you give it >>>> and then remove the database. See the examples on the home page: >>>> http://code.google.com/p/sqldf/#Example_13._read.csv.sql_and_read.csv2.sql >>>> >>>> On Fri, Feb 5, 2010 at 2:11 PM, Satish Vadlamani >>>> <satish.vadlam...@fritolay.com> wrote: >>>>> >>>>> Matthew: >>>>> If it is going to help, here is the explanation. I have an end state in >>>>> mind. It is given below under "End State" header. In order to get there, I >>>>> need to start somewhere right? I started with a 850 MB file and could not >>>>> load in what I think is reasonable time (I waited for an hour). >>>>> >>>>> There are references to 64 bit. How will that help? It is a 4GB RAM >>>>> machine >>>>> and there is no paging activity when loading the 850 MB file. >>>>> >>>>> I have seen other threads on the same types of questions. I did not see >>>>> any >>>>> clear cut answers or errors that I could have been making in the process. >>>>> If >>>>> I am missing something, please let me know. Thanks. >>>>> Satish >>>>> >>>>> >>>>> End State >>>>>> Satish wrote: "at one time I will need to load say 15GB into R" >>>>> >>>>> >>>>> ----- >>>>> Satish Vadlamani >>>>> -- >>>>> View this message in context: >>>>> http://n4.nabble.com/Reading-large-files-tp1469691p1470667.html >>>>> Sent from the R help mailing list archive at Nabble.com. >>>>> >>>>> ______________________________________________ >>>>> R-help@r-project.org mailing list >>>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>>> PLEASE do read the posting guide >>>>> http://www.R-project.org/posting-guide.html >>>>> and provide commented, minimal, self-contained, reproducible code. >>>>> >>>> >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.