Hi John and Jeff, Thanks a lot for your help. I agree that row numbers are not a standard feature in SQL. What I am looking for is some kind of a hack. After all, the sqlFetch command is able to return a specific number of rows. And the sqlFetchMore command is able to take up the baton from that row onwards to futher return rows corresponding to the max parameter.
I wonder if it is possible to straight away hop to a certain row number (without going through sqlfetch and sqlFetchMore and without loading any data into memory) and then return the contents corresponding to a certain number of rows. The question is : is there a "catch" for accessing a row location, and what could be the "hook" for that? I am interested in the the recent updated rows to a table after a certain date. Is it possible to identify them in a quick way? Running sql queries on such large tables appears to take too long a time. I understand that there is no provision to do this by available methods. But, is it possible to get under the hood and find some hack? Jeff, I will take your suggestion and try my luck at the R-sig-db mailing list. Thanks, Vivek 2015-11-20 20:25 GMT+01:00 Jeff Newmiller <jdnew...@dcn.davis.ca.us>: > Row numbers are not a standard feature in SQL, and as far as I know the > Access Jet engine does not support them. You are supposed to use the key > columns to partition your data, but that may require knowing how many > records fall within convenient bin sizes if the data are not uniformly > distributed. You can find that out using SQL group by queries. > > Note that you the resource you appear to be limited by is the database > engine. Parallel processing (more CPUs) is unlikely to yield any > improvement, and is in fact likely to slow you down. > > This looks like a good topic for the R-sig-db mailing list if you have > further questions about R and databases, or find a SQL support forum if you > need to learn more about using SQL in general. > > On November 20, 2015 10:32:31 AM PST, Vivek Sutradhara < > viveksu...@gmail.com> wrote: > >> Hi John, >> Thanks a lot for your quick reply. And thanks for drawing my attention to >> the openslsx package. I will certainly look into it when I work with Excel. >> But right now, my problems are with Microsoft Access. >> >> There are huge tables there which I am not able to export to excel, csv or >> text files with native access methods. The only solution that has worked so >> far is to incrementally extract data with the the help of RODBC. This was a >> huge leap in my attempts to export the tables. Once I have the data in form >> of rds files (which are compressed as well), I have found that it is much >> easier to work with them. >> >> But my wishes have suddenly expanded and I want to find out if it is >> possible to go beyond the normal capabilities of RODBC (the sqlFetch >> command does not have a provision for specifying the row number range). I >> am a newbie with parallel methods (using the 4 cores on my pc) but I >> am >> hoping to progress with that for processing the data from the multiple >> chunks of data (the first step will be just to filter and gather the data >> of relevance). >> >> I hope that I have explained what I am looking for. >> Thanks, >> Vivek >> >> 2015-11-20 19:09 GMT+01:00 John McKown <john.archie.mck...@gmail.com>: >> >> A possibility could be to not use ODBC, but the CRAN package openslsx ( >>> https://cran.revolutionanalytics.com/web/packages/openxlsx/index.html ). >>> Then use the read.xlsx() function. >>> <quote> >>> Description Read data from an Excel file or Workbook object into a >>> data.frame >>> >>> Usage read.xlsx(xlsxFile, sheet = 1, startRow = 1, colNames = TRUE, >>> rowNames = FALSE, detectDates = FALSE, >>> skipEmptyRows = TRUE, rows = NULL, >>> cols = NULL, check.names = FALSE, namedRegion = NULL) >>> >>> Arguments xlsxFile An xlsx file or Workbook object sheet The name or index >>> of the sheet to read data from. >>> startRow first row to begin looking for data. Empty rows at the top of a >>> file are always skipped, regardless of the value of startRow. >>> colNames If TRUE, the first row of data will be used as column names. >>> rowNames If TRUE, first column of data will be used as row names. >>> detectDates If TRUE, attempt to recognise dates and perform conversion. >>> skipEmptyRows If TRUE, empty rows are skipped else empty rows after the >>> first row containing data will return a row of NAs. >>> rows A numeric vector specifying which rows in the Excel file to read. If >>> NULL, all rows are read. >>> cols A numeric vector specifying which columns in the Excel file to read. >>> If NULL, all columns are read. >>> check.names logical. If TRUE then >>> the names of the variables in the data >>> frame are checked to ensure that they are syntactically valid variable >>> names >>> namedRegion A named region in the Workbook. If not NULL startRow, rows and >>> cols paramters are ignored. >>> </quote> >>> >>> On Fri, Nov 20, 2015 at 11:38 AM, Vivek Sutradhara <viveksu...@gmail.com> >>> wrote: >>> >>> Hi >>>> I want to extract data from a Microsoft access database having many tables >>>> with more than 1e7 rows. I find that the following code works to export a >>>> table to a rds file : >>>> ##################### >>>> setwd('C:/sFolder') >>>> library(RODBC);library(DBI) >>>> ch<-odbcConnect("sample") >>>> >>>> #No. of rows in the table not known >>>> rowN<-1e6 # no. of rows defined >>>> db<-sqlFetch(ch,"Table1",max=rowN,as.is=TRUE) >>>> >>>> file<-paste0('Table1',1,'.rds') >>>> df1<-saveRDS(db,file1) >>>> >>>> rm(db);gc() # garbage collection to free up the memory >>>> >>>> # To successively obtain more chunks from the access database >>>> for (i in 2:10) { >>>> rm(df);gc() >>>> df<-sqlFetchMore(ch,"Table1",max=rowN,as.is=TRUE) >>>> file<-paste0('Table1',i,'.rds') >>>> df1<-saveRDS(df,file) >>>> if (dim(df)[1]<rowN) >>>> break >>>> } >>>> rm(df);gc() >>>> odbcCloseAll() >>>> ############################## >>>> >>>> I would like to know the following : >>>> 1. Is there any way to extract data from a table by just specifying the >>>> row >>>> number range. I have extracted data before. Instead of repeating the >>>> operations, I would just like to obtain data from, let's say, 8e6 to 9e6 >>>> row range. I cannot do this now. I have to successively use the >>>> sqlfetchMore command. I would like to know if it is possible to straight >>>> >>>> away go to the 8e6 to 9e6 row range. >>>> 2. Is it possible to use the foreach package in the extraction step (in >>>> place of the for loop above). I am planning to use the foreach command in >>>> parallel later for processing the data in the multiple files. I just >>>> wonder >>>> if it is possible to do parallel processing for the data extraction also. >>>> Thanks, >>>> Vivek Sutradhara >>>> >>>> [[alternative HTML version deleted]] >>>> >>>> ------------------------------ >>>> >>>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>>> https://stat.ethz.ch/mailman/listinfo/r-help >>>> PLEASE do read the posting guide >>>> http://www.R-project.org/posting-guide.html >>>> and provide commented, minimal, self-contained, reproducible code. >>> >>> >>> >>> >>> >>> -- >>> >>> Schrodinger's backup: The condition of any backup is unknown until >>> a >>> restore is attempted. >>> >>> Yoda of Borg, we are. Futile, resistance is, yes. Assimilated, you will be. >>> >>> He's about as useful as a wax frying pan. >>> >>> 10 to the 12th power microphones = 1 Megaphone >>> >>> Maranatha! <>< >>> John McKown >> >> >> >> [[alternative HTML version deleted]] >> >> ------------------------------ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> >> > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.