Hello All, I have to admit that I am not that good when it comes to vectorizing a function. I need some insight. Is the below a case where vectorization can be accomplished to improve speed?
Below the function a sample data - as you can see it is not delimited. However, the record length is 220 characters. So I wrote the following code to delimit the data set "/r". The function works and I have a dataset that can then be inserted into a MySql data table. However, the actual data set is 518,000 records so the number of characters is 518000 * 220. It takes R hours to parse this using the function I have written. Can this be vectorized or is this a loop deal? Best Regards, Glenn #' FNMA Factor #' #' This function parses the FNMA factor file for load into #' into a database table the FNMA factor file is non-delimited #' @param filepath A character vector specifying a data director #' @param lenght of the line A numeric value equal to the length of a line #' @export FNMAFactor <- function(filepath = character){ callpath <- paste(filepath,"mbsfact.txt", sep = "") returnpath <- paste(filepath,"factor.txt", sep = "") data <- readLines(con = callpath) numchar <- nchar(data, type = "chars") start <- c(seq(1, numchar, 220)) end <- c(seq(220, numchar, 220)) for(i in 1 : length(start)){ write(str_sub(data, start[i], end[i]), file = returnpath, append = TRUE)} } 31365EJ46 CI125483 00002003473100OCT03000003103340610.1548980406.500030197040112180MULTIPLE POOL 00000070147FNMS 06.500 CI12548307017009600000000031371KMA6 CL254253 00001304570700OCT03000010156865640.7785600006.000030102030132357MULTIPLE POOL 00000067230FNMS 06.000 CL25425306715033300000000031371RE44 CL259455 00000983651400OCT03000003447615880.3504916406.500050102050132357MULTIPLE POOL 00000070200FNMS 06.500 CL25945507045034000000000031376KBB1 CL357434 00002505145900OCT03000025021294240.9987958905.000090103090133359MULTIPLE POOL 00000055000FNMS 05.000 CL35743405500035800000000031385XE52 WS555556 00003651248300OCT03000033344198060.9132273504.575050103050133356MEGA POOL ** NOT AN ACTIVE SERVICER ** 00000052440FNAR 04.595 WS55555600000000000000000031385XLL9 WS555731 00013439369600OCT03000129242191330.9616685505.360080103040133352MEGA POOL ** NOT AN ACTIVE SERVICER ** 00000075160FNAR 05.368 WS55573100000000000000000031390XG87 CI659123 00000208856500OCT03000001136251660.5440346206.000080102080117179WASHINGTON MUTUAL BANK, FA 19850 PLUMMER STREET CHATSWORTH CA91311069210FNMS 06.000 CI65912306909016500000000031403BTR4 CL744060 00000770371700OCT03000007694084860.9987496805.000090103080133356MULTIPLE POOL 00000053920FNMS 05.000 CL74406000000000000000000031403GND0 LB748388 00000952312900OCT03000009512089400.9988407604.525090103080133358DLJ MORTGAGE CAPITAL INC. ELEVEN MADISON AVENUE NEW YORK NY10010058430FNAR XX.XXX LB74838800000000000000000031403GNG3 LB748391 00000715661500OCT03000007007212290.9791238304.379090103080133358DLJ MORTGAGE CAPITAL INC. ELEVEN MADISON AVENUE NEW YORK NY10010056530FNAR XX.XXX LB748391000000000000000000 ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.