[R] Convert Numeric (20090101) to Date
Hi there, Does it exist where R can convert a numeric date (20090101) to a "proper" date format? (Ideally dd-mm-) Original data (in this case) is in .DAT format. I read the multi-column data with the read.fwf function, where I specified the column width for the eight digit date (example above). After the .DAT data is read-in & formatted in R, it is to be exported to Excel. I understand that with the as.Date function, 20090101 is understood as the number of days from the R origin date. I read that SAS has the capability to convert 20090101 to a date, so I'm hoping R does as well. Conversion to a date in Excel does not work. Help in this matter is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Convert-Numeric-20090101-to-Date-tp4451859p4451859.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Convert Numeric (20090101) to Date
Hi again, Thanks for the responses. The latter solution does the trick! I had tinkered around the numeric -> character route & tried as.Date a few different ways, but needed guidance to the bullseye. Thanks again! -- View this message in context: http://r.789695.n4.nabble.com/Convert-Numeric-20090101-to-Date-tp4451859p4453620.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading in 9.6GB .DAT File - OK with 64-bit R?
Hi there, I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine) - to then delete a substantial number of rows & then convert to a .csv file. Upon the first attempt the computer crashed (at some point last night). I'm rerunning this now & am closely monitoring Processor/CPU/Memory. Apart from this crash being a computer issue alone (possibly), is R equipped to handle this much data? I read up on the FAQs page that 64-bit R can handle larger data sets than 32-bit. I'm using the read.fwf function to read in the data. I don't have access to a database program (SQL, for instance). Advice is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4457220.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading in 9.6GB .DAT File - OK with 64-bit R?
Hi Jeff & Steve, Thanks for your responses. After seven hours R/machine ran out of memory (and thus ended). Currently the machine has 4GB RAM. I'm looking to install more RAM tomorrow. I will look into SQLLite3; thanks! I've read that SQL would be a great program for data of this size (read-in, manipulate), but I understand there is a hefty price tag (similar to the cost of SAS? [licensing]). At this time I'm looking for a low-cost solution, if possible. After this data event, a program like SQL would not be needed in the future; also, with these multiple data sets to synthesize, only a handful are of this size. Thanks & please lend any other advice! -- View this message in context: http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458042.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading in 9.6GB .DAT File - OK with 64-bit R?
Hi Barry, "You could do a similar thing in R by opening a text connection to your file and reading one line at a time, writing the modified or selected lines to a new file." Great! I'm aware of this existing, but don't know the commands for R. I have a variable [560,1] to use to pare down the incoming large data set (I'm sure of millions of rows). With other data sets they've been small enough where I've been able to use the merge function after data has been read in. Obviously I'm having trouble reading in this large data set in in the first place. Any additional help would be great! -- View this message in context: http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458074.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading in 9.6GB .DAT File - OK with 64-bit R?
Hi Sarah, Thanks for the SQL info! I'll look into these straightaway, along with the notion of opening a text connection. Thanks again! -- View this message in context: http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458083.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Subsetting a data.frame -> Read in with FWF format from .DAT file
Hi there, I am having trouble subsetting a data frame by a conditional via one column (of many). I read the file into R through "read.fwf," where I specified column widths. Original data is .DAT. I then utilized "names" function to read in column headings. For one column, PRVDR_NUM, I wish to further amend the entire data set, but only have PRVDR_NUM == 050108. This is where I'm having trouble. I've tried code like this: newinpatient <- subset(oldinpatient, oldinpatient$PRVDR_NUM == 050108) #OR newinpatient <- oldinpatient[oldinpatient$PRVDR_NUM == 050108, ] #OR providernum <- data.frame(newdim(PRVDR_NUM = c(050108)) newinpatient <- merge(providernum, oldinpatient) With checking "class" at one point, I gathered that R interprets PRVDR_NUM as a factor, not a number .. so I've understood a potential reason why I would have errors (with code above). So, I then tried something like this: newPRVDR_NUM <- format(as.numeric(levels(oldinpatient$PRVDR_NUM) [oldinpatient$PRVDR_NUM])) numericprvdr <- data.frame(oldinpatient, newPRVDR_NUM) bestprvdr <- numericprvdr[,-2] I thought that with converting PRVDR_NUM to numeric, then one of the three options above would be satisfied. But, that has not worked either. (I did confirm that the factor -> numeric worked, which it did) Though R reads the three options (above) with no errors, upon performing a "dim" check I receive the output: 0 93. The columns are correct, but rows (obviously) are not. (I did confirm that the desired value exists multiple times in the noted column, so 0 is definitely incorrect) As well, I would like to work with PRVDR_NUM as a variable alone, but I've found that with any of these variables/column names, I have to use "allinpatient$PRVDR_NUM." R does not recognize PRVDR_NUM alone. Why? More and more I think my problem is more foundational, meaning using the read.fwf function in the first place? Not using the read.fwf function correctly? Again, I've made enough progress with other variables & data sets of this type I've been fine so far, but now & future I need to repeat this code enough times where help in better understanding my errors & a more elegant/efficient solution would be greatly appreciated. Also note that R does not read all 93 columns as factors. Why would R interpret this six-wide column as a factor, but the nine-wide column next door as numeric? Your help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-Read-in-with-FWF-format-from-DAT-file-tp4461051p4461051.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Subsetting a data.frame -> Read in with FWF format from .DAT file
Hi Michael, Thanks so much for your detailed reply! I gained a better understanding of the read.fwf function, along with ensuring I better note how these read-in functions convert variables, etc. As well, your tip on removing "format" while converting the PRVDR_NUM variable to numeric (from factor) is the ticket! Also, your reply aided with noting which code (of three options to subset data) is the "best" to use. At this time I've been able to successfully output the data file at hand. Thanks again for your help! -- View this message in context: http://r.789695.n4.nabble.com/Subsetting-a-data-frame-Read-in-with-FWF-format-from-DAT-file-tp4461051p4466513.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Merge function - Return NON matches
Hi there, I wish to merge a common variable between a list and a data.frame & return rows via the data.frame where there is NO match. Here are some details: The list, where the variable/col.name = CLAIM_NO CLAIM_NO 20 83 1440 4439 7002 ... > dim(hrc78_clm_no) [1] 66781 The data.frame, where there exists a variable with the same name, CLAIM_NO. > dim(bestPartAreadmin) [1] 1306893 I wish to merge the two together & only return a data.frame where there is NO match in the CLAIM_NO between both files. I've read & tried code via the "merge" function. If "merge" can do this, I'm missing something with the available options. I'm figuring something like: clm_no_nomatch <- merge(hrc78_clm_no, bestPartAreadmin, by = "CLAIM_NO", .. .. ..) Your help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590755.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi Steve, Thanks for replying. Here's a small piece of the data.frame: > bestPartAreadmin[1:5,1:6] DESY_SORT_KEY PRVDR_NUM CLM_THRU_DT CLAIM_NO NCH_NEAR_LINE_REC_IDEN_CD NCH_CLM_TYPE_CD 1 10193 290003 20090323 20 V60 2 10193 290045 20091124 21 V60 3 10193 29T003 20090401 22 V60 4 10574 050017 20090527 83 V60 5 10574 050017 20090921 84 V60 There's 93 columns total in the data.frame, so these are the first six, where you can see CLAIM_NO. I wish for the resultant data.frame to "look" just like the data.frame above, but values for CLAIM_NO (above) are those that differ/don't match the corresponding CLAIM_NO values in the list (hrc78_clm_no). Does this help? Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590810.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi again, I tried the sample code like this: > merged_clmno <- subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) > dim(merged_clmno) [1] 1306893 Note that: > dim(bestPartAreadmin) [1] 1306893 So, no change between the original data.frame (bestPartAreadmin) & the (should be) less-rows merged_clmno data.frame. Any further help is most appreciated! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi there, Thanks for your responses. I haven't used/heard of dput() before. I'm looking it up & understanding how it works. Thanks! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591003.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Merge function - Return NON matches
Hi there, I've tried the noted solutions: "If you do `no <- unlist(hrc_78_clm_no`, do you get a character vector of claim numbers you want to exclude? If so, then `subset(whatever, !CLAIM_NO %in% no)` should work." I converted the CLAIM_NO list to a character, with > hrc78_clmno_char <- format(as.character(hrc78_clm_no)) > is.character(hrc78_clmno_char) [1] TRUE Then I applied your code (above), which didn't work. Thanks though! Thanks for the dput() help. Here is truncated output of the list (its class is data.frame, I call it a list for communication sake) & data.frame. Again, your help is most appreciated! Goal: merge the list & data.frame together. Output the data.frame, but with rows where the CLAIM_NO variable between the list & data.frame *do not match*. *The List* truncated_list <- hrc78_clm_no[1:100,] #So you can see consistency in previously-mentioned variables truncated_list <- structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 7002L, 9562L, 10463L, 12503L, 16195L, 22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 200473L, 200927L, 202407L), .Names = c("CLAIM_NO"), class = "data.frame")) *The (multi-column) data.frame, but greatly truncated* truncated_dataframe <- bestPartAreadmin[1:25, 1:4] truncated_dataframe <- structure(list(DESY_SORT_KEY = c(10193L, 10193L, 10193L, 10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM = structure(c(1368L, 1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 166L, 196L, 196L, 311L, 1363L), .Label = c("010001", "010006", "010015", "010016", "010029", "010033", "010034", "010035", "010039", "010040", "010046", "010049", "010083", "010092", "010108", "010131", "010149", "01S001", "01S033", "01S046", "01S145", "020001", "020006", "020012", "020017", "021306", "021311", "030002", "030006", "030007", "030010", "030011", "030012", "030013", "030014", "030016", "030023", "030024", "030030", "030033", "030036", "030037", "030038", "030043", "030055", "030061", "030062", "030064", "030065", "030067", "030069", "030078", "030083", "030085", "030087", "030088", "030089", "030092", "030093", "030100", "030101", "030102", "030103", "030105", "030108", "030110", "030111", "030114", "030115", "030117", "030118", "030119", "030120", "030121", "030122", "030123", "030126", "030128", "031300", "031305", "031311", "032000", "032001", "032002", "032006", "033025", "033028", "033029", "033032", "033034", "033036", "034004", "034013", "034020", "034024", "03S002", "03S006", "03S007", "03S016", "03S022", "03S023", "03S089", "03T002", "03T055", "03T061", "03T069", "03T093", "03T103", "03T114", "03T117", "03T126", "040004", "040007", "040010", "040011", "040016", "040022", "040026", "040027", "040029", "040036", "040041", "040047", "040055", "040062", "040072", "040080", "040084", "040088", "040091", "040114", "040118", "040119", "043028", "044005", "04S027", "04S084", "04T041", "04T062", "04T119", "050002", "050006", "050007", "050008", "050009", "050013", "050014", "050016", "050017", "050018", "050022", "050024", "050025", "050026", "050030", "050036", "050038", "050039", "050040", "050042", "050043", "050045", "050046", "050047", "050055", "050056", "050057", "050058", "050060", "050063", "050069", "050070", "050071", "050073", "050075", "050076", "050077", "050078", "050079", "050082", "050084", "050089", "050090", "050091", "050093", "050099", "050100", "050101", "050102", "050103", "050104", "050107", "050108", "050110", "050111", "050112", "050113", "050115", "050116", "050118", "050121", "050122", "050124", "050125", "050126", "050128", "050129", "050131", "050132", "050133", "050135", "050136", "050137", "050138", "050139", "050140", "050145", "050146", "050149", "050150", "050152", "050153", "050158", "050159", "050168", "050169", "050174", "050179", "050180", "050188", "050191", "050193", "050195", "050196", "050197", "050204", "050211", "050219", "050222", "050224", "050225", "050226", "050228", "050230",
Re: [R] Merge function - Return NON matches
Hi again, Petr, your solution worked! Thanks everyone for your input. I'll look more into "setdiff." Cheers! -- View this message in context: http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4593101.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.