[R] Convert Numeric (20090101) to Date

2012-03-06 Thread RHelpPlease
Hi there,
Does it exist where R can convert a numeric date (20090101) to a "proper"
date format?  (Ideally dd-mm-) 

Original data (in this case) is in .DAT format.  I read the multi-column
data with the read.fwf function, where I specified the column width for the
eight digit date (example above).  After the .DAT data is read-in &
formatted in R, it is to be exported to Excel. 

I understand that with the as.Date function, 20090101 is understood as the
number of days from the R origin date.
  
I read that SAS has the capability to convert 20090101 to a date, so I'm
hoping R does as well.  Conversion to a date in Excel does not work.

Help in this matter is most appreciated!

--
View this message in context: 
http://r.789695.n4.nabble.com/Convert-Numeric-20090101-to-Date-tp4451859p4451859.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Convert Numeric (20090101) to Date

2012-03-07 Thread RHelpPlease
Hi again,
Thanks for the responses.  The latter solution does the trick!

I had tinkered around the numeric -> character route & tried as.Date a few
different ways, but needed guidance to the bullseye.  

Thanks again!

--
View this message in context: 
http://r.789695.n4.nabble.com/Convert-Numeric-20090101-to-Date-tp4451859p4453620.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reading in 9.6GB .DAT File - OK with 64-bit R?

2012-03-08 Thread RHelpPlease
Hi there,
I wish to read a 9.6GB .DAT file into R (64-bit R on 64-bit Windows machine)
- to then delete a substantial number of rows & then convert to a .csv file. 
Upon the first attempt the computer crashed (at some point last night).

I'm rerunning this now & am closely monitoring Processor/CPU/Memory.

Apart from this crash being a computer issue alone (possibly), is R equipped
to handle this much data?  I read up on the FAQs page that 64-bit R can
handle larger data sets than 32-bit.

I'm using the read.fwf function to read in the data.  I don't have access to
a database program (SQL, for instance).

Advice is most appreciated!



--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4457220.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading in 9.6GB .DAT File - OK with 64-bit R?

2012-03-08 Thread RHelpPlease
Hi Jeff & Steve,
Thanks for your responses.  After seven hours R/machine ran out of memory
(and thus ended).  Currently the machine has 4GB RAM.  I'm looking to
install more RAM tomorrow.

I will look into SQLLite3; thanks!

I've read that SQL would be a great program for data of this size (read-in,
manipulate), but I understand there is a hefty price tag (similar to the
cost of SAS? [licensing]).  At this time I'm looking for a low-cost
solution, if possible.  After this data event, a program like SQL would not
be needed in the future; also, with these multiple data sets to synthesize,
only a handful are of this size.

Thanks & please lend any other advice!

--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458042.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading in 9.6GB .DAT File - OK with 64-bit R?

2012-03-08 Thread RHelpPlease
Hi Barry,

"You could do a similar thing in R by opening a text connection to 
your file and reading one line at a time, writing the modified or 
selected lines to a new file."

Great!  I'm aware of this existing, but don't know the commands for R.  I
have a variable [560,1] to use to pare down the incoming large data set (I'm
sure of millions of rows).  With other data sets they've been small enough
where I've been able to use the merge function after data has been read in. 
Obviously I'm having trouble reading in this large data set in in the first
place.

Any additional help would be great!  


--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458074.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Reading in 9.6GB .DAT File - OK with 64-bit R?

2012-03-08 Thread RHelpPlease
Hi Sarah,
Thanks for the SQL info!  I'll look into these straightaway, along with the
notion of opening a text connection. 

Thanks again!



--
View this message in context: 
http://r.789695.n4.nabble.com/Reading-in-9-6GB-DAT-File-OK-with-64-bit-R-tp4457220p4458083.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Subsetting a data.frame -> Read in with FWF format from .DAT file

2012-03-09 Thread RHelpPlease
Hi there,
I am having trouble subsetting a data frame by a conditional via one column
(of many).

I read the file into R through "read.fwf," where I specified column widths. 
Original data is .DAT.  I then utilized "names" function to read in column
headings.

For one column, PRVDR_NUM, I wish to further amend the entire data set, but
only have PRVDR_NUM == 050108.  This is where I'm having trouble.

I've tried code like this:

newinpatient <- subset(oldinpatient, oldinpatient$PRVDR_NUM == 050108)
#OR
newinpatient <- oldinpatient[oldinpatient$PRVDR_NUM == 050108, ]
#OR
providernum <- data.frame(newdim(PRVDR_NUM = c(050108))
newinpatient <- merge(providernum, oldinpatient)

With checking "class" at one point, I gathered that R interprets PRVDR_NUM
as a factor, not a number .. so I've understood a potential reason why I
would have errors (with code above).  So, I then tried something like this:

newPRVDR_NUM <- format(as.numeric(levels(oldinpatient$PRVDR_NUM)
[oldinpatient$PRVDR_NUM]))
numericprvdr <- data.frame(oldinpatient, newPRVDR_NUM)
bestprvdr <- numericprvdr[,-2]

I thought that with converting PRVDR_NUM to numeric, then one of the three
options above would be satisfied.  But, that has not worked either.  (I did
confirm that the factor -> numeric worked, which it did)

Though R reads the three options (above) with no errors, upon performing a
"dim" check I receive the output: 0 93.  The columns are correct, but rows
(obviously) are not.  (I did confirm that the desired value exists multiple
times in the noted column, so 0 is definitely incorrect)

As well, I would like to work with PRVDR_NUM as a variable alone, but I've
found that with any of these variables/column names, I have to use
"allinpatient$PRVDR_NUM."  R does not recognize PRVDR_NUM alone.  Why?

More and more I think my problem is more foundational, meaning using the
read.fwf function in the first place?  Not using the read.fwf function
correctly?  Again, I've made enough progress with other variables & data
sets of this type I've been fine so far, but now & future I need to repeat
this code enough times where help in better understanding my errors & a more
elegant/efficient solution would be greatly appreciated.  

Also note that R does not read all 93 columns as factors.  Why would R
interpret this six-wide column as a factor, but the nine-wide column next
door as numeric?

Your help is most appreciated!

--
View this message in context: 
http://r.789695.n4.nabble.com/Subsetting-a-data-frame-Read-in-with-FWF-format-from-DAT-file-tp4461051p4461051.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Subsetting a data.frame -> Read in with FWF format from .DAT file

2012-03-12 Thread RHelpPlease
Hi Michael,
Thanks so much for your detailed reply!  

I gained a better understanding of the read.fwf function, along with
ensuring I better note how these read-in functions convert variables, etc. 
As well, your tip on removing "format" while converting the PRVDR_NUM
variable to numeric (from factor) is the ticket!  Also, your reply aided
with noting which code (of three options to subset data) is the "best" to
use.

At this time I've been able to successfully output the data file at hand. 
Thanks again for your help!

--
View this message in context: 
http://r.789695.n4.nabble.com/Subsetting-a-data-frame-Read-in-with-FWF-format-from-DAT-file-tp4461051p4466513.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Merge function - Return NON matches

2012-04-26 Thread RHelpPlease
Hi there,
I wish to merge a common variable between a list and a data.frame & return
rows via the data.frame where there is NO match.  Here are some details:

The list, where the variable/col.name = CLAIM_NO
CLAIM_NO
20
83
1440
4439
7002
...

> dim(hrc78_clm_no)
[1] 66781

The data.frame, where there exists a variable with the same name, CLAIM_NO.
> dim(bestPartAreadmin)
[1] 1306893

I wish to merge the two together & only return a data.frame where there is
NO match in the CLAIM_NO between both files.

I've read & tried code via the "merge" function.  If "merge" can do this,
I'm missing something with the available options.

I'm figuring something like:

clm_no_nomatch <- merge(hrc78_clm_no, bestPartAreadmin, by = "CLAIM_NO",  ..
.. ..)

Your help is most appreciated!



--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590755.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread RHelpPlease
Hi Steve,
Thanks for replying.  Here's a small piece of the data.frame:

> bestPartAreadmin[1:5,1:6]
  DESY_SORT_KEY   PRVDR_NUM   CLM_THRU_DT   CLAIM_NO  
NCH_NEAR_LINE_REC_IDEN_CD   NCH_CLM_TYPE_CD 
1 10193 290003  20090323   20   
  
V60
2 10193 290045  20091124   21   
  
V60
3 10193 29T003  20090401   22   
  
V60
4 10574 050017  20090527   83   
  
V60   
5 10574 050017  20090921   84   
  
V60   

There's 93 columns total in the data.frame, so these are the first six,
where you can see CLAIM_NO.

I wish for the resultant data.frame to "look" just like the data.frame
above, but values for CLAIM_NO (above) are those that differ/don't match the
corresponding CLAIM_NO values in the list (hrc78_clm_no).

Does this help?

Thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590810.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread RHelpPlease
Hi again,
I tried the sample code like this:

> merged_clmno <- subset(bestPartAreadmin, !CLAIM_NO %in% hrc78_clm_no) 
> dim(merged_clmno)
[1] 1306893

Note that:
> dim(bestPartAreadmin)
[1] 1306893

So, no change between the original data.frame (bestPartAreadmin) & the
(should be) less-rows merged_clmno data.frame.

Any further help is most appreciated!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4590851.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-26 Thread RHelpPlease
Hi there,
Thanks for your responses.  I haven't used/heard of dput() before.  I'm
looking it up & understanding how it works.

Thanks!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4591003.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Merge function - Return NON matches

2012-04-27 Thread RHelpPlease
Hi there,
I've tried the noted solutions:

"If you do `no <- unlist(hrc_78_clm_no`, do you get a character vector 
of claim numbers you want to exclude? If so, then `subset(whatever, 
!CLAIM_NO %in% no)` should work."

I converted the CLAIM_NO list to a character, with

> hrc78_clmno_char <- format(as.character(hrc78_clm_no))
> is.character(hrc78_clmno_char)
[1] TRUE

Then I applied your code (above), which didn't work.  Thanks though!

Thanks for the dput() help.  Here is truncated output of the list (its class
is data.frame, I call it a list for communication sake) & data.frame. 
Again, your help is most appreciated!

Goal: merge the list & data.frame together.  Output the data.frame, but with
rows where the CLAIM_NO variable between the list & data.frame *do not
match*.

*The List*
truncated_list <- hrc78_clm_no[1:100,] #So you can see consistency in
previously-mentioned variables
truncated_list <- structure(list(CLAIM_NO = c(20L, 83L, 1440L, 4439L, 7002L,
9562L, 10463L, 12503L, 16195L, 
22987L, 30760L, 32108L, 32640L, 33045L, 36241L, 37091L, 37934L, 
38663L, 39456L, 40544L, 40630L, 40679L, 40734L, 43054L, 53483L, 
54155L, 56151L, 58113L, 61050L, 62056L, 63014L, 68486L, 68541L, 
69298L, 69983L, 73379L, 76810L, 79975L, 91124L, 97697L, 100524L, 
105808L, 112659L, 112955L, 113422L, 114522L, 124159L, 133566L, 
135167L, 137387L, 137954L, 138186L, 144574L, 148573L, 150013L, 
152193L, 154680L, 155414L, 165954L, 171223L, 175077L, 176359L, 
177656L, 178155L, 182250L, 182393L, 182832L, 184245L, 185542L, 
186038L, 186087L, 186098L, 186294L, 186550L, 186897L, 187025L, 
190180L, 191472L, 192593L, 196207L, 196689L, 197372L, 197537L, 
197590L, 197730L, 197874L, 198294L, 198750L, 198823L, 199076L, 
199233L, 199284L, 199468L, 199661L, 199913L, 200150L, 200279L, 
200473L, 200927L, 202407L), .Names = c("CLAIM_NO"), class = "data.frame"))

*The (multi-column) data.frame, but greatly truncated*
truncated_dataframe <- bestPartAreadmin[1:25, 1:4]
truncated_dataframe <- structure(list(DESY_SORT_KEY = c(10193L,
10193L, 10193L, 
10574L, 10574L, 19213L, 19213L, 19213L, 100026636L, 
100040718L, 100055111L, 100060558L, 100060558L, 100060558L, 100072978L, 
100096346L, 100130451L, 100168782L, 100168782L, 100168782L, 100168782L, 
100168782L, 100168782L, 100174887L, 100177905L), PRVDR_NUM =
structure(c(1368L, 
1353L, 1406L, 149L, 149L, 1362L, 1393L, 1367L, 1557L, 1370L, 
1360L, 1362L, 1362L, 1362L, 1372L, 1358L, 193L, 196L, 196L, 61L, 
166L, 196L, 196L, 311L, 1363L), .Label = c("010001", "010006", 
"010015", "010016", "010029", "010033", "010034", "010035", "010039", 
"010040", "010046", "010049", "010083", "010092", "010108", "010131", 
"010149", "01S001", "01S033", "01S046", "01S145", "020001", "020006", 
"020012", "020017", "021306", "021311", "030002", "030006", "030007", 
"030010", "030011", "030012", "030013", "030014", "030016", "030023", 
"030024", "030030", "030033", "030036", "030037", "030038", "030043", 
"030055", "030061", "030062", "030064", "030065", "030067", "030069", 
"030078", "030083", "030085", "030087", "030088", "030089", "030092", 
"030093", "030100", "030101", "030102", "030103", "030105", "030108", 
"030110", "030111", "030114", "030115", "030117", "030118", "030119", 
"030120", "030121", "030122", "030123", "030126", "030128", "031300", 
"031305", "031311", "032000", "032001", "032002", "032006", "033025", 
"033028", "033029", "033032", "033034", "033036", "034004", "034013", 
"034020", "034024", "03S002", "03S006", "03S007", "03S016", "03S022", 
"03S023", "03S089", "03T002", "03T055", "03T061", "03T069", "03T093", 
"03T103", "03T114", "03T117", "03T126", "040004", "040007", "040010", 
"040011", "040016", "040022", "040026", "040027", "040029", "040036", 
"040041", "040047", "040055", "040062", "040072", "040080", "040084", 
"040088", "040091", "040114", "040118", "040119", "043028", "044005", 
"04S027", "04S084", "04T041", "04T062", "04T119", "050002", "050006", 
"050007", "050008", "050009", "050013", "050014", "050016", "050017", 
"050018", "050022", "050024", "050025", "050026", "050030", "050036", 
"050038", "050039", "050040", "050042", "050043", "050045", "050046", 
"050047", "050055", "050056", "050057", "050058", "050060", "050063", 
"050069", "050070", "050071", "050073", "050075", "050076", "050077", 
"050078", "050079", "050082", "050084", "050089", "050090", "050091", 
"050093", "050099", "050100", "050101", "050102", "050103", "050104", 
"050107", "050108", "050110", "050111", "050112", "050113", "050115", 
"050116", "050118", "050121", "050122", "050124", "050125", "050126", 
"050128", "050129", "050131", "050132", "050133", "050135", "050136", 
"050137", "050138", "050139", "050140", "050145", "050146", "050149", 
"050150", "050152", "050153", "050158", "050159", "050168", "050169", 
"050174", "050179", "050180", "050188", "050191", "050193", "050195", 
"050196", "050197", "050204", "050211", "050219", "050222", "050224", 
"050225", "050226", "050228", "050230",

Re: [R] Merge function - Return NON matches

2012-04-27 Thread RHelpPlease
Hi again,
Petr, your solution worked!

Thanks everyone for your input.  I'll look more into "setdiff."

Cheers!

--
View this message in context: 
http://r.789695.n4.nabble.com/Merge-function-Return-NON-matches-tp4590755p4593101.html
Sent from the R help mailing list archive at Nabble.com.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.