842752|ASPIRIN DISP AAH|TABS|75MG|2.9 is taken directly from my y_data text
file.
If I search for ASPIRIN DISP AAH %in% y$name I get: FALSE. Despite the
fact that it is there in the text file that I loaded into y. There must be a
problem with my input.
tsf_data <- read.table("feb09_267_presc_items_tsf.txt", header = TRUE, sep =
"|", quote = "\"'",
dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = "#", allowEscapes = FALSE, flush = FALSE,
fileEncoding = "", encoding = "unknown")
Would fill=TRUE and strip.white=TRUE affect how the item_code columns are
matched? When I look at the file in a script editor I cannot see any
strange symbols or formatting? I have had to use fill otherwise I get the
following error message:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
number of items read is not a multiple of the number of columns
Not sure if correcting this in read.table is affecting the join function of
sqldb?
Thanks,
Natalie
I also tried 842752
On Fri, Mar 12, 2010 at 6:05 PM, Gabor Grothendieck <[email protected]
> wrote:
> That is not so. 842752 does not exist in y$item_code and ASPIRIN has
> a code of 22730.
>
> > 842752 %in% y$item_code
> [1] FALSE
>
> > subset(y, name == "ASPIRIN")
> item_code name formulation_code strength bnf_code
> 850 22730 ASPIRIN TABS 300MG 4.7.1
> 855 22780 ASPIRIN PDR NULL 4.7.1
> 856 22790 ASPIRIN MIXT $ 4.7.1
>
>
> On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam
> <[email protected]> wrote:
> > ---------- Forwarded message ----------
> > From: Natalie Van Zuydam <[email protected]>
> > Date: Fri, Mar 12, 2010 at 5:49 PM
> > Subject: Re: [R] sqldf not joining all the fields
> > To: David Winsemius <[email protected]>
> >
> >
> > Dear David
> >
> > I'm not sure what the problem is as for every item code there is a
> > corresponding information in the y_data. For example 842752 from the
> x_data
> > corresponds to Aspirin in the y_data? Yet when I use sqldf to join the
> two
> > df's I get NA values in the columns from the y_data in z for 842752 item
> > code....is there something wrong with my sqldf code or something wrong
> with
> > the way I have inputed the data frames?
> >
> > Thanks for taking the time to help me,
> > Natalie
> >
> >
> >
> > On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius <[email protected]
> >wrote:
> >
> >> If I assign the file input to y_data and change you sqldf to
> >>
> >> > z <- sqldf("select * from x_data left join y_data using
> (item_code)"); z
> >>
> >> I can replicate your result. Even after changing the types of the two
> >> item_code fields to match I still get the same result and when I see to
> what
> >> degree they share values I get:
> >>
> >> > sum(x_data$item_code %in% y_data$item_code)
> >> [1] 2
> >> > sum(y_data$item_code %in% x_data$item_code)
> >> [1] 2
> >>
> >>
> >> So why are you so sure they are "complete" as you claimed in your first
> >> email.
> >>
> >> --
> >> David.
> >>
> >>
> >>
> >> On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:
> >>
> >> You have now given two different assignments to x_data and none to
> y_data:
> >>>
> >>> The str( from the file access offering:
> >>>
> >>> > str(x_data)
> >>> 'data.frame': 2848 obs. of 5 variables:
> >>> $ item_code : int 100 110 150 160 161 164 200 210 212 220 ...
> >>> $ name : chr "NEONACLEX K" "NEONACLEX" "MESORB" "ABSORBENT
> >>> CELLULOSE MESO" ...
> >>> $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ...
> >>> $ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
> >>> $ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...
> >>>
> >>> The str from assignment from the dput offering
> >>> > str(x_data)
> >>> 'data.frame': 10 obs. of 10 variables:
> >>> $ prochi : chr "CAO0000713" "CAO0000713" "CAO0000713"
> >>> "CAO0000713" ...
> >>> $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000"
> >>> "03/07/2000" ...
> >>> $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ...
> >>> $ item_code : chr "842752" "7800" "842652" "842652" ...
> >>> $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ...
> >>> $ quantity : chr "60" "100G" "60" "60" ...
> >>> $ directions : chr "1/D" "A/TD" "1/D" "1/D" ...
> >>> $ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ...
> >>> $ datasource : chr "TSF" "TSF" "TSF" "TSF" ...
> >>> $ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ...
> >>>
> >>> This code "worked", but it is not clear that the x-y assignments were
> >>> correct:
> >>>
> >>> x_data <- read.table(file="
> >>> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt",
> header
> >>> = TRUE, sep = "|", quote = "\"'",
> >>> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
> >>> nrows = 3864284,
> >>> skip = 0, check.names = TRUE,fill=TRUE,
> >>> strip.white = TRUE, blank.lines.skip = TRUE,
> >>> comment.char = "#", allowEscapes = FALSE, flush = FALSE,
> >>> fileEncoding = "", encoding = "unknown")
> >>>
> >>> --
> >>> David.
> >>>
> >>> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:
> >>>
> >>>
> >>>> The y_data file has over 9000 rows in it so I thought it would be more
> >>>> practical to give you the file to download....
> >>>> --
> >>>> View this message in context:
> >>>>
> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
> >>>> Sent from the R help mailing list archive at Nabble.com.
> >>>>
> >>>> ______________________________________________
> >>>> [email protected] mailing list
> >>>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>>> PLEASE do read the posting guide
> >>>> http://www.R-project.org/posting-guide.html
> >>>> and provide commented, minimal, self-contained, reproducible code.
> >>>>
> >>>
> >>> David Winsemius, MD
> >>> West Hartford, CT
> >>>
> >>> ______________________________________________
> >>> [email protected] mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >>
> >> David Winsemius, MD
> >> West Hartford, CT
> >>
> >>
> >
> > [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > [email protected] mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.