Hi, I did quote the string and as I told read.table to strip.white I also tried it with no spaces. "ASPIRIN DISP AAH" %in% tsf_data$name [1] FALSE
"ASPIRINDISPAAH" %in% tsf_data$name [1] FALSE I have looked at the last lines of my y_data object and there is a problem with the file that I'm going to try to use perl to sort out if I y_data[2847:2848,] RONIC ACID|TABS|5MG|6.6.2\n652903|ALENDRONIC ACID Once Wee|TABS|70MG|1.1.1\n653000|AZATHIOPRINE|INJ|50MG|8.2.1\n653200|DORZOLAMIDE EYE|DROPS|2%|11.6\n653500|TOPIRAMATE|TABS|50MG|4.8.1\n653510|TOPIRAMATE|TABS|100MG|4.8.1\n653513|TOPIRAMATE|TABS|25MG|4.8.1\n653600|GUAR GUM SF|SACH|5G|6.1.2.3\n653700|TACALCITOL|OINT|NULL|13.5.2\n654000|COCODAMOL|CAPS|30/500MG|4.7.1\n654010|COCODAMOL|TABS|12.8/500M|4.7.1\n654020|COCODAMOL|SACH|30/500MG|4.7.1\n654300|ACAMPROSATE CALCIUM|TABS|333MG|4.10\n654400|ACECLOFENAC|TABS|100MG|10.1.1\n654500|TILUDRONIC ACID|TABS|200MG|6.6.2\n654600|TAMSULOSIN HCL MR|CAPS|400MCG|7.4.1\n654800|PENCICLOVIR|CREAM|1%|13.10.3\...... ........formulation_code strength bnf_code 2847 INJ 1MG/ML 3.4.3 2848 The output looks like this so it must be the input files and not an sqldf problem. I didn't see this in the script editor I was using. Natalie On Fri, Mar 12, 2010 at 7:05 PM, Dennis Murphy <djmu...@gmail.com> wrote: > Hi: > > Part of the problem is that your string contains spaces; in your example > below, you would need to quote the string, I believe, as in > > "ASPIRIN DISP AAH" %in% y$name [untested...] > > HTH, > Dennis > > > On Fri, Mar 12, 2010 at 10:59 AM, Natalie Van Zuydam <nvanzuy...@gmail.com > > wrote: > >> 842752|ASPIRIN DISP AAH|TABS|75MG|2.9 is taken directly from my y_data >> text >> file. >> >> If I search for ASPIRIN DISP AAH %in% y$name I get: FALSE. Despite the >> fact that it is there in the text file that I loaded into y. There must be >> a >> problem with my input. >> >> tsf_data <- read.table("feb09_267_presc_items_tsf.txt", header = TRUE, sep >> = >> "|", quote = "\"'", >> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA, >> nrows = 3864284, >> skip = 0, check.names = TRUE,fill=TRUE, >> strip.white = TRUE, blank.lines.skip = TRUE, >> comment.char = "#", allowEscapes = FALSE, flush = FALSE, >> fileEncoding = "", encoding = "unknown") >> >> Would fill=TRUE and strip.white=TRUE affect how the item_code columns are >> matched? When I look at the file in a script editor I cannot see any >> strange symbols or formatting? I have had to use fill otherwise I get the >> following error message: >> >> Warning message: >> In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : >> number of items read is not a multiple of the number of columns >> >> Not sure if correcting this in read.table is affecting the join function >> of >> sqldb? >> >> Thanks, >> Natalie >> >> >> >> I also tried 842752 >> >> On Fri, Mar 12, 2010 at 6:05 PM, Gabor Grothendieck < >> ggrothendi...@gmail.com >> > wrote: >> >> > That is not so. 842752 does not exist in y$item_code and ASPIRIN has >> > a code of 22730. >> > >> > > 842752 %in% y$item_code >> > [1] FALSE >> > >> > > subset(y, name == "ASPIRIN") >> > item_code name formulation_code strength bnf_code >> > 850 22730 ASPIRIN TABS 300MG 4.7.1 >> > 855 22780 ASPIRIN PDR NULL 4.7.1 >> > 856 22790 ASPIRIN MIXT $ 4.7.1 >> > >> > >> > On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam >> > <nvanzuy...@gmail.com> wrote: >> > > ---------- Forwarded message ---------- >> > > From: Natalie Van Zuydam <nvanzuy...@gmail.com> >> > > Date: Fri, Mar 12, 2010 at 5:49 PM >> > > Subject: Re: [R] sqldf not joining all the fields >> > > To: David Winsemius <dwinsem...@comcast.net> >> > > >> > > >> > > Dear David >> > > >> > > I'm not sure what the problem is as for every item code there is a >> > > corresponding information in the y_data. For example 842752 from the >> > x_data >> > > corresponds to Aspirin in the y_data? Yet when I use sqldf to join >> the >> > two >> > > df's I get NA values in the columns from the y_data in z for 842752 >> item >> > > code....is there something wrong with my sqldf code or something wrong >> > with >> > > the way I have inputed the data frames? >> > > >> > > Thanks for taking the time to help me, >> > > Natalie >> > > >> > > >> > > >> > > On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius < >> dwinsem...@comcast.net >> > >wrote: >> > > >> > >> If I assign the file input to y_data and change you sqldf to >> > >> >> > >> > z <- sqldf("select * from x_data left join y_data using >> > (item_code)"); z >> > >> >> > >> I can replicate your result. Even after changing the types of the two >> > >> item_code fields to match I still get the same result and when I see >> to >> > what >> > >> degree they share values I get: >> > >> >> > >> > sum(x_data$item_code %in% y_data$item_code) >> > >> [1] 2 >> > >> > sum(y_data$item_code %in% x_data$item_code) >> > >> [1] 2 >> > >> >> > >> >> > >> So why are you so sure they are "complete" as you claimed in your >> first >> > >> email. >> > >> >> > >> -- >> > >> David. >> > >> >> > >> >> > >> >> > >> On Mar 12, 2010, at 12:29 PM, David Winsemius wrote: >> > >> >> > >> You have now given two different assignments to x_data and none to >> > y_data: >> > >>> >> > >>> The str( from the file access offering: >> > >>> >> > >>> > str(x_data) >> > >>> 'data.frame': 2848 obs. of 5 variables: >> > >>> $ item_code : int 100 110 150 160 161 164 200 210 212 220 ... >> > >>> $ name : chr "NEONACLEX K" "NEONACLEX" "MESORB" >> "ABSORBENT >> > >>> CELLULOSE MESO" ... >> > >>> $ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ... >> > >>> $ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ... >> > >>> $ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ... >> > >>> >> > >>> The str from assignment from the dput offering >> > >>> > str(x_data) >> > >>> 'data.frame': 10 obs. of 10 variables: >> > >>> $ prochi : chr "CAO0000713" "CAO0000713" "CAO0000713" >> > >>> "CAO0000713" ... >> > >>> $ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000" >> > >>> "03/07/2000" ... >> > >>> $ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ... >> > >>> $ item_code : chr "842752" "7800" "842652" "842652" ... >> > >>> $ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ... >> > >>> $ quantity : chr "60" "100G" "60" "60" ... >> > >>> $ directions : chr "1/D" "A/TD" "1/D" "1/D" ... >> > >>> $ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ... >> > >>> $ datasource : chr "TSF" "TSF" "TSF" "TSF" ... >> > >>> $ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ... >> > >>> >> > >>> This code "worked", but it is not clear that the x-y assignments >> were >> > >>> correct: >> > >>> >> > >>> x_data <- read.table(file=" >> > >>> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt", >> > header >> > >>> = TRUE, sep = "|", quote = "\"'", >> > >>> dec = ".",as.is = TRUE,na.strings = "NA",colClasses = >> NA, >> > >>> nrows = 3864284, >> > >>> skip = 0, check.names = TRUE,fill=TRUE, >> > >>> strip.white = TRUE, blank.lines.skip = TRUE, >> > >>> comment.char = "#", allowEscapes = FALSE, flush = >> FALSE, >> > >>> fileEncoding = "", encoding = "unknown") >> > >>> >> > >>> -- >> > >>> David. >> > >>> >> > >>> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote: >> > >>> >> > >>> >> > >>>> The y_data file has over 9000 rows in it so I thought it would be >> more >> > >>>> practical to give you the file to download.... >> > >>>> -- >> > >>>> View this message in context: >> > >>>> >> > >> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html >> > >>>> Sent from the R help mailing list archive at Nabble.com. >> > >>>> >> > >>>> ______________________________________________ >> > >>>> R-help@r-project.org mailing list >> > >>>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>>> PLEASE do read the posting guide >> > >>>> http://www.R-project.org/posting-guide.html >> > >>>> and provide commented, minimal, self-contained, reproducible code. >> > >>>> >> > >>> >> > >>> David Winsemius, MD >> > >>> West Hartford, CT >> > >>> >> > >>> ______________________________________________ >> > >>> R-help@r-project.org mailing list >> > >>> https://stat.ethz.ch/mailman/listinfo/r-help >> > >>> PLEASE do read the posting guide >> > >>> http://www.R-project.org/posting-guide.html >> > >>> and provide commented, minimal, self-contained, reproducible code. >> > >>> >> > >> >> > >> David Winsemius, MD >> > >> West Hartford, CT >> > >> >> > >> >> > > >> > > [[alternative HTML version deleted]] >> > > >> > > ______________________________________________ >> > > R-help@r-project.org mailing list >> > > https://stat.ethz.ch/mailman/listinfo/r-help >> > > PLEASE do read the posting guide >> > http://www.R-project.org/posting-guide.html >> > > and provide commented, minimal, self-contained, reproducible code. >> > > >> > >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.