On Mar 12, 2010, at 2:15 PM, Natalie Van Zuydam wrote:

Hi,
I did quote the string and as I told read.table to strip.white I also tried
it with no spaces.
"ASPIRIN DISP AAH" %in% tsf_data$name
[1] FALSE

"ASPIRINDISPAAH" %in% tsf_data$name
[1] FALSE

I have looked at the last lines of my y_data object and there is a problem
with the file that I'm going to try to use perl to sort out

> max(tsf_data$item_code)
[1] 123612
Looking at the Firefox version of for this value I see line starting:
123612|MINIJET ADRENALINE 1.5"|INJ|100MCG/ML|2.7.3
... has unmatched '"'

--
David.


if I y_data[2847:2848,]
RONIC ACID|TABS|5MG|6.6.2\n652903|ALENDRONIC ACID Once
Wee|TABS|70MG|1.1.1\n653000|AZATHIOPRINE|INJ|50MG|8.2.1\n653200| DORZOLAMIDE EYE|DROPS|2%|11.6\n653500|TOPIRAMATE|TABS|50MG|4.8.1\n653510| TOPIRAMATE|TABS|100MG|4.8.1\n653513|TOPIRAMATE|TABS|25MG| 4.8.1\n653600|GUAR
GUM
SF|SACH|5G|6.1.2.3\n653700|TACALCITOL|OINT|NULL|13.5.2\n654000| COCODAMOL|CAPS|30/500MG|4.7.1\n654010|COCODAMOL|TABS|12.8/500M| 4.7.1\n654020|COCODAMOL|SACH|30/500MG|4.7.1\n654300|ACAMPROSATE CALCIUM|TABS|333MG|4.10\n654400|ACECLOFENAC|TABS|100MG| 10.1.1\n654500|TILUDRONIC
ACID|TABS|200MG|6.6.2\n654600|TAMSULOSIN HCL
MR|CAPS|400MCG|7.4.1\n654800|PENCICLOVIR|CREAM|1%|13.10.3\......
    ........formulation_code strength bnf_code
2847              INJ   1MG/ML    3.4.3
2848

The output looks like this so it must be the input files and not an sqldf
problem.  I didn't see this in the script editor I was using.

Natalie

On Fri, Mar 12, 2010 at 7:05 PM, Dennis Murphy <djmu...@gmail.com> wrote:

Hi:

Part of the problem is that your string contains spaces; in your example
below, you would need to quote the string, I believe, as in

"ASPIRIN DISP AAH" %in% y$name                        [untested...]

HTH,
Dennis


On Fri, Mar 12, 2010 at 10:59 AM, Natalie Van Zuydam <nvanzuy...@gmail.com
wrote:

842752|ASPIRIN DISP AAH|TABS|75MG|2.9 is taken directly from my y_data
text
file.

If I search for ASPIRIN DISP AAH %in% y$name I get: FALSE. Despite the fact that it is there in the text file that I loaded into y. There must be
a
problem with my input.

tsf_data <- read.table("feb09_267_presc_items_tsf.txt", header = TRUE, sep
=
"|", quote = "\"'",
dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
nrows = 3864284,
              skip = 0, check.names = TRUE,fill=TRUE,
              strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = "#", allowEscapes = FALSE, flush = FALSE,
              fileEncoding = "", encoding = "unknown")

Would fill=TRUE and strip.white=TRUE affect how the item_code columns are matched? When I look at the file in a script editor I cannot see any strange symbols or formatting? I have had to use fill otherwise I get the
following error message:

Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, :
number of items read is not a multiple of the number of columns

Not sure if correcting this in read.table is affecting the join function
of
sqldb?

Thanks,
Natalie



I also tried 842752

On Fri, Mar 12, 2010 at 6:05 PM, Gabor Grothendieck <
ggrothendi...@gmail.com
wrote:

That is not so. 842752 does not exist in y$item_code and ASPIRIN has
a code of 22730.

842752 %in% y$item_code
[1] FALSE

subset(y, name == "ASPIRIN")
  item_code    name formulation_code strength bnf_code
850     22730 ASPIRIN             TABS    300MG    4.7.1
855     22780 ASPIRIN              PDR     NULL    4.7.1
856     22790 ASPIRIN             MIXT        $    4.7.1


On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam
<nvanzuy...@gmail.com> wrote:
---------- Forwarded message ----------
From: Natalie Van Zuydam <nvanzuy...@gmail.com>
Date: Fri, Mar 12, 2010 at 5:49 PM
Subject: Re: [R] sqldf not joining all the fields
To: David Winsemius <dwinsem...@comcast.net>


Dear David

I'm not sure what the problem is as for every item code there is a
corresponding information in the y_data. For example 842752 from the
x_data
corresponds to Aspirin in the y_data? Yet when I use sqldf to join
the
two
df's I get NA values in the columns from the y_data in z for 842752
item
code....is there something wrong with my sqldf code or something wrong
with
the way I have inputed the data frames?

Thanks for taking the time to help me,
Natalie



On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius <
dwinsem...@comcast.net
wrote:

If I assign the file input to y_data and change you sqldf to

z  <- sqldf("select * from x_data left join y_data using
(item_code)"); z

I can replicate your result. Even after changing the types of the two item_code fields to match I still get the same result and when I see
to
what
degree they share values I get:

sum(x_data$item_code %in% y_data$item_code)
[1] 2
sum(y_data$item_code %in% x_data$item_code)
[1] 2


So why are you so sure they are "complete" as you claimed in your
first
email.

--
David.



On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:

You have now given two different assignments to x_data and none to
y_data:

The str( from the file access offering:

str(x_data)
'data.frame':   2848 obs. of  5 variables:
$ item_code : int 100 110 150 160 161 164 200 210 212 220 ...
$ name            : chr  "NEONACLEX K" "NEONACLEX" "MESORB"
"ABSORBENT
CELLULOSE MESO" ...
$ formulation_code: chr  "TABS" "TABS" "DRESS" "DRESS" ...
$ strength : chr "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
$ bnf_code        : chr  "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...

The str from assignment from the dput offering
str(x_data)
'data.frame':   10 obs. of  10 variables:
$ prochi         : chr  "CAO0000713" "CAO0000713" "CAO0000713"
"CAO0000713" ...
$ prescribed_date: chr  "22/06/2001" "28/04/2000" "10/04/2000"
"03/07/2000" ...
$ dataMonth      : chr  "NULL" "NULL" "NULL" "NULL" ...
$ item_code      : chr  "842752" "7800" "842652" "842652" ...
$ res_seqno      : chr  "NULL" "NULL" "NULL" "NULL" ...
$ quantity       : chr  "60" "100G" "60" "60" ...
$ directions     : chr  "1/D" "A/TD" "1/D" "1/D" ...
$ no_of_packs    : chr  "NULL" "NULL" "NULL" "NULL" ...
$ datasource     : chr  "TSF" "TSF" "TSF" "TSF" ...
$ scan_ref_no    : chr  "NULL" "NULL" "NULL" "NULL" ...

This code "worked", but it is not clear that the x-y assignments
were
correct:

x_data <- read.table(file="
http://n4.nabble.com/file/n1590804/ feb09_267_presc_items_tsf.txt",
header
= TRUE, sep = "|", quote = "\"'",
dec = ".",as.is = TRUE,na.strings = "NA",colClasses =
NA,
nrows = 3864284,
            skip = 0, check.names = TRUE,fill=TRUE,
            strip.white = TRUE, blank.lines.skip = TRUE,
            comment.char = "#", allowEscapes = FALSE, flush =
FALSE,
            fileEncoding = "", encoding = "unknown")

--
David.

On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:


The y_data file has over 9000 rows in it so I thought it would be
more
practical to give you the file to download....
--
View this message in context:


http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to