On Mar 12, 2010, at 2:15 PM, Natalie Van Zuydam wrote:
Hi,
I did quote the string and as I told read.table to strip.white I
also tried
it with no spaces.
"ASPIRIN DISP AAH" %in% tsf_data$name
[1] FALSE
"ASPIRINDISPAAH" %in% tsf_data$name
[1] FALSE
I have looked at the last lines of my y_data object and there is a
problem
with the file that I'm going to try to use perl to sort out
> max(tsf_data$item_code)
[1] 123612
Looking at the Firefox version of for this value I see line starting:
123612|MINIJET ADRENALINE 1.5"|INJ|100MCG/ML|2.7.3
... has unmatched '"'
--
David.
if I y_data[2847:2848,]
RONIC ACID|TABS|5MG|6.6.2\n652903|ALENDRONIC ACID Once
Wee|TABS|70MG|1.1.1\n653000|AZATHIOPRINE|INJ|50MG|8.2.1\n653200|
DORZOLAMIDE
EYE|DROPS|2%|11.6\n653500|TOPIRAMATE|TABS|50MG|4.8.1\n653510|
TOPIRAMATE|TABS|100MG|4.8.1\n653513|TOPIRAMATE|TABS|25MG|
4.8.1\n653600|GUAR
GUM
SF|SACH|5G|6.1.2.3\n653700|TACALCITOL|OINT|NULL|13.5.2\n654000|
COCODAMOL|CAPS|30/500MG|4.7.1\n654010|COCODAMOL|TABS|12.8/500M|
4.7.1\n654020|COCODAMOL|SACH|30/500MG|4.7.1\n654300|ACAMPROSATE
CALCIUM|TABS|333MG|4.10\n654400|ACECLOFENAC|TABS|100MG|
10.1.1\n654500|TILUDRONIC
ACID|TABS|200MG|6.6.2\n654600|TAMSULOSIN HCL
MR|CAPS|400MCG|7.4.1\n654800|PENCICLOVIR|CREAM|1%|13.10.3\......
........formulation_code strength bnf_code
2847 INJ 1MG/ML 3.4.3
2848
The output looks like this so it must be the input files and not an
sqldf
problem. I didn't see this in the script editor I was using.
Natalie
On Fri, Mar 12, 2010 at 7:05 PM, Dennis Murphy <djmu...@gmail.com>
wrote:
Hi:
Part of the problem is that your string contains spaces; in your
example
below, you would need to quote the string, I believe, as in
"ASPIRIN DISP AAH" %in% y$name [untested...]
HTH,
Dennis
On Fri, Mar 12, 2010 at 10:59 AM, Natalie Van Zuydam <nvanzuy...@gmail.com
wrote:
842752|ASPIRIN DISP AAH|TABS|75MG|2.9 is taken directly from my
y_data
text
file.
If I search for ASPIRIN DISP AAH %in% y$name I get: FALSE.
Despite the
fact that it is there in the text file that I loaded into y. There
must be
a
problem with my input.
tsf_data <- read.table("feb09_267_presc_items_tsf.txt", header =
TRUE, sep
=
"|", quote = "\"'",
dec = ".",as.is = TRUE,na.strings = "NA",colClasses
= NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = "#", allowEscapes = FALSE, flush =
FALSE,
fileEncoding = "", encoding = "unknown")
Would fill=TRUE and strip.white=TRUE affect how the item_code
columns are
matched? When I look at the file in a script editor I cannot see
any
strange symbols or formatting? I have had to use fill otherwise I
get the
following error message:
Warning message:
In scan(file, what, nmax, sep, dec, quote, skip, nlines,
na.strings, :
number of items read is not a multiple of the number of columns
Not sure if correcting this in read.table is affecting the join
function
of
sqldb?
Thanks,
Natalie
I also tried 842752
On Fri, Mar 12, 2010 at 6:05 PM, Gabor Grothendieck <
ggrothendi...@gmail.com
wrote:
That is not so. 842752 does not exist in y$item_code and ASPIRIN
has
a code of 22730.
842752 %in% y$item_code
[1] FALSE
subset(y, name == "ASPIRIN")
item_code name formulation_code strength bnf_code
850 22730 ASPIRIN TABS 300MG 4.7.1
855 22780 ASPIRIN PDR NULL 4.7.1
856 22790 ASPIRIN MIXT $ 4.7.1
On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam
<nvanzuy...@gmail.com> wrote:
---------- Forwarded message ----------
From: Natalie Van Zuydam <nvanzuy...@gmail.com>
Date: Fri, Mar 12, 2010 at 5:49 PM
Subject: Re: [R] sqldf not joining all the fields
To: David Winsemius <dwinsem...@comcast.net>
Dear David
I'm not sure what the problem is as for every item code there is a
corresponding information in the y_data. For example 842752
from the
x_data
corresponds to Aspirin in the y_data? Yet when I use sqldf to
join
the
two
df's I get NA values in the columns from the y_data in z for
842752
item
code....is there something wrong with my sqldf code or something
wrong
with
the way I have inputed the data frames?
Thanks for taking the time to help me,
Natalie
On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius <
dwinsem...@comcast.net
wrote:
If I assign the file input to y_data and change you sqldf to
z <- sqldf("select * from x_data left join y_data using
(item_code)"); z
I can replicate your result. Even after changing the types of
the two
item_code fields to match I still get the same result and when
I see
to
what
degree they share values I get:
sum(x_data$item_code %in% y_data$item_code)
[1] 2
sum(y_data$item_code %in% x_data$item_code)
[1] 2
So why are you so sure they are "complete" as you claimed in your
first
email.
--
David.
On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:
You have now given two different assignments to x_data and none
to
y_data:
The str( from the file access offering:
str(x_data)
'data.frame': 2848 obs. of 5 variables:
$ item_code : int 100 110 150 160 161 164 200 210 212
220 ...
$ name : chr "NEONACLEX K" "NEONACLEX" "MESORB"
"ABSORBENT
CELLULOSE MESO" ...
$ formulation_code: chr "TABS" "TABS" "DRESS" "DRESS" ...
$ strength : chr "NULL" "5MG" "10CMX10CM"
"10CMX10CM" ...
$ bnf_code : chr "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...
The str from assignment from the dput offering
str(x_data)
'data.frame': 10 obs. of 10 variables:
$ prochi : chr "CAO0000713" "CAO0000713" "CAO0000713"
"CAO0000713" ...
$ prescribed_date: chr "22/06/2001" "28/04/2000" "10/04/2000"
"03/07/2000" ...
$ dataMonth : chr "NULL" "NULL" "NULL" "NULL" ...
$ item_code : chr "842752" "7800" "842652" "842652" ...
$ res_seqno : chr "NULL" "NULL" "NULL" "NULL" ...
$ quantity : chr "60" "100G" "60" "60" ...
$ directions : chr "1/D" "A/TD" "1/D" "1/D" ...
$ no_of_packs : chr "NULL" "NULL" "NULL" "NULL" ...
$ datasource : chr "TSF" "TSF" "TSF" "TSF" ...
$ scan_ref_no : chr "NULL" "NULL" "NULL" "NULL" ...
This code "worked", but it is not clear that the x-y assignments
were
correct:
x_data <- read.table(file="
http://n4.nabble.com/file/n1590804/
feb09_267_presc_items_tsf.txt",
header
= TRUE, sep = "|", quote = "\"'",
dec = ".",as.is = TRUE,na.strings =
"NA",colClasses =
NA,
nrows = 3864284,
skip = 0, check.names = TRUE,fill=TRUE,
strip.white = TRUE, blank.lines.skip = TRUE,
comment.char = "#", allowEscapes = FALSE, flush =
FALSE,
fileEncoding = "", encoding = "unknown")
--
David.
On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:
The y_data file has over 9000 rows in it so I thought it
would be
more
practical to give you the file to download....
--
View this message in context:
http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
Sent from the R help mailing list archive at Nabble.com.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible
code.
David Winsemius, MD
West Hartford, CT
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.