Re: [R] Fwd: sqldf not joining all the fields

Natalie Van Zuydam Fri, 12 Mar 2010 11:16:03 -0800

Hi,
I did quote the string and as I told read.table to strip.white  I also tried
it with no spaces.
 "ASPIRIN DISP AAH" %in% tsf_data$name
[1] FALSE


"ASPIRINDISPAAH" %in% tsf_data$name
[1] FALSE

I have looked at the last lines of my y_data object and there is a problem
with the file that I'm going to try to use perl to sort out


if I y_data[2847:2848,]
RONIC ACID|TABS|5MG|6.6.2\n652903|ALENDRONIC ACID Once
Wee|TABS|70MG|1.1.1\n653000|AZATHIOPRINE|INJ|50MG|8.2.1\n653200|DORZOLAMIDE
EYE|DROPS|2%|11.6\n653500|TOPIRAMATE|TABS|50MG|4.8.1\n653510|TOPIRAMATE|TABS|100MG|4.8.1\n653513|TOPIRAMATE|TABS|25MG|4.8.1\n653600|GUAR
GUM
SF|SACH|5G|6.1.2.3\n653700|TACALCITOL|OINT|NULL|13.5.2\n654000|COCODAMOL|CAPS|30/500MG|4.7.1\n654010|COCODAMOL|TABS|12.8/500M|4.7.1\n654020|COCODAMOL|SACH|30/500MG|4.7.1\n654300|ACAMPROSATE
CALCIUM|TABS|333MG|4.10\n654400|ACECLOFENAC|TABS|100MG|10.1.1\n654500|TILUDRONIC
ACID|TABS|200MG|6.6.2\n654600|TAMSULOSIN HCL
MR|CAPS|400MCG|7.4.1\n654800|PENCICLOVIR|CREAM|1%|13.10.3\......
     ........formulation_code strength bnf_code
2847              INJ   1MG/ML    3.4.3
2848

The output looks like this so it must be the input files and not an sqldf
problem.  I didn't see this in the script editor I was using.

Natalie

On Fri, Mar 12, 2010 at 7:05 PM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
> Part of the problem is that your string contains spaces; in your example
> below, you would need to quote the string, I believe, as in
>
> "ASPIRIN DISP AAH" %in% y$name                        [untested...]
>
> HTH,
> Dennis
>
>
> On Fri, Mar 12, 2010 at 10:59 AM, Natalie Van Zuydam <nvanzuy...@gmail.com
> > wrote:
>
>> 842752|ASPIRIN DISP AAH|TABS|75MG|2.9  is taken directly from my y_data
>> text
>> file.
>>
>> If I search for ASPIRIN DISP AAH %in% y$name  I get: FALSE.  Despite the
>> fact that it is there in the text file that I loaded into y. There must be
>> a
>> problem with my input.
>>
>> tsf_data <- read.table("feb09_267_presc_items_tsf.txt", header = TRUE, sep
>> =
>> "|", quote = "\"'",
>>                dec = ".",as.is = TRUE,na.strings = "NA",colClasses = NA,
>> nrows = 3864284,
>>                skip = 0, check.names = TRUE,fill=TRUE,
>>                strip.white = TRUE, blank.lines.skip = TRUE,
>>                comment.char = "#", allowEscapes = FALSE, flush = FALSE,
>>                fileEncoding = "", encoding = "unknown")
>>
>> Would fill=TRUE and strip.white=TRUE affect how the item_code columns are
>> matched?  When I look at the file in a script editor I cannot see any
>> strange symbols or formatting?  I have had to use fill otherwise I get the
>> following error message:
>>
>> Warning message:
>> In scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  :
>>  number of items read is not a multiple of the number of columns
>>
>> Not sure if correcting this in read.table is affecting the join function
>> of
>> sqldb?
>>
>> Thanks,
>> Natalie
>>
>>
>>
>> I also tried 842752
>>
>> On Fri, Mar 12, 2010 at 6:05 PM, Gabor Grothendieck <
>> ggrothendi...@gmail.com
>> > wrote:
>>
>> > That is not so.  842752 does not exist in y$item_code and ASPIRIN has
>> > a code of 22730.
>> >
>> > > 842752 %in% y$item_code
>> > [1] FALSE
>> >
>> > > subset(y, name == "ASPIRIN")
>> >    item_code    name formulation_code strength bnf_code
>> > 850     22730 ASPIRIN             TABS    300MG    4.7.1
>> > 855     22780 ASPIRIN              PDR     NULL    4.7.1
>> > 856     22790 ASPIRIN             MIXT        $    4.7.1
>> >
>> >
>> > On Fri, Mar 12, 2010 at 12:51 PM, Natalie Van Zuydam
>> > <nvanzuy...@gmail.com> wrote:
>> > > ---------- Forwarded message ----------
>> > > From: Natalie Van Zuydam <nvanzuy...@gmail.com>
>> > > Date: Fri, Mar 12, 2010 at 5:49 PM
>> > > Subject: Re: [R] sqldf not joining all the fields
>> > > To: David Winsemius <dwinsem...@comcast.net>
>> > >
>> > >
>> > > Dear David
>> > >
>> > > I'm not sure what the problem is as for every item code there is a
>> > > corresponding information in the y_data.  For example 842752 from the
>> > x_data
>> > > corresponds to Aspirin in the y_data?  Yet when I use sqldf to join
>> the
>> > two
>> > > df's I get NA values in the columns from the y_data in z for 842752
>> item
>> > > code....is there something wrong with my sqldf code or something wrong
>> > with
>> > > the way I have inputed the data frames?
>> > >
>> > > Thanks for taking the time to help me,
>> > > Natalie
>> > >
>> > >
>> > >
>> > > On Fri, Mar 12, 2010 at 5:42 PM, David Winsemius <
>> dwinsem...@comcast.net
>> > >wrote:
>> > >
>> > >> If I assign the file input to y_data and change you sqldf to
>> > >>
>> > >> > z  <- sqldf("select * from x_data left join y_data using
>> > (item_code)"); z
>> > >>
>> > >> I can replicate your result. Even after changing the types of the two
>> > >> item_code fields to match I still get the same result and when I see
>> to
>> > what
>> > >> degree they share values I get:
>> > >>
>> > >> > sum(x_data$item_code %in% y_data$item_code)
>> > >> [1] 2
>> > >> > sum(y_data$item_code %in% x_data$item_code)
>> > >> [1] 2
>> > >>
>> > >>
>> > >> So why are you so sure they are "complete" as you claimed in your
>> first
>> > >> email.
>> > >>
>> > >> --
>> > >> David.
>> > >>
>> > >>
>> > >>
>> > >> On Mar 12, 2010, at 12:29 PM, David Winsemius wrote:
>> > >>
>> > >>  You have now given two different assignments to x_data and none to
>> > y_data:
>> > >>>
>> > >>> The str( from the file access offering:
>> > >>>
>> > >>> > str(x_data)
>> > >>> 'data.frame':   2848 obs. of  5 variables:
>> > >>> $ item_code       : int  100 110 150 160 161 164 200 210 212 220 ...
>> > >>> $ name            : chr  "NEONACLEX K" "NEONACLEX" "MESORB"
>> "ABSORBENT
>> > >>> CELLULOSE MESO" ...
>> > >>> $ formulation_code: chr  "TABS" "TABS" "DRESS" "DRESS" ...
>> > >>> $ strength        : chr  "NULL" "5MG" "10CMX10CM" "10CMX10CM" ...
>> > >>> $ bnf_code        : chr  "2.2.8" "2.2.1" "20.3.1" "20.3.1" ...
>> > >>>
>> > >>> The str from assignment from the dput offering
>> > >>> > str(x_data)
>> > >>> 'data.frame':   10 obs. of  10 variables:
>> > >>> $ prochi         : chr  "CAO0000713" "CAO0000713" "CAO0000713"
>> > >>> "CAO0000713" ...
>> > >>> $ prescribed_date: chr  "22/06/2001" "28/04/2000" "10/04/2000"
>> > >>> "03/07/2000" ...
>> > >>> $ dataMonth      : chr  "NULL" "NULL" "NULL" "NULL" ...
>> > >>> $ item_code      : chr  "842752" "7800" "842652" "842652" ...
>> > >>> $ res_seqno      : chr  "NULL" "NULL" "NULL" "NULL" ...
>> > >>> $ quantity       : chr  "60" "100G" "60" "60" ...
>> > >>> $ directions     : chr  "1/D" "A/TD" "1/D" "1/D" ...
>> > >>> $ no_of_packs    : chr  "NULL" "NULL" "NULL" "NULL" ...
>> > >>> $ datasource     : chr  "TSF" "TSF" "TSF" "TSF" ...
>> > >>> $ scan_ref_no    : chr  "NULL" "NULL" "NULL" "NULL" ...
>> > >>>
>> > >>> This code "worked", but it is not clear that the x-y assignments
>> were
>> > >>> correct:
>> > >>>
>> > >>> x_data <- read.table(file="
>> > >>> http://n4.nabble.com/file/n1590804/feb09_267_presc_items_tsf.txt";,
>> > header
>> > >>> = TRUE, sep = "|", quote = "\"'",
>> > >>>              dec = ".",as.is = TRUE,na.strings = "NA",colClasses =
>> NA,
>> > >>> nrows = 3864284,
>> > >>>              skip = 0, check.names = TRUE,fill=TRUE,
>> > >>>              strip.white = TRUE, blank.lines.skip = TRUE,
>> > >>>              comment.char = "#", allowEscapes = FALSE, flush =
>> FALSE,
>> > >>>              fileEncoding = "", encoding = "unknown")
>> > >>>
>> > >>> --
>> > >>> David.
>> > >>>
>> > >>> On Mar 12, 2010, at 12:23 PM, Newbie19_02 wrote:
>> > >>>
>> > >>>
>> > >>>> The y_data file has over 9000 rows in it so I thought it would be
>> more
>> > >>>> practical to give you the file to download....
>> > >>>> --
>> > >>>> View this message in context:
>> > >>>>
>> >
>> http://n4.nabble.com/sqldf-not-joining-all-the-fields-tp1590786p1590833.html
>> > >>>> Sent from the R help mailing list archive at Nabble.com.
>> > >>>>
>> > >>>> ______________________________________________
>> > >>>> R-help@r-project.org mailing list
>> > >>>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>>> PLEASE do read the posting guide
>> > >>>> http://www.R-project.org/posting-guide.html
>> > >>>> and provide commented, minimal, self-contained, reproducible code.
>> > >>>>
>> > >>>
>> > >>> David Winsemius, MD
>> > >>> West Hartford, CT
>> > >>>
>> > >>> ______________________________________________
>> > >>> R-help@r-project.org mailing list
>> > >>> https://stat.ethz.ch/mailman/listinfo/r-help
>> > >>> PLEASE do read the posting guide
>> > >>> http://www.R-project.org/posting-guide.html
>> > >>> and provide commented, minimal, self-contained, reproducible code.
>> > >>>
>> > >>
>> > >> David Winsemius, MD
>> > >> West Hartford, CT
>> > >>
>> > >>
>> > >
>> > >        [[alternative HTML version deleted]]
>> > >
>> > > ______________________________________________
>> > > R-help@r-project.org mailing list
>> > > https://stat.ethz.ch/mailman/listinfo/r-help
>> > > PLEASE do read the posting guide
>> > http://www.R-project.org/posting-guide.html
>> > > and provide commented, minimal, self-contained, reproducible code.
>> > >
>> >
>>
>>        [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide
>> http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>>
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Fwd: sqldf not joining all the fields

Reply via email to