Hi all, I have a question about the arules package in R. I hope the example 
tables are readable in your email, otherwise you can view it in the 
question.txt in the attachment.Within the apriori function in the arules 
package, I want the outcome to only contain these two variables in the LHS 
HouseOwnerFlag=0 and HouseOwnerFlag=1. The RHS should only contain attributes 
from the column Product. For instance:   lhs                   rhs              
                      support   confidence lift1 {HouseOwnerFlag=0} => 
{Product=SV 16xDVD M360 Black}          0.2500000 0.2500000 1.0000002 
{HouseOwnerFlag=1} => {Product=Adventure Works 26" 720p}      0.2500000 
0.2500000 1.0000003 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 
Silver}0.1666667 0.3333333 1.3333334 {HouseOwnerFlag=1} => {Product=Contoso 
Coffee Maker 5C E0900} 0.1666667 0.3333333 1.333333So now I use the following: 
rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, 
minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", 
"HouseOwnerFlag=1")))Then I use this to ensure that only the Product column is 
on the RHS: inspect( subset( rules, subset = rhs %pin% "Product=" ) )The 
outcome is like this (for the sake of readability, I omitted the colomns for 
support, lift, confidence):    lhs                                              
                      rhs 1 {ProductKey=153, IncomeGroup=Moderate, 
BrandName=Adventure Works }   => {Product=SV 16xDVD M360 Black} 2 
{ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } => 
{Product=Adventure Works 26" 720p} 3 {BrandName=Southridge Video, 
NumberChildrenAtHome=0 }                => {Product=Litware Wall Lamp E3015 
Silver} 4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 }      
=> {Product=Contoso Coffee Maker 5C E0900} So apparently the LHS is able to 
contain every possible column, not just HouseOwnerFlag like I specified.  I see 
that I can put default="rhs" in the apriori function to prevent this, like so: 
rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, 
minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), 
default="rhs")) Then upon inspecting (without the subset part, just 
inspect(rules), there are far less rules (7) than before but it does indeed 
only containHouseOwnerFlag in the LHS:    lhs                  rhs              
                   support  confidence lift1 {HouseOwnerFlag=0} => 
{MaritalStatus=S}                    0.2500000 0.2500000 1.0000002 
{HouseOwnerFlag=1} => {Gender=M}                           0.2500000 0.2500000 
1.0000003 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0}             0.1666667 
0.3333333 1.3333334 {HouseOwnerFlag=1} => {Gender=M}                           
0.1666667 0.3333333 1.333333However on the RHS there's nothing from the column 
Product in the RHS. So it has no use to inspect it with subset as ofcourse it 
would return null. I tested it several times with different support numbers to 
experiment and see if Product would appear or not, but the 7 same rules remain 
the same.So my question is, how can I specify both the LHS (HouseOwnerFlag) and 
RHS (Product)? What am I doing wrong?You can reproduce this problem by 
downloading this testdataset from the attachment (testdf.txt) or via this 
link:https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 Mind you, I only 
took the first 20 rows from a huge dataset (12 million), so the output here 
won't have the same product names as the example I displayed above. But the 
problem still remains the same. (if you would like to have the entire dataset I 
can email it ofcourse). I want to be able to get only HouseOwnerFlag=0 and/or 
HouseOwnerFlag=1 on the LHS and the column Product on the RHS. I asked this 
question on other forum before, but no response at all unfortunately. Since 
this mailinglist is dedicated to R only I thought you guys might be able to 
help me. Thanks in advance! I look forward to hear from you.Kim                 
                        
sales <- structure(list(ProductCategoryName = structure(c(6L, 6L, 2L, 
                                                 2L, 2L, 7L, 3L, 3L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 2L, 2L, 2L, 2L, 
                                                 2L), .Label = c("Audio", 
"Cameras and camcorders ", "Cell phones", 
                                                                 "Computers", 
"Games and Toys", "Home Appliances", "Music, Movies and Audio Books", 
                                                                 "TV and 
Video"), class = "factor"), ProductSubcategory = structure(c(26L, 
                                                                                
                                                      26L, 11L, 12L, 12L, 21L, 
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 
                                                                                
                                                      27L, 12L, 12L, 12L, 12L, 
12L), .Label = c("Air Conditioners", 
                                                                                
                                                                                
                "Bluetooth Headphones", "Boxed Games", "Camcorders", "Cameras & 
Camcorders Accessories", 
                                                                                
                                                                                
                "Car Video", "Cell phones Accessories", "Coffee Machines", 
"Computers Accessories", 
                                                                                
                                                                                
                "Desktops", "Digital Cameras", "Digital SLR Cameras", "Download 
Games", 
                                                                                
                                                                                
                "Fans", "Home & Office Phones", "Home Theater System", "Lamps", 
                                                                                
                                                                                
                "Laptops", "Microwaves", "Monitors", "Movie DVD", "MP4&MP3", 
                                                                                
                                                                                
                "Printers, Scanners & Fax", "Projectors & Screens", "Recording 
Pen", 
                                                                                
                                                                                
                "Refrigerators", "Smart phones & PDAs ", "Televisions", "Touch 
Screen Phones ", 
                                                                                
                                                                                
                "VCD & DVD", "Washers & Dryers", "Water Heaters"), class = 
"factor"), 
               Product = structure(c(1L, 1L, 2L, 3L, 3L, 4L, 
                                     5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
                                     6L, 6L, 6L, 6L, 6L), .Label = c("Fabrikam 
Refrigerator 4.6CuFt E2800 Grey", 
                                                                                
    "A. Datum Consumer Digital Camera M300 Orange", "Contoso SLR Camera M144 
Gold", "SV DVD Movies E100 Yellow", 
                                                                                
    "The Phone Company Smart phones 160x160 M26 White", "Fabrikam SLR Camera 35 
X358 Gold",
                                                                                
    "WWI Wireless Transmitter and Bluetooth Headphones X250 White"
                                     ), class = "factor"), Region = 
structure(c(30L, 30L, 30L, 
                                                                                
30L, 30L, 30L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
                                                                                
2L, 2L, 2L), .Label = c("Armenia", "Australia", "Bhutan", 
                                                                                
                        "Canada", "China", "France", "Germany", "Germany ", 
"Greece ", 
                                                                                
                        "India", "Iran", "Ireland ", "Italy    ", "Japan", 
"Kyrgyzstan", 
                                                                                
                        "Pakistan", "Poland ", "Portugal", "Russia", 
"Singapore", 
                                                                                
                        "South Korea", "Spain", "Switzerland ", "Syria", 
"Taiwan", 
                                                                                
                        "Thailand", "the Netherlands", "Turkmenistan", "United 
Kingdom", 
                                                                                
                        "United States"), class = "factor"), Age = 
structure(c(32L, 
                                                                                
                                                                               
31L, 30L, 40L, 40L, 36L, NA, NA, NA, NA, NA, NA, NA, NA, 
                                                                                
                                                                               
NA, NA, NA, NA, NA, NA), .Label = c("34", "35", "36", "37", 
                                                                                
                                                                                
                                   "38", "39", "40", "41", "42", "43", "44", 
"45", "46", "47", 
                                                                                
                                                                                
                                   "48", "49", "50", "51", "52", "53", "54", 
"55", "56", "57", 
                                                                                
                                                                                
                                   "58", "59", "60", "61", "62", "63", "64", 
"65", "66", "67", 
                                                                                
                                                                                
                                   "68", "69", "70", "71", "72", "73", "74", 
"75", "76", "77", 
                                                                                
                                                                                
                                   "78", "79", "80", "81", "82", "83", "84", 
"85", "86", "87", 
                                                                                
                                                                                
                                   "88", "89", "90", "91", "92", "93", "94", 
"95", "96", "97", 
                                                                                
                                                                                
                                   "98", "99", "101", "102", "103", "104"), 
class = "factor"), 
               IncomeGroup = structure(c(3L, 3L, 3L, 3L, 3L, 2L, 1L, 1L, 
                                         1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("High", 
                                                                                
                     "Low", "Moderate"), class = "factor"), BrandName = 
structure(c(6L, 
                                                                                
                                                                                
    6L, 1L, 4L, 4L, 12L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 14L, 
                                                                                
                                                                                
    14L, 6L, 6L, 6L, 6L, 6L), .Label = c("A. Datum", "Adventure Works", 
                                                                                
                                                                                
                                         "Adventure Works ", "Contoso", 
"Contoso ", "Fabrikam", "Fabrikam  ", 
                                                                                
                                                                                
                                         "Litware", "Litware ", "Northwind 
Traders", "Proseware", 
                                                                                
                                                                                
                                         "Southridge Video", "Tailspin Toys", 
"The Phone Company", 
                                                                                
                                                                                
                                         "Wide World Importers"), class = 
"factor"), MaritalStatus = structure(c(2L, 
                                                                                
                                                                                
                                                                                
                                 1L, 1L, 1L, 1L, 2L, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, 
                                                                                
                                                                                
                                                                                
                                 NA, NA, NA, NA), .Label = c("M", "S"), class = 
"factor"), 
               Gender = structure(c(1L, 1L, 1L, 1L, 2L, 2L, NA, NA, NA, 
                                    NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA), .Label = c("F", 
                                                                                
            "M"), class = "factor"), TotalChildren = structure(c(3L, 
                                                                                
                                                                 3L, 5L, 4L, 
4L, 6L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                                                                                
                                                                 NA, NA, NA, 
NA), .Label = c("0", "1", "2", "3", "4", "5"), class = "factor"), 
               NumberChildrenAtHome = structure(c(2L, 2L, 3L, 1L, 1L, 1L, 
                                                  NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA), .Label = c("0", 
                                                                                
                                      "1", "2", "3", "4", "5"), class = 
"factor"), Education = structure(c(4L, 
                                                                                
                                                                                
                           4L, 4L, 2L, 2L, 5L, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, 
                                                                                
                                                                                
                           NA, NA, NA, NA), .Label = c("Bachelors", "Graduate 
Degree", 
                                                                                
                                                                                
                                                       "High School", "Partial 
College", "Partial High School"), class = "factor"), 
               Occupation = structure(c(4L, 4L, 4L, 2L, 2L, 5L, NA, NA, 
                                        NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA), .Label = c("Clerical", 
                                                                                
                    "Management", "Manual", "Professional", "Skilled Manual"), 
class = "factor"), 
               HouseOwnerFlag = structure(c(1L, 2L, 2L, 2L, 2L, 1L, NA, 
                                            NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA), .Label = c("0", 
                                                                                
                            "1"), class = "factor"), NumberCarsOwned = 
structure(c(3L, 
                                                                                
                                                                                
   2L, 3L, 3L, 3L, 4L, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
                                                                                
                                                                                
   NA, NA, NA, NA), .Label = c("0", "1", "2", "3", "4"), class = "factor")), 
.Names = c("ProductCategoryName", 
                                                                                
                                                                                
                                                                                
        "ProductSubcategory", "Product", "Region", "Age", "IncomeGroup", 
                                                                                
                                                                                
                                                                                
        "BrandName", "MaritalStatus", "Gender", "TotalChildren", 
"NumberChildrenAtHome", 
                                                                                
                                                                                
                                                                                
        "Education", "Occupation", "HouseOwnerFlag", "NumberCarsOwned"
                                                                                
                                                                                
   ), row.names = c(NA, 20L), class = "data.frame")
Hi all, 

I have a question about the arules package in R. I hope the example tables are 
readable in your email, otherwise you can view it in the question.txt in the 
attachment.

Within the apriori function in the arules package, I want the outcome to only 
contain these two variables in the LHS HouseOwnerFlag=0 and HouseOwnerFlag=1. 
The RHS should only contain attributes from the column Product. For instance:

   lhs                   rhs                                    support   
confidence lift
1 {HouseOwnerFlag=0} => {Product=SV 16xDVD M360 Black}          0.2500000 
0.2500000 1.000000
2 {HouseOwnerFlag=1} => {Product=Adventure Works 26" 720p}      0.2500000 
0.2500000 1.000000
3 {HouseOwnerFlag=0} => {Product=Litware Wall Lamp E3015 Silver}0.1666667 
0.3333333 1.333333
4 {HouseOwnerFlag=1} => {Product=Contoso Coffee Maker 5C E0900} 0.1666667 
0.3333333 1.333333

So now I use the following: 
rules <- apriori(sales, parameter=list(support =0.01, confidence =0.8, 
minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1")))

Then I use this to ensure that only the Product column is on the RHS: 
inspect( subset( rules, subset = rhs %pin% "Product=" ) )

The outcome is like this (for the sake of readability, I omitted the colomns 
for support, lift, confidence):
    lhs                                                                    rhs 
1 {ProductKey=153, IncomeGroup=Moderate, BrandName=Adventure Works }   => 
{Product=SV 16xDVD M360 Black} 
2 {ProductKey=176, MaritalStatus=M, ProductCategoryName=TV and Video } => 
{Product=Adventure Works 26" 720p} 
3 {BrandName=Southridge Video, NumberChildrenAtHome=0 }                => 
{Product=Litware Wall Lamp E3015 Silver} 
4 {HouseOwnerFlag=1, BrandName=Southridge Video, ProductKey=170 }      => 
{Product=Contoso Coffee Maker 5C E0900} 

So apparently the LHS is able to contain every possible column, not just 
HouseOwnerFlag like I specified.  I see that I can put default="rhs" in the 
apriori function to prevent this, like so: 
rules <- apriori(sales, parameter=list(support =0.001, confidence =0.5, 
minlen=2), appearance = list(lhs=c("HouseOwnerFlag=0", "HouseOwnerFlag=1"), 
default="rhs")) 

Then upon inspecting (without the subset part, just inspect(rules), there are 
far less rules (7) than before but it does indeed only contain
HouseOwnerFlag in the LHS:

    lhs                  rhs                                 support  
confidence lift
1 {HouseOwnerFlag=0} => {MaritalStatus=S}                    0.2500000 
0.2500000 1.000000
2 {HouseOwnerFlag=1} => {Gender=M}                           0.2500000 
0.2500000 1.000000
3 {HouseOwnerFlag=0} => {NumberChildrenAtHome=0}             0.1666667 
0.3333333 1.333333
4 {HouseOwnerFlag=1} => {Gender=M}                           0.1666667 
0.3333333 1.333333

However on the RHS there's nothing from the column Product in the RHS. So it 
has no use to inspect it with subset as ofcourse it would return null. I tested 
it several times with different support numbers to experiment and see if 
Product would appear or not, but the 7 same rules remain the same.

So my question is, how can I specify both the LHS (HouseOwnerFlag) and RHS 
(Product)? What am I doing wrong?

You can reproduce this problem by downloading this testdataset from the 
attachment or via this link:
https://www.dropbox.com/s/tax5xalac5xgxtf/testdf.txt?dl=0 
Mind you, I only took the first 20 rows from a huge dataset (12 million), so 
the output here won't have the same product names as the example I displayed 
above. But the problem still remains the same. (if you would like to have the 
entire dataset I can email it ofcourse). I want to be able to get only 
HouseOwnerFlag=0 and/or HouseOwnerFlag=1 on the LHS and the column Product on 
the RHS. 

I asked this question on other forum before, but no response at all 
unfortunately. Since this mailinglist is dedicated to R only I thought you guys 
might be able to help me. 

Thanks in advance! I look forward to hear from you.

Kim
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to