As usual, careful reading of the relevant Help page would resolve the confusion.
from ?factor: "factor(x, exclude = NULL) applied to a factor without NAs is a no-operation unless there are unused levels: in that case, a factor with the reduced level set is returned. If exclude is used, since R version 3.4.0, excluding non-existing character levels is equivalent to excluding nothing, and when excludeis a character vector, that is applied to the levels of x. Alternatively, excludecan be factor with the same level set as x and will exclude the levels present in exclude." In, subsetting a factor does not change the levels attribute, even if some levels are not present. One must explicitly remove them, e.g.: > f <- factor(letters[1:3]) ## 3 levels, all present > f[1:2] [1] a b Levels: a b c ## 3 levels, but one empty > factor(f[1:2], exclude = NULL) [1] a b Levels: a b ## Now only two levels Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Fri, Nov 16, 2018 at 7:38 AM Bill Poling <bill.pol...@zelis.com> wrote: > > Hello: > > I am running windows 10 -- R3.5.1 -- RStudio Version 1.1.456 > > I would like to know why when I replace a column value it still appears in > subsequent routines: > > My example: > > r1$B1 is a Factor: It is created from the first character of a list of CPT > codes, r1$CPT. > > head(r1$CPT, N= 25) > [1] A4649 A4649 C9359 C1713 A0394 A0398 > 903 Levels: 00000 00001 00140 00160 00670 00810 00940 01400 01470 01961 01968 > 10160 11000 11012 11042 11043 11044 11045 11100 11101 11200 11201 11401 11402 > ... l8699 > > str(r1$CPT) > Factor w/ 903 levels "00000","00001",..: 773 773 816 783 739 741 743 739 739 > 741 ... > > > And I want only those CPT's with leading alpha char in this column so I set > the numeric leading char to Z > > r1$B1 <- str_sub(r1$CPT,1,1) > > r1$B1 <- as.factor(r1$B1) #Redundant > levels(r1$B1)[levels(r1$B1) %in% c('1','2','3','4','5','6','7','8','9','0')] > <- 'Z' > > When I check what I have done I find l & L > > unique(r1$B1) > #[1] A C Z L G Q U J V E S l D P > #Levels: Z A C D E G J l L P Q S U V > > So I change l to L > r1$B1[r1$B1 == 'l'] <- 'L' > > When I check again I have l & L but l = 0 > table(r1$B1) > # Z A C D E G J l L > P Q S U V > #19639 1673 546 2 8 147 281 0 664 1 64 36 > 114 14 > > When I go to find those rows as if they existed, they are not accounted for? > > tmp <- subset(r1, B1 == "l") > print(tmp) > Empty data.table (0 rows) of 9 cols: > SavingsReversed,productID,ProviderID,PatientGender,ModCnt,Editnumber2... > > And I have actually visually inspected the whole darn column, sheesh! > > So I ignore it temporarily. > > Now later on it resurfaces in a tutorial I am following for caret pkg. > > preProcess(r1b, method = c("center", "scale"), > thresh = 0.95, pcaComp = NULL, na.remove = TRUE, k = 5, > knnSummary = mean, outcome = NULL, fudge = 0.2, numUnique = 3, > verbose = FALSE, freqCut = 95/5, uniqueCut = 10, cutoff = 0.9, > rangeBounds = c(0, 1)) > # Warning in preProcess.default(r1b, method = c("center", "scale"), thresh = > 0.95, : > # These variables have zero variances: B1l > <-------------yes this is a remnant of the r1$B1 clean-up > # Created from 23141 samples and 22 variables > # > # Pre-processing: > # - centered (22) > # - ignored (0) > # - scaled (22) > > > So my questions are, in consideration of regression modelling accuracy: > > Why is this happening? > How do I remove it? > Or is it irrelevant and leave it be? > > As always, thank you for you support. > > WHP > > > > > > > > > > > > > Confidentiality Notice This message is sent from Zelis. ...{{dropped:13}} > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.