I too have this problem. Everything worked fine last year, but after updating R and packages I can no longer do word stemming. Unfortunately, I didn't save the old binaries, otherwise I would just revert back.
Hoping someone finds a solution for R on Windows. Thanks! There is a potential solution for R on Mac OS from Kurt Hornik copied below, but I cannot get this to work on Windows. Here's the code I'm running: #1) Using package Snowball library(Snowball) source <- readLines(system.file("words", "porter","voc.txt",package = "Snowball")) result <- SnowballStemmer(source) #2) Using package tm library(tm) data("crude") stemDocument(crude[[1]]) In both instances I got a Java error "Could not initialize the GenericPropertiesCreator. This exception was produced: java.lang.NullPointerException". After receiving this error once in the session, no further error messages are generated. However, SnowballStemmer() and stemDocument() return the original unstemmed text. Possible Solution: For those on Mac OS, Kurt Hornik wrote... These issues seem to be specific to Mac OS X. Recent versions of Weka have added a package management system not unlike R's, to the effect that now when external packages (or the Snowball jar) is loaded their KnowledgeFlow GUI is started, which in turn requires AWT---and from what I understand, this does not work on Mac OS X. Short term, you should be able to Sys.setenv("NOAWT", "true"). More long term, the Weka maintainers have patched their upstream code so that it is possible to turn off the dynamic class discovery altogether, but I have not found the time to test this ... I realize this solution was for Mac OS, but not knowing anything about rJava I tried this on Windows anyways resulting in "Error in Sys.setenv("NOAWT", "true") : all arguments must be named" Here's my session info. R version 2.13.0 Patched (2011-04-21 r55576) Platform: i386-pc-mingw32/i386 (32-bit) (Windows Vista) locale: [1] LC_COLLATE=English_United States.1252 [2] LC_CTYPE=English_United States.1252 [3] LC_MONETARY=English_United States.1252 [4] LC_NUMERIC=C [5] LC_TIME=English_United States.1252 attached base packages: [1] stats graphics grDevices datasets utils methods base other attached packages: [1] Snowball_0.0-7 tm_0.5-6 rcom_2.2-3.1 rscproxy_1.3-1 loaded via a namespace (and not attached): [1] grid_2.13.0 rJava_0.9-0 (same error with multiple older versions) RWeka_0.4-7 RWekajars_3.7.3-1 [5] slam_0.1-22 tools_2.13.0 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.