library(sos)
tm <- findFn('text mining')
tm

This produced 15 matches, which you could also find using "RSiteSearch('text mining', 'function')". The difference is that findFn{sos} displays the results in a table sorted to place the package with the most matches first. In this case, there is actually a "Text Mining Package" called "tm". "summary(tm)" says these 15 matches are in 11 packages. The first of the 11 is "FactoMineR".

Hope this helps. Spencer Graves


David Winsemius wrote:

On Oct 1, 2009, at 12:18 AM, cls59 wrote

PDXRugger wrote:

Considering your instructions:

#Define words to find
to.find <- c( 'the', 'is', 'are' ,'dr')
#Read in the file...
file.text <- readLines( 'data/letter.txt' )
#Count number of occurnces of deined word in text
line.matches <- unlist( lapply( to.find, grep, x = unlist(file.text[2]) )
)

Result:
line.matches
[1] 1 1 1

This is not right of course as there are actually four words and secondly
becasue the searched words appear multiple times.



The example I gave was only meant to identify those lines on which matches
occurred. Using x = unlist(file.text[2]) only feeds one line of the file
into the matching routine so the result indicates that all the matches were
on line 1-- the only line present for searching.

If you want to count the individual occurrences of the words on each line,
you may need to look at using a function such as gregexpr. grep only
indicates if a match or matches is present in a line of text-- gregexpr
indicates at which positions those matches occur in the line.

However, you may be getting to the point with this where R is no longer an appropriate tool for this job. R is amazingly flexible it is possible that it can give you what you want. However, R was not designed to perform text
processing-- Perl comes to mind as being a language that was explicitly
designed to perform these sorts of operations.

Perhaps you should use the R-search facilities for such questions:

http://finzi.psych.upenn.edu/R/library/tau/html/00Index.html
http://finzi.psych.upenn.edu/views/NaturalLanguageProcessing.html
http://www.jstatsoft.org/v25/i05/

R may not have been designed for text processing, but it is rather amazing how much has been done.




--
Spencer Graves, PE, PhD
President and Chief Operating Officer
Structure Inspection and Monitoring, Inc.
751 Emerson Ct.
San José, CA 95126
ph:  408-655-4567

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to