Considering your instructions: #Define words to find to.find <- c( 'the', 'is', 'are' ,'dr') #Read in the file... file.text <- readLines( 'data/letter.txt' ) #Count number of occurnces of deined word in text line.matches <- unlist( lapply( to.find, grep, x = unlist(file.text[2]) ) )
Result: > line.matches [1] 1 1 1 This is not right of course as there are actually four words and secondly becasue the searched words appear multiple times. I think the problem is that the file.text is coming in so that file.text[2] <-""\tHello sir, I write to you seeking your guidance organizing some data. I have a ....." So its reading the document its just putting them into this type of format. Im stuck, i tried doing it by saving the doc to a csv and searching strings, tried using a match process. It would also be useful to simply get a run down similar to a summary expressing the most common words. Ideas? cls59 wrote: > > > PDXRugger wrote: >> >> Howdy Y'all, >> >> So i am looking to read a word document in the following formats(.doc) or >> any type of accessible word processor software (e.g. text .txt, notepad, >> etc). Had the ability to search certain words, for instance "banana", >> "peacock","Weapons" "Mass" "Destruction". Then i could summarize and >> view the results. i looked and the only thing i could find was the below >> where i want to analyze "letter.doc" and look for the words mentioned in >> quotes above. Its aparently wrong but im wondering if this is even >> possible. Please advise. Thanks >> >> In Solidarity >> JR >> > > Well... you could make a vector of the words you want to find: > > to.find <- c( 'banana', 'peacock', 'Weapons' ) > > Read in the file... > > file.text <- readLines( 'myFile.txt' ) > > And recursively apply the grep command in order to determine which lines > contain matches for your words: > > line.matches <- unlist( lapply( to.find, grep, x = file.text ) ) > > It may do what you want for plain text files, as for Microsoft Word > files... well... > > Sometimes there is a price to pay for using a closed proprietary binary > document format. > > Good luck! > > -Charlie > -- View this message in context: http://www.nabble.com/reading-and-analyzing-a-word-document-tp25691972p25692751.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.