Considering your instructions:

#Define words to find
to.find <- c( 'the', 'is', 'are' ,'dr') 
#Read in the file... 
file.text <- readLines( 'data/letter.txt' ) 
#Count number of occurnces of deined word in text
line.matches <- unlist( lapply( to.find, grep, x = unlist(file.text[2]) ) ) 

Result:
> line.matches 
[1] 1 1 1

This is not right of course as there are actually four words and secondly
becasue the searched words appear multiple times.  

I think the problem is that the file.text is coming in so that file.text[2]
<-""\tHello sir, I write to you seeking your guidance organizing some data. 
I have a ....." So its reading the document its just putting them into this
type of format.  Im stuck, i tried doing it by saving the doc to a csv and
searching strings, tried using a match process.   It would also be useful to
simply get a run down similar to a summary expressing the most common words. 
Ideas? 



cls59 wrote:
> 
> 
> PDXRugger wrote:
>> 
>> Howdy Y'all, 
>> 
>> So i am looking to read a word document in the following formats(.doc) or
>> any type of accessible word processor software (e.g. text .txt, notepad,
>> etc).  Had the ability to search certain words, for instance "banana",
>> "peacock","Weapons" "Mass" "Destruction".  Then i could summarize and
>> view the results.  i looked and the only thing i could find was the below
>> where i want to analyze "letter.doc" and look for the words mentioned in
>> quotes above.  Its aparently wrong but im wondering if this is even
>> possible.  Please advise.  Thanks 
>> 
>> In Solidarity
>> JR
>> 
> 
> Well... you could make a vector of the words you want to find:
> 
> to.find <- c( 'banana', 'peacock', 'Weapons' )
> 
> Read in the file...
> 
> file.text <- readLines( 'myFile.txt' )
> 
> And recursively apply the grep command in order to determine which lines
> contain matches for your words:
> 
> line.matches <- unlist( lapply( to.find, grep, x = file.text ) )
> 
> It may do what you want for plain text files, as for Microsoft Word
> files... well...
> 
> Sometimes there is a price to pay for using a closed proprietary binary
> document format.
> 
> Good luck!
> 
> -Charlie
> 

-- 
View this message in context: 
http://www.nabble.com/reading-and-analyzing-a-word-document-tp25691972p25692751.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to