Eric agreed Solr + Nutch solution was proposed by myself and had never used these technologies, this is first time i handle these 2. My initial response to client's requirments were to try to work out existing industry tools and then modify it according to client requirements instead of re-inventing the wheel. I start from 0 to this point and was not even aware Sole can handle this sort of requirement .
Now all infrastructure is there crawler + index and a app to make searches, its just this base requirement to fullfill. At the moment i am moving in dark to configure Solr to handle this requirements. Here is what I am thinking to do Develop a filter which is called in search time for a field that will hold all tokens for the page. it will determine how many tokens (words) match with criteria words and what are remaining tokens. get the total number of tokens for a document and produce the % of matched and unmatched ratio. Not sure above solution will work. so need suggestions -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094953.html Sent from the Solr - User mailing list archive at Nabble.com.