Eric agreed   Solr + Nutch solution was proposed by myself and had never used
these technologies, this is first time i handle these 2.   My initial
response to client's requirments were to try to work out existing industry
tools and then modify it according to client requirements instead of
re-inventing the wheel. I start from 0 to this point and was not even aware
Sole can handle this sort of requirement . 

Now all infrastructure is there crawler + index and a app to make searches,
its just this base requirement to fullfill.   At the moment i am moving in
dark to configure Solr to handle this requirements.   Here is what I am
thinking to do

Develop a filter which is called in search time for a field that will hold
all tokens for the page.   it will determine how many tokens (words) match
with criteria words  and what are remaining tokens.   get the total number
of tokens for a document and produce the % of matched and unmatched ratio.

Not sure above solution will work. so need suggestions






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094953.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to