Find documents that are composed of % words

2013-10-09 Thread shahzad73
Is there a way that in Solr Query i find documents that is composed of n number of words. for example here is the list of words - Love - Ice - Cream - Sunny - I - To - A - On - Elephant - Balloon And a percentage such as: 80% Let’s assume you’re analyzing the text of the following sentence.

Re: Find documents that are composed of % words

2013-10-09 Thread shahzad73
Please help me formulate the query that will be easy or do i have to build a custom filter for this ? Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094372.html Sent from the Solr - User mailing list archive at

Re: Find documents that are composed of % words

2013-10-09 Thread shahzad73
my client has a strange requirement, he will give a list of 500 words and then set a percentage like 80% now he want to find those pages or documents which consist of the only those 80% of 500 and only 20% unknown. like we have this document word1 word2

Re: Find documents that are composed of % words

2013-10-10 Thread shahzad73
No did not get it unfortunately how this will help meexplain a bit in details -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094630.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find documents that are composed of % words

2013-10-10 Thread shahzad73
Yes the correct is answer may be "Why" but you cannot ask this to client. He think there is something interesting with this formula and if it works we can index websites with Nutch + Solrand let users input queries that can locate documents which has % of foreign words other than list pr

Re: Find documents that are composed of % words

2013-10-10 Thread shahzad73
is there a way that i build a plugin that gets all words on a single page and build a percentage to see how many words are foreign on the page (words not on the search list) -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264

Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73
Eric agreed Solr + Nutch solution was proposed by myself and had never used these technologies, this is first time i handle these 2. My initial response to client's requirments were to try to work out existing industry tools and then modify it according to client requirements instead of re-inve

Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73
Aloke Ghoshal i'm trying to work out your equation. i am using standard scheme provided by nutch for solr and not aware of how to calculate myfieldwordcount in first query.no idea where this count will come from. is there any filter that will store number of tokens generated for a speci