Re: Find documents that are composed of % words

2013-10-16 Thread Aloke Ghoshal
Hi Shahzad, Personally I am of the same opinion as others who have replied, that you are better off going back to your clients at this stage itself, with all the new found info/data points. Further, to the questions that you put to me directly: 1) For option 1, as indicated earlier, you have to

Re: Find documents that are composed of % words

2013-10-14 Thread Chris Hostetter
: bq: but you cannot ask this to client. : : You _can_ ask this of a client. IMO you are obligated to. +1. >> When you are given a requirement/request from your client, >> always verify that you aren't dealing with an XY Problem: >> http://people.apache.org/~hossman/#xyproblem ... >> Don'

Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73
Aloke Ghoshal i'm trying to work out your equation. i am using standard scheme provided by nutch for solr and not aware of how to calculate myfieldwordcount in first query.no idea where this count will come from. is there any filter that will store number of tokens generated for a speci

Re: Find documents that are composed of % words

2013-10-11 Thread shahzad73
Eric agreed Solr + Nutch solution was proposed by myself and had never used these technologies, this is first time i handle these 2. My initial response to client's requirments were to try to work out existing industry tools and then modify it according to client requirements instead of re-inve

Re: Find documents that are composed of % words

2013-10-11 Thread Erick Erickson
bq: but you cannot ask this to client. You _can_ ask this of a client. IMO you are obligated to. A gentle way to do that is say something like: "Solr doesn't do that out-of-the-box. I estimate it will take me XXX weeks to implement that in custom code. I will be unable to make progress on feature

Re: Find documents that are composed of % words

2013-10-10 Thread shahzad73
is there a way that i build a plugin that gets all words on a single page and build a percentage to see how many words are foreign on the page (words not on the search list) -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264

Re: Find documents that are composed of % words

2013-10-10 Thread Jack Krupansky
:03 PM To: solr-user@lucene.apache.org Subject: Re: Find documents that are composed of % words Yes the correct is answer may be "Why" but you cannot ask this to client. He think there is something interesting with this formula and if it works we can index websites with Nutch + Solr

Re: Find documents that are composed of % words

2013-10-10 Thread shahzad73
Yes the correct is answer may be "Why" but you cannot ask this to client. He think there is something interesting with this formula and if it works we can index websites with Nutch + Solrand let users input queries that can locate documents which has % of foreign words other than list pr

Re: Find documents that are composed of % words

2013-10-10 Thread Upayavira
Right - aside from the interesting intellectual exercise, the correct question to ask is, "why?" Why would you want to do this? What's the benefit, and is there a way of doing it that is more in keeping with how Solr has been designed? Upayavira On Thu, Oct 10, 2013, at 01:17 PM, Erick Erickson

Re: Find documents that are composed of % words

2013-10-10 Thread Aloke Ghoshal
Something you could do via function queries. Performance (for 500+ words) is a doubtful. 1) With a separate float field (myfieldwordcount) that holds the count of words from your query field (myfield): http://localhost:8983/solr/collection1/select?wt=xml&indent=true&defType=func &fl=id,myfield &q

Re: Find documents that are composed of % words

2013-10-10 Thread Erick Erickson
Just to add my $0.02. Often this kind of thing is a mistaken assumption on the part of the client that they know how to score documents better than the really bright people who put a lot of time and energy into scoring (note, I'm _certainly_ not one of those people!). I'll often, instead of making

Re: Find documents that are composed of % words

2013-10-10 Thread Upayavira
On Wed, Oct 9, 2013, at 02:45 PM, shahzad73 wrote: > my client has a strange requirement, he will give a list of 500 words > and > then set a percentage like 80% now he want to find those pages or > documents which consist of the only those 80% of 500 and only 20% > unknown. > like we

Re: Find documents that are composed of % words

2013-10-10 Thread Upayavira
On Wed, Oct 9, 2013, at 02:45 PM, shahzad73 wrote: > my client has a strange requirement, he will give a list of 500 words > and > then set a percentage like 80% now he want to find those pages or > documents which consist of the only those 80% of 500 and only 20% > unknown. > like we

Re: Find documents that are composed of % words

2013-10-10 Thread Furkan KAMACI
Hi; Your question seems like an example of minimum should match feature and Aloke has answered it. However I've wanted to mention about dedup mechanism at Solr (http://wiki.apache.org/solr/Deduplication) if *mm* parameter is not you are looking for and if you want to do something more special. Ded

Re: Find documents that are composed of % words

2013-10-10 Thread shahzad73
No did not get it unfortunately how this will help meexplain a bit in details -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094630.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Find documents that are composed of % words

2013-10-09 Thread Furkan KAMACI
Are you asking something like that: http://wiki.apache.org/solr/TextProfileSignature 9 Ekim 2013 Çarşamba tarihinde shahzad73 adlı kullanıcı şöyle yazdı: > Please help me formulate the query that will be easy or do i have to build a > custom filter for this ? > > Shahzad > > > > -- > View this m

Re: Find documents that are composed of % words

2013-10-09 Thread shahzad73
my client has a strange requirement, he will give a list of 500 words and then set a percentage like 80% now he want to find those pages or documents which consist of the only those 80% of 500 and only 20% unknown. like we have this document word1 word2

Re: Find documents that are composed of % words

2013-10-09 Thread shahzad73
Please help me formulate the query that will be easy or do i have to build a custom filter for this ? Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094372.html Sent from the Solr - User mailing list archive at

Re: Find documents that are composed of % words

2013-10-09 Thread Aloke Ghoshal
Hi Shahzad, Have you tried with the Minimum Should Match feature: http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29 Regards, Aloke On Wed, Oct 9, 2013 at 4:55 PM, Otis Gospodnetic wrote: > Hi, > > You can take your words, combine some % of them with AND. Then take

Re: Find documents that are composed of % words

2013-10-09 Thread Otis Gospodnetic
Hi, You can take your words, combine some % of them with AND. Then take another set of them OR it with the previous set, and so on. Otis Solr & ElasticSearch Support http://sematext.com/ On Oct 9, 2013 6:54 AM, "shahzad73" wrote: > Is there a way that in Solr Query i find documents that is