termFreq always = 1 ?
Hi, I want to index a list of keywords. When I search "k1_en:men", I find a lot of documents like that : DocA : (k1_en = man;men;Men;business... termFreq=2) DocB : (k1_en = man;Men;business... termFreq=1) DocC : ... DocD : ... DocE : ... But I don't want to have a different termFreq for DocA & DocB. I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help me :-/ ... If you have any idea, thx in advance. -- ~ | klessou | ~
Re: termFreq always = 1 ?
Yes this may be my problem, But is there any solution to have only one "men" keyword indexed when i''ve got something like this : 1 - k1_en = men;business;Men or : 2 - k1_en = man,business,men or : 3 - k1_en = Man,men,business,Men,man ... Thx in advance, On Wed, Oct 1, 2008 at 5:12 PM, Otis Gospodnetic <[EMAIL PROTECTED] > wrote: > Hi, > > Note that RemoveDuplicatesTokenFilterFactory "filters out any tokens which > are at the same logical position in the tokenstream as a previous token with > the same text." > > So if you have "men in black are real men" then > RemoveDuplicatesTokenFilterFactory will not remove duplicate "men". > > This may or may not be your problem. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: KLessou <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Wednesday, October 1, 2008 9:48:28 AM > > Subject: termFreq always = 1 ? > > > > Hi, > > > > I want to index a list of keywords. > > > > When I search "k1_en:men", I find a lot of documents like that : > > > > DocA : > > (k1_en = man;men;Men;business... termFreq=2) > > DocB : > > (k1_en = man;Men;business... termFreq=1) > > DocC : > > ... > > DocD : > > ... > > DocE : > > ... > > > > But I don't want to have a different termFreq for DocA & DocB. > > > > I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help me > :-/ > > > > > > > > > > > > > > > > > > ignoreCase="true"/> > > > > protected="protwords.txt" /> > > > > > > > > > > generateWordParts="0" > > generateNumberParts="0" > > catenateWords="0" > > catenateNumbers="0" > > catenateAll="0" > > /> > > > > > > > > > > /> > > > > > > > > > > ignoreCase="true"/> > > > > protected="protwords.txt" /> > > > > > > > > generateWordParts="0" > > generateNumberParts="0" > > catenateWords="0" > > catenateNumbers="0" > > catenateAll="0" > > /> > > > > > > > > > > > > ... > > > > > > > > required="false" /> > > > > > > If you have any idea, thx in advance. > > > > -- > > ~ > > | klessou | > > ~ > > -- ~ | klessou | ~
Re: termFreq always = 1 ?
Yes, each one is a document. A real example : k1_en:men 0.81426066 ... 846 ... ;arm;arms;elbow;elbows;man;men;male;males;indoors;one;person;Men's;moods; ... ... 0.6232885 ... 652 ;portrait;portraits;young;adult;young;adults;*man*;*men*;male;males;male;males;young;*men*;young;*man*;identity;identities;self-confidence;assertiveness;male;beauty;masculine;beauty;*men's*;beauty;indoors;inside;day;daytime;one;person;one;individual;northern;european;caucasian ... .;. 0.81426066 = (MATCH) weight(k1_en:men in 35050), product of: 0.9994 = queryWeight(k1_en:men), product of: 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.43420166 = queryNorm 0.8142607 = (MATCH) fieldWeight(k1_en:men in 35050), product of: *1.4142135 = tf(termFreq(k1_en:men)=2)* 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.25 = fieldNorm(field=k1_en, doc=35050) ... 0.62328845 = (MATCH) weight(k1_en:men in 13312), product of: 0.9994 = queryWeight(k1_en:men), product of: 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.43420166 = queryNorm 0.6232885 = (MATCH) fieldWeight(k1_en:men in 13312), product of: *1.7320508 = tf(termFreq(k1_en:men)=3)* 2.3030772 = idf(docFreq=17576, numDocs=64694) 0.15625 = fieldNorm(field=k1_en, doc=13312) ... You can see here for the first document termFreq = 2 and for the second document termFreq = 3 ... And I would like to have termFreq = 1 in each case for this field (k1_en). Thanks for in advance your help, On Wed, Oct 1, 2008 at 8:45 PM, Otis Gospodnetic <[EMAIL PROTECTED] > wrote: > In each of your examples (is each one a documen?) I see only 1 "men" > instance, so "men" term frequency should be 1 for that document. > > Otis > -- > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > - Original Message > > From: KLessou <[EMAIL PROTECTED]> > > To: solr-user@lucene.apache.org > > Sent: Wednesday, October 1, 2008 11:43:59 AM > > Subject: Re: termFreq always = 1 ? > > > > Yes this may be my problem, > > > > But is there any solution to have only one "men" keyword indexed when > i''ve > > got something like this : > > > > 1 - k1_en = men;business;Men > > or : > > 2 - k1_en = man,business,men > > or : > > 3 - k1_en = Man,men,business,Men,man > > ... > > > > Thx in advance, > > > > On Wed, Oct 1, 2008 at 5:12 PM, Otis Gospodnetic > > > wrote: > > > > > Hi, > > > > > > Note that RemoveDuplicatesTokenFilterFactory "filters out any tokens > which > > > are at the same logical position in the tokenstream as a previous token > with > > > the same text." > > > > > > So if you have "men in black are real men" then > > > RemoveDuplicatesTokenFilterFactory will not remove duplicate "men". > > > > > > This may or may not be your problem. > > > > > > Otis > > > -- > > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > > > > > > > > > > > > - Original Message > > > > From: KLessou > > > > To: solr-user@lucene.apache.org > > > > Sent: Wednesday, October 1, 2008 9:48:28 AM > > > > Subject: termFreq always = 1 ? > > > > > > > > Hi, > > > > > > > > I want to index a list of keywords. > > > > > > > > When I search "k1_en:men", I find a lot of documents like that : > > > > > > > > DocA : > > > > (k1_en = man;men;Men;business... termFreq=2) > > > > DocB : > > > > (k1_en = man;Men;business... termFreq=1) > > > > DocC : > > > > ... > > > > DocD : > > > > ... > > > > DocE : > > > > ... > > > > > > > > But I don't want to have a different termFreq for DocA & DocB. > > > > > > > > I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help > me > > > :-/ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ignoreCase="true"/> > > > > > > > > protected="protwords.txt" /> > > > > > > > > > > > > > > > > > > > > generateWordParts="0" > > > > generateNumberParts="0" > > > > catenateWords="0" > > > > catenateNumbers="0" > > > > catenateAll="0" > > > > /> > > > > > > > > > > > > > > > > > > > > /> > > > > > > > > > > > > > > > > > > > > ignoreCase="true"/> > > > > > > > > protected="protwords.txt" /> > > > > > > > > > > > > > > > > generateWordParts="0" > > > > generateNumberParts="0" > > > > catenateWords="0" > > > > catenateNumbers="0" > > > > catenateAll="0" > > > > /> > > > > > > > > > > > > > > > > > > > > > > > > ... > > > > > > > > > > > > > > > > required="false" /> > > > > > > > > > > > > If you have any idea, thx in advance. > > > > > > > > -- > > > > ~ > > > > | klessou | > > > > ~ > > > > > > > > > > > > -- > > ~ > > | klessou | > > ~ > > -- ~ | klessou | ~
required keyword in all a document
Hi, I would like to find all documents who contain "France, Flag, French". I've got docs like this one : ... wordA,wordB,france, ... wordA,wordB,flag, ... wordA,wordB,french, ... ... I can't make my query like this : k1_en:(+france +flag +french)^100 OR k2_en:(+france +flag +french)^10 OR k3_en:(+france +flag +french) So, this only way to do what I want is to generate this query : (k1_en:france^100 OR k2_en:france^10 OR k3_en:france) AND (k1_en:flag^100 OR k2_en:flag^10 OR k3_en:flag) AND (k1_en:french^100 OR k2_en:french^10 OR k3_en:french) Is there a better/more simple way to do this ? Thx in advance ! -- ~ | klessou | ~
Re: required keyword in all a document
MultiFieldQueryParser seems to generate what I want : http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/queryParser/MultiFieldQueryParser.html But is there a Php version ? On Mon, Oct 6, 2008 at 1:24 PM, KLessou <[EMAIL PROTECTED]> wrote: > Hi, > > I would like to find all documents who contain "France, Flag, French". > > I've got docs like this one : > > ... > wordA,wordB,france, ... > wordA,wordB,flag, ... > wordA,wordB,french, ... > ... > > > I can't make my query like this : > k1_en:(+france +flag +french)^100 OR k2_en:(+france +flag +french)^10 OR > k3_en:(+france +flag +french) > > So, this only way to do what I want is to generate this query : > > (k1_en:france^100 OR k2_en:france^10 OR k3_en:france) > AND > (k1_en:flag^100 OR k2_en:flag^10 OR k3_en:flag) > AND > (k1_en:french^100 OR k2_en:french^10 OR k3_en:french) > > Is there a better/more simple way to do this ? > > Thx in advance ! > > -- > ~ > | klessou | > ~ > -- ~ | klessou | ~
Re: required keyword in all a document
On Mon, Oct 6, 2008 at 1:24 PM, KLessou <[EMAIL PROTECTED]> wrote: > Hi, > > I would like to find all documents who contain "France, Flag, French". > > I've got docs like this one : > > ... > wordA,wordB,france, ... > wordA,wordB,flag, ... > wordA,wordB,french, ... > ... > > > I can't make my query like this : > k1_en:(+france +flag +french)^100 OR k2_en:(+france +flag +french)^10 OR > k3_en:(+france +flag +french) > because I only get this type of document : ... wordA,wordB,france,flag,french, ... wordA,wordB, ... wordA,wordB, ... ... > > So, this only way to do what I want is to generate this query : > > (k1_en:france^100 OR k2_en:france^10 OR k3_en:france) > AND > (k1_en:flag^100 OR k2_en:flag^10 OR k3_en:flag) > AND > (k1_en:french^100 OR k2_en:french^10 OR k3_en:french) > > Is there a better/more simple way to do this ? > > Thx in advance ! > > -- > ~ > | klessou | > ~ > -- ~ | klessou | ~