Re: termFreq always = 1 ?

KLessou Thu, 02 Oct 2008 01:41:54 -0700

Yes, each one is a document.

A real example :


<str name="q">k1_en:men</str>

<doc>
    <float name="score">0.81426066</float>
...
    <str name="id">846</str>
...
    <str name="k1_en">

;arm;arms;elbow;elbows;man;men;male;males;indoors;one;person;Men's;moods;
    </str>
...
</doc>

...

 <doc>
 <float name="score">0.6232885</float>

...

  <str name="id">652</str>

  <str name="k1_en">
      
;portrait;portraits;young;adult;young;adults;*man*;*men*;male;males;male;males;young;*men*;young;*man*;identity;identities;self-confidence;assertiveness;male;beauty;masculine;beauty;*men's*;beauty;indoors;inside;day;daytime;one;person;one;individual;northern;european;caucasian
  </str>

...
 </doc>


.;.

<lst name="explain">
  <str name="846">
0.81426066 = (MATCH) weight(k1_en:men in 35050), product of:
  0.99999994 = queryWeight(k1_en:men), product of:
    2.3030772 = idf(docFreq=17576, numDocs=64694)
    0.43420166 = queryNorm
  0.8142607 = (MATCH) fieldWeight(k1_en:men in 35050), product of:
    *1.4142135 = tf(termFreq(k1_en:men)=2)*
    2.3030772 = idf(docFreq=17576, numDocs=64694)
    0.25 = fieldNorm(field=k1_en, doc=35050)
</str>
 ...
<str name="652">
0.62328845 = (MATCH) weight(k1_en:men in 13312), product of:
  0.99999994 = queryWeight(k1_en:men), product of:
    2.3030772 = idf(docFreq=17576, numDocs=64694)
    0.43420166 = queryNorm
  0.6232885 = (MATCH) fieldWeight(k1_en:men in 13312), product of:
    *1.7320508 = tf(termFreq(k1_en:men)=3)*
    2.3030772 = idf(docFreq=17576, numDocs=64694)
    0.15625 = fieldNorm(field=k1_en, doc=13312)
</str>
...

You can see here for the first document termFreq = 2 and for the
second document termFreq = 3 ...

And I would like to have termFreq = 1 in each case for this field (k1_en).

Thanks for in advance your help,







On Wed, Oct 1, 2008 at 8:45 PM, Otis Gospodnetic <[EMAIL PROTECTED]
> wrote:

> In each of your examples (is each one a documen?) I see only 1 "men"
> instance, so "men" term frequency should be 1 for that document.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: KLessou <[EMAIL PROTECTED]>
> > To: [email protected]
> > Sent: Wednesday, October 1, 2008 11:43:59 AM
> > Subject: Re: termFreq always = 1 ?
> >
> > Yes this may be my problem,
> >
> > But is there any solution to have only one "men" keyword indexed when
> i''ve
> > got something like this :
> >
> > 1 - k1_en = men;business;Men
> > or :
> > 2 - k1_en = man,business,men
> > or :
> > 3 - k1_en = Man,men,business,Men,man
> > ...
> >
> > Thx in advance,
> >
> > On Wed, Oct 1, 2008 at 5:12 PM, Otis Gospodnetic
> > > wrote:
> >
> > > Hi,
> > >
> > > Note that RemoveDuplicatesTokenFilterFactory "filters out any tokens
> which
> > > are at the same logical position in the tokenstream as a previous token
> with
> > > the same text."
> > >
> > > So if you have "men in black are real men" then
> > > RemoveDuplicatesTokenFilterFactory will not remove duplicate "men".
> > >
> > > This may or may not be your problem.
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > >
> > > ----- Original Message ----
> > > > From: KLessou
> > > > To: [email protected]
> > > > Sent: Wednesday, October 1, 2008 9:48:28 AM
> > > > Subject: termFreq always = 1 ?
> > > >
> > > > Hi,
> > > >
> > > > I want to index a list of keywords.
> > > >
> > > > When I search "k1_en:men", I find a lot of documents like that :
> > > >
> > > > DocA :
> > > > (k1_en = man;men;Men;business... termFreq=2)
> > > > DocB :
> > > > (k1_en = man;Men;business... termFreq=1)
> > > > DocC :
> > > > ...
> > > > DocD :
> > > > ...
> > > > DocE :
> > > > ...
> > > >
> > > > But I don't want to have a different termFreq for DocA & DocB.
> > > >
> > > > I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help
> me
> > > :-/
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ignoreCase="true"/>
> > > >
> > > > protected="protwords.txt" />
> > > >
> > > >
> > > >
> > > >
> > > >                     generateWordParts="0"
> > > >                     generateNumberParts="0"
> > > >                     catenateWords="0"
> > > >                     catenateNumbers="0"
> > > >                     catenateAll="0"
> > > >                     />
> > > >
> > > >
> > > >
> > > >
> > > > />
> > > >
> > > >
> > > >
> > > >
> > > > ignoreCase="true"/>
> > > >
> > > > protected="protwords.txt" />
> > > >
> > > >
> > > >
> > > >                     generateWordParts="0"
> > > >                     generateNumberParts="0"
> > > >                     catenateWords="0"
> > > >                     catenateNumbers="0"
> > > >                     catenateAll="0"
> > > >                     />
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ...
> > > >
> > > >
> > > >
> > > > required="false" />
> > > >
> > > >
> > > > If you have any idea, thx in advance.
> > > >
> > > > --
> > > > ~~~~~
> > > > | klessou |
> > > > ~~~~~
> > >
> > >
> >
> >
> > --
> > ~~~~~
> > | klessou |
> > ~~~~~
>
>


-- 
~~~~~
| klessou |
~~~~~

Re: termFreq always = 1 ?

Reply via email to