termFreq always = 1 ?

2008-10-01 Thread KLessou
Hi,

I want to index a list of keywords.

When I search "k1_en:men", I find a lot of documents like that :

DocA :
(k1_en = man;men;Men;business... termFreq=2)
DocB :
(k1_en = man;Men;business... termFreq=1)
DocC :
...
DocD :
...
DocE :
...

But I don't want to have a different termFreq for DocA & DocB.

I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help me :-/






























...





If you have any idea, thx in advance.

-- 
~
| klessou |
~


Re: termFreq always = 1 ?

2008-10-01 Thread KLessou
Yes this may be my problem,

But is there any solution to have only one "men" keyword indexed when i''ve
got something like this :

1 - k1_en = men;business;Men
or :
2 - k1_en = man,business,men
or :
3 - k1_en = Man,men,business,Men,man
...

Thx in advance,

On Wed, Oct 1, 2008 at 5:12 PM, Otis Gospodnetic <[EMAIL PROTECTED]
> wrote:

> Hi,
>
> Note that RemoveDuplicatesTokenFilterFactory "filters out any tokens which
> are at the same logical position in the tokenstream as a previous token with
> the same text."
>
> So if you have "men in black are real men" then
> RemoveDuplicatesTokenFilterFactory will not remove duplicate "men".
>
> This may or may not be your problem.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: KLessou <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, October 1, 2008 9:48:28 AM
> > Subject: termFreq always = 1 ?
> >
> > Hi,
> >
> > I want to index a list of keywords.
> >
> > When I search "k1_en:men", I find a lot of documents like that :
> >
> > DocA :
> > (k1_en = man;men;Men;business... termFreq=2)
> > DocB :
> > (k1_en = man;Men;business... termFreq=1)
> > DocC :
> > ...
> > DocD :
> > ...
> > DocE :
> > ...
> >
> > But I don't want to have a different termFreq for DocA & DocB.
> >
> > I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help me
> :-/
> >
> >
> >
> >
> >
> >
> >
> >
> > ignoreCase="true"/>
> >
> > protected="protwords.txt" />
> >
> >
> >
> >
> > generateWordParts="0"
> > generateNumberParts="0"
> > catenateWords="0"
> > catenateNumbers="0"
> > catenateAll="0"
> > />
> >
> >
> >
> >
> > />
> >
> >
> >
> >
> > ignoreCase="true"/>
> >
> > protected="protwords.txt" />
> >
> >
> >
> > generateWordParts="0"
> > generateNumberParts="0"
> > catenateWords="0"
> > catenateNumbers="0"
> > catenateAll="0"
> > />
> >
> >
> >
> >
> >
> > ...
> >
> >
> >
> > required="false" />
> >
> >
> > If you have any idea, thx in advance.
> >
> > --
> > ~
> > | klessou |
> > ~
>
>


-- 
~
| klessou |
~


Re: termFreq always = 1 ?

2008-10-02 Thread KLessou
Yes, each one is a document.

A real example :

k1_en:men


0.81426066
...
846
...


;arm;arms;elbow;elbows;man;men;male;males;indoors;one;person;Men's;moods;

...


...

 
 0.6232885

...

  652

  
  
;portrait;portraits;young;adult;young;adults;*man*;*men*;male;males;male;males;young;*men*;young;*man*;identity;identities;self-confidence;assertiveness;male;beauty;masculine;beauty;*men's*;beauty;indoors;inside;day;daytime;one;person;one;individual;northern;european;caucasian
  

...
 


.;.


  
0.81426066 = (MATCH) weight(k1_en:men in 35050), product of:
  0.9994 = queryWeight(k1_en:men), product of:
2.3030772 = idf(docFreq=17576, numDocs=64694)
0.43420166 = queryNorm
  0.8142607 = (MATCH) fieldWeight(k1_en:men in 35050), product of:
*1.4142135 = tf(termFreq(k1_en:men)=2)*
2.3030772 = idf(docFreq=17576, numDocs=64694)
0.25 = fieldNorm(field=k1_en, doc=35050)

 ...

0.62328845 = (MATCH) weight(k1_en:men in 13312), product of:
  0.9994 = queryWeight(k1_en:men), product of:
2.3030772 = idf(docFreq=17576, numDocs=64694)
0.43420166 = queryNorm
  0.6232885 = (MATCH) fieldWeight(k1_en:men in 13312), product of:
*1.7320508 = tf(termFreq(k1_en:men)=3)*
2.3030772 = idf(docFreq=17576, numDocs=64694)
0.15625 = fieldNorm(field=k1_en, doc=13312)

...

You can see here for the first document termFreq = 2 and for the
second document termFreq = 3 ...

And I would like to have termFreq = 1 in each case for this field (k1_en).

Thanks for in advance your help,







On Wed, Oct 1, 2008 at 8:45 PM, Otis Gospodnetic <[EMAIL PROTECTED]
> wrote:

> In each of your examples (is each one a documen?) I see only 1 "men"
> instance, so "men" term frequency should be 1 for that document.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: KLessou <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, October 1, 2008 11:43:59 AM
> > Subject: Re: termFreq always = 1 ?
> >
> > Yes this may be my problem,
> >
> > But is there any solution to have only one "men" keyword indexed when
> i''ve
> > got something like this :
> >
> > 1 - k1_en = men;business;Men
> > or :
> > 2 - k1_en = man,business,men
> > or :
> > 3 - k1_en = Man,men,business,Men,man
> > ...
> >
> > Thx in advance,
> >
> > On Wed, Oct 1, 2008 at 5:12 PM, Otis Gospodnetic
> > > wrote:
> >
> > > Hi,
> > >
> > > Note that RemoveDuplicatesTokenFilterFactory "filters out any tokens
> which
> > > are at the same logical position in the tokenstream as a previous token
> with
> > > the same text."
> > >
> > > So if you have "men in black are real men" then
> > > RemoveDuplicatesTokenFilterFactory will not remove duplicate "men".
> > >
> > > This may or may not be your problem.
> > >
> > > Otis
> > > --
> > > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> > >
> > >
> > >
> > > - Original Message 
> > > > From: KLessou
> > > > To: solr-user@lucene.apache.org
> > > > Sent: Wednesday, October 1, 2008 9:48:28 AM
> > > > Subject: termFreq always = 1 ?
> > > >
> > > > Hi,
> > > >
> > > > I want to index a list of keywords.
> > > >
> > > > When I search "k1_en:men", I find a lot of documents like that :
> > > >
> > > > DocA :
> > > > (k1_en = man;men;Men;business... termFreq=2)
> > > > DocB :
> > > > (k1_en = man;Men;business... termFreq=1)
> > > > DocC :
> > > > ...
> > > > DocD :
> > > > ...
> > > > DocE :
> > > > ...
> > > >
> > > > But I don't want to have a different termFreq for DocA & DocB.
> > > >
> > > > I try RemoveDuplicatesTokenFilterFactory but it doesn't seem to help
> me
> > > :-/
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ignoreCase="true"/>
> > > >
> > > > protected="protwords.txt" />
> > > >
> > > >
> > > >
> > > >
> > > >     generateWordParts="0"
> > > > generateNumberParts="0"
> > > > catenateWords="0"
> > > > catenateNumbers="0"
> > > > catenateAll="0"
> > > > />
> > > >
> > > >
> > > >
> > > >
> > > > />
> > > >
> > > >
> > > >
> > > >
> > > > ignoreCase="true"/>
> > > >
> > > > protected="protwords.txt" />
> > > >
> > > >
> > > >
> > > > generateWordParts="0"
> > > > generateNumberParts="0"
> > > > catenateWords="0"
> > > > catenateNumbers="0"
> > > > catenateAll="0"
> > > > />
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > ...
> > > >
> > > >
> > > >
> > > > required="false" />
> > > >
> > > >
> > > > If you have any idea, thx in advance.
> > > >
> > > > --
> > > > ~
> > > > | klessou |
> > > > ~
> > >
> > >
> >
> >
> > --
> > ~
> > | klessou |
> > ~
>
>


-- 
~
| klessou |
~


required keyword in all a document

2008-10-06 Thread KLessou
Hi,

I would like to find all documents who contain "France, Flag, French".

I've got docs like this one :

...
wordA,wordB,france, ...
wordA,wordB,flag, ...
wordA,wordB,french, ...
...


I can't make my query like this :
k1_en:(+france +flag +french)^100 OR k2_en:(+france +flag +french)^10 OR
k3_en:(+france +flag +french)

So, this only way to do what I want is to generate this query :

(k1_en:france^100 OR k2_en:france^10 OR k3_en:france)
AND
(k1_en:flag^100 OR k2_en:flag^10 OR k3_en:flag)
AND
(k1_en:french^100 OR k2_en:french^10 OR k3_en:french)

Is there a better/more simple way to do this ?

Thx in advance !

-- 
~
| klessou |
~


Re: required keyword in all a document

2008-10-06 Thread KLessou
MultiFieldQueryParser seems to generate what I want :
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/queryParser/MultiFieldQueryParser.html

But is there a Php version ?

On Mon, Oct 6, 2008 at 1:24 PM, KLessou <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I would like to find all documents who contain "France, Flag, French".
>
> I've got docs like this one :
> 
> ...
> wordA,wordB,france, ...
> wordA,wordB,flag, ...
> wordA,wordB,french, ...
> ...
> 
>
> I can't make my query like this :
> k1_en:(+france +flag +french)^100 OR k2_en:(+france +flag +french)^10 OR
> k3_en:(+france +flag +french)
>
> So, this only way to do what I want is to generate this query :
>
> (k1_en:france^100 OR k2_en:france^10 OR k3_en:france)
> AND
> (k1_en:flag^100 OR k2_en:flag^10 OR k3_en:flag)
> AND
> (k1_en:french^100 OR k2_en:french^10 OR k3_en:french)
>
> Is there a better/more simple way to do this ?
>
> Thx in advance !
>
> --
> ~
> | klessou |
> ~
>



-- 
~
| klessou |
~


Re: required keyword in all a document

2008-10-06 Thread KLessou
On Mon, Oct 6, 2008 at 1:24 PM, KLessou <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I would like to find all documents who contain "France, Flag, French".
>
> I've got docs like this one :
> 
> ...
> wordA,wordB,france, ...
> wordA,wordB,flag, ...
> wordA,wordB,french, ...
> ...
> 
>
> I can't make my query like this :
> k1_en:(+france +flag +french)^100 OR k2_en:(+france +flag +french)^10 OR
> k3_en:(+france +flag +french)
>

because I only get this type of document :

...
wordA,wordB,france,flag,french, ... 
wordA,wordB, ...
wordA,wordB, ...
...



>
> So, this only way to do what I want is to generate this query :
>
> (k1_en:france^100 OR k2_en:france^10 OR k3_en:france)
> AND
> (k1_en:flag^100 OR k2_en:flag^10 OR k3_en:flag)
> AND
> (k1_en:french^100 OR k2_en:french^10 OR k3_en:french)
>
> Is there a better/more simple way to do this ?
>
> Thx in advance !
>
> --
> ~
> | klessou |
> ~
>



-- 
~
| klessou |
~