Possible to adjust FieldNorm?

2011-12-14 Thread cnyee
Hi, 

Is it possible to adjust FieldNorm? I have a scenario where the search is
not producing the desired result because of fieldNorm:

Search terms: coaching leadership
Record 1: name="Ask the Coach", desc="...",...
Record 2: name="Coaching as a Leadership Development Tool Part 1",
desc="...",...

Record 1 was scored higher than record 2, despite record 2 has two matches.
The scoring is given below:

Record 1:
  1.2878088 = (MATCH) weight(name_en:coach in 6430), product of:
0.20103075 = queryWeight(name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
6.406029 = (MATCH) fieldWeight(name_en:coach in 6430), product of:
  1.0 = tf(termFreq(name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  1.0 = fieldNorm(field=name_en, doc=6430)

Record 2:
  0.56341636 = (MATCH) weight(name_en:coach in 4744), product of:
0.20103075 = queryWeight(name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
2.8026378 = (MATCH) fieldWeight(name_en:coach in 4744), product of:
  1.0 = tf(termFreq(crs_name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.4375 = fieldNorm(field=name_en, doc=4744)

Many thanks in advance.

Chut



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Possible-to-adjust-FieldNorm-tp3584998p3584998.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Possible to adjust FieldNorm?

2011-12-14 Thread cnyee
Sorry, I did not give the full output in the first post. 
For what it looks, the fieldNorm is saying that:
1 match out of 3 words in record 1 is more significant than 2 matches out of
8 words in record 2.
That would be true for simple arithmetic, but unsatisfactory in human
'meaning'.

Here are the full explanation. Record 2 has some boosting as well.

Record 1:
1.5843434 = (MATCH) sum of:
  1.5372416 = (MATCH) sum of:
1.2878088 = (MATCH) max plus 0.1 times others of:
  1.2878088 = (MATCH) weight(crs_name_en:coach in 6430), product of:
0.20103075 = queryWeight(crs_name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
6.406029 = (MATCH) fieldWeight(crs_name_en:coach in 6430), product
of:
  1.0 = tf(termFreq(crs_name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  1.0 = fieldNorm(field=crs_name_en, doc=6430)
0.2494328 = (MATCH) max plus 0.1 times others of:
  0.2494328 = (MATCH) weight(crs_desc_en:leadership in 6430), product
of:
0.15826634 = queryWeight(crs_desc_en:leadership), product of:
  5.043302 = idf(docFreq=628, maxDocs=35862)
  0.03138149 = queryNorm
1.5760319 = (MATCH) fieldWeight(crs_desc_en:leadership in 6430),
product of:
  1.0 = tf(termFreq(crs_desc_en:leadership)=1)
  5.043302 = idf(docFreq=628, maxDocs=35862)
  0.3125 = fieldNorm(field=crs_desc_en, doc=6430)
  0.04710189 = (MATCH) product of:
0.09420378 = (MATCH) sum of:
  0.09420378 = (MATCH) product of:
0.3768151 = (MATCH) sum of:
  0.3768151 = (MATCH) weight(published_year:2008 in 6430), product
of:
0.10874291 = queryWeight(published_year:2008), product of:
  3.4651926 = idf(docFreq=3047, maxDocs=35862)
  0.03138149 = queryNorm
3.4651926 = (MATCH) fieldWeight(published_year:2008 in 6430),
product of:
  1.0 = tf(termFreq(published_year:2008)=1)
  3.4651926 = idf(docFreq=3047, maxDocs=35862)
  1.0 = fieldNorm(field=published_year, doc=6430)
0.25 = coord(1/4)
0.5 = coord(1/2)
  0.0 = (MATCH) FunctionQuery(int(crs_stars)), product of:
0.0 = int(crs_stars)=0
2.5 = boost
0.03138149 = queryNorm

Record 2:
1.5590522 = (MATCH) sum of:
  1.0096307 = (MATCH) sum of:
0.6206793 = (MATCH) max plus 0.1 times others of:
  0.56341636 = (MATCH) weight(crs_name_en:coach in 4744), product of:
0.20103075 = queryWeight(crs_name_en:coach), product of:
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.03138149 = queryNorm
2.8026378 = (MATCH) fieldWeight(crs_name_en:coach in 4744), product
of:
  1.0 = tf(termFreq(crs_name_en:coach)=1)
  6.406029 = idf(docFreq=160, maxDocs=35862)
  0.4375 = fieldNorm(field=crs_name_en, doc=4744)
  0.11664742 = (MATCH) weight(meta_en:coach in 4744), product of:
0.11443973 = queryWeight(meta_en:coach), product of:
  3.646727 = idf(docFreq=2541, maxDocs=35862)
  0.03138149 = queryNorm
1.0192913 = (MATCH) fieldWeight(meta_en:coach in 4744), product of:
  2.236068 = tf(termFreq(meta_en:coach)=5)
  3.646727 = idf(docFreq=2541, maxDocs=35862)
  0.125 = fieldNorm(field=meta_en, doc=4744)
  0.4559821 = (MATCH) weight(crs_desc_en:coach in 4744), product of:
0.19534174 = queryWeight(crs_desc_en:coach), product of:
  6.2247434 = idf(docFreq=192, maxDocs=35862)
  0.03138149 = queryNorm
2.3342788 = (MATCH) fieldWeight(crs_desc_en:coach in 4744), product
of:
  2.0 = tf(termFreq(crs_desc_en:coach)=4)
  6.2247434 = idf(docFreq=192, maxDocs=35862)
  0.1875 = fieldNorm(field=crs_desc_en, doc=4744)
0.3889513 = (MATCH) max plus 0.1 times others of:
  0.36372444 = (MATCH) weight(crs_name_en:leadership in 4744), product
of:
0.16152287 = queryWeight(crs_name_en:leadership), product of:
  5.147074 = idf(docFreq=566, maxDocs=35862)
  0.03138149 = queryNorm
2.251845 = (MATCH) fieldWeight(crs_name_en:leadership in 4744),
product of:
  1.0 = tf(termFreq(crs_name_en:leadership)=1)
  5.147074 = idf(docFreq=566, maxDocs=35862)
  0.4375 = fieldNorm(field=crs_name_en, doc=4744)
  0.04061773 = (MATCH) weight(meta_en:leadership in 4744), product of:
0.076728955 = queryWeight(meta_en:leadership), product of:
  2.4450386 = idf(docFreq=8453, maxDocs=35862)
  0.03138149 = queryNorm
0.5293664 = (MATCH) fieldWeight(meta_en:leadership in 4744), product
of:
  1.7320508 = tf(termFreq(meta_en:leadership)=3)
  2.4450386 = idf(docFreq=8453, maxDocs=35862)
  0.125 = fieldNorm(field=meta_en, doc=4744)
  0.21165074 = (MATCH) weight(crs_desc_en:leadership in 4744), product
of:
0.15826634 = queryWeight(crs_desc_en:leadership), product of:
  5.043302 = i

Facet filter: how to specify OR expression?

2011-05-11 Thread cnyee
Hi,

Is there anyway to specify an 'OR' expression for facet filter? 
For example docType="pdf" or docType="txt"

Many thanks in advance.
Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930570.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet filter: how to specify OR expression?

2011-05-12 Thread cnyee
It works. Many thanks.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930783.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet filter: how to specify OR expression?

2011-05-12 Thread cnyee
I have another facet that is of type integer and it gave an exception.

Is it true that the field has to be of type string or text for the OR
expression to work?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2930863.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet filter: how to specify OR expression?

2011-05-12 Thread cnyee
The exception says:

java.lang.NumberFormatExcepton: for input string "or"

The field type is:




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931282.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Facet filter: how to specify OR expression?

2011-05-12 Thread cnyee
Oh I see

I was wrong in using (pdf or txt). It worked, but have different meanings
altogether from (pdf OR txt).

Thanks a lot for your help.

Best regards,
Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facet-filter-how-to-specify-OR-expression-tp2930570p2931347.html
Sent from the Solr - User mailing list archive at Nabble.com.


What is the different?

2011-07-21 Thread cnyee
Hi,

I have two queries:

(1) q = (change management)
(2) q = (change management) AND domain_ids:(0^1.3 OR 1)

The purpose of the (2) is to boost the records with domain_ids=0.
In my database all records has domain_ids = 0 or 1, so domains_ids:(0 or 1)
will always returns the full database.

Now my questions is - query (2) returns 5000+ results, but query (1) returns
700+ results.

Can somebody enlighten me on what is the reasons behind such a vast
different in number of results?

Many thanks in advance.

Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/What-is-the-different-tp3190278p3190278.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Logically equivalent queries but vastly different no of results?

2011-07-22 Thread cnyee
I think I know what it is. The second query has higher scores than the first.

The additional condition "domain_ids:(0^1.3 OR 1)" which evaluates to true
always - pushed up the scores and allows a LOT more records to pass.

Is there a better way of doing this?

Regards,
Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Logically-equivalent-queries-but-vastly-different-no-of-results-tp3190278p3191211.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Logically equivalent queries but vastly different no of results?

2011-07-25 Thread cnyee
Yes - I am using edismax but the reason is not obvious to me can you give
me a pointer?

Thanks
Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Logically-equivalent-queries-but-vastly-different-no-of-results-tp3190278p3199362.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multiplexing TokenFilter for multi-language?

2011-08-08 Thread cnyee
Sorry if this has already been discussed, but I have already spent a couple
of days googling in vain

The problem:
- documents in multiple languages (us, de, fr, es).
- language is known (a team of editors determines the language manually, and
users are asked to specify language option for searching).

My intended approach:
- one index.
- a multiplexing token filter, a MultilingualSnowballFilterFactory that
instantiates a Snowball Stemmer for the appropriate language.
- language is a facet, to get rid of cross-language ambiguities with
multiple languages mixed in the same field.

The problem is how to communicate the language to the
MultilingualSnowballFilterFactory. Once the language is known, instantiating
the Snowball Stemmer for the right language is easy. I got a working version
attached below. 

My solution:
- append the language as the first token for the FilterFactory to pick up.
E.g. "es This is a spanish document".
- this would mean I need to duplicate the fields - an original version for
storing, and a version with the language marker appended for indexing. E.g
description (indexed=false, stored=true), description_i (indexed=true,
stored=false).

Is there a better way?

Many thanks in advance.

Yee

http://lucene.472066.n3.nabble.com/file/n3235341/MultilingualSnowballFilterFactory.java
MultilingualSnowballFilterFactory.java 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-tp3235341p3235341.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread cnyee
You are right - the stemmer was only instantiated twice. Not sure why it was
instantiated twice. I tested with 10 and 50 records, maybe it was associated
with the auto-commit cycle).

What a bummer. Back to the drawing board again.

Thanks for your input anyway. I was struggling with weird search behavior
all day today. Now it all makes sense.

I think a multiplexing stemmer would be a worthy extension for future
version of SOLR.

Best regards,
Yee

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-tp3235341p3239103.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multiplexing TokenFilter for multi-language?

2011-08-09 Thread cnyee
I believe that the FilterFactory is not designed to be called for each
instant of field processing. Think of it, that would be terribly
inefficient. The instantiated stemmer is meant to be reused as much as
possible. Maybe the FilterFactory is called to instantiate a new stemmer in
association with a new server thread and when the dependent resources is
changed (e.g. updated stopword list).

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Multiplexing-TokenFilter-for-multi-language-tp3235341p3239260.html
Sent from the Solr - User mailing list archive at Nabble.com.