Setting a Threshold of a sortable field to filter the result?

2008-03-29 Thread Vinci

Hi,

How can I set a threshold value of a field so that I can filter the result
which is lower than the threshold? By the schema.xml or set by the query?

Thank you,
Vinci
-- 
View this message in context: 
http://www.nabble.com/Setting-a-Threshold-of-a-sortable-field-to-filter-the-result--tp16367336p16367336.html
Sent from the Solr - User mailing list archive at Nabble.com.



Multiple unique field?

2008-03-29 Thread Vinci

Hi,

I want to set 2 field that are unique for different kind of searching. Does
it possible?

Thank you,
Vinci
-- 
View this message in context: 
http://www.nabble.com/Multiple-unique-field--tp16367339p16367339.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: synonyms

2008-03-29 Thread Christian Vogler
On Friday 28 March 2008 21:44:29 Leonardo Santagada wrote:
> Well his examples are in brazilian portuguese and not spanish and the
> biggest problem is that a spanish stemmer is not goin to work. I
> haven't found a pt_BR steammer, have I overlooked something?

Try the Snowball Porter filter factory. The algorithm is specified in plain 
text files, so adding new stemmers to the codebase is pretty easy. The hard 
part is finding a good specification of the algorithm for Brazilian 
Portuguese.

A Google search reveals some references to Brazilian Portuguese versions of 
the Porter algorithm. Maybe one of these is suitably unencumbered for 
implementation and distribution as free software.

As a last resort, there already is a Snowball Porter stemmer for Portuguese in 
the SOLR codebase. However, I do not know how suitable it would be for 
adaptation to Brazilian Portuguese, as I know zilch about the variant spoken 
in Portugal.

Best  regards
- Christian


Re: Solr commits automatically on appserver shutdown

2008-03-29 Thread Yonik Seeley
On Fri, Mar 28, 2008 at 2:05 PM, Noble Paul നോബിള്‍ नोब्ळ्
<[EMAIL PROTECTED]> wrote:
> hi,
>  I am willing to work on this if you can give me some pointers as to
>  where to start?

DirectUpateHandler2 implements it's own duplicates removal, which is
no longer necessary.

-Yonik


Re: hl.requireFieldMatch and idf

2008-03-29 Thread Koji Sekiguchi

Mike,

Thank you for your response.


cause:
If hl.requireFieldMatch set to true, 
DefaultSolrHighlight.getQueryScorer()

uses QueryScorer(Query,IndexReader,String) constructor in Lucene
highlighter.
Then the constructor calls getIdfWeightedTerms() to get an array of
WeightedTerm.
In getIdfWeightedTerms(), idf is calculated to get weighted terms.
And the calculated idf can be minus with un-optimized index.


Okay, _this_ is the true bug.  I don't see how lucene can return a 
negative idf, optimized index or no.
I think that docFreq includes deleted docs count and this is Lucene's 
feature.

This feature causes a negative idf, as long as the following fomula is used:

// o.a.l.s.highlight.QueryTermExtractor.java
float idf=(float)(Math.log((float)totalNumDocs/(double)(docFreq+1)) + 1.0);


Does DefaultSolrHighlight.getQueryScorer() use
QueryScorer(Query,IndexReader,String)
by design? If no, I'm happy to open a ticket.


Indeed it is by design: this is how requireFieldMatch is implemented, 
as the lucene highlighter will require the field to match as well as 
the term.  A consequence of this is that the idf's as also folded into 
the score, which is triggering the bug you are seeing.
Can we use QueryScorer(Query,String) instead of 
QueryScorer(Query,IndexReader,String) to implement

hl.requireFieldMatch=true? I've opened SOLR-517 to follow up this problem.

Thank you,

Koji





Re: solr.search.function

2008-03-29 Thread Chris Hostetter

: SELECT MID, AVG(Rating) as Average FROM mpr
: WHERE PID in (p1[,p2,...])
: GROUP BY MID
: ORDER BY Average DESC LIMIT 0, 10;
: 
: Also I would require to boost the vales based on PIDs (some products have
: more wight than others  effectively computing a wighted average)

: To handle these queries I am plannig to develop a custom request handler
: plugin in most generic form to be useful in general.

ok .. but i'm not really sure what you're asking at this point ... as i 
said: the FunctionQuery code isn't relaly going to help you here .. the 
Faceting Code is more akin to what you are asking about.

alternately: just because your database is structured arround one record 
for each (MID, PID, Rating) triple doesn't mean your *documents* need to 
be structured that way ... instead you can have one document per product 
and precompute the average before indexing them (that's the theory behind 
building an index, you precompute/denormalize/invert information for 
faster lookup later)



-Hoss



Re: Term frequency

2008-03-29 Thread Chris Hostetter

: is there a way to get the term frequency per found result back from Solr ?

this info is in the "explain" section of the debugQuery output, see this 
recent post about a similar question...

http://www.nabble.com/Highlight---get-terms-used-by-lucene-to16276184.html#a16323025


-Hoss