Weighting the Licene score

2008-08-26 Thread s d
I want to weighted average the Lucene score with an additional score i have,
i.e. (W1 * Lucene score + W2 * Other score) / (W1 + W2) .
What is the easiest way to do this?
Also, is the Lucene score normalized.
Thanks,


Re: Weighting the Licene score

2008-08-26 Thread s d
But function query doesn't give access to the SOLR score, only to fields in
the index, no ?
thx

On Tue, Aug 26, 2008 at 2:02 PM, Otis Gospodnetic <
[EMAIL PROTECTED]> wrote:

> I think the easiest approach might be making use of Lucene's function
> query.
>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Original Message 
> > From: s d <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Tuesday, August 26, 2008 1:55:38 PM
> > Subject: Weighting the Licene score
> >
> > I want to weighted average the Lucene score with an additional score i
> have,
> > i.e. (W1 * Lucene score + W2 * Other score) / (W1 + W2) .
> > What is the easiest way to do this?
> > Also, is the Lucene score normalized.
> > Thanks,
>
>


Partitioning the index

2008-12-17 Thread s d
Hi,Is there a recommended index size (on disk, number of documents) for when
to start partitioning it to ensure good response time?
Thanks,
S


display tokens

2007-12-07 Thread s d
How can I retrieve the "analyzed tokens" (e.g. the stemmed values) of a
specific field?


Is there a way to retrieve the "analyzed tokens" (e.g. the stemmed values) of a field from the SOLR index ?

2007-12-09 Thread s d
Is there a way to retrieve the "analyzed tokens" (e.g. the stemmed
values) of a field from the SOLR index ?
Almost like using SOLR as a utility for generating the tokens.
Thanks !


Lucene And SOLR

2007-12-19 Thread s d
Is there a way to import a Lucene index (as is) into SOLR? Basically, I'm
looking to enjoy the "web context" and caching provided by SOLR but keep the
index under my control in Lucene.


RAMDirectory

2007-12-27 Thread s d
Is there a way to use RAMDirectory with SOLR?If you can point me to
documentation that would be great.
Thanks,
S


Query Syntax (Standard handler) Question

2008-01-04 Thread s d
Is there a simpler way to write this query (I'm using the standard handler)
?
field1:t1 field1:t2 field1:"t1 t2" field2:t1 field2:t2 field2:"t1 t2"
Thanks,


Re: Query Syntax (Standard handler) Question

2008-01-04 Thread s d
but i want to sum the scores and not use max, can i still do it with the
DisMax? am i missing anything ?

On Jan 4, 2008 2:32 AM, Erik Hatcher <[EMAIL PROTECTED]> wrote:

>
> On Jan 4, 2008, at 4:40 AM, s d wrote:
> > Is there a simpler way to write this query (I'm using the standard
> > handler)
> > ?
> > field1:t1 field1:t2 field1:"t1 t2" field2:t1 field2:t2 field2:"t1 t2"
>
> Looks like you'd be better off using the DisMax handler for 
> (without the brackets).
>
>Erik
>
>


queryResultCache

2008-01-05 Thread s d
What is the best approach to tune queryResultCache ?For example  the default
size is: size="512" but since a document id is just an int (it is an int,
right?) ,i.e 4 bytes why not set size to 10,000,000 for example (it's only
~38Mb).
I sense there is something that I'm missing here :). any help would be
appreciated.
Thanks,


Boosting a Field (Standard Handler)

2008-01-05 Thread s d
How do i boost a field (not a term) using the standard handler syntax? I
know i can do that with the DisMax but I'm trying to keep myself in the
standard one.Can this be done ?
Thanks,


Re: queryResultCache

2008-01-06 Thread s d
Thanks. a factor of 20 or even 30 from my numbers still gives a much larger
number than the default one and i was wondering is there any disadvantage in
having a big number/ cache?BTW, where is the TTL controlled ?

On Jan 6, 2008 7:23 AM, Yonik Seeley <[EMAIL PROTECTED]> wrote:

> On Jan 6, 2008 12:59 AM, s d <[EMAIL PROTECTED]> wrote:
> > What is the best approach to tune queryResultCache ?For example  the
> default
> > size is: size="512" but since a document id is just an int (it is an
> int,
> > right?) ,i.e 4 bytes why not set size to 10,000,000 for example (it's
> only
> > ~38Mb).
>
> This cash size refers to the number of id lists are stored.
> One query + sort that yields the top 20 results == 1 entry in the cache.
>
> -Yonik
>


Re: queryResultCache

2008-01-06 Thread s d
Got it. Smart.
Thx

On 1/6/08, Chris Hostetter <[EMAIL PROTECTED]> wrote:
>
> : number than the default one and i was wondering is there any disadvantage
> in
> : having a big number/ cache?BTW, where is the TTL controlled ?
>
> no disadvantage as long as you've got the RAM ... NOTE: the magic "512"
> number you refered to isn't a "default" -- it's an "example" in the
> "example"
> solrconfig.xml
>
> There is no TTL for Solr caches, as noted in the wiki...
>
> http://wiki.apache.org/solr/SolrCaching
>
> Solr caches are associated with an Index Searcher -- a particular 'view'
> of the index that doesn't change. So as long as that Index Searcher is
> being used, any items in the cache will be valid and available for reuse.
> Caching in Solr is unlike ordinary caches in that Solr cached objects will
> not expire after a certain period of time; rather, cached objects will be
> valid as long as the Index Searcher is valid.
>
>
>
> -Hoss
>
>


How do i normalize diff information (different type of documents) in the index ?

2008-01-07 Thread s d
e.g. if the index is field1 and field2 and documents of type (A) always have
information for field1 AND information for field2 while document of type (B)
always have information for field1 but NEVER information for field2.
The problem is that the formula will sum field1 and field2 hence skewing in
favour of documents of type (A).
If i combine the 2 fields into 1 field (in an attempt to normalize) i will
obviously skew the statistics.
Please advise,
Thanks,


Re: How do i normalize diff information (different type of documents) in the index ?

2008-01-07 Thread s d
Isn't there a better way to take the information into account but still
normalize? taking the score of only one of the fields doesn't sound like the
best thing to do (it's basically ignoring part of the information).

On Jan 7, 2008 9:20 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:

>
> On 7-Jan-08, at 9:02 PM, s d wrote:
>
> > e.g. if the index is field1 and field2 and documents of type (A)
> > always have
> > information for field1 AND information for field2 while document of
> > type (B)
> > always have information for field1 but NEVER information for field2.
> > The problem is that the formula will sum field1 and field2 hence
> > skewing in
> > favour of documents of type (A).
> > If i combine the 2 fields into 1 field (in an attempt to normalize)
> > i will
> > obviously skew the statistics.
>
> Try the dismax handler.  It's main goal is to query multiple fields
> while only counting the score of the highest-scoring one (mostly).
>
> -Mike
>


Performance - FunctionQuery

2008-01-08 Thread s d
Adding a FunctionQuery made the query response time slower by ~300ms, adding
a 2ndFunctionQuery added another ~300ms so overall i got over 0.5sec for a
response time (slow).Is this expected or am i doing something wrong ?
Thx


Min-Score Filter

2008-01-08 Thread s d
Is there a way or a point in filtering all results bellow a certain score?
e.g. exclude all results bellow score Y.Thanks


Re: How do i normalize diff information (different type of documents) in the index ?

2008-01-08 Thread s d
Got it (
http://wiki.apache.org/solr/DisMaxRequestHandler#head-cfa8058622bce1baaf98607b197dc906a7f09590)
.
thx !

On Jan 8, 2008 12:11 AM, Chris Hostetter < [EMAIL PROTECTED]> wrote:

>
> : Isn't there a better way to take the information into account but still
> : normalize? taking the score of only one of the fields doesn't sound like
> the
> : best thing to do (it's basically ignoring part of the information).
>
> note the word "mostly" in Mike's response about dismax ... the "tie" param
>
> lets you decide how much the other fields influence the score.  Try it,
> it works really well ... trust me/us.
>
> For the record: i'm really not sure what your question is ... you say you
> want to normalize for the fact that some docs don't have a value in some
> fields, but you don't want to combine the fields because it will skew the
> statistics ... isn't that "skewing" exactly what you are trying to
> achieve?
>
> don't you need to introduce some "skew" in favor of hte docs that don't
> have a value for field2 to compensate forr the existing "counter skew"
> they already have?
>
>
>
> -Hoss
>
>


DisMax Syntax

2008-01-08 Thread s d
User Query: x1 x2
Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b

In the standard handler the only way i saw how to make this work was:
field:x1 field:x2 field:"x1 x2"!a^b

Now that i want to try the DisMax is there a way to implement this without
having duplicate fields? i.e. since the fields and the terms are separated
in the DisMax how do i achieve the same query ?

Thanks


Re: DisMax Syntax

2008-01-08 Thread s d
I may be mistaken, but this is not equivalent to my query.In my query i have
matches for x1, matches for x2 without slope and/or boosting and then match
to "x1 x2" (exact match) with slope (~) a and boost (b) in order to have
results with exact match score better.
The total score is the sum of all the above.
Your query seems diff


On Jan 8, 2008 11:56 AM, Chris Hostetter <[EMAIL PROTECTED]> wrote:

>
> : User Query: x1 x2
> : Desired query (Lucene): field:x1 x2 field:"x1 x2"~a^b
> :
> : In the standard handler the only way i saw how to make this work was:
> : field:x1 field:x2 field:"x1 x2"!a^b
> :
> : Now that i want to try the DisMax is there a way to implement this
> without
> : having duplicate fields? i.e. since the fields and the terms are
> separated
> : in the DisMax how do i achieve the same query ?
>
> i'm not sure what you mean by "without duplicate fields" but assuming i
> understand your goal, this seems trivial...
>
>q = x1 x2
>qf = field
>pf = field^b
>ps = a
>
>
> -Hoss
>
>


Inconsistent results

2008-02-01 Thread s d
Hi,I use SOLR with standard handler and when i send the same exact query to
solr i get different results every time (i.e. refresh the page with the
query and get different results).
Any ideas?
Thx,


Interleaved results form different sources

2008-04-14 Thread s d
We have an index of documents from different sources and we want to make
sure the results we display are interleaved from the different sources and
not only ranked based on relevancy.Is there a way to do this ?
Thanks,
S.


result limit / diversity with an OR query

2008-05-12 Thread s d
Hi,I have a query similar to: x OR y OR z and i want to know if there is a
way to make sure i get 1 result with x, 1 result with y and one with z ?
Alternatively, is it possible to achieve through facets?
Thanks,
S.


Does SOLR support RAMDirectory ?

2008-06-01 Thread s d
Can i use RAMDirectory in SOLR?Thanks,
S