Need "OR" in DisMax Query

2009-10-05 Thread David Giffin
Hi There,

Maybe I'm missing something, but I can't seem to get the dismax
request handler to perform and OR query. It appears that OR is removed
by the stop words. I like to do something like
"qt=dismax&q=red+OR+green" and get all green and all red results.

Thanks,
David


Re: Need "OR" in DisMax Query

2009-10-05 Thread David Giffin
So, I remove the stop word OR from the stopwords and get the same
result. Using the standard query handler syntax like this
"fq=((tags:red)+OR+(tags:green))" I get 421,000 results. Using dismax
"q=red+OR+green" I get 29,000 results. The debug output from
parsedquery_toString show this:

+(((tags:red)~0.01 (tags:green)~0.01)~2)

It feels like the dismax handler is not handling the "OR" properly. I
also tried "q=red+|+green" and got the same 29,000 results.

Thanks,
David

On Mon, Oct 5, 2009 at 3:02 PM, Christian Zambrano  wrote:
> David,
>
> If your schema includes fields with analyzers that use the StopFilterFactory
> and the dismax QueryHandler is set-up to search within those fields, then
> you are correct.
>
>
> On 10/05/2009 01:36 PM, David Giffin wrote:
>>
>> Hi There,
>>
>> Maybe I'm missing something, but I can't seem to get the dismax
>> request handler to perform and OR query. It appears that OR is removed
>> by the stop words. I like to do something like
>> "qt=dismax&q=red+OR+green" and get all green and all red results.
>>
>> Thanks,
>> David
>>
>


facet.query and fq

2009-10-27 Thread David Giffin
Hi There,

Is there a way to get facet.query= to ignore the fq= param? We want to
do a query like this:

select?fl=*&start=0&q=cool&fq=in_stock:true&facet=true&facet.query=in_stock:false&qt=dismax

To understand the count of items not in stock, when someone has
filtered items that are in stock. Or is there a way to combine two
queries into one?

Thanks,
David


Re: facet.query and fq

2009-10-27 Thread David Giffin
Thanks, that was just what I was looking for!

On Tue, Oct 27, 2009 at 1:27 PM, Jérôme Etévé  wrote:
> Hi,
>
>  you need to 'tag' your filter and then exclude it from the faceting.
>
>  An example here:
> http://wiki.apache.org/solr/SimpleFacetParameters#Tagging_and_excluding_Filters
>
> J.
>
> 2009/10/27 David Giffin :
>> Hi There,
>>
>> Is there a way to get facet.query= to ignore the fq= param? We want to
>> do a query like this:
>>
>> select?fl=*&start=0&q=cool&fq=in_stock:true&facet=true&facet.query=in_stock:false&qt=dismax
>>
>> To understand the count of items not in stock, when someone has
>> filtered items that are in stock. Or is there a way to combine two
>> queries into one?
>>
>> Thanks,
>> David
>>
>
>
>
> --
> Jerome Eteve.
> http://www.eteve.net
> jer...@eteve.net
>


Solr Replication Performance

2009-01-08 Thread David Giffin
Hi There,

I have been building a Solr environment that indexes roughly 3 million
products. The current index is roughly 9gig in size. We have bumped
into some issues performance issues with Solr's Replication. During
the Solr slave snapshot installation, query times take longer and may
in some cases timeout.

Here are some of the details: Every 3 minutes approximately 2000
updates are committed to the master Solr index and a snapshot is
taken. There are 4 Solr slaves (2 way quad cores / 32gig ram / 15k
scsi) which poll every minute to look for a new snapshot and install
it. During the install of the snapshot on the slaves I'm seeing two
things, 1. the disk i/o hit, and 2. cpu load on the Java/Jetty/Solr
process jumps up. I know the i/o is related to the transfer of the
snapshot to the local box. I believe the cpu load is related to cache
warming, which takes roughly 10-30 seconds to complete. Currently for
cache warming I have the following settings:




2


50
200
1024

true
false

I have thought about turning off the cache warming completely and
looking at the search performance. I would love to hear any ideas or
experiences that people have had in tuning Solr Replication.

Thanks,
David


New Searcher / Commit / Cache Warming Time

2009-01-15 Thread David Giffin
Hi All,

I have been trying to reduce the cpu load and time it takes to put a
new snapshot in place on our slave servers. I have tried tweaking many
of the system memory, jvm and cache size setting used by Solr. When
running a commit from the command line I'm seeing roughly 16 seconds
before the commit completes. This is a ~7gig index with no pending
changes, nothing else running, no load:

INFO: {commit=} 0 15771
Jan 15, 2009 11:29:35 PM org.apache.solr.core.SolrCore execute
INFO: [listings] webapp=/solr path=/update params={} status=0 QTime=15771

So I started disabling things. I disabled everything under 
times went down:

INFO: {commit=} 0 103
Jan 15, 2009 11:35:22 PM org.apache.solr.core.SolrCore execute
INFO: [listings] webapp=/solr path=/update params={} status=0 QTime=103

So I started adding things back in, and found that adding the

section was causing the slow down. When I comment that section commit
times go down, the cpu spikes go away. So I tried putting the
newSearcher section back in with no queries to run, same thing...
times jump up:

INFO: {commit=} 0 16306
Jan 15, 2009 11:49:32 PM org.apache.solr.core.SolrCore execute
INFO: [listings] webapp=/solr path=/update params={} status=0 QTime=16306

Do you know what would be causing "newSearcher" to create such a
delays, and cpu spikes? Is there any reason not to disable the
"newSearcher" section?

Thanks,
David


Custom Sorting Based on Relevancy

2009-05-04 Thread David Giffin
Hi There,

I'm working on a sorting issue. Our site currently sorts by creation
date descending, so users list similar products multiple times to show
up at the top of the results. When sorting based on score, we want to
move items by the "same user" with the "same title" down search
results. It would be best if the first item stayed in place based on
score, and each additional item is moved out (rows * repeated
user/title).

Is custom sorting the best way? or is there something else I'm not
thinking about. At the moment I'm looking at doing roughly the
opposite of the Query Elevate Search component.

Thanks,
David


Token filter on multivalue field

2009-06-03 Thread David Giffin
Hi There,

I'm working on a unique token filter, to eliminate duplicates on a
multivalue field. My filter works properly for a single value field.
It seems that a new TokenFilter is created for each value in the
multivalue field. I need to maintain an array of used tokens across
all of the values in the multivalue field. Is there a good way to do
this? Here is my current code:

public class UniqueTokenFilter extends TokenFilter {

private ArrayList words;
public UniqueTokenFilter(TokenStream input) {
super(input);
this.words = new ArrayList();
}

@Override
public final Token next(Token in) throws IOException {
for (Token token=input.next(in); token!=null; token=input.next()) {
if ( !words.contains(token.term()) ) {
words.add(token.term());
return token;
}
}
return null;
}
}

Thanks,
David


Re: Token filter on multivalue field

2009-06-06 Thread David Giffin
I'm doing a combination of update processor and token filter. The
token filter is necessary to reduce the duplicates after stemming has
occurred.

David

2009/6/4 Noble Paul നോബിള്‍  नोब्ळ् :
> isn't better to use an UpdateProcessor  for this?
>
> On Thu, Jun 4, 2009 at 1:52 AM, Otis Gospodnetic
>  wrote:
>>
>> Hello,
>>
>> It's ugly, but the first thing that came to mind was ThreadLocal.
>>
>>  Otis
>> --
>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>>
>>
>>
>> - Original Message 
>>> From: David Giffin 
>>> To: solr-user@lucene.apache.org
>>> Sent: Wednesday, June 3, 2009 1:57:42 PM
>>> Subject: Token filter on multivalue field
>>>
>>> Hi There,
>>>
>>> I'm working on a unique token filter, to eliminate duplicates on a
>>> multivalue field. My filter works properly for a single value field.
>>> It seems that a new TokenFilter is created for each value in the
>>> multivalue field. I need to maintain an array of used tokens across
>>> all of the values in the multivalue field. Is there a good way to do
>>> this? Here is my current code:
>>>
>>> public class UniqueTokenFilter extends TokenFilter {
>>>
>>>     private ArrayList words;
>>>     public UniqueTokenFilter(TokenStream input) {
>>>         super(input);
>>>         this.words = new ArrayList();
>>>     }
>>>
>>>     @Override
>>>     public final Token next(Token in) throws IOException {
>>>         for (Token token=input.next(in); token!=null; token=input.next()) {
>>>             if ( !words.contains(token.term()) ) {
>>>                 words.add(token.term());
>>>                 return token;
>>>             }
>>>         }
>>>         return null;
>>>     }
>>> }
>>>
>>> Thanks,
>>> David
>>
>>
>
>
>
> --
> -
> Noble Paul | Principal Engineer| AOL | http://aol.com
>