Different behavior for q=goo.com vs q=@goo.com in queries?

2010-12-30 Thread mrw

Using Lucid's Solr 1.4 distribution, if I index my email inbox and then
search it by passing in different email expressions, I notice that I get
different results based on whether the '@' character is included, even
though the character is present in every email address in the field I'm
searching.

For example, q=goo.com returns multiple items, as expected.

However, q...@goo.com return no results.  Since every address containing
"goo.com" also contains "@goo.com," I would expect the same number of
results.

I get this from both the Solr admin console and from my application, which
URL-encodes the query.

I Googled, searched the Wiki, and grepped the Pugh and Lucid books, but
don't see anything about this.  


Ideas?

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2168935.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Different behavior for q=goo.com vs q=@goo.com in queries?

2010-12-30 Thread mrw


Basically, just what you've suggested.  I did the field/query analysis piece
with verbose output.  Not entirely sure how to interpret the results, of
course.  Currently reading anything I can find on that.


Thanks


Erick Erickson wrote:
> 
> What steps have you taken to figure out whether the
> contents of your index are what you think? I suspect
> that the fields you're indexing aren't being
> analyzed/tokenized quite the way you expect either at
> query time or index time (or maybe both!).
> 
> Take a look at the admin/analysis page for the field you're indexing
> the data into. If that doesn't shed any light on the problem,
> please paste in the  definition for the field in question,
> maybe another set of eyes can see the issue.
> 
> Best
> Erick
> 
> 
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2169478.html
Sent from the Solr - User mailing list archive at Nabble.com.


Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw

I've noticed that performing a query with facet.mincount=0 and no fq clauses
results in a response where only facets with non-zero counts are returned,
but adding in an fq clause (caused by a user selecting a non-zero-valued
facet value checkbox) actually causes a bunch of 0-count facet values
completely unrelated to the query to be returned.

Is adding the fq constraint actually widening the query before
facet.mincount gets applied?  

E.g., say a query with no fq constraint produces the following facet values:

ID
1234 (1)
 (15)
1010 (30)

Title
Red (11)
Green (15)
Blue (32)

but when the user selects Blue (32), and I add &fq=Color:Blue, Solr returns
the following:

ID
1 (0)
2 (0)
3 (0)
...
99 (0)
100 (0)

Color
Orange (0)
Teal (0)
Red (0)
Green (0)
Blue (32)


Notice how, before the fq clause is added, none of the 0-count facets are
returned, even though facet.mincount = 0, but afterward, a bunch of 0-count
facets are returned?


The context of my question is trying to solve a problem where the
application must display facet values with a count of zero as filtering
operations remove them from the result set.  That is, if Red (10) was
displayed after the initial query, but the user filters on Blue (32), then
we must still display Red (0) so the user can select it and widen the query.  
Initially, we were using mincount=1 and managing the missing facets entirely
within the application, but now I'm trying to see if we can use mincount=0
and maybe some other constraints to achieve the same behavior without a lot
of custom code in the application.

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236105.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw


>> Notice how, before the fq clause is added, none of the
>> 0-count facets are
>> returned, even though facet.mincount = 0, but afterward, a
>> bunch of 0-count
>> facets are returned?
>>
> This is normal.

What's behind that?  Is it widening the results before the mincount
constraint is being applied?


> I couldn't fully follow, but you want something like multi-select
> faceting?
> 
> http://search-lucene.com/ is an example for that, user can select solr and
> lucene from the project facet > at the same time.
> 
> http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams


No.  Search-Lucene actually appears to remove facets when they're not
returned.  If you select Blue, and Red is eliminated, Red won't show up as a
facet anymore.  Therefore, the user select Red to add it back into the
result set.

Multi-selection keeps eliminated facets, but gives them virtual counts
related to the entire result set.  If you select Blue(32) and Red(10) is
eliminated, multi-selection causes Red(10) to be displayed.  Therefore, the
user can't tell that Red was eliminated, and the Red facet no longer has any
connection to the values int the result set.

What we need to do is show eliminated facets with a 0 count, so if you
select Blue (32) and Red (10) is eliminated, we show Red (0).  That
indicates that there are zero documents in the result set for Red, but Red
can still be selected to add Red documents back into the result set.


Thanks



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236309.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw


iorixxx wrote:
> 
> 
> After re-reading, it is not normal that none of the 0-count facets are
> showing up. Can you give us full parameter list that you obtain this
> by adding &echoParams=all to your search URL?
> 
> May be you limit facets to three in your first query? What happens when
> you add &facet.limit=-1?
> 
> 

We're actually using the default facet.limit value of 100.  I will increase
it to 200 and see if the non-zero-count facets show up.  Maybe that was
causing my confusion.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236768.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?

2011-01-11 Thread mrw


mrw wrote:
> 
> 
> We're actually using the default facet.limit value of 100.  I will
> increase it to 200 and see if the non-zero-count facets show up.  Maybe
> that was causing my confusion.
> 

Yep -- the 0-count facets were not being returned due to the facet.limit
cutoff.

So, unless there is another parameter that can be used with facet.mincount=0
in order to tune the results, it looks like I will need to use
facet.mincount=1 and handle the processing of omitted facets in the
application.

Thanks for the help.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236801.html
Sent from the Solr - User mailing list archive at Nabble.com.


Changing value of start parameter affects numFound?

2011-02-09 Thread mrw

I have a data set indexed over two irons, with M docs per Solr core for a
total of N cores.

If I perform a query across all N cores with start=0 and rows=30, I get,
say, numFound=27521).  If I simply change the start param to start=27510
(simulating being on the last page of data), I get a smaller result set
(say, numFound=21415).  

I had expected numFound to be the same in either case, since no other aspect
of the query had changed.  Am I mistaken?

I'm using Solr 1.4.1.955763M.  Faceting is enabled on the query. All cores
have the same schema.

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Changing-value-of-start-parameter-affects-numFound-tp2460645p2460645.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Changing value of start parameter affects numFound?

2011-02-09 Thread mrw


mrw wrote:
> 
> I have a data set indexed over two irons, with M docs per Solr core for a
> total of N cores.
> 
> If I perform a query across all N cores with start=0 and rows=30, I get,
> say, numFound=27521).  If I simply change the start param to start=27510
> (simulating being on the last page of data), I get a smaller result set
> (say, numFound=21415).  
> 
> I had expected numFound to be the same in either case, since no other
> aspect of the query had changed.  Am I mistaken?
> 
> I'm using Solr 1.4.1.955763M.  Faceting is enabled on the query. All cores
> have the same schema.
> 
> Thanks!
> 

More detail:  numFound seems to vary unpredictably based on start value.


start,   numFound
--
0-46,   27521
47-59, 27520
60,  27519
61-91, 27518
62,  27517


Any ideas?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Changing-value-of-start-parameter-affects-numFound-tp2460645p2460795.html
Sent from the Solr - User mailing list archive at Nabble.com.


GET or POST for large queries?

2011-02-17 Thread mrw

We are running into some issues with large queries.  Initially, they were
ostensibly header buffer overruns, because increasing Jetty's
headerBufferSize value to 65536 resolved them. This seems like a kludge, but
it does solve the problem for 95% of our users.

However, we do have queries that are physically larger than that and for
which increasing the headerBufferSize to 65536 does not work.  This is due
to security requirements:  Security descriptors are baked into the index,
and then potentially thousands of them (depending on the user context) are
passed in with each query.  These excessive queries are only a problem with
approximately 5% of users who are highly entitled, but the number of
security descriptors in are likely to increase and we won't have a
workaround for this security policy any time soon.

After a lot of Googling, it seems to me that it's common to increase the
headerBufferSize, but I don't see any other strategies.  Is it
possible/feasible to switch to use POST for querying?

Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: GET or POST for large queries?

2011-02-17 Thread mrw

Yeah, I tried switching to POST.

It seems to be handling the size, but apparently Solr has a limit on the
number of boolean comparisons -- I'm now getting "too many boolean clauses"
errors emanating from

org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108).
 
:)


Thanks for responding.



Erik Hatcher-4 wrote:
> 
> Yes, you may use POST to make search requests to Solr.
> 
>   Erik
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2522293.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: GET or POST for large queries?

2011-02-18 Thread mrw

Thanks for the response.

Yes, the queries are fairly large.  Basically, the corporate security policy
dictates that we use row-level security attributes from the DB for access
control to Solr.   So,  we bake row-level security attributes from the
database into the index, and then, at query time, ask for those same
attributes from the DB and pass them as part of the Solr query.  So, imagine
a bank VP with access to tens of thousands of customer records and
transactions, and all those access attributes get sent to Solr.  The system
works well for the low-level account managers and low-entitlement users, but
cannot scale for the high-level folks.

POSTing the data appears to avoid the header threshold issue, but it breaks
because of the "too many boolean clauses" error.




gearond wrote:
> 
> Probably you could do it, and solving a problem in business supersedes 
> 'rightness' concerns, much to the dismay of geeks and 'those who like
> rightness 
> and say the word "Neemph!" '. 
> 
> 
> the not rightness about this is that:
> POST, PUT, DELETE are assumed to make changes to the URL's backend.
> GET is assumed NOT to make changes.
> 
> So if your POST does not make a change . . . it breaks convention. But if
> it 
> solves the problem . . . :-)
> 
> Another way would be to GET with a 'query file' location, and then have
> the 
> server fetch that query and execute it.
> 
> Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs
> in 
> them :-)
> 
>  Dennis Gearon
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526934.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: GET or POST for large queries?

2011-02-18 Thread mrw

Thanks for the response and info.

I'll try that.  


Jonathan Rochkind wrote:
> 
> Yes, I think it's 1024 by default.  I think you can raise it in your 
> config. But your performance may suffer.
> 
> Best would be to try and find a better way to do what you want without 
> using thousands of clauses. This might require some custom Java plugins 
> to Solr though.
> 
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526950.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: GET or POST for large queries?

2011-02-18 Thread mrw

Thanks for the tip.  No, I did not know about that.  Unfortunately, we use
Oracle OLS which does not appear to be supported.


Jan Høydahl / Cominvent wrote:
> 
> Hi,
> 
> There are better ways to combat row level security in search than sending
> huge lists of users over the wire.
> 
> Have you checked out the ManifoldCF project with which you can integrate
> security to Solr? http://incubator.apache.org/connectors/
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> 
> 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html
Sent from the Solr - User mailing list archive at Nabble.com.


Understanding multi-field queries with q and fq

2011-02-18 Thread mrw


After searching this list, Google, and looking through the Pugh book, I am a
little confused about the right way to structure a query.

The Packt book uses the example of the MusicBrainz DB full of song metadata. 
What if they also had the song lyrics in English and German as files on
disk, and wanted to index them along with the metadata, so that each
document would basically have song title, artist, publisher, date, ...,
All_Metadata (copy field of all metadata fields), Text_English, and
Text_German fields?  

There can only be one default field, correct?  So if we want to search for
all songs containing (zeppelin AND (dog OR merle)) do we 

repeat the entire query text for all three major fields in the 'q' clause
(assuming we don't want to use the cache):

q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog
OR merle)+Text_German:(zeppelin AND (dog OR merle))

or repeat the entire query text for all three major fields in the 'fq'
clause (assuming we want to use the cache):

q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin
AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle))

?

Thanks!


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2528866.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding multi-field queries with q and fq

2011-02-28 Thread mrw
Hi, Otis.

I have been playing with dismax (defType=dismax, not qt=dismax -- not sure
about the difference). It looks like eDismax won't be available until Solr
3.1, correct?

We actually have to pass hundreds of Oracle OLS labels in each request for
each user (e.g., Loan Officer can see her customers' data, but VP can see
all customer data).   I've been passing them as an fq parameter, but have
recently learned that's bad, since fq parameters participate in caching.  

We obviously *only* want the label comparisons performed against the label
field. (Those values won't be present in the other search-able fields that
the dismax would run all query parameters against.)

Is there some dismax query magic that would allow us to match the labels in
an uncached manner against only the labels field, but match the user-entered
query against the qf fields?   If not, I think we're stuck with moving the
labels piece to q and the user query to fq and sticking with the standard
handler.


Thanks!



Otis Gospodnetic-2 wrote:
> 
> Hi mrw,
> 
> It sounds like you (e)dismax is what you should look into.  You didn't
> mention 
> it/them, so I'm assuming you're not aware of them.
> 
> See: http://search-lucene.com/?q=dismax+OR+edismax&fc_project=Solr
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Lucene ecosystem search :: http://search-lucene.com/
> 
> 
> 
> - Original Message 
>> From: mrw 
>> To: solr-user@lucene.apache.org
>> Sent: Fri, February 18, 2011 1:56:24 PM
>> Subject: Understanding multi-field queries with q and fq
>> 
>> 
>> 
>> After searching this list, Google, and looking through the Pugh book, I 
>> am a
>> little confused about the right way to structure a query.
>> 
>> The  Packt book uses the example of the MusicBrainz DB full of song
>> metadata. 
>> What if they also had the song lyrics in English and German as files  on
>> disk, and wanted to index them along with the metadata, so that  each
>> document would basically have song title, artist, publisher, date,  ...,
>> All_Metadata (copy field of all metadata fields), Text_English,  and
>> Text_German fields?  
>> 
>> There can only be one default field,  correct?  So if we want to search
>> for
>> all songs containing (zeppelin AND  (dog OR merle)) do we 
>> 
>> repeat the entire query text for all three major  fields in the 'q'
>> clause
>> (assuming we don't want to use the  cache):
>> 
>> q=(+All_Metadata:zeppelin AND (dog OR  merle)+Text_English:zeppelin AND
>> (dog
>> OR merle)+Text_German:(zeppelin AND  (dog OR merle))
>> 
>> or repeat the entire query text for all three major  fields in the 'fq'
>> clause (assuming we want to use the  cache):
>> 
>> q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR 
>> merle)+Text_English:zeppelin
>> AND (dog OR merle)+Text_German:zeppelin AND (dog  OR merle))
>> 
>> ?
>> 
>> Thanks!
>> 
>> 
>> -- 
>> View this message in  context: 
>>http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2528866.html
>>
>> Sent  from the Solr - User mailing list archive at Nabble.com.
>> 
> 
> 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2596242.html
Sent from the Solr - User mailing list archive at Nabble.com.


Basic Dismax syntax question

2011-02-28 Thread mrw
Say I have an index with first_name and last_name fields, and also a copy
field for the full name called full_name.  Say I add two employees:
Napoleon Bonaparte and Napoleon Dynamite.

If I search for just the first or last name, or both names, with mm=1, I get
the expected results:

q=Napoleon&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1  // 2
results
q=Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 // 2
results
q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1
// 2 results


However, if I try to search for both names with mm=2 (which I think means
term1 AND term2), I get 0 results:

q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2
// 0 results
q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=full_name&mm=2 // 0
results

I also see this when I put all fields (including the copy field) into the qf
parameter.


Thoughts?


Thanks!


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2596768.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Basic Dismax syntax question

2011-02-28 Thread mrw
They're all set to LC.  I was just coming up with a safe example to post.  

It sounds like you don't see an issue with the syntax we're using?

Thanks


tjpoe wrote:
> 
> i noticed that your search terms are using caps vs lower case, are your
> search fields perhaps not set to lowercase the terms and/or the search
> term?
> 
> On Mon, Feb 28, 2011 at 10:41 AM, mrw  wrote:
> 
>> Say I have an index with first_name and last_name fields, and also a copy
>> field for the full name called full_name.  Say I add two employees:
>> Napoleon Bonaparte and Napoleon Dynamite.
>>
>> If I search for just the first or last name, or both names, with mm=1, I
>> get
>> the expected results:
>>
>> q=Napoleon&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1  //
>> 2
>> results
>> q=Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 //
>> 2
>> results
>>
>> q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1
>> // 2 results
>>
>>
>> However, if I try to search for both names with mm=2 (which I think means
>> term1 AND term2), I get 0 results:
>>
>>
>> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2
>>// 0 results
>> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=full_name&mm=2 // 0
>> results
>>
>> I also see this when I put all fields (including the copy field) into the
>> qf
>> parameter.
>>
>>
>> Thoughts?
>>
>>
>> Thanks!
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2596768.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2597447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Basic Dismax syntax question

2011-02-28 Thread mrw
Fields are str type.

The issue happens regardless of case.  I just threw in some examples using
names to highlight the issue.  In the actual index, the data is the affected
fields is all LC, and I'm searching in LC.  

Sounds like the syntax looks okay to you? 


Thanks


iorixxx wrote:
> 
> 
> --- On Mon, 2/28/11, mrw  wrote:
> 
>> From: mrw 
>> Subject: Basic Dismax syntax question
>> To: solr-user@lucene.apache.org
>> Date: Monday, February 28, 2011, 7:41 PM
>> Say I have an index with first_name
>> and last_name fields, and also a copy
>> field for the full name called full_name.  Say I add
>> two employees:
>> Napoleon Bonaparte and Napoleon Dynamite.
>> 
>> If I search for just the first or last name, or both names,
>> with mm=1, I get
>> the expected results:
>> 
>> q=Napoleon&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 
>>     // 2
>> results
>> q=Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 
>>    // 2
>> results
>> q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1
>> // 2 results
>> 
>> 
>> However, if I try to search for both names with mm=2 (which
>> I think means
>> term1 AND term2), I get 0 results:
>> 
>> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2
>>     // 0 results
>> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=full_name&mm=2 
>>    // 0
>> results
>> 
>> I also see this when I put all fields (including the copy
>> field) into the qf
>> parameter.
> 
> &debugQuery=on will dump useful information. What is the field types of
> first_name, last_name and full_name? 
> 
> What happens when you query first letter uppercased?
> 
> q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2
> 
> 
> 
>  
> 
> 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2597510.html
Sent from the Solr - User mailing list archive at Nabble.com.


Disabling caching for fq param?

2011-02-28 Thread mrw
Based on what I've read here and what I could find on the web, it seems that
each fq clause essentially gets its own results cache.  Is that correct?

We have a corporate policy of passing the user's Oracle OLS labels into the
index in order to be matched against the labels field.  I currently separate
this from the user's query text by sticking it into an fq param...

?q=
&fq=labels:
&qf= 
&tie=0.1
&defType=dismax

...but since its value (a collection of hundreds of label values) only apply
to that user, the accompanying result set won't be reusable by other users:

My understanding is that this query will result in two result sets (q and
fq) being cached separately, with the union of the two sets being returned
to the user.  (Is that correct?)

There are thousands of users, each with a unique combination of labels, so
there seems to be little value in caching the result set created from the fq
labels param.  It would be beneficial if there were some kind of fq
parameter override to indicate to Solr to not cache the results?


Thanks!




-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2600188.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disabling caching for fq param?

2011-03-01 Thread mrw
We use fq params for filtering as well (not show in previous example), so we
only want to be able to override fq caching on a per-parameter basis (e.g.,
fq={!noCache userLabels} ).

Thanks


Markus Jelsma-2 wrote:
> 
> If filterCache hitratio is low then just disable it in solrconfig by
> deleting 
> the section or setting its values to 0.
> 
>> Based on what I've read here and what I could find on the web, it seems
>> that each fq clause essentially gets its own results cache.  Is that
>> correct?
>> 
>> We have a corporate policy of passing the user's Oracle OLS labels into
>> the
>> index in order to be matched against the labels field.  I currently
>> separate this from the user's query text by sticking it into an fq
>> param...
>> 
>> ?q=
>> &fq=labels:
>> &qf= 
>> &tie=0.1
>> &defType=dismax
>> 
>> ...but since its value (a collection of hundreds of label values) only
>> apply to that user, the accompanying result set won't be reusable by
>> other
>> users:
>> 
>> My understanding is that this query will result in two result sets (q and
>> fq) being cached separately, with the union of the two sets being
>> returned
>> to the user.  (Is that correct?)
>> 
>> There are thousands of users, each with a unique combination of labels,
>> so
>> there seems to be little value in caching the result set created from the
>> fq labels param.  It would be beneficial if there were some kind of fq
>> parameter override to indicate to Solr to not cache the results?
>> 
>> 
>> Thanks!
> 
> 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2602986.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Disabling caching for fq param?

2011-03-01 Thread mrw
That clause will always be the same per-user (i.e., you have values 1,2,4 and
I have values 1,2,8) across queries.  In the result set denoted by the
labels param, some users will have tens of thousands of documents and others
will have millions of documents.  

It sounds like you don't see a huge problem with our approach, so maybe
we'll stick with it for the time being.

Thanks!


Jonathan Rochkind wrote:
> 
> As far as I know there is not, it might be beneficial, but also worth
> considering: "thousands" of users isn't _that_ many, and if that same
> clause is always the same per user, then if the same user does a query a
> second time, it wouldn't hurt to have their user-specific fq in the cache. 
> A single fq cache may not take as much RAM as you think, you could
> potentially afford increase your fq cache size to
> thousands/tens-of-thousands, and win all the way around. 
> 
> The filter cache should be a least-recently-used-out-first cache, so even
> if the filter cache isn't big enough for all of them, fq's that are used
> by more than one user will probably stay in the cache as old user-specific
> fq's end up falling off the back as least-recently-used. 
> 
> So in actual practice, one way or another, it may not be a problem. 
> 
> From: mrw [mikerobertsw...@gmail.com]
> Sent: Monday, February 28, 2011 9:06 PM
> To: solr-user@lucene.apache.org
> Subject: Disabling caching for fq param?
> 
> Based on what I've read here and what I could find on the web, it seems
> that
> each fq clause essentially gets its own results cache.  Is that correct?
> 
> We have a corporate policy of passing the user's Oracle OLS labels into
> the
> index in order to be matched against the labels field.  I currently
> separate
> this from the user's query text by sticking it into an fq param...
> 
> ?q=
> &fq=labels:
> &qf= 
> &tie=0.1
> &defType=dismax
> 
> ...but since its value (a collection of hundreds of label values) only
> apply
> to that user, the accompanying result set won't be reusable by other
> users:
> 
> My understanding is that this query will result in two result sets (q and
> fq) being cached separately, with the union of the two sets being returned
> to the user.  (Is that correct?)
> 
> There are thousands of users, each with a unique combination of labels, so
> there seems to be little value in caching the result set created from the
> fq
> labels param.  It would be beneficial if there were some kind of fq
> parameter override to indicate to Solr to not cache the results?
> 
> 
> Thanks!
> 
> 
> 
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2600188.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2603009.html
Sent from the Solr - User mailing list archive at Nabble.com.


dismax query with no/empty/*:* q parameter?

2011-03-02 Thread mrw

For standard query handler fq-only queries, we used q=*:*.  However, with
dismax, that returns 0 results.  Are fq-only queries possible with dismax?  




Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/dismax-query-with-no-empty-q-parameter-tp2619170p2619170.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Understanding multi-field queries with q and fq

2011-03-02 Thread mrw
Anyone understand how to do boolean logic across multiple fields?  

Dismax is nice for searching multiple fields, but doesn't necessarily
support our syntax requirements. eDismax appears to be not available until
Solr 3.1.   

In the meantime, it looks like we need to support applying the user's query
to multiple fields, so if the user enters "led zeppelin merle" we need to be
able to do the logical equivalent of 

&fq=field1:led zeppelin merle OR field2:led zeppelin merle


Any ideas?  :)



mrw wrote:
> 
> After searching this list, Google, and looking through the Pugh book, I am
> a little confused about the right way to structure a query.
> 
> The Packt book uses the example of the MusicBrainz DB full of song
> metadata.  What if they also had the song lyrics in English and German as
> files on disk, and wanted to index them along with the metadata, so that
> each document would basically have song title, artist, publisher, date,
> ..., All_Metadata (copy field of all metadata fields), Text_English, and
> Text_German fields?  
> 
> There can only be one default field, correct?  So if we want to search for
> all songs containing (zeppelin AND (dog OR merle)) do we 
> 
> repeat the entire query text for all three major fields in the 'q' clause
> (assuming we don't want to use the cache):
> 
> q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND
> (dog OR merle)+Text_German:(zeppelin AND (dog OR merle))
> 
> or repeat the entire query text for all three major fields in the 'fq'
> clause (assuming we want to use the cache):
> 
> q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin
> AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle))
> 
> ?
> 
> Thanks!
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2619700.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: dismax query with no/empty/*:* q parameter?

2011-03-02 Thread mrw

Ah...so I need to be doing 

&q.alt=*:*
&fq=:.

Of course, now that you showed me what I look for, I also see the
explanation in the Packt book.  Sheesh.

Thanks for the tip!


Chris Hostetter-3 wrote:
> 
> : For standard query handler fq-only queries, we used q=*:*.  However,
> with
> : dismax, that returns 0 results.  Are fq-only queries possible with
> dismax?  
> 
> they are if you use the q.alt param.
> 
> http://wiki.apache.org/solr/DisMaxQParserPlugin#q.alt
> 
> -Hoss
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/dismax-query-with-no-empty-q-parameter-tp2619170p2620158.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-02 Thread mrw

Looks nice.

Might be also worth it to create a page with large query field for pasting
in complete URL-encoded queries that cross cores, etc.  I did that at work
(via ASP.net) so we could paste in queries from logs and debug them.  We
tend to use that quite a bit.


Cheers


Stefan Matheis wrote:
> 
> Hi List,
> 
> given that fact that my java-knowledge is sort of non-existing .. my 
> idea was to rework the Solr Admin Interface.
> 
> Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy, 
> but it was an idea few weeks ago - and i would like to contrib 
> something, a thing which has to be non-java but not useless - hopefully ;)
> 
> Actually it's completly work-in-progress .. but i'm interested in what 
> you guys think. Right direction? Completly Wrong, just drop it?
> 
> http://files.mathe.is/solr-admin/01_dashboard.png
> http://files.mathe.is/solr-admin/02_query.png
> http://files.mathe.is/solr-admin/03_schema.png
> http://files.mathe.is/solr-admin/04_analysis.png
> http://files.mathe.is/solr-admin/05_plugins.png
> 
> It's actually using one index.jsp to generate to basic frame, including 
> cores and their navigation. Everything else is loaded via existing 
> SolrAdminHandler.
> 
> Any Questions, Ideas, Thoughts outta there? Please, let me know :)
> 
> Regards
> Stefan
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2620745.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dismax, q, q.alt, and defaultSearchField?

2011-03-02 Thread mrw
We have two banks of Solr nodes with identical schemas.  The data I'm
searching for is in both banks.

One has defaultSearchField set to field1, the other has defaultSearchField
set to field2.

We need to support both user queries and facet queries that have no user
content.  For the latter, it appears I need to use q.alt=*:*, so I am
investigating also using q.alt for user content (e.g., q.alt=banana).

I run the following query:

q.alt=banana
&defType=dismax
&mm=1
&tie=0.1
&qf=field1+field2


On bank one, I get the expected results, but on bank two, I get 0 results.

I noticed (via debugQuery=true), that when I use q.alt, it resolves using
the defaultSearchField (e.g., field1:banana), not the value of the qf param. 
Therefore, I get different results.

If I switched to using q for user queries and q.alt for facet queries, I
would still get different results, because q would resolve against the
fields in the qf param, and q.alt would resolve against the default search
field.

Is there a way to override this behavior in order to get consistent results?

Thanks!






--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2621061.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Admin Interface, reworked - Go on? Go away?

2011-03-03 Thread mrw

Picture the URI field above the response field, only half-screen.  This
facilitates breaking the query apart on different lines in order to debug
it.  

When you have a lot of shards, fq clauses, etc., you end up with a very long
URI that is difficult to get your head around and manipulate.  We take
queries from the logs, split them around parameters, take the shards out,
put the shards back in, take the OLS labels out, put them back in, etc. 
With long, complex queries, it's essential to have a large work space to
play in. :)




Stefan Matheis wrote:
> 
> mrw,
> 
> you mean a field like here 
> (http://files.mathe.is/solr-admin/02_query.png) on the right side, 
> between meta-navigation and plain solr-xml response?
> 
> actually it's just to display the computed url, but if so .. we could 
> use a larger field for that, of course :)
> 
> Regards
> Stefan
> 
> Am 02.03.2011 22:31, schrieb mrw:
>>
>> Looks nice.
>>
>> Might be also worth it to create a page with large query field for
>> pasting
>> in complete URL-encoded queries that cross cores, etc.  I did that at
>> work
>> (via ASP.net) so we could paste in queries from logs and debug them.  We
>> tend to use that quite a bit.
>>
>>
>> Cheers
>>
>>
>> Stefan Matheis wrote:
>>>
>>> Hi List,
>>>
>>> given that fact that my java-knowledge is sort of non-existing .. my
>>> idea was to rework the Solr Admin Interface.
>>>
>>> Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy,
>>> but it was an idea few weeks ago - and i would like to contrib
>>> something, a thing which has to be non-java but not useless - hopefully
>>> ;)
>>>
>>> Actually it's completly work-in-progress .. but i'm interested in what
>>> you guys think. Right direction? Completly Wrong, just drop it?
>>>
>>> http://files.mathe.is/solr-admin/01_dashboard.png
>>> http://files.mathe.is/solr-admin/02_query.png
>>> http://files.mathe.is/solr-admin/03_schema.png
>>> http://files.mathe.is/solr-admin/04_analysis.png
>>> http://files.mathe.is/solr-admin/05_plugins.png
>>>
>>> It's actually using one index.jsp to generate to basic frame, including
>>> cores and their navigation. Everything else is loaded via existing
>>> SolrAdminHandler.
>>>
>>> Any Questions, Ideas, Thoughts outta there? Please, let me know :)
>>>
>>> Regards
>>> Stefan
>>>
>>
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2620745.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2624956.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Understanding multi-field queries with q and fq

2011-03-03 Thread mrw
Yes, we're investigating dismax (with the qf param), but we're not sure it
supports our syntax needs.  The users want to put put AND/OR/NOT in their
queries, and we don't want to write a lot of code converting those queries
into dismax (+/-/mm) format.  So, until 3.1 (edismax) ships, we're also
trying to get boolean queries to work across multiple fields with the
standard query handler.

I've seen quite a few unanswered or partially-answered posts on this list on
getting boolean syntax right.  I can tell it's a thorny issue.


Robert Sandiford wrote:
> 
> Have you looked at the 'qf' parameter?
> 
> Bob Sandiford | Lead Software Engineer | SirsiDynix
> P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
> www.sirsidynix.com 
> _
> http://www.cosugi.org/ 
> 
> 
> 
> 
>> -Original Message-
>> From: mrw [mailto:mikerobertsw...@gmail.com]
>> Sent: Wednesday, March 02, 2011 2:28 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Understanding multi-field queries with q and fq
>> 
>> Anyone understand how to do boolean logic across multiple fields?
>> 
>> Dismax is nice for searching multiple fields, but doesn't necessarily
>> support our syntax requirements. eDismax appears to be not available
>> until
>> Solr 3.1.
>> 
>> In the meantime, it looks like we need to support applying the user's
>> query
>> to multiple fields, so if the user enters "led zeppelin merle" we need
>> to be
>> able to do the logical equivalent of
>> 
>> &fq=field1:led zeppelin merle OR field2:led zeppelin merle
>> 
>> 
>> Any ideas?  :)
>> 
>> 
>> 
>> mrw wrote:
>> >
>> > After searching this list, Google, and looking through the Pugh book,
>> I am
>> > a little confused about the right way to structure a query.
>> >
>> > The Packt book uses the example of the MusicBrainz DB full of song
>> > metadata.  What if they also had the song lyrics in English and
>> German as
>> > files on disk, and wanted to index them along with the metadata, so
>> that
>> > each document would basically have song title, artist, publisher,
>> date,
>> > ..., All_Metadata (copy field of all metadata fields), Text_English,
>> and
>> > Text_German fields?
>> >
>> > There can only be one default field, correct?  So if we want to
>> search for
>> > all songs containing (zeppelin AND (dog OR merle)) do we
>> >
>> > repeat the entire query text for all three major fields in the 'q'
>> clause
>> > (assuming we don't want to use the cache):
>> >
>> > q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin
>> AND
>> > (dog OR merle)+Text_German:(zeppelin AND (dog OR merle))
>> >
>> > or repeat the entire query text for all three major fields in the
>> 'fq'
>> > clause (assuming we want to use the cache):
>> >
>> > q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR
>> merle)+Text_English:zeppelin
>> > AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle))
>> >
>> > ?
>> >
>> > Thanks!
>> >
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-
>> with-q-and-fq-tp2528866p2619700.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2625068.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax, q, q.alt, and defaultSearchField?

2011-03-03 Thread mrw
Thanks, Jan.

It looks like we need to do is use both q and q.alt, such that q.alt is
always "*:*" and q is either empty for filter-only queries, or has the user
text.  That seems to work.


Jan Høydahl / Cominvent wrote:
> 
> Hi,
> 
> Try
> q.alt={!dismax}banana
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> 
> On 2. mars 2011, at 23.06, mrw wrote:
> 
>> We have two banks of Solr nodes with identical schemas.  The data I'm
>> searching for is in both banks.
>> 
>> One has defaultSearchField set to field1, the other has
>> defaultSearchField
>> set to field2.
>> 
>> We need to support both user queries and facet queries that have no user
>> content.  For the latter, it appears I need to use q.alt=*:*, so I am
>> investigating also using q.alt for user content (e.g., q.alt=banana).
>> 
>> I run the following query:
>> 
>> q.alt=banana
>> &defType=dismax
>> &mm=1
>> &tie=0.1
>> &qf=field1+field2
>> 
>> 
>> On bank one, I get the expected results, but on bank two, I get 0
>> results.
>> 
>> I noticed (via debugQuery=true), that when I use q.alt, it resolves using
>> the defaultSearchField (e.g., field1:banana), not the value of the qf
>> param. 
>> Therefore, I get different results.
>> 
>> If I switched to using q for user queries and q.alt for facet queries, I
>> would still get different results, because q would resolve against the
>> fields in the qf param, and q.alt would resolve against the default
>> search
>> field.
>> 
>> Is there a way to override this behavior in order to get consistent
>> results?
>> 
>> Thanks!
>> 
>> 
>> 
>> 
>> 
>> 
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2621061.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2627134.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dismax: field not returned unless in sort clause?

2011-03-15 Thread mrw
We have a "D" field (string, indexed, stored, not required) that is returned
* when we search with the standard request handler
* when we search with dismax request handler _and the field is specified in
the sort parameter_

but is not returned when using the dismax handler and the field is not
specified in the sort param.

IOW, if I do the following query (no sort param), I get all the expected
results, but the D field never comes back...

&q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D

...but if I add "D" to the sort param, the D field comes back on every
single record

&q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20asc
&q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20desc

If I omit the fl param, I see that all of our other fields appear to be
returned on every result without any need to specify them in the sort param.  

Obviously, I cannot hard-code the sort order around the D field.  :)

Any ideas?   


Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-field-not-returned-unless-in-sort-clause-tp2681447p2681447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dismax: field not returned unless in sort clause?

2011-03-16 Thread mrw
No, not setting those options in the query or schema.xml file.

I'll try what you said, however.


Thanks


Chris Hostetter-3 wrote:
> 
> : We have a "D" field (string, indexed, stored, not required) that is
> returned
> : * when we search with the standard request handler
> : * when we search with dismax request handler _and the field is specified
> in
> : the sort parameter_
> : 
> : but is not returned when using the dismax handler and the field is not
> : specified in the sort param.
> 
> are you using one of the "sortMissing" options on D or it's fieldType?
> 
> I'm guessing you have sortMissingLast="true" for D, so anytime you sort on 
> it the docs that do have a value appear first.  but when you don't sort on 
> it, other factors probably lead docs that don't have a value for the D 
> field to appear first -- solr doesn't include fields in docs that don't 
> have any value for that field.
> 
> if my guess is correct, adding "fq=D:[* TO *] to any of your queries will 
> cause the total number of results to shrink, but the first page of results 
> for your requests that don't sort on D will look exactly the same.
> 
> the LUkeRequestHandler will help you see how many docs in your index don't 
> have any values indexed in the "D" field.
> 
> 
> -Hoss
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dismax-field-not-returned-unless-in-sort-clause-tp2681447p2688039.html
Sent from the Solr - User mailing list archive at Nabble.com.


Result docs missing only when shards parameter present in query?

2011-05-11 Thread mrw

We have two Solr nodes, each with multiple shards.  If we query each shard
directly (no shards parameter), we get the expected results:

response
   lst name="responseHeader"
   int name="status" 0
   int name="QTime"  22
   result name="response" numFound="100" start="0"
doc
doc
  
(^^^ hand-typed pseudo XML)

However, if we add the shards parameter and even supply one of the above
shards, we get the same number of results, but all the doc elements under
the result element are missing:

response
   lst name="responseHeader"
   int name="status" 0
   int name="QTime"  33
   result name="response" numFound="100" start="0"
   

(^^^ note missing doc elements)

It doesn't matter which shard is specified in the shards parameter;  if any
or all of the shards are specified after the shards parameter, we see this
behavior.

When we go to http://:8983/solr/  on either node, we see all the
shards properly listed.  

So, the shards seem to be registered properly, and work individually, but
not when the shards parameter is supplied.   Any ideas?


Thanks!


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-docs-missing-only-when-shards-parameter-present-in-query-tp2928889p2928889.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Result docs missing only when shards parameter present in query?

2011-05-12 Thread mrw

Does this seem like it would be a configuration issue, an indexed data
issue, or something else?

Thanks


mrw wrote:
> 
> We have two Solr nodes, each with multiple shards.  If we query each shard
> directly (no shards parameter), we get the expected results:
> 
> response
>lst name="responseHeader"
>int name="status" 0
>int name="QTime"  22
>result name="response" numFound="100" start="0"
> doc
> doc
>   
> (^^^ hand-typed pseudo XML)
> 
> However, if we add the shards parameter and even supply one of the above
> shards, we get the same number of results, but all the doc elements under
> the result element are missing:
> 
> response
>lst name="responseHeader"
>int name="status" 0
>int name="QTime"  33
>result name="response" numFound="100" start="0"
>
> 
> (^^^ note missing doc elements)
> 
> It doesn't matter which shard is specified in the shards parameter;  if
> any or all of the shards are specified after the shards parameter, we see
> this behavior.
> 
> When we go to http://:8983/solr/  on either node, we see all the
> shards properly listed.  
> 
> So, the shards seem to be registered properly, and work individually, but
> not when the shards parameter is supplied.   Any ideas?
> 
> 
> Thanks!
> 


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Result-docs-missing-only-when-shards-parameter-present-in-query-tp2928889p2932248.html
Sent from the Solr - User mailing list archive at Nabble.com.