Different behavior for q=goo.com vs q=@goo.com in queries?
Using Lucid's Solr 1.4 distribution, if I index my email inbox and then search it by passing in different email expressions, I notice that I get different results based on whether the '@' character is included, even though the character is present in every email address in the field I'm searching. For example, q=goo.com returns multiple items, as expected. However, q...@goo.com return no results. Since every address containing "goo.com" also contains "@goo.com," I would expect the same number of results. I get this from both the Solr admin console and from my application, which URL-encodes the query. I Googled, searched the Wiki, and grepped the Pugh and Lucid books, but don't see anything about this. Ideas? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2168935.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Different behavior for q=goo.com vs q=@goo.com in queries?
Basically, just what you've suggested. I did the field/query analysis piece with verbose output. Not entirely sure how to interpret the results, of course. Currently reading anything I can find on that. Thanks Erick Erickson wrote: > > What steps have you taken to figure out whether the > contents of your index are what you think? I suspect > that the fields you're indexing aren't being > analyzed/tokenized quite the way you expect either at > query time or index time (or maybe both!). > > Take a look at the admin/analysis page for the field you're indexing > the data into. If that doesn't shed any light on the problem, > please paste in the definition for the field in question, > maybe another set of eyes can see the issue. > > Best > Erick > > > > -- View this message in context: http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2169478.html Sent from the Solr - User mailing list archive at Nabble.com.
Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?
I've noticed that performing a query with facet.mincount=0 and no fq clauses results in a response where only facets with non-zero counts are returned, but adding in an fq clause (caused by a user selecting a non-zero-valued facet value checkbox) actually causes a bunch of 0-count facet values completely unrelated to the query to be returned. Is adding the fq constraint actually widening the query before facet.mincount gets applied? E.g., say a query with no fq constraint produces the following facet values: ID 1234 (1) (15) 1010 (30) Title Red (11) Green (15) Blue (32) but when the user selects Blue (32), and I add &fq=Color:Blue, Solr returns the following: ID 1 (0) 2 (0) 3 (0) ... 99 (0) 100 (0) Color Orange (0) Teal (0) Red (0) Green (0) Blue (32) Notice how, before the fq clause is added, none of the 0-count facets are returned, even though facet.mincount = 0, but afterward, a bunch of 0-count facets are returned? The context of my question is trying to solve a problem where the application must display facet values with a count of zero as filtering operations remove them from the result set. That is, if Red (10) was displayed after the initial query, but the user filters on Blue (32), then we must still display Red (0) so the user can select it and widen the query. Initially, we were using mincount=1 and managing the missing facets entirely within the application, but now I'm trying to see if we can use mincount=0 and maybe some other constraints to achieve the same behavior without a lot of custom code in the application. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236105.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?
>> Notice how, before the fq clause is added, none of the >> 0-count facets are >> returned, even though facet.mincount = 0, but afterward, a >> bunch of 0-count >> facets are returned? >> > This is normal. What's behind that? Is it widening the results before the mincount constraint is being applied? > I couldn't fully follow, but you want something like multi-select > faceting? > > http://search-lucene.com/ is an example for that, user can select solr and > lucene from the project facet > at the same time. > > http://wiki.apache.org/solr/SimpleFacetParameters#Multi-Select_Faceting_and_LocalParams No. Search-Lucene actually appears to remove facets when they're not returned. If you select Blue, and Red is eliminated, Red won't show up as a facet anymore. Therefore, the user select Red to add it back into the result set. Multi-selection keeps eliminated facets, but gives them virtual counts related to the entire result set. If you select Blue(32) and Red(10) is eliminated, multi-selection causes Red(10) to be displayed. Therefore, the user can't tell that Red was eliminated, and the Red facet no longer has any connection to the values int the result set. What we need to do is show eliminated facets with a 0 count, so if you select Blue (32) and Red (10) is eliminated, we show Red (0). That indicates that there are zero documents in the result set for Red, but Red can still be selected to add Red documents back into the result set. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236309.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?
iorixxx wrote: > > > After re-reading, it is not normal that none of the 0-count facets are > showing up. Can you give us full parameter list that you obtain this > by adding &echoParams=all to your search URL? > > May be you limit facets to three in your first query? What happens when > you add &facet.limit=-1? > > We're actually using the default facet.limit value of 100. I will increase it to 200 and see if the non-zero-count facets show up. Maybe that was causing my confusion. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Adding fq to query with mincount=0 causes unexpected 0-count facet values to be returned?
mrw wrote: > > > We're actually using the default facet.limit value of 100. I will > increase it to 200 and see if the non-zero-count facets show up. Maybe > that was causing my confusion. > Yep -- the 0-count facets were not being returned due to the facet.limit cutoff. So, unless there is another parameter that can be used with facet.mincount=0 in order to tune the results, it looks like I will need to use facet.mincount=1 and handle the processing of omitted facets in the application. Thanks for the help. -- View this message in context: http://lucene.472066.n3.nabble.com/Adding-fq-to-query-with-mincount-0-causes-unexpected-0-count-facet-values-to-be-returned-tp2236105p2236801.html Sent from the Solr - User mailing list archive at Nabble.com.
Changing value of start parameter affects numFound?
I have a data set indexed over two irons, with M docs per Solr core for a total of N cores. If I perform a query across all N cores with start=0 and rows=30, I get, say, numFound=27521). If I simply change the start param to start=27510 (simulating being on the last page of data), I get a smaller result set (say, numFound=21415). I had expected numFound to be the same in either case, since no other aspect of the query had changed. Am I mistaken? I'm using Solr 1.4.1.955763M. Faceting is enabled on the query. All cores have the same schema. Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Changing-value-of-start-parameter-affects-numFound-tp2460645p2460645.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Changing value of start parameter affects numFound?
mrw wrote: > > I have a data set indexed over two irons, with M docs per Solr core for a > total of N cores. > > If I perform a query across all N cores with start=0 and rows=30, I get, > say, numFound=27521). If I simply change the start param to start=27510 > (simulating being on the last page of data), I get a smaller result set > (say, numFound=21415). > > I had expected numFound to be the same in either case, since no other > aspect of the query had changed. Am I mistaken? > > I'm using Solr 1.4.1.955763M. Faceting is enabled on the query. All cores > have the same schema. > > Thanks! > More detail: numFound seems to vary unpredictably based on start value. start, numFound -- 0-46, 27521 47-59, 27520 60, 27519 61-91, 27518 62, 27517 Any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Changing-value-of-start-parameter-affects-numFound-tp2460645p2460795.html Sent from the Solr - User mailing list archive at Nabble.com.
GET or POST for large queries?
We are running into some issues with large queries. Initially, they were ostensibly header buffer overruns, because increasing Jetty's headerBufferSize value to 65536 resolved them. This seems like a kludge, but it does solve the problem for 95% of our users. However, we do have queries that are physically larger than that and for which increasing the headerBufferSize to 65536 does not work. This is due to security requirements: Security descriptors are baked into the index, and then potentially thousands of them (depending on the user context) are passed in with each query. These excessive queries are only a problem with approximately 5% of users who are highly entitled, but the number of security descriptors in are likely to increase and we won't have a workaround for this security policy any time soon. After a lot of Googling, it seems to me that it's common to increase the headerBufferSize, but I don't see any other strategies. Is it possible/feasible to switch to use POST for querying? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Yeah, I tried switching to POST. It seems to be handling the size, but apparently Solr has a limit on the number of boolean comparisons -- I'm now getting "too many boolean clauses" errors emanating from org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108). :) Thanks for responding. Erik Hatcher-4 wrote: > > Yes, you may use POST to make search requests to Solr. > > Erik > > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2522293.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Thanks for the response. Yes, the queries are fairly large. Basically, the corporate security policy dictates that we use row-level security attributes from the DB for access control to Solr. So, we bake row-level security attributes from the database into the index, and then, at query time, ask for those same attributes from the DB and pass them as part of the Solr query. So, imagine a bank VP with access to tens of thousands of customer records and transactions, and all those access attributes get sent to Solr. The system works well for the low-level account managers and low-entitlement users, but cannot scale for the high-level folks. POSTing the data appears to avoid the header threshold issue, but it breaks because of the "too many boolean clauses" error. gearond wrote: > > Probably you could do it, and solving a problem in business supersedes > 'rightness' concerns, much to the dismay of geeks and 'those who like > rightness > and say the word "Neemph!" '. > > > the not rightness about this is that: > POST, PUT, DELETE are assumed to make changes to the URL's backend. > GET is assumed NOT to make changes. > > So if your POST does not make a change . . . it breaks convention. But if > it > solves the problem . . . :-) > > Another way would be to GET with a 'query file' location, and then have > the > server fetch that query and execute it. > > Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs > in > them :-) > > Dennis Gearon > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526934.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Thanks for the response and info. I'll try that. Jonathan Rochkind wrote: > > Yes, I think it's 1024 by default. I think you can raise it in your > config. But your performance may suffer. > > Best would be to try and find a better way to do what you want without > using thousands of clauses. This might require some custom Java plugins > to Solr though. > > > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526950.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: GET or POST for large queries?
Thanks for the tip. No, I did not know about that. Unfortunately, we use Oracle OLS which does not appear to be supported. Jan Høydahl / Cominvent wrote: > > Hi, > > There are better ways to combat row level security in search than sending > huge lists of users over the wire. > > Have you checked out the ManifoldCF project with which you can integrate > security to Solr? http://incubator.apache.org/connectors/ > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > > -- View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html Sent from the Solr - User mailing list archive at Nabble.com.
Understanding multi-field queries with q and fq
After searching this list, Google, and looking through the Pugh book, I am a little confused about the right way to structure a query. The Packt book uses the example of the MusicBrainz DB full of song metadata. What if they also had the song lyrics in English and German as files on disk, and wanted to index them along with the metadata, so that each document would basically have song title, artist, publisher, date, ..., All_Metadata (copy field of all metadata fields), Text_English, and Text_German fields? There can only be one default field, correct? So if we want to search for all songs containing (zeppelin AND (dog OR merle)) do we repeat the entire query text for all three major fields in the 'q' clause (assuming we don't want to use the cache): q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:(zeppelin AND (dog OR merle)) or repeat the entire query text for all three major fields in the 'fq' clause (assuming we want to use the cache): q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) ? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2528866.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Understanding multi-field queries with q and fq
Hi, Otis. I have been playing with dismax (defType=dismax, not qt=dismax -- not sure about the difference). It looks like eDismax won't be available until Solr 3.1, correct? We actually have to pass hundreds of Oracle OLS labels in each request for each user (e.g., Loan Officer can see her customers' data, but VP can see all customer data). I've been passing them as an fq parameter, but have recently learned that's bad, since fq parameters participate in caching. We obviously *only* want the label comparisons performed against the label field. (Those values won't be present in the other search-able fields that the dismax would run all query parameters against.) Is there some dismax query magic that would allow us to match the labels in an uncached manner against only the labels field, but match the user-entered query against the qf fields? If not, I think we're stuck with moving the labels piece to q and the user query to fq and sticking with the standard handler. Thanks! Otis Gospodnetic-2 wrote: > > Hi mrw, > > It sounds like you (e)dismax is what you should look into. You didn't > mention > it/them, so I'm assuming you're not aware of them. > > See: http://search-lucene.com/?q=dismax+OR+edismax&fc_project=Solr > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: mrw >> To: solr-user@lucene.apache.org >> Sent: Fri, February 18, 2011 1:56:24 PM >> Subject: Understanding multi-field queries with q and fq >> >> >> >> After searching this list, Google, and looking through the Pugh book, I >> am a >> little confused about the right way to structure a query. >> >> The Packt book uses the example of the MusicBrainz DB full of song >> metadata. >> What if they also had the song lyrics in English and German as files on >> disk, and wanted to index them along with the metadata, so that each >> document would basically have song title, artist, publisher, date, ..., >> All_Metadata (copy field of all metadata fields), Text_English, and >> Text_German fields? >> >> There can only be one default field, correct? So if we want to search >> for >> all songs containing (zeppelin AND (dog OR merle)) do we >> >> repeat the entire query text for all three major fields in the 'q' >> clause >> (assuming we don't want to use the cache): >> >> q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND >> (dog >> OR merle)+Text_German:(zeppelin AND (dog OR merle)) >> >> or repeat the entire query text for all three major fields in the 'fq' >> clause (assuming we want to use the cache): >> >> q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR >> merle)+Text_English:zeppelin >> AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) >> >> ? >> >> Thanks! >> >> >> -- >> View this message in context: >>http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2528866.html >> >> Sent from the Solr - User mailing list archive at Nabble.com. >> > > -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2596242.html Sent from the Solr - User mailing list archive at Nabble.com.
Basic Dismax syntax question
Say I have an index with first_name and last_name fields, and also a copy field for the full name called full_name. Say I add two employees: Napoleon Bonaparte and Napoleon Dynamite. If I search for just the first or last name, or both names, with mm=1, I get the expected results: q=Napoleon&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 // 2 results q=Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 // 2 results q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 // 2 results However, if I try to search for both names with mm=2 (which I think means term1 AND term2), I get 0 results: q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2 // 0 results q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=full_name&mm=2 // 0 results I also see this when I put all fields (including the copy field) into the qf parameter. Thoughts? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2596768.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Basic Dismax syntax question
They're all set to LC. I was just coming up with a safe example to post. It sounds like you don't see an issue with the syntax we're using? Thanks tjpoe wrote: > > i noticed that your search terms are using caps vs lower case, are your > search fields perhaps not set to lowercase the terms and/or the search > term? > > On Mon, Feb 28, 2011 at 10:41 AM, mrw wrote: > >> Say I have an index with first_name and last_name fields, and also a copy >> field for the full name called full_name. Say I add two employees: >> Napoleon Bonaparte and Napoleon Dynamite. >> >> If I search for just the first or last name, or both names, with mm=1, I >> get >> the expected results: >> >> q=Napoleon&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 // >> 2 >> results >> q=Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 // >> 2 >> results >> >> q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 >> // 2 results >> >> >> However, if I try to search for both names with mm=2 (which I think means >> term1 AND term2), I get 0 results: >> >> >> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2 >>// 0 results >> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=full_name&mm=2 // 0 >> results >> >> I also see this when I put all fields (including the copy field) into the >> qf >> parameter. >> >> >> Thoughts? >> >> >> Thanks! >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2596768.html >> Sent from the Solr - User mailing list archive at Nabble.com. > > -- View this message in context: http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2597447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Basic Dismax syntax question
Fields are str type. The issue happens regardless of case. I just threw in some examples using names to highlight the issue. In the actual index, the data is the affected fields is all LC, and I'm searching in LC. Sounds like the syntax looks okay to you? Thanks iorixxx wrote: > > > --- On Mon, 2/28/11, mrw wrote: > >> From: mrw >> Subject: Basic Dismax syntax question >> To: solr-user@lucene.apache.org >> Date: Monday, February 28, 2011, 7:41 PM >> Say I have an index with first_name >> and last_name fields, and also a copy >> field for the full name called full_name. Say I add >> two employees: >> Napoleon Bonaparte and Napoleon Dynamite. >> >> If I search for just the first or last name, or both names, >> with mm=1, I get >> the expected results: >> >> q=Napoleon&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 >> // 2 >> results >> q=Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 >> // 2 >> results >> q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=1 >> // 2 results >> >> >> However, if I try to search for both names with mm=2 (which >> I think means >> term1 AND term2), I get 0 results: >> >> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2 >> // 0 results >> q=napoleon%20bonaparte&defType=dismax&tie=0.1&qf=full_name&mm=2 >> // 0 >> results >> >> I also see this when I put all fields (including the copy >> field) into the qf >> parameter. > > &debugQuery=on will dump useful information. What is the field types of > first_name, last_name and full_name? > > What happens when you query first letter uppercased? > > q=Napoleon%20Bonaparte&defType=dismax&tie=0.1&qf=first_name%20last_name&mm=2 > > > > > > -- View this message in context: http://lucene.472066.n3.nabble.com/Different-behavior-for-q-goo-com-vs-q-goo-com-in-queries-tp2168935p2597510.html Sent from the Solr - User mailing list archive at Nabble.com.
Disabling caching for fq param?
Based on what I've read here and what I could find on the web, it seems that each fq clause essentially gets its own results cache. Is that correct? We have a corporate policy of passing the user's Oracle OLS labels into the index in order to be matched against the labels field. I currently separate this from the user's query text by sticking it into an fq param... ?q= &fq=labels: &qf= &tie=0.1 &defType=dismax ...but since its value (a collection of hundreds of label values) only apply to that user, the accompanying result set won't be reusable by other users: My understanding is that this query will result in two result sets (q and fq) being cached separately, with the union of the two sets being returned to the user. (Is that correct?) There are thousands of users, each with a unique combination of labels, so there seems to be little value in caching the result set created from the fq labels param. It would be beneficial if there were some kind of fq parameter override to indicate to Solr to not cache the results? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2600188.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Disabling caching for fq param?
We use fq params for filtering as well (not show in previous example), so we only want to be able to override fq caching on a per-parameter basis (e.g., fq={!noCache userLabels} ). Thanks Markus Jelsma-2 wrote: > > If filterCache hitratio is low then just disable it in solrconfig by > deleting > the section or setting its values to 0. > >> Based on what I've read here and what I could find on the web, it seems >> that each fq clause essentially gets its own results cache. Is that >> correct? >> >> We have a corporate policy of passing the user's Oracle OLS labels into >> the >> index in order to be matched against the labels field. I currently >> separate this from the user's query text by sticking it into an fq >> param... >> >> ?q= >> &fq=labels: >> &qf= >> &tie=0.1 >> &defType=dismax >> >> ...but since its value (a collection of hundreds of label values) only >> apply to that user, the accompanying result set won't be reusable by >> other >> users: >> >> My understanding is that this query will result in two result sets (q and >> fq) being cached separately, with the union of the two sets being >> returned >> to the user. (Is that correct?) >> >> There are thousands of users, each with a unique combination of labels, >> so >> there seems to be little value in caching the result set created from the >> fq labels param. It would be beneficial if there were some kind of fq >> parameter override to indicate to Solr to not cache the results? >> >> >> Thanks! > > -- View this message in context: http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2602986.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Disabling caching for fq param?
That clause will always be the same per-user (i.e., you have values 1,2,4 and I have values 1,2,8) across queries. In the result set denoted by the labels param, some users will have tens of thousands of documents and others will have millions of documents. It sounds like you don't see a huge problem with our approach, so maybe we'll stick with it for the time being. Thanks! Jonathan Rochkind wrote: > > As far as I know there is not, it might be beneficial, but also worth > considering: "thousands" of users isn't _that_ many, and if that same > clause is always the same per user, then if the same user does a query a > second time, it wouldn't hurt to have their user-specific fq in the cache. > A single fq cache may not take as much RAM as you think, you could > potentially afford increase your fq cache size to > thousands/tens-of-thousands, and win all the way around. > > The filter cache should be a least-recently-used-out-first cache, so even > if the filter cache isn't big enough for all of them, fq's that are used > by more than one user will probably stay in the cache as old user-specific > fq's end up falling off the back as least-recently-used. > > So in actual practice, one way or another, it may not be a problem. > > From: mrw [mikerobertsw...@gmail.com] > Sent: Monday, February 28, 2011 9:06 PM > To: solr-user@lucene.apache.org > Subject: Disabling caching for fq param? > > Based on what I've read here and what I could find on the web, it seems > that > each fq clause essentially gets its own results cache. Is that correct? > > We have a corporate policy of passing the user's Oracle OLS labels into > the > index in order to be matched against the labels field. I currently > separate > this from the user's query text by sticking it into an fq param... > > ?q= > &fq=labels: > &qf= > &tie=0.1 > &defType=dismax > > ...but since its value (a collection of hundreds of label values) only > apply > to that user, the accompanying result set won't be reusable by other > users: > > My understanding is that this query will result in two result sets (q and > fq) being cached separately, with the union of the two sets being returned > to the user. (Is that correct?) > > There are thousands of users, each with a unique combination of labels, so > there seems to be little value in caching the result set created from the > fq > labels param. It would be beneficial if there were some kind of fq > parameter override to indicate to Solr to not cache the results? > > > Thanks! > > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2600188.html > Sent from the Solr - User mailing list archive at Nabble.com. > > -- View this message in context: http://lucene.472066.n3.nabble.com/Disabling-caching-for-fq-param-tp2600188p2603009.html Sent from the Solr - User mailing list archive at Nabble.com.
dismax query with no/empty/*:* q parameter?
For standard query handler fq-only queries, we used q=*:*. However, with dismax, that returns 0 results. Are fq-only queries possible with dismax? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/dismax-query-with-no-empty-q-parameter-tp2619170p2619170.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Understanding multi-field queries with q and fq
Anyone understand how to do boolean logic across multiple fields? Dismax is nice for searching multiple fields, but doesn't necessarily support our syntax requirements. eDismax appears to be not available until Solr 3.1. In the meantime, it looks like we need to support applying the user's query to multiple fields, so if the user enters "led zeppelin merle" we need to be able to do the logical equivalent of &fq=field1:led zeppelin merle OR field2:led zeppelin merle Any ideas? :) mrw wrote: > > After searching this list, Google, and looking through the Pugh book, I am > a little confused about the right way to structure a query. > > The Packt book uses the example of the MusicBrainz DB full of song > metadata. What if they also had the song lyrics in English and German as > files on disk, and wanted to index them along with the metadata, so that > each document would basically have song title, artist, publisher, date, > ..., All_Metadata (copy field of all metadata fields), Text_English, and > Text_German fields? > > There can only be one default field, correct? So if we want to search for > all songs containing (zeppelin AND (dog OR merle)) do we > > repeat the entire query text for all three major fields in the 'q' clause > (assuming we don't want to use the cache): > > q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin AND > (dog OR merle)+Text_German:(zeppelin AND (dog OR merle)) > > or repeat the entire query text for all three major fields in the 'fq' > clause (assuming we want to use the cache): > > q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin > AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) > > ? > > Thanks! > -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2619700.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dismax query with no/empty/*:* q parameter?
Ah...so I need to be doing &q.alt=*:* &fq=:. Of course, now that you showed me what I look for, I also see the explanation in the Packt book. Sheesh. Thanks for the tip! Chris Hostetter-3 wrote: > > : For standard query handler fq-only queries, we used q=*:*. However, > with > : dismax, that returns 0 results. Are fq-only queries possible with > dismax? > > they are if you use the q.alt param. > > http://wiki.apache.org/solr/DisMaxQParserPlugin#q.alt > > -Hoss > -- View this message in context: http://lucene.472066.n3.nabble.com/dismax-query-with-no-empty-q-parameter-tp2619170p2620158.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Admin Interface, reworked - Go on? Go away?
Looks nice. Might be also worth it to create a page with large query field for pasting in complete URL-encoded queries that cross cores, etc. I did that at work (via ASP.net) so we could paste in queries from logs and debug them. We tend to use that quite a bit. Cheers Stefan Matheis wrote: > > Hi List, > > given that fact that my java-knowledge is sort of non-existing .. my > idea was to rework the Solr Admin Interface. > > Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy, > but it was an idea few weeks ago - and i would like to contrib > something, a thing which has to be non-java but not useless - hopefully ;) > > Actually it's completly work-in-progress .. but i'm interested in what > you guys think. Right direction? Completly Wrong, just drop it? > > http://files.mathe.is/solr-admin/01_dashboard.png > http://files.mathe.is/solr-admin/02_query.png > http://files.mathe.is/solr-admin/03_schema.png > http://files.mathe.is/solr-admin/04_analysis.png > http://files.mathe.is/solr-admin/05_plugins.png > > It's actually using one index.jsp to generate to basic frame, including > cores and their navigation. Everything else is loaded via existing > SolrAdminHandler. > > Any Questions, Ideas, Thoughts outta there? Please, let me know :) > > Regards > Stefan > -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2620745.html Sent from the Solr - User mailing list archive at Nabble.com.
Dismax, q, q.alt, and defaultSearchField?
We have two banks of Solr nodes with identical schemas. The data I'm searching for is in both banks. One has defaultSearchField set to field1, the other has defaultSearchField set to field2. We need to support both user queries and facet queries that have no user content. For the latter, it appears I need to use q.alt=*:*, so I am investigating also using q.alt for user content (e.g., q.alt=banana). I run the following query: q.alt=banana &defType=dismax &mm=1 &tie=0.1 &qf=field1+field2 On bank one, I get the expected results, but on bank two, I get 0 results. I noticed (via debugQuery=true), that when I use q.alt, it resolves using the defaultSearchField (e.g., field1:banana), not the value of the qf param. Therefore, I get different results. If I switched to using q for user queries and q.alt for facet queries, I would still get different results, because q would resolve against the fields in the qf param, and q.alt would resolve against the default search field. Is there a way to override this behavior in order to get consistent results? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2621061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Admin Interface, reworked - Go on? Go away?
Picture the URI field above the response field, only half-screen. This facilitates breaking the query apart on different lines in order to debug it. When you have a lot of shards, fq clauses, etc., you end up with a very long URI that is difficult to get your head around and manipulate. We take queries from the logs, split them around parameters, take the shards out, put the shards back in, take the OLS labels out, put them back in, etc. With long, complex queries, it's essential to have a large work space to play in. :) Stefan Matheis wrote: > > mrw, > > you mean a field like here > (http://files.mathe.is/solr-admin/02_query.png) on the right side, > between meta-navigation and plain solr-xml response? > > actually it's just to display the computed url, but if so .. we could > use a larger field for that, of course :) > > Regards > Stefan > > Am 02.03.2011 22:31, schrieb mrw: >> >> Looks nice. >> >> Might be also worth it to create a page with large query field for >> pasting >> in complete URL-encoded queries that cross cores, etc. I did that at >> work >> (via ASP.net) so we could paste in queries from logs and debug them. We >> tend to use that quite a bit. >> >> >> Cheers >> >> >> Stefan Matheis wrote: >>> >>> Hi List, >>> >>> given that fact that my java-knowledge is sort of non-existing .. my >>> idea was to rework the Solr Admin Interface. >>> >>> Compared to CouchDBs Futon or the MongoDB Admin-Utils .. not that fancy, >>> but it was an idea few weeks ago - and i would like to contrib >>> something, a thing which has to be non-java but not useless - hopefully >>> ;) >>> >>> Actually it's completly work-in-progress .. but i'm interested in what >>> you guys think. Right direction? Completly Wrong, just drop it? >>> >>> http://files.mathe.is/solr-admin/01_dashboard.png >>> http://files.mathe.is/solr-admin/02_query.png >>> http://files.mathe.is/solr-admin/03_schema.png >>> http://files.mathe.is/solr-admin/04_analysis.png >>> http://files.mathe.is/solr-admin/05_plugins.png >>> >>> It's actually using one index.jsp to generate to basic frame, including >>> cores and their navigation. Everything else is loaded via existing >>> SolrAdminHandler. >>> >>> Any Questions, Ideas, Thoughts outta there? Please, let me know :) >>> >>> Regards >>> Stefan >>> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2620745.html >> Sent from the Solr - User mailing list archive at Nabble.com. > -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Admin-Interface-reworked-Go-on-Go-away-tp2620365p2624956.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Understanding multi-field queries with q and fq
Yes, we're investigating dismax (with the qf param), but we're not sure it supports our syntax needs. The users want to put put AND/OR/NOT in their queries, and we don't want to write a lot of code converting those queries into dismax (+/-/mm) format. So, until 3.1 (edismax) ships, we're also trying to get boolean queries to work across multiple fields with the standard query handler. I've seen quite a few unanswered or partially-answered posts on this list on getting boolean syntax right. I can tell it's a thorny issue. Robert Sandiford wrote: > > Have you looked at the 'qf' parameter? > > Bob Sandiford | Lead Software Engineer | SirsiDynix > P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com > www.sirsidynix.com > _ > http://www.cosugi.org/ > > > > >> -Original Message- >> From: mrw [mailto:mikerobertsw...@gmail.com] >> Sent: Wednesday, March 02, 2011 2:28 PM >> To: solr-user@lucene.apache.org >> Subject: Re: Understanding multi-field queries with q and fq >> >> Anyone understand how to do boolean logic across multiple fields? >> >> Dismax is nice for searching multiple fields, but doesn't necessarily >> support our syntax requirements. eDismax appears to be not available >> until >> Solr 3.1. >> >> In the meantime, it looks like we need to support applying the user's >> query >> to multiple fields, so if the user enters "led zeppelin merle" we need >> to be >> able to do the logical equivalent of >> >> &fq=field1:led zeppelin merle OR field2:led zeppelin merle >> >> >> Any ideas? :) >> >> >> >> mrw wrote: >> > >> > After searching this list, Google, and looking through the Pugh book, >> I am >> > a little confused about the right way to structure a query. >> > >> > The Packt book uses the example of the MusicBrainz DB full of song >> > metadata. What if they also had the song lyrics in English and >> German as >> > files on disk, and wanted to index them along with the metadata, so >> that >> > each document would basically have song title, artist, publisher, >> date, >> > ..., All_Metadata (copy field of all metadata fields), Text_English, >> and >> > Text_German fields? >> > >> > There can only be one default field, correct? So if we want to >> search for >> > all songs containing (zeppelin AND (dog OR merle)) do we >> > >> > repeat the entire query text for all three major fields in the 'q' >> clause >> > (assuming we don't want to use the cache): >> > >> > q=(+All_Metadata:zeppelin AND (dog OR merle)+Text_English:zeppelin >> AND >> > (dog OR merle)+Text_German:(zeppelin AND (dog OR merle)) >> > >> > or repeat the entire query text for all three major fields in the >> 'fq' >> > clause (assuming we want to use the cache): >> > >> > q=*:*&fq=(+All_Metadata:zeppelin AND (dog OR >> merle)+Text_English:zeppelin >> > AND (dog OR merle)+Text_German:zeppelin AND (dog OR merle)) >> > >> > ? >> > >> > Thanks! >> > >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries- >> with-q-and-fq-tp2528866p2619700.html >> Sent from the Solr - User mailing list archive at Nabble.com. > -- View this message in context: http://lucene.472066.n3.nabble.com/Understanding-multi-field-queries-with-q-and-fq-tp2528866p2625068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax, q, q.alt, and defaultSearchField?
Thanks, Jan. It looks like we need to do is use both q and q.alt, such that q.alt is always "*:*" and q is either empty for filter-only queries, or has the user text. That seems to work. Jan Høydahl / Cominvent wrote: > > Hi, > > Try > q.alt={!dismax}banana > > -- > Jan Høydahl, search solution architect > Cominvent AS - www.cominvent.com > > On 2. mars 2011, at 23.06, mrw wrote: > >> We have two banks of Solr nodes with identical schemas. The data I'm >> searching for is in both banks. >> >> One has defaultSearchField set to field1, the other has >> defaultSearchField >> set to field2. >> >> We need to support both user queries and facet queries that have no user >> content. For the latter, it appears I need to use q.alt=*:*, so I am >> investigating also using q.alt for user content (e.g., q.alt=banana). >> >> I run the following query: >> >> q.alt=banana >> &defType=dismax >> &mm=1 >> &tie=0.1 >> &qf=field1+field2 >> >> >> On bank one, I get the expected results, but on bank two, I get 0 >> results. >> >> I noticed (via debugQuery=true), that when I use q.alt, it resolves using >> the defaultSearchField (e.g., field1:banana), not the value of the qf >> param. >> Therefore, I get different results. >> >> If I switched to using q for user queries and q.alt for facet queries, I >> would still get different results, because q would resolve against the >> fields in the qf param, and q.alt would resolve against the default >> search >> field. >> >> Is there a way to override this behavior in order to get consistent >> results? >> >> Thanks! >> >> >> >> >> >> >> -- >> View this message in context: >> http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2621061.html >> Sent from the Solr - User mailing list archive at Nabble.com. > -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-q-q-alt-and-defaultSearchField-tp2621061p2627134.html Sent from the Solr - User mailing list archive at Nabble.com.
Dismax: field not returned unless in sort clause?
We have a "D" field (string, indexed, stored, not required) that is returned * when we search with the standard request handler * when we search with dismax request handler _and the field is specified in the sort parameter_ but is not returned when using the dismax handler and the field is not specified in the sort param. IOW, if I do the following query (no sort param), I get all the expected results, but the D field never comes back... &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D ...but if I add "D" to the sort param, the D field comes back on every single record &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20asc &q=&q.alt=*:*&defType=dismax&tie=0.1&mm=1&qf=A,B,C&start=0&rows=300&fl=D&sort=D%20desc If I omit the fl param, I see that all of our other fields appear to be returned on every result without any need to specify them in the sort param. Obviously, I cannot hard-code the sort order around the D field. :) Any ideas? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-field-not-returned-unless-in-sort-clause-tp2681447p2681447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dismax: field not returned unless in sort clause?
No, not setting those options in the query or schema.xml file. I'll try what you said, however. Thanks Chris Hostetter-3 wrote: > > : We have a "D" field (string, indexed, stored, not required) that is > returned > : * when we search with the standard request handler > : * when we search with dismax request handler _and the field is specified > in > : the sort parameter_ > : > : but is not returned when using the dismax handler and the field is not > : specified in the sort param. > > are you using one of the "sortMissing" options on D or it's fieldType? > > I'm guessing you have sortMissingLast="true" for D, so anytime you sort on > it the docs that do have a value appear first. but when you don't sort on > it, other factors probably lead docs that don't have a value for the D > field to appear first -- solr doesn't include fields in docs that don't > have any value for that field. > > if my guess is correct, adding "fq=D:[* TO *] to any of your queries will > cause the total number of results to shrink, but the first page of results > for your requests that don't sort on D will look exactly the same. > > the LUkeRequestHandler will help you see how many docs in your index don't > have any values indexed in the "D" field. > > > -Hoss > -- View this message in context: http://lucene.472066.n3.nabble.com/Dismax-field-not-returned-unless-in-sort-clause-tp2681447p2688039.html Sent from the Solr - User mailing list archive at Nabble.com.
Result docs missing only when shards parameter present in query?
We have two Solr nodes, each with multiple shards. If we query each shard directly (no shards parameter), we get the expected results: response lst name="responseHeader" int name="status" 0 int name="QTime" 22 result name="response" numFound="100" start="0" doc doc (^^^ hand-typed pseudo XML) However, if we add the shards parameter and even supply one of the above shards, we get the same number of results, but all the doc elements under the result element are missing: response lst name="responseHeader" int name="status" 0 int name="QTime" 33 result name="response" numFound="100" start="0" (^^^ note missing doc elements) It doesn't matter which shard is specified in the shards parameter; if any or all of the shards are specified after the shards parameter, we see this behavior. When we go to http://:8983/solr/ on either node, we see all the shards properly listed. So, the shards seem to be registered properly, and work individually, but not when the shards parameter is supplied. Any ideas? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Result-docs-missing-only-when-shards-parameter-present-in-query-tp2928889p2928889.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Result docs missing only when shards parameter present in query?
Does this seem like it would be a configuration issue, an indexed data issue, or something else? Thanks mrw wrote: > > We have two Solr nodes, each with multiple shards. If we query each shard > directly (no shards parameter), we get the expected results: > > response >lst name="responseHeader" >int name="status" 0 >int name="QTime" 22 >result name="response" numFound="100" start="0" > doc > doc > > (^^^ hand-typed pseudo XML) > > However, if we add the shards parameter and even supply one of the above > shards, we get the same number of results, but all the doc elements under > the result element are missing: > > response >lst name="responseHeader" >int name="status" 0 >int name="QTime" 33 >result name="response" numFound="100" start="0" > > > (^^^ note missing doc elements) > > It doesn't matter which shard is specified in the shards parameter; if > any or all of the shards are specified after the shards parameter, we see > this behavior. > > When we go to http://:8983/solr/ on either node, we see all the > shards properly listed. > > So, the shards seem to be registered properly, and work individually, but > not when the shards parameter is supplied. Any ideas? > > > Thanks! > -- View this message in context: http://lucene.472066.n3.nabble.com/Result-docs-missing-only-when-shards-parameter-present-in-query-tp2928889p2932248.html Sent from the Solr - User mailing list archive at Nabble.com.