A few random questions about solr queries.
*1)* With faceting, how does facet.query perform in comparison to facet.field? I'm just wondering this as in my use case, I need to facet over a field -- which would get me the top n facets for that field, but I also need to show the count for a "selected filter" which might have a relatively low count so it doesn't appear in the top n returned facets. So the solution would be to 'ensure' its presence by adding a 'facet.query=cat:val' in addition to my facet.field=cat. I want to do this to quite a few fields. Related/example-based question: When I facet over a field, and something gets returned, eg: John Smith (83), and I also 'ensure' this facet's presence by having it in facet.query=author:"John Smith", are two different calculations performed? Or is the facet returned by facet.field also used by facet.query to obtain the count? *2) *Is there a performance issue if I have around, say, 20 facet.query conditions along with 10 facet.fields? 3/10 of those fields have around 100,000 possible values. Remaining have a few hundred each. *3)* I've rummaged around a bit, looking for info on when to use q vs fq. I want to clear my doubts for a certain use case. Where should my date range queries go? In q or fq? The default settings in my site show results from the past 90 days with buttons to show stuff from the last month and week as well. But the user is allowed to use a slider to apply any date range... this is allowed, but it's not /that/ common. I definitely use fq for filtering various tags. Choosing a tag is a common activity. Should the date range query go in fq? As I mentioned, the default view shows stuff from the past 90 days. So on each new day does this like invalidate stuff in the cache? Or is stuff stored in the filtered cache in some way that makes it easy to fetch stuff from the past 89 days when a query is performed the next day? -- View this message in context: http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: A few random questions about solr queries.
A wee bit of clarification on the 2nd question. I meant relative performance, ie. would it be much slower to facet over 20 facet.queries & 10 facet.fields compared to say, 4 facet.queries & facet.fields. I wonder if this makes sense... So... is a bump improper etiquette here? >_> -- View this message in context: http://lucene.472066.n3.nabble.com/A-few-random-questions-about-solr-queries-tp3986562p3986977.html Sent from the Solr - User mailing list archive at Nabble.com.
Is it faster to search over many different fields or one field that combines the values of all those other fields?
Say I have various categories of 'tags'. I want a keyword search to search through my index of articles. So I search over: 1) the title. 2) the body 3) about 10 of these tag-categories. Each tag category is multivalued with a few words per value. Without considering the affect on 'relevance', and using the standard lucene query parser, would it be faster to specify each of these 10 fields in q (q = cat1:keyword OR cat2:keyword OR ... ), or to copyfield the stuff in those 10 fields into one combined field? Or is it such that I should be slapped in the face for even thinking about performance in this scenario? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-faster-to-search-over-many-different-fields-or-one-field-that-combines-the-values-of-all-those-tp3987766.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcard query vs facet.prefix for autocomplete?
I'm about to implement an autocomplete mechanism for my search box. I've read about some of the common approaches, but I have a question about wildcard query vs facet.prefix. Say I want autocomplete for a title: 'Shadows of the Damned'. I want this to appear as a suggestion if I type 'sha' or 'dam' or 'the'. I don't care that it won't appear if I type 'hadows'. While indexing, I'd use a whitespace tokenizer and a lowercase filter to store that title in the index. Now I'm thinking two approaches for 'dam' typed in the search box: 1) q=title:dam* 2) q=*:*&facet=on&facet.field=title&facet.prefix=dam So any reason that I should favour one over the other? Speed a factor? The index has around 200,000 items. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard query vs facet.prefix for autocomplete?
I'll consider using the other methods, but I'd like to know which would be faster among the two approaches mentioned in my opening post. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard query vs facet.prefix for autocomplete?
Well silly me... you're right. On Wed, Jul 18, 2012 at 6:44 PM, Erick Erickson [via Lucene] < ml-node+s472066n399570...@n3.nabble.com> wrote: > Well, option 2 won't do you any good, so speed doesn't really matter. > Your response would have a facet count for "dam", all by itself, something > like > > 2 > 1 > > etc. > > which does not contain anything that lets you reconstruct the title > for autosuggest. > > Best > Erick > > On Tue, Jul 17, 2012 at 3:18 AM, santamaria2 <[hidden > email]<http://user/SendEmail.jtp?type=node&node=3995706&i=0>> > wrote: > > I'll consider using the other methods, but I'd like to know which would > be > > faster among the two approaches mentioned in my opening post. > > > > -- > > View this message in context: > http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995458.html > > > Sent from the Solr - User mailing list archive at Nabble.com. > > > -- > If you reply to this email, your message will be added to the discussion > below: > > http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995706.html > To unsubscribe from Wildcard query vs facet.prefix for autocomplete?, click > here<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=3995199&code=YXJhdmluZGEucmFvQGNvbnRpZnkuY29tfDM5OTUxOTl8MTgyMTM4MDg2OQ==> > . > NAML<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995707.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcard query vs facet.prefix for autocomplete?
Very interesting! Thanks for sharing, I'll ponder on it. -- View this message in context: http://lucene.472066.n3.nabble.com/Wildcard-query-vs-facet-prefix-for-autocomplete-tp3995199p3995899.html Sent from the Solr - User mailing list archive at Nabble.com.
Designing an index with multiple entity types, sharing field names across entity-types.
My question stems from a vague memory of reading somewhere that Solr's search performance depends on how the total number of 'terms' there are in all in a field that is searched upon. I'm setting up an index core for some autocomplete boxes on my site. There is a search box for each facet group in my results page (suggestions for a single entity-type), and a 'generic' search box on my header that will display suggestions for multiple entity-types. The entity types are: Books, Authors, Categories, Publishers. Books, Authors --> over 100,000 of each type right now. Will grow larger. Categories, Publishers --> around 500 of each type. Will grow slowly. Books & Categories have 'descriptions' which I also want searchable -- with lower boosts. In my per-entity search boxes, for autocomplete suggestions for user input "man", I'd do: q=(name:man* OR description:man*^0.5)&fq=type: For my generic search box on top of my page, I would not have fq, but instead I'd use &group=true&group.field=type. (type --> {'book', 'author', 'category', 'publisher'}) This seems okay, but I'm just wondering about what I said in my first paragraph. The number of total terms of a field. For a lrge index, would it be better to more specific fields? eg. Instead of a common field 'name', what if I do 'author_name', 'book_name', 'publisher_name', 'category_name', 'book_description', 'category_description'? Would this be 'faster' to search on? For my per-entity search boxes, the query changes in an obvious manner. But this would complicate stuff for my generic-search-box query... for which I haven't decided on how I'd go about designing a query, yet. What say thee? -- View this message in context: http://lucene.472066.n3.nabble.com/Designing-an-index-with-multiple-entity-types-sharing-field-names-across-entity-types-tp3999727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Designing an index with multiple entity types, sharing field names across entity-types.
To clarify a wee bit more. I'm wondering the performance impact on single-entity queries if I use common field names. eg. 'name' field for all entity types. 'Author' & 'Book' together make up for 200,000+ 'name' values. Will this affect anything if I search over 'Category'? Will using fq=type:category save me? -- View this message in context: http://lucene.472066.n3.nabble.com/Designing-an-index-with-multiple-entity-types-sharing-field-names-across-entity-types-tp3999727p3999728.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Designing an index with multiple entity types, sharing field names across entity-types.
*civilized bump* -- View this message in context: http://lucene.472066.n3.nabble.com/Designing-an-index-with-multiple-entity-types-sharing-field-names-across-entity-types-tp3999727p451.html Sent from the Solr - User mailing list archive at Nabble.com.