bq: If we use groupby(supplier), the 80 products per page is not possible anymore.
Worst case is you specify 80 groups and 80 docs per group max. Admittedly, this could, in the worst case, lead to 6,400 docs but it covers this case. The question is whether this is fast enough or not. There are other ways to go about this. Create a cheaper query that covers the most common use-case and refine if necessary with a second query. Say you specify you want 80 groups returned and the top 5 docs in each group as your default (or 2 docs or....). If you get 80 groups back, you're done, just pick the top one from each. If you get less than 80 groups back but more than 80 _documents_ you're also done. You know that you have all the groups there are and you have over 80 products to display. Whatever numbers you pick, as long as you specify 80 groups and get 80 docs back total you can meet your requirements with a single query. Experiment to see what numbers cover your common queries of course. If you don't get 80 docs back use that information to make a second query that gives you enough docs to display your 80. True, that'll be a performance hit for these kinds of queries. This means you have < 80 groups but the sum of the docs returned is < 80. Check the total number of hits to see if there _are_ more than 80 docs to determine whether to make a second query. If so, go ahead and send off a second query that's guaranteed to get 80 docs or more. Measure. Is that fast enough? It'll be a lot less work than custom code. Also, I'd push back on the requirement. You've said "we need 80 products displayed". If there are < 80 suppliers, why? Let's say that there are only 10 suppliers for a particular query and you adopt the suggestion above and ask for 2 docs each. How is the user served better by going back to Solr and getting a bunch more docs from each of these suppliers? Wasn't the problem that you didn't want to display a bunch of docs from the same supplier in the first place? So would just displaying 2 docs from each of the 10 suppliers suffice? However, if you simply _must_ satisfy these requirements and grouping isn't performant enough you want a custom collector I think. You'll extend RankQuery I'm pretty sure. You really don't want to _score_ documents, you want to keep lists of documents by supplier. My intuition is that you'll find this to be quite a bit of work to get right and will probably have the same performance as grouping. Do yourself a favor and write out the algorithm you'll use and ask "how is this different from grouping?". Then consider what happens if you shard your collection. Then ask management whether they think you time would be better spent doing something else than re-implementing grouping ;) Best, Erick On Thu, Dec 22, 2016 at 9:46 PM, Daisy <daisy...@globalsources.com> wrote: > Thanks a lot for the response. > What I would like to do is something similar to groupby function. But because > of some performance issue and business requirement limitation, I rather don't > want to use groupby. > > Our business requirement is to avoid one supplier is dominating the search > result page. 50 out of 80 products are belong to one supplier. > I understand that this issue can be solved by groupby function. > > 1. I did check the query time and groupby takes a little bit longer time > compare to without groupby query. > 2. Another business limitation is: Our page needs 80 products to display from > different suppliers per keyword search. We have some of the search keywords > which only have less than 80 suppliers. If we use groupby(supplier), the 80 > products per page is not possible anymore. > > > -----Original Message----- > From: Erick Erickson [mailto:erickerick...@gmail.com] > Sent: Friday, December 23, 2016 12:01 PM > To: solr-user > Subject: Re: Customizing the search result > > My very, very, very first question is "why do you think you have to develop > your own customized re-ranking?". How have you determined that your needs > aren't satisfied out-of-the-box? What I'm going for here is wondering if this > is an XY problem. You're asking how to do X because you think that will > accomplish Y, without stating what the task (Y) is. It'll save you a LOT of > work if you don't have to create (and > maintain) your own. > > That said, maybe you _do_ have to extend BaseSimilarity. But there's a lot > built in to Solr so before going there let's see if there's an easier > solution. > > For instance, there's the ReRankingQParserPlugin that takes the output from > the main clause and pushes it through a completely independent Solr query > that at least sounds similar to what you want to do. There is boosting, > altering the score by function queries, etc. etc, etc..... > > For <2> what you probaby want is a search component, which is pluggable. > These are chained together in your request handler and you can add a > <last-components> entry and get the packet to be returned just before it's > sent. It will contain all the data to be returned, the docs (rows worth), the > facets, groups, all that stuff. > > But again, why do you want to do this? There are also DocTransformers that > can be used to munge the individual documents coming back that you can > configure rather than code fresh. They may not actually do what you need but > before writing your own let's see if maybe there's an easier way to do what > you want than extending org.apache.solr.response.transform.DocTransformer and > creating a plugin.. > > Best, > Erick > > On Thu, Dec 22, 2016 at 6:58 PM, Daisy <daisy...@globalsources.com> wrote: >> I’m really new to SOLR and excuse me if my question is vague. >> >> I found some of the search related things in solr-core → >> org.apache.solr.search package. I’m not sure this is the right package to >> look into. >> >> >> >> 1. I would like to know if we are going to develop our own customized >> re-ranking, where and how can we add the new codes? >> >> 2. Which class is the final step before returning the result from >> Solr? For e.g. “<result maxScore="9.452013" name="response" numFound="17343" >> start="0">” >> >> Thank you. >> >> >> ---------------------- >> CONFIDENTIALITY NOTICE >> >> This e-mail (including any attachments) may contain confidential and/or >> privileged information. If you are not the intended recipient or have >> received this e-mail in error, please inform the sender immediately and >> delete this e-mail (including any attachments) from your computer, and you >> must not use, disclose to anyone else or copy this e-mail (including any >> attachments), whether in whole or in part. >> >> This e-mail and any reply to it may be monitored for security, legal, >> regulatory compliance and/or other appropriate reasons. >> > > > ---------------------- > CONFIDENTIALITY NOTICE > > This e-mail (including any attachments) may contain confidential and/or > privileged information. If you are not the intended recipient or have > received this e-mail in error, please inform the sender immediately and > delete this e-mail (including any attachments) from your computer, and you > must not use, disclose to anyone else or copy this e-mail (including any > attachments), whether in whole or in part. > > This e-mail and any reply to it may be monitored for security, legal, > regulatory compliance and/or other appropriate reasons. >