Yes, these documents have lots of unique values as the same product could
be assigned to lots of other categories and that too, in a different sort
order.

We did some evaluation of heap usage and found that with kind of queries we
generate, heap usage was going up to 24-26 GB. I could trace it to the fact
that
fieldCache is creating an array of 2M size for each of the sort fields.

Since same products are mapped to multiple categories, we incur significant
memory overhead. Therefore, any solve where memory consumption can be
reduced is a good one for me.

In fact, we have situations where same product is mapped to more than 1
sub-category in the same category like


Books
 -- Programming
      - Java in a nutshell
 -- Sale (40% off)
      - Java in a nutshell


So,another thought in my mind is to somehow use second pass collector to
group books appropriately in Programming and Sale categories, with right
sort order.

But, i have no clue about that piece :(

-Saroj


On Sun, Jun 10, 2012 at 4:30 PM, Erick Erickson <erickerick...@gmail.com>wrote:

> 2M docs is actually pretty small. Sorting is sensitive to the number
> of _unique_ values in the sort fields, not necessarily the number of
> documents.
>
> And sorting only works on fields with a single value (i.e. it can't have
> more than one token after analysis). So for each field you're only talking
> 2M values at the vary maximum, assuming that the field in question has
> a unique value per document, which I doubt very much given your
> problem description.
>
> So with a corpus that size, I'd "just try it'.
>
> Best
> Erick
>
> On Sun, Jun 10, 2012 at 7:12 PM, roz dev <rozde...@gmail.com> wrote:
> > Thanks Erik for your quick feedback
> >
> > When Products are assigned to a category or Sub-Category then they can be
> > in any order and price type can be regular or markdown.
> > So, reg and markdown products are intermingled  as per their assignment
> but
> > I want to sort them in such a way that we
> > ensure that all the products which are on markdown are at the bottom of
> the
> > list.
> >
> > I can use these multiple sorts but I realize that they are costly in
> terms
> > of heap used, as they are using FieldCache.
> >
> > I have an index with 2M docs and docs are pretty big. So, I don't want to
> > use them unless there is no other option.
> >
> > I am wondering if I can define a custom function query which can be like
> > this:
> >
> >
> >   - check if product is on the markdown
> >   - if yes then change its sort order field to be the max value in the
> >   given sub-category, say 999999
> >   - else, use the sort order of the product in the sub-category
> >
> > I have been looking at existing function queries but do not have a good
> > handle on how to make one of my own.
> >
> > - Another option could be use a custom sort comparator but I am not sure
> > about the way it works
> >
> > Any thoughts?
> >
> >
> > -Saroj
> >
> >
> >
> >
> > On Sun, Jun 10, 2012 at 5:02 AM, Erick Erickson <erickerick...@gmail.com
> >wrote:
> >
> >> Skimming this, I two options come to mind:
> >>
> >> 1> Simply apply primary, secondary, etc sorts. Something like
> >>   &sort=subcategory asc,markdown_or_regular desc,sort_order asc
> >>
> >> 2> You could also use grouping to arrange things in groups and sort
> within
> >>      those groups. This has the advantage of returning some members
> >>      of each of the top N groups in the result set, which makes it
> easier
> >> to
> >>      get some of each group rather than having to analyze the whole
> >> list....
> >>
> >> But your example is somewhat contradictory. You say
> >> "products which are on markdown, are at
> >> the bottom of the documents list"
> >>
> >> But in your examples, products on "markdown" are intermingled....
> >>
> >> Best
> >> Erick
> >>
> >> On Sun, Jun 10, 2012 at 3:36 AM, roz dev <rozde...@gmail.com> wrote:
> >> > Hi All
> >> >
> >> >>
> >> >> I have an index which contains a Catalog of Products and Categories,
> >> with
> >> >> Solr 4.0 from trunk
> >> >>
> >> >> Data is organized like this:
> >> >>
> >> >> Category: Books
> >> >>
> >> >> Sub Category: Programming
> >> >>
> >> >> Products:
> >> >>
> >> >> Product # 1,  Price: Regular Sort Order:1
> >> >> Product # 2,  Price: Markdown, Sort Order:2
> >> >> Product # 3   Price: Regular, Sort Order:3
> >> >> Product # 4   Price: Regular, Sort Order:4
> >> >> ....
> >> >> .....
> >> >> ...
> >> >> Product # 100   Price: Regular, Sort Order:100
> >> >>
> >> >> Sub Category: Fiction
> >> >>
> >> >> Products:
> >> >>
> >> >> Product # 1,  Price: Markdown, Sort Order:1
> >> >> Product # 2,  Price: Regular, Sort Order:2
> >> >> Product # 3   Price: Regular, Sort Order:3
> >> >> Product # 4   Price: Markdown, Sort Order:4
> >> >> ....
> >> >> .....
> >> >> ...
> >> >> Product # 70   Price: Regular, Sort Order:70
> >> >>
> >> >>
> >> >> I want to query Solr and sort these products within each of the
> >> >> sub-category in a such a way that products which are on markdown,
> are at
> >> >> the bottom of the documents list and other products
> >> >> which are on regular price, are sorted as per their sort order in
> their
> >> >> sub-category.
> >> >>
> >> >> Expected Results are
> >> >>
> >> >> Category: Books
> >> >>
> >> >> Sub Category: Programming
> >> >>
> >> >> Products:
> >> >>
> >> >> Product # 1,  Price: Regular Sort Order:1
> >> >> Product # 2,  Price: Markdown, Sort Order:101
> >> >> Product # 3   Price: Regular, Sort Order:3
> >> >> Product # 4   Price: Regular, Sort Order:4
> >> >> ....
> >> >> .....
> >> >> ...
> >> >> Product # 100   Price: Regular, Sort Order:100
> >> >>
> >> >> Sub Category: Fiction
> >> >>
> >> >> Products:
> >> >>
> >> >> Product # 1,  Price: Markdown, Sort Order:71
> >> >> Product # 2,  Price: Regular, Sort Order:2
> >> >> Product # 3   Price: Regular, Sort Order:3
> >> >> Product # 4   Price: Markdown, Sort Order:71
> >> >> ....
> >> >> .....
> >> >> ...
> >> >> Product # 70   Price: Regular, Sort Order:70
> >> >>
> >> >>
> >> >> My query is like this:
> >> >>
> >> >> q=*:*&fq=category:Books
> >> >>
> >> >> What are the options to implement custom sorting and how do I do it?
> >> >>
> >> >>
> >> >>    - Define a Custom Function query?
> >> >>    - Define a Custom Comparator? Or,
> >> >>    - Define a Custom Collector?
> >> >>
> >> >>
> >> >> Please let me know the best way to go about it and any pointers to
> >> >> customize Solr 4.
> >> >>
> >> >
> >> > Thanks
> >> > Saroj
> >>
>

Reply via email to