Hello Troy,

What a challenge!!

On Thu, Oct 1, 2015 at 3:42 PM, Troy Edwards <tedwards415...@gmail.com>
wrote:

>
> 2) It appears that I cannot have fromIndex=Contracts because it is very
> large and has to be sharded. Per my understanding SolrCloud join does not
> support multiple shards
>

.. but it doesn't mean it will never..

Assuming: Item collection is fully replicated (shards=1, we can afford to
replicate 2M recs) and ContractItem is sharded.
Then, let's imagine we relax the constraint and let to join opposite to
legal way
../item/select?q={!join ...
fromIndex=ContractItem}Active:true&facet.field=SellerCode...
this should be executed as
shard_1/item/select?q={!join ...
fromIndex=ContractItem_shard1}Active:true&facet.field=SellerCode...&fq=Description:colgate
shard_2/item/select?q={!join ...
fromIndex=ContractItem_shard2}Active:true&facet.field=SellerCode...&fq=Description:colgate
shard_3/item/select?q={!join ...
fromIndex=ContractItem_shard3}Active:true&facet.field=SellerCode...&fq=Description:colgate


at least it can be evaluated manually. If it works, it will give you only
set of SellerCode, facets counts will be wrong (items are duplicated across
shards). However, they can be refined with pivot facets, stats
(countDistinct etc).

>From the other perspective, what if you embed Item into ContractItem? What
exactly is wrong with such denormalization?

4) The Item index contains approximately 2 million items. For ContractItem
> there are about 10000 clients with about 1.5 million records for each
> client. So the total ContractItem records are close to 15 billion.
>
> Several updates are made to Item during the day. Sometimes clients will
> made large changes to ContractItem.
>
> Any thoughts/suggestions?
>
> On Thu, Oct 1, 2015 at 6:09 AM, Mikhail Khludnev <
> mkhlud...@griddynamics.com
> > wrote:
>
> > 1. i'd say it's challenge.
> > 2. can't you do the opposite filter active contracts, join them back to
> > items, and facet then?
> > q=(Description:colgate OR Categories:colgate OR
> > Sellers:colgate)&fq={!join from=ItemId to=ItemId
> > fromIndex=Contracts)Active:true&facet.field=SellersString
> > 3. note: there is {!terms} QParser (which makes leg-shooting easier).
> > 4. what are number of documents you operate? what is update frequency? Is
> > there a chance to keep both types in the single index?
> >
> > On Thu, Oct 1, 2015 at 5:58 AM, Troy Edwards <tedwards415...@gmail.com>
> > wrote:
> >
> > > I am working with the following indices
> > >
> > > *Item*
> > >
> > > ItemId - string
> > > Description - text (query on this)
> > > Categories - Multivalued text (query on this)
> > > Sellers - Multivalued text (query on this)
> > > SellersString - Multivalued string (Need to facet and filter on this)
> > >
> > > *ContractItem*
> > >
> > > ContractItemId - string
> > > ItemId - string
> > > ContractCode - string (facet and filter on this)
> > > Priority -  integer (order by priority descending)
> > > Active - boolean (filter on this)
> > >
> > > Say someone is searching for colgate
> > >
> > > I am doing two queries:
> > >
> > > First query: q = {!join from=ItemId to=ItemId
> > > fromIndex=Item)(Description:colgate OR Categories:colgate OR
> > > Sellers:colgate)&facet.field=ContractCode
> > >
> > > From the first query I get all the ItemIds and do a second query on
> Item
> > > index using q=ItemId:(Id1 Id2 Id3) and generate facet on SellersString
> > >
> > > I have to do some custom coding to retain Priority (so that I can sort
> on
> > > it)
> > >
> > > Following are the issues I am running into:
> > >
> > > 1) Since there are a lot of Items and ContractItems, the number of Ids
> > > becomes large and I had to increase maxBooleanClause (possible
> > performance
> > > degradation?)
> > >
> > > 2) Since I have to return a lot of items from first query, the data
> size
> > > becomes very large (again a performance concern)
> > >
> > > 3) When a filter is applied on the second query, I have to adjust the
> > facet
> > > results of the first query
> > >
> > > 4) Overall this seems complex
> > >
> > > Is it possible to do just one query and apply filters (if any) and get
> > > results along with facets?
> > >
> > > Any suggestions on simplifying this and improving performance?
> > >
> > > Thanks in advance
> > >
> >
> >
> >
> > --
> > Sincerely yours
> > Mikhail Khludnev
> > Principal Engineer,
> > Grid Dynamics
> >
> > <http://www.griddynamics.com>
> > <mkhlud...@griddynamics.com>
> >
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


<mkhlud...@griddynamics.com>

Reply via email to