Re: Using join vs flattening structure

Erick Erickson Mon, 31 Aug 2015 13:19:41 -0700

Mostly just do the most naive data-flattening you can and see
how big the index is. You really have to generate the index then
run representative queries at it.


But naively flattening the data in this case approaches
15B documents, which is a problem, you're sharding over quite a
few shards etc.

Before even going there though, you need to pin your data model
down, the whole question about "what to flatten" is premature IMO.

For instance, how are you going to search this data? Do you
require searches like
"show me all the red things from customer X sort by price"?

Really, start from the _requirements_ and create your data model from
there _then_ start worrying about what needs to happen to make it fit
rather than worry about the index size/structure first.

Best,
Erick

On Mon, Aug 31, 2015 at 1:02 PM, Brian Narsi <bnars...@gmail.com> wrote:
> We have about 15 million items. Each item has 10 attributes that we are
> indexing at this time. We are planning on adding 15 more attributes in
> future.
>
> We have about 10000 customers. Each of the items mentioned above can have
> special pricing, etc for each of the customers. There are 6 attributes of
> item that are different for each customer.
>
> Erick - you have mentioned testing. What would be a good test scenario to
> determine using flattened structure or relational?
>
> Best,
>
> On Mon, Aug 31, 2015 at 9:50 AM, Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> For 1-3, test and see. The problem I often see is that it is _assumed_ that
>> flattening the data will cost a lot in terms of index size and maintenance.
>> Test that assumption before going down the relational road.
>>
>> You haven't talked about how many documents you have, how much data
>> would have to be replicated in each if you denormalized etc., so there's
>> not much guidance we can give.
>>
>> I'll skip 4
>>
>> 5 probably another month or two in Solr 5.4
>>
>> Best,
>> Erick
>>
>> On Sun, Aug 30, 2015 at 6:59 PM, Brian Narsi <bnars...@gmail.com> wrote:
>> > I have read a lot about using flattened structures in solr (instead of
>> > relational). Looks like it is preferable to use flattened structure. But
>> in
>> > our case we have to consider  using (sort of) relational structure to
>> keep
>> > index maintenance cost low.
>> >
>> > Does anyone have deeper insight into this?
>> >
>> > 1) When should we definitely use relational type of structure and use
>> join?
>> > (instead of flattened structure)
>> >
>> > 2) When should we definitely use flattened structure (instead of
>> > relational)?
>> >
>> > 3) What are the signs that one has made a wrong choice of flattened vs
>> > relational?
>> >
>> > 4) Any best practices when relational structure and join is used?
>> >
>> > 5) I understand that parallel sql (in solr) will have more relational
>> > functionality support? Any ETA on when the parallel sql will support
>> joins?
>> >
>> > Thanks for your help!
>>

Re: Using join vs flattening structure

Reply via email to