Mostly just do the most naive data-flattening you can and see how big the index is. You really have to generate the index then run representative queries at it.
But naively flattening the data in this case approaches 15B documents, which is a problem, you're sharding over quite a few shards etc. Before even going there though, you need to pin your data model down, the whole question about "what to flatten" is premature IMO. For instance, how are you going to search this data? Do you require searches like "show me all the red things from customer X sort by price"? Really, start from the _requirements_ and create your data model from there _then_ start worrying about what needs to happen to make it fit rather than worry about the index size/structure first. Best, Erick On Mon, Aug 31, 2015 at 1:02 PM, Brian Narsi <bnars...@gmail.com> wrote: > We have about 15 million items. Each item has 10 attributes that we are > indexing at this time. We are planning on adding 15 more attributes in > future. > > We have about 10000 customers. Each of the items mentioned above can have > special pricing, etc for each of the customers. There are 6 attributes of > item that are different for each customer. > > Erick - you have mentioned testing. What would be a good test scenario to > determine using flattened structure or relational? > > Best, > > On Mon, Aug 31, 2015 at 9:50 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> For 1-3, test and see. The problem I often see is that it is _assumed_ that >> flattening the data will cost a lot in terms of index size and maintenance. >> Test that assumption before going down the relational road. >> >> You haven't talked about how many documents you have, how much data >> would have to be replicated in each if you denormalized etc., so there's >> not much guidance we can give. >> >> I'll skip 4 >> >> 5 probably another month or two in Solr 5.4 >> >> Best, >> Erick >> >> On Sun, Aug 30, 2015 at 6:59 PM, Brian Narsi <bnars...@gmail.com> wrote: >> > I have read a lot about using flattened structures in solr (instead of >> > relational). Looks like it is preferable to use flattened structure. But >> in >> > our case we have to consider using (sort of) relational structure to >> keep >> > index maintenance cost low. >> > >> > Does anyone have deeper insight into this? >> > >> > 1) When should we definitely use relational type of structure and use >> join? >> > (instead of flattened structure) >> > >> > 2) When should we definitely use flattened structure (instead of >> > relational)? >> > >> > 3) What are the signs that one has made a wrong choice of flattened vs >> > relational? >> > >> > 4) Any best practices when relational structure and join is used? >> > >> > 5) I understand that parallel sql (in solr) will have more relational >> > functionality support? Any ETA on when the parallel sql will support >> joins? >> > >> > Thanks for your help! >>