Hello,
I have asked a question recently about solr limitations and some about
joins. It comes that this question is about both at the same time.
I am trying to figure how to denormalize my data so I will need just 1
document in my index instead of performing a join. I figure one way of
doing this is storing an entity as a multivalued field, instead of storing
different fields.
Let me give an example. Consider the entities:
User:
id: 1
type: Joan of Arc
age: 27
Webpage:
id: 1
url: http://wiki.apache.org/solr/Join
category: Technical
user_id: 1
id: 2
url: http://stackoverflow.com
category: Technical
user_id: 1
Instead of creating 1 document for user, 1 for webpage 1 and 1 for
webpage 2 (1 parent and 2 childs) I could store webpages in a user
multivalued field, as follows:
User:
id: 1
name: Joan of Arc
age: 27
webpage1: ["id:1", "url: http://wiki.apache.org/solr/Join", "category:
Technical"]
webpage2: ["id:2", "url: http://stackoverflow.com", "category:
Technical"]
It would probably perform better than the join, right? However, it made
me think about solr limitations again. What if I have 200 million webpges
(200 million fields) per user? Or imagine a case where I could have 200
million values on a field, like in the case I need to index every html DOM
element (div, a, etc.) for each web page user visited.
I mean, if I need to do the query and this is a business requirement no
matter what, although denormalizing could be better than using query time
joins, I wonder it distributing the data present in this single document
along the cluster wouldn't give me better performance. And this is
something I won't get with block joins or multivalued fields...
I guess there is probably no right answer for this question (at least
not a known one), and I know I should create a POC to check how each
perform... But do you think a so large number of values in a single
document could make denormalization not possible in an extreme case like
this? Would you share my thoughts if I said denormalization is not always
the right option?
Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr