Hello all,
I am using Solr Cloud today and I have the following need:
- My queries focus on counting how many users attend to some criteria.
So my main document is "user" (parent table)
- Each user can access several web pages (a child table) and each web
page might have several attributes.
- I need to lookup for users where there is some page accessed by them
which matches a set of attributes. For example, I have two scenarios:
1. if a user accessed a web page WP1 with a URL that starts with
"www." and with a title that includes "solr", then the user is a match.
2. However, if there is a webpage WP1 with such url and ANOTHER WP2
that includes "solr" in the title, this is not a match.
If I were modeling this on a relational DB, user would be a table and
url would be other. However, as I using solr, my first option would be
denormalizing first. Simply storing all the fields in the user document
wouldn't work, as I would work as described in scenario 2.
I thought in two solutions for these:
- Using the idea of an inverted index - Having several kinds of
documents (user, web page, entity 3, entity 4, etc.) where each entity (web
page, for instance) would have a field to relate to the user id. Then,
using a cross join in solr to get the results where there was a match on
user (parent table) and also on each child entity (in other words, to merge
the results of several queries that might return user ids). This has a
drawback of using a join.
- Having just a user document and storing each web page as only one
field (like a json). To search, the same field would need to match a
regular expression that includes both conditions. This would make my search
slower and I would not be able to apply the same technique if the child
tables also had children.
Am I missing any obvious solution here? I would love to receive critics
on this, as I am probably not the only one who have this problem... I
would like more ideas on how to denormalize data in this case. Is the join
my best option here?
Best regards,
--
Marcelo Elias Del Valle
http://mvalle.com - @mvallebr