Re: joins in solr cloud - good or bad idea?

Roman Chyla Mon, 08 Jul 2013 13:58:37 -0700

Hello,

The joins are not the only idea, you may want to write your own function
(ValueSource) that can implement your logic. However, I think you should
not throw away the regex idea (as being slow), before trying it out -
because it can be faster than the joins. Your problem is that the number of
entities need to be limited, see recent replies of Jack Krupansky on the
number of fields.


The joins are of different kinds, I recommend this link to see their
differences: http://vimeo.com/44299232

If your data relations can fit in memory, a smart cache (ie [un]inverted
index) will always outperform lucene joins - look at the chart inside this:
http://code4lib.org/files/2ndOrderOperatorsv2.pdf

roman


On Mon, Jul 8, 2013 at 4:03 PM, Marcelo Elias Del Valle
<mvall...@gmail.com>wrote:

> Hello all,
>
>     I am using Solr Cloud today and I have the following need:
>
>    - My queries focus on counting how many users attend to some criteria.
>    So my main document is "user" (parent table)
>    - Each user can access several web pages (a child table) and each web
>    page might have several attributes.
>    - I need to lookup for users where there is some page accessed by them
>    which matches a set of attributes. For example, I have two scenarios:
>       1. if a user accessed a web page WP1 with a URL that starts with
>       "www." and with a title that includes "solr", then the user is a
> match.
>       2. However, if there is a webpage WP1 with such url and ANOTHER WP2
>       that includes "solr" in the title, this is not a match.
>
>
>     If I were modeling this on a relational DB, user would be a table and
> url would be other. However, as I using solr, my first option would be
> denormalizing first. Simply storing all the fields in the user document
> wouldn't work, as I would work as described in scenario 2.
>      I thought in two solutions for these:
>
>    - Using the idea of an inverted index - Having several kinds of
>    documents (user, web page, entity 3, entity 4, etc.) where each entity
> (web
>    page, for instance) would have a field to relate to the user id. Then,
>    using a cross join in solr to get the results where there was a match on
>    user (parent table) and also on each child entity (in other words, to
> merge
>    the results of several queries that might return user ids). This has a
>    drawback of using a join.
>    - Having just a user document and storing each web page as only one
>    field (like a json). To search, the same field would need to match a
>    regular expression that includes both conditions. This would make my
> search
>    slower and I would not be able to apply the same technique if the child
>    tables also had children.
>
>     Am I missing any obvious solution here? I would love to receive critics
> on this, as I am probably not the only one who have this problem...  I
> would like more ideas on how to denormalize data in this case.  Is the join
> my best option here?
>
> Best regards,
> --
> Marcelo Elias Del Valle
> http://mvalle.com - @mvallebr
>

Re: joins in solr cloud - good or bad idea?

Reply via email to