Re: Many fields versus join

Erick Erickson Tue, 21 Aug 2012 05:17:50 -0700

Steven:

Nope, I don't have any benchmarks off the top of my head.


You could probably compare this pretty quickly by using one of the
benchmarking tools (http://wiki.apache.org/solr/BenchmarkingSolr)
jMeter works as well, using two different schemas and
configuring, say, an edismax request handler to search across
all your fields.....

You could try some sort of clever indexing on multiValued fields with
an appropriate positionIncrementGap and phrase slop. The idea
here would be to put all the fields in one field and somehow
keep them distinguishable (but I don't understand the domain
well enough to suggest how).

But I think the real question is whether your corpus is big enough
to worry about. Try the simple thing, stress test, and go from there.
If you have a million docs, chances are you don't much care. 100M
and it's dicier.

I have seen people like Yonik say that searching a bunch of
separate fields is more expensive than searching a single large
field, but whether it's enough to matter in _your_ situation only
testing will tell....

Best
Erick

On Tue, Aug 21, 2012 at 3:41 AM, Steven Livingstone Pérez
<webl...@hotmail.com> wrote:
> Many Thanks Erick.
> Are you aware of any real world metrics or best practice/pattern samples that 
> use a lot of fields?
> I'm looking to get an ideas of the pros/cons as I scale.
> On what you're saying it defo looks like I'll try keeping a flat structure 
> (which means perhaps 300 fields) but given some things i read i suspect there 
> are things to watch out for when defining so many fields (but then, not sure 
> it 300 is a *big* number).
> thanks,steven
>
>> Date: Mon, 20 Aug 2012 19:28:57 -0600
>> Subject: Re: Many fields versus join
>> From: erickerick...@gmail.com
>> To: solr-user@lucene.apache.org
>>
>> Join works best with a small number of unique values. Unfortunately,
>> people often want to join on <uniqueKey>, which is by definition
>> unique per document.
>>
>> The usual advice is to first try to flatten your data as much as possible.
>> There's also some ongoing work on "block joins" that you may want to
>> look at the JIRA for, explicitly for parent/child relationships but I confess
>> I haven't a real clue what the details are....
>>
>> Best
>> Erick
>>
>> On Mon, Aug 20, 2012 at 2:56 PM, Steven Livingstone Pérez
>> <webl...@hotmail.com> wrote:
>> > Hi folks. I read some posts in the past about this subject but nothing 
>> > that definitively answer my question.
>> > I am trying to understand the trade off when you use a large number of 
>> > fields (now sure what a quantative value of large is in Solr .. say 200 
>> > fields) versus a join - and even a multi value join.
>> > The reason being, I have a document that has a set of core fields and then 
>> > a load of metadata that is a repeating structure.
>> > D1 F1 F2 F3 F4 F5 ..... S1a S1b S1c S2a S2b S2c ....
>> > I'm not sure whether to create a load of fields up to SNx and a single 
>> > document or to have multiple documents with each SNx in a separate 
>> > document with a parent id that points to a parent document (or a 
>> > multivalue metadata pointer field).
>> > I hope that comes across reasonable well - please ask if not. Oh, if 
>> > anyone knows of any quantative studies in Solr fields/documents i'd love 
>> > to see the hard stats to improve my knowledge.
>> > Loving Solr.
>> > Cheers,/Steven
>

Re: Many fields versus join

Reply via email to