RE: Many fields versus join

Steven Livingstone Pérez Tue, 21 Aug 2012 05:41:16 -0700

Thanka again Erick.
I have read some of Yonik's posts also.
I think 1M is closer to my number (i'm more interested in using Solr to improve 
the quality of search over a limited doc with lots of metadata set than 
quantity).
I'll make sure to stress test.
Cheers,/Steven


> Date: Tue, 21 Aug 2012 06:17:11 -0600
> Subject: Re: Many fields versus join
> From: erickerick...@gmail.com
> To: solr-user@lucene.apache.org
> 
> Steven:
> 
> Nope, I don't have any benchmarks off the top of my head.
> 
> You could probably compare this pretty quickly by using one of the
> benchmarking tools (http://wiki.apache.org/solr/BenchmarkingSolr)
> jMeter works as well, using two different schemas and
> configuring, say, an edismax request handler to search across
> all your fields.....
> 
> You could try some sort of clever indexing on multiValued fields with
> an appropriate positionIncrementGap and phrase slop. The idea
> here would be to put all the fields in one field and somehow
> keep them distinguishable (but I don't understand the domain
> well enough to suggest how).
> 
> But I think the real question is whether your corpus is big enough
> to worry about. Try the simple thing, stress test, and go from there.
> If you have a million docs, chances are you don't much care. 100M
> and it's dicier.
> 
> I have seen people like Yonik say that searching a bunch of
> separate fields is more expensive than searching a single large
> field, but whether it's enough to matter in _your_ situation only
> testing will tell....
> 
> Best
> Erick
> 
> On Tue, Aug 21, 2012 at 3:41 AM, Steven Livingstone Pérez
> <webl...@hotmail.com> wrote:
> > Many Thanks Erick.
> > Are you aware of any real world metrics or best practice/pattern samples 
> > that use a lot of fields?
> > I'm looking to get an ideas of the pros/cons as I scale.
> > On what you're saying it defo looks like I'll try keeping a flat structure 
> > (which means perhaps 300 fields) but given some things i read i suspect 
> > there are things to watch out for when defining so many fields (but then, 
> > not sure it 300 is a *big* number).
> > thanks,steven
> >
> >> Date: Mon, 20 Aug 2012 19:28:57 -0600
> >> Subject: Re: Many fields versus join
> >> From: erickerick...@gmail.com
> >> To: solr-user@lucene.apache.org
> >>
> >> Join works best with a small number of unique values. Unfortunately,
> >> people often want to join on <uniqueKey>, which is by definition
> >> unique per document.
> >>
> >> The usual advice is to first try to flatten your data as much as possible.
> >> There's also some ongoing work on "block joins" that you may want to
> >> look at the JIRA for, explicitly for parent/child relationships but I 
> >> confess
> >> I haven't a real clue what the details are....
> >>
> >> Best
> >> Erick
> >>
> >> On Mon, Aug 20, 2012 at 2:56 PM, Steven Livingstone Pérez
> >> <webl...@hotmail.com> wrote:
> >> > Hi folks. I read some posts in the past about this subject but nothing 
> >> > that definitively answer my question.
> >> > I am trying to understand the trade off when you use a large number of 
> >> > fields (now sure what a quantative value of large is in Solr .. say 200 
> >> > fields) versus a join - and even a multi value join.
> >> > The reason being, I have a document that has a set of core fields and 
> >> > then a load of metadata that is a repeating structure.
> >> > D1 F1 F2 F3 F4 F5 ..... S1a S1b S1c S2a S2b S2c ....
> >> > I'm not sure whether to create a load of fields up to SNx and a single 
> >> > document or to have multiple documents with each SNx in a separate 
> >> > document with a parent id that points to a parent document (or a 
> >> > multivalue metadata pointer field).
> >> > I hope that comes across reasonable well - please ask if not. Oh, if 
> >> > anyone knows of any quantative studies in Solr fields/documents i'd love 
> >> > to see the hard stats to improve my knowledge.
> >> > Loving Solr.
> >> > Cheers,/Steven
> >

RE: Many fields versus join

Reply via email to