Thanka again Erick. I have read some of Yonik's posts also. I think 1M is closer to my number (i'm more interested in using Solr to improve the quality of search over a limited doc with lots of metadata set than quantity). I'll make sure to stress test. Cheers,/Steven
> Date: Tue, 21 Aug 2012 06:17:11 -0600 > Subject: Re: Many fields versus join > From: erickerick...@gmail.com > To: solr-user@lucene.apache.org > > Steven: > > Nope, I don't have any benchmarks off the top of my head. > > You could probably compare this pretty quickly by using one of the > benchmarking tools (http://wiki.apache.org/solr/BenchmarkingSolr) > jMeter works as well, using two different schemas and > configuring, say, an edismax request handler to search across > all your fields..... > > You could try some sort of clever indexing on multiValued fields with > an appropriate positionIncrementGap and phrase slop. The idea > here would be to put all the fields in one field and somehow > keep them distinguishable (but I don't understand the domain > well enough to suggest how). > > But I think the real question is whether your corpus is big enough > to worry about. Try the simple thing, stress test, and go from there. > If you have a million docs, chances are you don't much care. 100M > and it's dicier. > > I have seen people like Yonik say that searching a bunch of > separate fields is more expensive than searching a single large > field, but whether it's enough to matter in _your_ situation only > testing will tell.... > > Best > Erick > > On Tue, Aug 21, 2012 at 3:41 AM, Steven Livingstone Pérez > <webl...@hotmail.com> wrote: > > Many Thanks Erick. > > Are you aware of any real world metrics or best practice/pattern samples > > that use a lot of fields? > > I'm looking to get an ideas of the pros/cons as I scale. > > On what you're saying it defo looks like I'll try keeping a flat structure > > (which means perhaps 300 fields) but given some things i read i suspect > > there are things to watch out for when defining so many fields (but then, > > not sure it 300 is a *big* number). > > thanks,steven > > > >> Date: Mon, 20 Aug 2012 19:28:57 -0600 > >> Subject: Re: Many fields versus join > >> From: erickerick...@gmail.com > >> To: solr-user@lucene.apache.org > >> > >> Join works best with a small number of unique values. Unfortunately, > >> people often want to join on <uniqueKey>, which is by definition > >> unique per document. > >> > >> The usual advice is to first try to flatten your data as much as possible. > >> There's also some ongoing work on "block joins" that you may want to > >> look at the JIRA for, explicitly for parent/child relationships but I > >> confess > >> I haven't a real clue what the details are.... > >> > >> Best > >> Erick > >> > >> On Mon, Aug 20, 2012 at 2:56 PM, Steven Livingstone Pérez > >> <webl...@hotmail.com> wrote: > >> > Hi folks. I read some posts in the past about this subject but nothing > >> > that definitively answer my question. > >> > I am trying to understand the trade off when you use a large number of > >> > fields (now sure what a quantative value of large is in Solr .. say 200 > >> > fields) versus a join - and even a multi value join. > >> > The reason being, I have a document that has a set of core fields and > >> > then a load of metadata that is a repeating structure. > >> > D1 F1 F2 F3 F4 F5 ..... S1a S1b S1c S2a S2b S2c .... > >> > I'm not sure whether to create a load of fields up to SNx and a single > >> > document or to have multiple documents with each SNx in a separate > >> > document with a parent id that points to a parent document (or a > >> > multivalue metadata pointer field). > >> > I hope that comes across reasonable well - please ask if not. Oh, if > >> > anyone knows of any quantative studies in Solr fields/documents i'd love > >> > to see the hard stats to improve my knowledge. > >> > Loving Solr. > >> > Cheers,/Steven > >