As far as I know, there's no underlying difference between adding all 42K tokens one at a time (mutlivalued) or all at once (singlevalued), with one rather technical difference: If you've changed the positionIncrementGap to something other than "1" in your schema, then the token offsets delta between successive value adds will be something other than one. Put another way, there's no difference if you leave PositionIncrementGap="1". And even that doesn't matter of you're not doing proximity queries on that field.
You could even batch them up in chunks. I.e. <zip>zip1 zip2 zip3</zip> <zip>zip4 zip5 zip6</zip> You're only talking 2.5M tokens or so, right? I predict you'll never notice the data duplication etc. I'd guess that it's too small of a data set to worry about... HTH Erick On Tue, Jan 19, 2010 at 3:15 PM, SHS SOLR <shss...@gmail.com> wrote: > Thanks Erik, > > I was not aware of the maxFieldLength. > > * Query performance compared to storing data by zipcode. Schema to > accommodate this would have 42K * 60 documents approx. Also to consider > duplicate document data with varying zipcode in the index. > > Hope this makes sense. We however wanted to understand if it is a good > practice to dump 42K tokens in a multivalued field. > > Thanks, > Pavan. > > On Tue, Jan 19, 2010 at 1:56 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > You should be able to do this no problem. Do be aware of the > > maxfieldlength though, it defaults to 10,000 tokens but you > > can change it in your schema.xml. Beware, there are TWO > > instances of this in the schema file. See: > > > > > http://search.lucidimagination.com/search/document/30616a061f8c4bf6/solr_ignoring_maxfieldlength > > > > What do you mean by index/search performance impact? As > > compared to what? > > > > I think the impacts will be negligible when compared to putting all > > the zip codes into the field at once, and search time should be > > unaffected over that alternative. > > > > HTH > > Erick > > > > On Tue, Jan 19, 2010 at 12:11 PM, SHS SOLR <shss...@gmail.com> wrote: > > > > > * Can we define a field in our schema as multiValued (with > stored=false, > > > indexed=true) that will hold upto 42K zipcode values associated to each > > > document? > > > * Is there any query time performance impact with this. > > > * Is there any impact on index time. > > > > > > The number of documents we are talking here is not more than 100 right > > now. > > > There is no requirement to facet or highlight or even show this field > in > > > the > > > search results. We only want to enable zipcode searches that would > return > > > matching docs. > > > > > > Thanks, > > > > > >