We are also evaluating an option in which each document has 600K fields (scalable tested with 1M fields). Index/reindex/query performance is acceptable (~2.5h to index 130K docs using 1 machine, query time <20ms), however atomic update took lots of memory and time. Hope can help.
-Ha Pham -----Original Message----- From: david.w.smi...@gmail.com [mailto:david.w.smi...@gmail.com] Sent: Friday, April 10, 2015 10:34 AM To: solr-user@lucene.apache.org; Marcelo Valle Subject: Re: multivalued fields or multiple fields? I don't at all thing a massive number of fields is helpful here. I added an answer to stack-overflow since you started this question/conversation there. I'll paste it here for those that don't want to follow the link: Use highlighting. @Jokin first mentioned it and I feel this is the best > answer without hacking on Solr. Try either the PostingsHighlighter or > the FastVectorHighlighter, not the default/standard highlighter. > Unfortunately both of them internally execute a wildcard query against > all UIDS in this field. FVH has the *opportunity* internally to be > smarter about that but it's not implemented that way. > note: if it's within scope to write a little Java to add to Solr, the ideal > answer would be to add term vectors (just the terms data in the > term-vector, no offsets/positions) and then write a "DocTransformer" > to grab the term vector terms; seek to the prefix, then iterate on > those that have that prefix. Pretty darned fast. ~ David Smiley Freelance Apache Lucene/Solr Search Consultant/Developer http://www.linkedin.com/in/davidwsmiley On Fri, Apr 10, 2015 at 5:46 AM, Marcelo Valle (BLOOMBERG/ LONDON) < mvallemil...@bloomberg.net> wrote: > I have a model where I store a field called `uuid_scores` in a > document and save values with the following format: > 123_456 - where 123 is uuid and 456 is the score. > > To retrieve scores for uuid 123, I search all documents where > uuid_scores field starts with 123_ and then I read only the values > that start with 123_ in the answer. > > The problem is I can have about 100k values in this multi valued > field, so it can be hard to retrieve just what I want, as stated in > http://stackoverflow.com/questions/29535197/how-to-filter-values-retur > ned-on-a-multivalued-field-in-solr > > Someone suggested me using 1 field per uuid. So instead of just 1 > multi valued field, I would have about 100k fields, with names like > score_123 (and value [456]). > > Is there a problem in having so many fields in Solr? What are the > advantages / disadvantages if: > > *I have to update this document later adding one more value *I need > fast inserts when inserting the document in the first time *I need to > query everything related to 1 uuid, so what would be the faster option > to search? > > Thanks > -Marcelo