Can you describe how you're planning on using Streaming? I can provide some feedback on how it will perform for your use use.
When scaling out Streaming you'll get large performance boosts when you increase the number of shards, replicas and workers. This is particularly true if you're doing parallel relational algebra or map/reduce operations. As far a DocValues being expensive with unique fields, you'll want to do a sizing exercise to see how many documents per-shard work best for your use case. There are different docValues implementations that will allow you to trade off memory for performance. Joel Bernstein http://joelsolr.blogspot.com/ On Mon, Apr 25, 2016 at 3:30 AM, Reth RM <reth.ik...@gmail.com> wrote: > Hi, > > So, is the concern related to same field value being stored twice: with > stored=true and docValues=true? If that is the case, there is a jira > relevant to this, fixed[1]. If you upgrade to 5.5/6.0 version, it is > possible to read non-stored fields from docValues index., check out. > > > [1] https://issues.apache.org/jira/browse/SOLR-8220 > > On Mon, Apr 25, 2016 at 9:44 AM, sudsport s <sudssf2...@gmail.com> wrote: > > > Thanks Erik for reply, > > > > Since I was storing Id (its stored field) and after enabling docValues my > > guess is it will be stored in 2 places. also as per my understanding > > docValues are great when you have values which repeat. I am not sure how > > beneficial it would be for uniqueId field. > > I am looking at collection of few hundred billion documents , that is > > reason I really want to care about expense from design phase. > > > > > > > > > > On Sun, Apr 24, 2016 at 7:24 PM, Erick Erickson <erickerick...@gmail.com > > > > wrote: > > > > > In a word, "yes". > > > > > > DocValues aren't particularly expensive, or expensive at all. The idea > > > is that when you sort by a field or facet, the field has to be > > > "uninverted" which builds the entire structure in Java's JVM (this is > > > when the field is _not_ DocValues). > > > > > > DocValues essentially serialize this structure to disk. So your > > > on-disk index size is larger, but that size is MMaped rather than > > > stored on Java's heap. > > > > > > Really, the question I'd have to ask though is "why do you care about > > > the expense?". If you have a functional requirement that has to be > > > served by returning the id via the /export handler, you really have no > > > choice. > > > > > > Best, > > > Erick > > > > > > > > > On Sun, Apr 24, 2016 at 9:55 AM, sudsport s <sudssf2...@gmail.com> > > wrote: > > > > I was trying to use Streaming for reading basic tuple stream. I am > > using > > > > sort by id asc , > > > > I am getting following exception > > > > > > > > I am using export search handler as per > > > > > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets > > > > > > > > null:java.io.IOException: id must have DocValues to use this feature. > > > > at > > > > > > org.apache.solr.response.SortingResponseWriter.getFieldWriters(SortingResponseWriter.java:241) > > > > at > > > > > > org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:120) > > > > at > > > > > > org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:53) > > > > at > > > > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:742) > > > > at > > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:471) > > > > at > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214) > > > > at > > > > > > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179) > > > > at > > > > > > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652) > > > > at > > > > > > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585) > > > > at > > > > > > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > > > > at > > > > > > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577) > > > > at > > > > > > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223) > > > > at > > > > > > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127) > > > > at > > > > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515) > > > > at org.eclipse.jetty.server.session.SessionHandler.doScope( > > > > > > > > > > > > does it make sense to enable docValues for unique field? How > expensive > > > is it? > > > > > > > > > > > > if I have existing collection can I update schema and optimize > > > > collection to get docvalues enabled for id? > > > > > > > > > > > > -- > > > > > > > > Thanks > > > > > >