Re: The Streaming API (Solrj.io) : id must have DocValues?

sudsport s Tue, 26 Apr 2016 10:31:30 -0700

Thanks @Reth yes that was my one of the concern. I will look at JIRA you
mentioned.


Thanks Joel
I used some of examples for streaming client from your blog. I got basic
tuple stream working but I get following exception while running parallel
string.


java.io.IOException: java.util.concurrent.ExecutionException:
org.noggit.JSONParser$ParseException: JSON Parse Error: char=<,position=0
BEFORE='<' AFTER='html> <head> <meta http-equiv="Content-'
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:332)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:231)



I tried to look into solr logs but after turning on debug mode I found
following
POST /solr/collection_shard20_replica1/stream HTTP/1.1
"HTTP/1.1 404 Not Found[\r][\n]"


looks like Parallel stream is trying to access /stream on shard. can
someone tell me how to enable stream handler? I have export handler
enabled. I will look at latest solrconfig to see if I can turn that on.



@Joel I am running sizing exercises already , I will run new one with
solr5.5+ and docValues on id enabled.

BTW Solr streaming has amazing response times thanks for making it so
FAST!!!







On Mon, Apr 25, 2016 at 10:54 AM, Joel Bernstein <joels...@gmail.com> wrote:

> Can you describe how you're planning on using Streaming? I can provide some
> feedback on how it will perform for your use use.
>
> When scaling out Streaming you'll get large performance boosts when you
> increase the number of shards, replicas and workers. This is particularly
> true if you're doing parallel relational algebra or map/reduce operations.
>
> As far a DocValues being expensive with unique fields, you'll want to do a
> sizing exercise to see how many documents per-shard work best for your use
> case. There are different docValues implementations that will allow you to
> trade off memory for performance.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Mon, Apr 25, 2016 at 3:30 AM, Reth RM <reth.ik...@gmail.com> wrote:
>
> > Hi,
> >
> > So, is the concern related to same field value being stored twice: with
> > stored=true and docValues=true? If that is the case, there is a jira
> > relevant to this, fixed[1]. If you upgrade to 5.5/6.0 version, it is
> > possible to read non-stored fields from docValues index., check out.
> >
> >
> > [1] https://issues.apache.org/jira/browse/SOLR-8220
> >
> > On Mon, Apr 25, 2016 at 9:44 AM, sudsport s <sudssf2...@gmail.com>
> wrote:
> >
> > > Thanks Erik for reply,
> > >
> > > Since I was storing Id (its stored field) and after enabling docValues
> my
> > > guess is it will be stored in 2 places. also as per my understanding
> > > docValues are great when you have values which repeat. I am not sure
> how
> > > beneficial it would be for uniqueId field.
> > > I am looking at collection of few hundred billion documents , that is
> > > reason I really want to care about expense from design phase.
> > >
> > >
> > >
> > >
> > > On Sun, Apr 24, 2016 at 7:24 PM, Erick Erickson <
> erickerick...@gmail.com
> > >
> > > wrote:
> > >
> > > > In a word, "yes".
> > > >
> > > > DocValues aren't particularly expensive, or expensive at all. The
> idea
> > > > is that when you sort by a field or facet, the field has to be
> > > > "uninverted" which builds the entire structure in Java's JVM (this is
> > > > when the field is _not_ DocValues).
> > > >
> > > > DocValues essentially serialize this structure to disk. So your
> > > > on-disk index size is larger, but that size is MMaped rather than
> > > > stored on Java's heap.
> > > >
> > > > Really, the question I'd have to ask though is "why do you care about
> > > > the expense?". If you have a functional requirement that has to be
> > > > served by returning the id via the /export handler, you really have
> no
> > > > choice.
> > > >
> > > > Best,
> > > > Erick
> > > >
> > > >
> > > > On Sun, Apr 24, 2016 at 9:55 AM, sudsport s <sudssf2...@gmail.com>
> > > wrote:
> > > > > I was trying to use Streaming for reading basic tuple stream. I am
> > > using
> > > > > sort by id asc ,
> > > > > I am getting following exception
> > > > >
> > > > > I am using export search handler as per
> > > > >
> > https://cwiki.apache.org/confluence/display/solr/Exporting+Result+Sets
> > > > >
> > > > > null:java.io.IOException: id must have DocValues to use this
> feature.
> > > > >         at
> > > >
> > >
> >
> org.apache.solr.response.SortingResponseWriter.getFieldWriters(SortingResponseWriter.java:241)
> > > > >         at
> > > >
> > >
> >
> org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:120)
> > > > >         at
> > > >
> > >
> >
> org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:53)
> > > > >         at
> > > >
> > org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:742)
> > > > >         at
> > > > org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:471)
> > > > >         at
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:214)
> > > > >         at
> > > >
> > >
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:179)
> > > > >         at
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)
> > > > >         at
> > > >
> > >
> >
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)
> > > > >         at
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> > > > >         at
> > > >
> > >
> >
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)
> > > > >         at
> > > >
> > >
> >
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)
> > > > >         at
> > > >
> > >
> >
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)
> > > > >         at
> > > >
> > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)
> > > > >         at org.eclipse.jetty.server.session.SessionHandler.doScope(
> > > > >
> > > > >
> > > > > does it make sense to enable docValues for unique field? How
> > expensive
> > > > is it?
> > > > >
> > > > >
> > > > > if I have existing collection can I update schema and optimize
> > > > > collection to get docvalues enabled for id?
> > > > >
> > > > >
> > > > > --
> > > > >
> > > > > Thanks
> > > >
> > >
> >
>

Re: The Streaming API (Solrj.io) : id must have DocValues?

Reply via email to