Hi Jack, I should have pointed out our use case. In any reasonable case where actual end users will be looking at search results, paging 1,000 at a time is reasonable. But what we are doing is a dump of the unique ids with a "*:*" query. This allows us to verify that what our system thinks has been indexed is actually indexed. Since we need to dump out results in the hundreds of millions, requesting 1,000 at a time is not scalable.
The other context of this is that we currently index 10 million books with each book as a Solr document. We are looking at indexing at the page level, which would result in about 3 billion pages. So testing the scalability of queries used by our current production system, such as the query against the index that is not released to production to get a list of the unique ids that are actually indexed in Solr is part of that testing process. Tom On Thu, Jul 25, 2013 at 2:13 PM, Jack Krupansky <j...@basetechnology.com>wrote: > As usual, there is no published hard limit per se, but I would urge > caution about requesting more than 1,000 rows at a time or even 250. Sure, > in a fair number of cases 5,000 or 10,000 or even 100,000 MAY work (at > least sometimes), but Solr and Lucene are more appropriate for "paged" > results, where page size is 10, 20, 50, 100 or something in that range. So, > my recommendation is to use 250 to 1,000 as the limit for rows. And > certainly do a proof of concept implementation for anything above 1,000. > > So, if rows=100000 works for you, consider yourself lucky! > > That said, there is sometimes talk of supporting streaming, which > presumably would allow access to all results, but chunked/paged in some way. > > -- Jack Krupansky > > -----Original Message----- From: Tom Burton-West > Sent: Thursday, July 25, 2013 1:39 PM > To: solr-user@lucene.apache.org > Subject: Solr 4.2.1 limit on number of rows or number of hits per shard? > > Hello, > > I am running solr 4.2.1 on 3 shards and have about 365 million documents in > the index total. > I sent a query asking for 1 million rows at a time, but I keep getting an > error claiming that there is an invalid version or data not in javabin > format (see below) > > If I lower the number of rows requested to 100,000, I have no problems. > > Does Solr have a limit on number of rows that can be requested or is this > a bug? > > > Tom > > INFO: [core] webapp=/dev-1 path=/select > params={shards=XXX:8111/dev-1/**core,XXX:8111/dev-2/core,XXX:** > 8111/dev-3/core&fl=vol_id&**indent=on&start=0&q=*:*&rows=**1000000} > hits=365488789 status=500 QTime=132743 > Jul 25, 2013 1:26:00 PM org.apache.solr.common.**SolrException log > SEVERE: null:org.apache.solr.common.**SolrException: > java.lang.RuntimeException: Invalid version (expected 2, but 60) or the > data in not in 'javabin' format > at > org.apache.solr.handler.**component.SearchHandler.**handleRequestBody(** > SearchHandler.java:302) > at > org.apache.solr.handler.**RequestHandlerBase.**handleRequest(** > RequestHandlerBase.java:135) > at org.apache.solr.core.SolrCore.**execute(SolrCore.java:1817) > at > org.apache.solr.servlet.**SolrDispatchFilter.execute(** > SolrDispatchFilter.java:639) > at > org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** > SolrDispatchFilter.java:345) > at > org.apache.solr.servlet.**SolrDispatchFilter.doFilter(** > SolrDispatchFilter.java:141) > at > org.apache.catalina.core.**ApplicationFilterChain.**internalDoFilter(** > ApplicationFilterChain.java:**215) > at > org.apache.catalina.core.**ApplicationFilterChain.**doFilter(** > ApplicationFilterChain.java:**188) > at > org.apache.catalina.core.**StandardWrapperValve.invoke(** > StandardWrapperValve.java:213) > at > org.apache.catalina.core.**StandardContextValve.invoke(** > StandardContextValve.java:172) > at > org.apache.catalina.valves.**AccessLogValve.invoke(** > AccessLogValve.java:548) > at > org.apache.catalina.core.**StandardHostValve.invoke(** > StandardHostValve.java:127) > at > org.apache.catalina.valves.**ErrorReportValve.invoke(** > ErrorReportValve.java:117) > at > org.apache.catalina.core.**StandardEngineValve.invoke(** > StandardEngineValve.java:108) > at > org.apache.catalina.connector.**CoyoteAdapter.service(** > CoyoteAdapter.java:174) > at > org.apache.coyote.http11.**Http11Processor.process(** > Http11Processor.java:875) > at > org.apache.coyote.http11.**Http11BaseProtocol$**Http11ConnectionHandler.** > processConnection(**Http11BaseProtocol.java:665) > at > org.apache.tomcat.util.net.**PoolTcpEndpoint.processSocket(** > PoolTcpEndpoint.java:528) > at > org.apache.tomcat.util.net.**LeaderFollowerWorkerThread.**runIt(** > LeaderFollowerWorkerThread.**java:81) > at > org.apache.tomcat.util.**threads.ThreadPool$**ControlRunnable.run(** > ThreadPool.java:689) > at java.lang.Thread.run(Thread.**java:619) > Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) > or the data in not in 'javabin' format > at > org.apache.solr.common.util.**JavaBinCodec.unmarshal(** > JavaBinCodec.java:109) > at > org.apache.solr.client.solrj.**impl.BinaryResponseParser.** > processResponse(**BinaryResponseParser.java:41) > : >