Dotan, Could you please provide more line of the stack trace? I have no idea why it made worse at 4.3. I know that 4.3 can use facets backed on DocValues, which are modest for the heap. But from what I saw, but can be wrong it's disabled from numeric facets. Hence, I can suggest to reindex id as string docvalues and hope for them. However, it's doubtful to reindex everything without strong guaranties. Also, I checked source code of http://wiki.apache.org/solr/TermsComponentand found that it can be really memory modest (ie without sort nor limit). Be aware that df-s returned by that component are unaware of deleted document, hence expungeDeletes before.
On Tue, Jul 30, 2013 at 10:16 PM, Dotan Cohen <dotanco...@gmail.com> wrote: > To search for duplicate IDs, I am running the following query: > select?q=*:*&facet=true&facet.field=id&rows=0 > > However, since upgrading from Solr 4.1 to Solr 4.3 I am receiving > OutOfMemoryError errors instead of the desired facet: > > <response><lst name="error"><str > name="msg">java.lang.OutOfMemoryError: Java heap space</str><str > name="trace">java.lang.RuntimeException: java.lang.OutOfMemoryError: > Java heap space > at > org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:670) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:380) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) > at ... > > Might there be a less resource-intensive way to get this information. > This is Solr 4.3 running on Ubuntu Server 12.04 in Jetty. The index > has over 100,000,000 small records, for a total of about 95 GiB of > disk space, with Solr running on it's own disk. Actually, the 'disk' > is an Amazon Web Service EBS volume. > > -- > Dotan Cohen > > http://gibberish.co.il > http://what-is-what.com > -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>