Faceting on very high cardinality fields can use up memory, no doubt
about that. I think the entire delete question was a red herring, but
you know that already ;)....
So I think you can forget about the delete stuff. Although do note
that if you do re-index your old documents, the new version won't have
the field, and as segments are merged the deleted documents will have
all their resources reclaimed, effectively deleting the field from the
old docs.... So you could gradually re-index your corpus and get this
stuff out of there.
Best,
Erick
On Sat, May 30, 2015 at 5:18 AM, Joseph Obernberger
<j...@lovehorsepower.com> wrote:
Thank you Erick. I was thinking that it actually went through and
removed
the index data; that you for the clarification. What happened was I had
some bad data that created a lot of fields (some 8000). I was getting
some
errors adding new fields where solr could not talk to zookeeper, and I
thought it may be because there are so many fields. The index size is
some
420million docs.
I'm hesitant to try to re-create as when the shards crash, they leave a
write.lock file in HDFS, and I need to manually delete that file (on 27
machines) before bringing them back up.
I believe this is the stack trace - but this looks to be related to
facets,
and I'm not 100% sure that this is the correct trace! Sorry - I if it
happens again I will update.
ERROR - 2015-05-29 20:39:34.707; [UNCLASS shard9 core_node14 UNCLASS]
org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
java.lang.OutOfMemoryError: unable to create new native thread
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at
java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:949)
at
java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1371)
at
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:637)
at
org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:280)
at
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:106)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:222)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:143)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1984)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:829)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:446)
... 26 more
Then later:
ERROR - 2015-05-29 21:57:22.370; [UNCLASS shard9 core_node14 UNCLASS]
org.apache.solr.common.SolrException; null:java.lang.RuntimeException:
java.lang.OutOfMemoryError: Java heap space
at
org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:854)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:463)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:220)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
at
org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.OutOfMemoryError: Java heap space
-Joe
On 5/30/2015 12:32 AM, Erick Erickson wrote:
Yes, but deleting fields from the schema only means that _future_
documents will throw an "undefined field" error. All the documents
currently in the index will retain that field.
Why you're hitting an OOM is a mystery though. But delete field isn't
removing the contents if indexed documents. Showing us the full stack
when you hit an OOM would be helpful.
Best,
Erick
On Fri, May 29, 2015 at 4:58 PM, Joseph Obernberger
<j...@lovehorsepower.com> wrote:
Thank you Shawn - I'm referring to fields in the schema. With Solr 5,
you
can delete fields from the schema.
https://cwiki.apache.org/confluence/display/solr/Schema+API#SchemaAPI-DeleteaField
-Joe
On 5/29/2015 7:30 PM, Shawn Heisey wrote:
On 5/29/2015 5:08 PM, Joseph Obernberger wrote:
Hi All - I have a lot of fields to delete, but noticed that once I
started deleting them, I quickly ran out of heap space. Is
delete-field a memory intensive operation? Should I delete one
field,
wait a while, then delete the next?
I'm not aware of a way to delete a field. I may have a different
definition of what a field is than you do, though.
Solr lets you delete entire documents, but deleting a field from the
entire index would involve re-indexing every document in the index,
excluding that field.
Can you be more specific about exactly what you are doing, what you
are
seeing, and what you want to see instead?
Also, please be aware of this:
http://people.apache.org/~hossman/#threadhijack
Thanks,
Shawn