I tried to build a large model based on about 1.2 million documents.
One of the nodes ran out of memory and killed itself. Is this much data
not reasonable to use? The nodes have 16g of heap. Happy to increase
it, but not sure if this is possible?
Thank you!
-Joe
On 4/5/2018 10:24 AM, Joe Obernberger wrote:
Thank you Shawn - sorry so long to respond, been playing around with
this a good bit. It is an amazing capability. It looks like it could
be related to certain nodes in the cluster not responding quickly
enough. In one case, I got the concurrent.ExecutionException, but it
looks like the root cause was a SocketTimeoutException. I'm using
HDFS for the index and it gets hit pretty hard by other processes
running, and I'm wondering if that's causing this.
java.io.IOException: java.util.concurrent.ExecutionException:
java.io.IOException: params
expr=update(models,+batchSize%3D"50",train(MODEL1033_1522883727011,features(MODEL1033_1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_1522883727011",field%3D"Text",outcome%3D"out_i",positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL1033",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000"))&qt=/stream&explain=true&q=*:*&fl=id&sort=id+asc&distrib=false
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:405)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:275)
at
com.ngc.bigdata.ie_solrmodelbuilder.SolrModelBuilderProcessor.doWork(SolrModelBuilderProcessor.java:114)
at
com.ngc.intelenterprise.intelentutil.utils.Processor.run(Processor.java:140)
at
com.ngc.intelenterprise.intelentutil.jms.IntelEntQueueProc.process(IntelEntQueueProc.java:208)
at
org.apache.camel.processor.DelegateSyncProcessor.process(DelegateSyncProcessor.java:63)
at
org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at
org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:460)
at
org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
at
org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
at
org.apache.camel.component.direct.DirectProducer.process(DirectProducer.java:62)
at
org.apache.camel.processor.SendProcessor.process(SendProcessor.java:141)
at
org.apache.camel.management.InstrumentationProcessor.process(InstrumentationProcessor.java:77)
at
org.apache.camel.processor.RedeliveryErrorHandler.process(RedeliveryErrorHandler.java:460)
at
org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
at
org.apache.camel.processor.CamelInternalProcessor.process(CamelInternalProcessor.java:190)
at
org.apache.camel.component.jms.EndpointMessageListener.onMessage(EndpointMessageListener.java:114)
at
org.springframework.jms.listener.AbstractMessageListenerContainer.doInvokeListener(AbstractMessageListenerContainer.java:699)
at
org.springframework.jms.listener.AbstractMessageListenerContainer.invokeListener(AbstractMessageListenerContainer.java:637)
at
org.springframework.jms.listener.AbstractMessageListenerContainer.doExecuteListener(AbstractMessageListenerContainer.java:605)
at
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.doReceiveAndExecute(AbstractPollingMessageListenerContainer.java:308)
at
org.springframework.jms.listener.AbstractPollingMessageListenerContainer.receiveAndExecute(AbstractPollingMessageListenerContainer.java:246)
at
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.invokeListener(DefaultMessageListenerContainer.java:1144)
at
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.executeOngoingLoop(DefaultMessageListenerContainer.java:1136)
at
org.springframework.jms.listener.DefaultMessageListenerContainer$AsyncMessageListenerInvoker.run(DefaultMessageListenerContainer.java:1033)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException:
java.io.IOException: params
expr=update(models,+batchSize%3D"50",train(MODEL1033_1522883727011,features(MODEL1033_1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_1522883727011",field%3D"Text",outcome%3D"out_i",positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL1033",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000"))&qt=/stream&explain=true&q=*:*&fl=id&sort=id+asc&distrib=false
at java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.util.concurrent.FutureTask.get(FutureTask.java:192)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream.openStreams(CloudSolrStream.java:399)
... 27 more
Caused by: java.io.IOException: params
expr=update(models,+batchSize%3D"50",train(MODEL1033_1522883727011,features(MODEL1033_1522883727011,q%3D"*:*",featureSet%3D"FSet_MODEL1033_1522883727011",field%3D"Text",outcome%3D"out_i",positiveLabel%3D1,numTerms%3D1000),q%3D"*:*",name%3D"MODEL1033",field%3D"Text",outcome%3D"out_i",maxIterations%3D"1000"))&qt=/stream&explain=true&q=*:*&fl=id&sort=id+asc&distrib=false
at
org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrStream.java:115)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:510)
at
org.apache.solr.client.solrj.io.stream.CloudSolrStream$StreamOpener.call(CloudSolrStream.java:499)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
... 3 more
Caused by: org.apache.solr.client.solrj.SolrServerException: Timeout
occured while waiting response from server at:
http://leda:9100/solr/MODEL1033_1522883727011_shard20_replica_n74
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:637)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:253)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:242)
at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
at
org.apache.solr.client.solrj.io.stream.SolrStream.constructParser(SolrStream.java:269)
at
org.apache.solr.client.solrj.io.stream.SolrStream.open(SolrStream.java:113)
... 7 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at
java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at
org.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:139)
at
org.apache.http.impl.io.SessionInputBufferImpl.fillBuffer(SessionInputBufferImpl.java:155)
at
org.apache.http.impl.io.SessionInputBufferImpl.readLine(SessionInputBufferImpl.java:284)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:138)
at
org.apache.http.impl.conn.DefaultHttpResponseParser.parseHead(DefaultHttpResponseParser.java:56)
at
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:261)
at
org.apache.http.impl.DefaultBHttpClientConnection.receiveResponseHeader(DefaultBHttpClientConnection.java:165)
at
org.apache.http.impl.conn.CPoolProxy.receiveResponseHeader(CPoolProxy.java:165)
at
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:272)
at
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:124)
at
org.apache.http.impl.execchain.MainClientExec.execute(MainClientExec.java:272)
at
org.apache.http.impl.execchain.ProtocolExec.execute(ProtocolExec.java:185)
at
org.apache.http.impl.execchain.RetryExec.execute(RetryExec.java:89)
at
org.apache.http.impl.execchain.RedirectExec.execute(RedirectExec.java:111)
at
org.apache.http.impl.client.InternalHttpClient.doExecute(InternalHttpClient.java:185)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:83)
at
org.apache.http.impl.client.CloseableHttpClient.execute(CloseableHttpClient.java:56)
at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:525)
-Joe
On 4/2/2018 7:09 PM, Shawn Heisey wrote:
On 4/2/2018 1:55 PM, Joe Obernberger wrote:
The training data was split across 20 shards - specifically created
with:
http://icarus.querymasters.com:9100/solr/admin/collections?action=CREATE&name=MODEL1024_1522696624083&numShards=20&replicationFactor=2&maxShardsPerNode=5&collection.configName=TRAINING
Any ideas? The complete error is:
<snip>
<body><h2>HTTP ERROR 404</h2>
<p>Problem accessing
/solr/MODEL1024_1522696624083_shard20_replica_n75/select. Reason:
<pre> Not Found</pre></p>
</body>
I'll warn you in advance that I know nothing at all about the learning
to rank functionality. I'm replying about the underlying error you're
getting, independent of what your query is trying to accomplish.
It's a 404 error, trying to access the URL mentioned above.
The error doesn't indicate exactly WHAT wasn't found. It could either
be the core named "MODEL1024_1522696624083_shard20_replica_n75" or the
"/select" handler on that core. That's something you need to figure
out. It could be that the core *does* exist, but for some reason, Solr
on that machine was unable to start it.
The solr.log file on the Solr instance that returned the error (which
seems to be on the machine named vesta, answering to port 9100) may have
more detail for the error, or some additional error messages.
Normally SolrCloud is good at making sure that requests aren't sent to
resources that aren't working. So I'm not sure why this happened.
Are there other errors or warnings in the solr.log file, either on the
instance where you sent your request, or the instance that returned the
404 error?
Thanks,
Shawn
---
This email has been checked for viruses by AVG.
http://www.avg.com