Architecture: How to commercialise a Solr based application?

2017-08-06 Thread Paul Smith Parker
Hello,

I am building a search application based on single core Solr 6.6 server, with 
an Angular frontend.
Between the frontend and the Solr server I am thinking of using a Java backend 
(this to avoid exposing Solr end points directly to the frontend).

I would like to package all those components and commercialise the final 
product.

Do you have any advice on what technology I should use to build this final 
product?

I would do the installation at customer’s premise, including data import, 
maintenance and support.

Ideally, I would like the customer to access only the frontend and never access 
the Solr configuration files nor call the Solr endpoints directly.

Initially I thought of delivering a Linux based VM, but that seems a bit too 
heavy.
Another idea is to create a docker container with all components.

In any case I need some kind of licensing mechanism that prevents the customer 
from installing/running an arbitrary number of instances (the commercial model 
is based on a pay per installation approach).

I know this is not Solr specific, but I was wondering if you could share your 
experience on how to commercialise a Solr based application.

Any help is much appreciated.

Thank you,
Paul





SOLR Metric Reporting to graphite

2017-08-06 Thread abhi Abhishek
Hi All,
I am trying to setup the graphite reporter for SOLR 6.5.0. i've started
a sample docker instance for graphite with statd (
https://github.com/hopsoft/docker-graphite-statsd).

also i've added the graphite metrics reporter in the SOLR.xml config of the
collection. however post doing this i dont see any data getting posted to
the graphite (
https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting).
added XML Config to solr.xml
 
  
localhost
2003
1
  
 
Graphite Mapped Ports
HostContainerService
80 80 nginx 
2003 2003 carbon receiver - plaintext

2004 2004 carbon receiver - pickle

2023 2023 carbon aggregator - plaintext

2024 2024 carbon aggregator - pickle

8125 8125 statsd 
8126 8126 statsd admin


please advice if i am doing something wrong here.

Thanks,
Abhishek


Re: SOLR Metric Reporting to graphite

2017-08-06 Thread Amrit Sarkar
Hi,

I didn't had a chance to go through the steps you are doing, but I followed
the one written by Varun Thacker via influxdb:
https://github.com/vthacker/solr-metrics-influxdb, and it works fine. Maybe
it can be of some help.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2

On Sun, Aug 6, 2017 at 9:47 PM, abhi Abhishek  wrote:

> Hi All,
> I am trying to setup the graphite reporter for SOLR 6.5.0. i've started
> a sample docker instance for graphite with statd (
> https://github.com/hopsoft/docker-graphite-statsd).
>
> also i've added the graphite metrics reporter in the SOLR.xml config of the
> collection. however post doing this i dont see any data getting posted to
> the graphite (
> https://cwiki.apache.org/confluence/display/solr/Metrics+Reporting).
> added XML Config to solr.xml
>  
>class="org.apache.solr.metrics.reporters.SolrGraphiteReporter">
> localhost
> 2003
> 1
>   
>  
> Graphite Mapped Ports
> HostContainerService
> 80 80 nginx 
> 2003 2003 carbon receiver - plaintext
>  html#the-plaintext-protocol>
> 2004 2004 carbon receiver - pickle
>  html#the-pickle-protocol>
> 2023 2023 carbon aggregator - plaintext
>  html#carbon-aggregator-py>
> 2024 2024 carbon aggregator - pickle
>  html#carbon-aggregator-py>
> 8125 8125 statsd  server.md>
> 8126 8126 statsd admin
> 
> 
> please advice if i am doing something wrong here.
>
> Thanks,
> Abhishek
>


Solr 6.6: Configure number of indexing threads

2017-08-06 Thread Nawab Zada Asad Iqbal
Hi

I have switched between solr and lucene user lists while debugging this
issue (detail In following thread) My current hypothesis is that since a
large number of indexing threads are being created ( maxIndexingThreads
config is now obsolete) , each output segment is really small .  Reference:
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-6659


Is there any config in solr 6.6 to control this ?
If not , why was the current config  considered useless ?

Thanks
Nawab

-- Forwarded message -
From: Nawab Zada Asad Iqbal 
Date: Sun, Aug 6, 2017 at 8:25 AM
Subject: Re: Understanding flush and DocumentsWriterPerThread
To: 


I think I am hitting this problem. Since, maxIndexingThreads is not used
anymore, i see 330+ indexing threads (in the attached log:-   "334 in-use
non-flushing threads states" )

The bugfix recommends using custom code to control concurrency in
IndexWriter, how can I configure it using solr6.6 ?


On Sat, Aug 5, 2017 at 12:59 PM, Nawab Zada Asad Iqbal 
wrote:

> Hi,
>
> I am debugging a bulk indexing performance issue while upgrading to 6.6
> from 4.5.0 . I have commits disabled while indexing total of 85G data
> during 7 hours. At the end of it, I want some 30 or so big segments. But i
> am getting 3000 segments.
> I deleted the index and enabled infostream logging ; i have attached the
> log when first segment is flushed. Here are few questions:
>
> 1. When a segment if flushed , then is it permanent or can more documents
> be written to it (besides the merge scenario)?
> 2. It seems that 330+ threads are writing in parallel. Will each one of
> them become one segment when written to the disk? In which case, i should
> probably decrease concurrency?
> 3. One possibility is to delay flushing, the flush is getting triggered at
> 1MB, probably coming from 1 ;
> however, the segment which is flushed is only 115MB. Is this limit for the
> combined size of all in-memory segments? In which case, is it ok to
> increase it further to use more of my heap (48GB).
> 4. How can I decrease the concurrency, maybe the solution is to use fewer
> in memory segments?
>
> In previous run, there were 110k files in the index folder after I
> stopping indexing. Before doing commit, I noticed that the file count
> continued to decrease every few minutes, until it reduced to 27k or so. (I
> committed after it stabilized)
>
>
> My Indexconfig is this:
>
>   
> 1000
> 1
> 10
> false
> 1
>class="org.apache.solr.index.TieredMergePolicyFactory">
>   5
>  3000
>   10
>   16
>   
>   20
>   1
> 
>   class="org.apache.lucene.index.ConcurrentMergeScheduler">
>10
>10
>  
> ${solr.lock.type:native}
> true
> 
>   1
>   0
> 
> true
> false
>   
>
>
> Thanks
> Nawab
>
>
>


Re: solr hangs

2017-08-06 Thread hawk....@139.com
Hi Eric,

I am using the restful api directly. In our application, system issues the http 
request directly to Solr.

 
   ${solr.autoCommit.maxTime:15000}
   1 
   true 



Thanks
Hawk



> On 6 Aug 2017, at 11:10 AM, Erick Erickson  wrote:
> 
> How are you updating 50K docs? SolrJ? If so are you using
> CloudSolrClient? What are your commit settings? Details matter.
> 
> Best,
> Erick
> 
> On Sat, Aug 5, 2017 at 6:19 PM, hawk  wrote:
>> Hi All,
>> 
>> I have encountered one problem of Solr. In our environment, we setup 2 Solr 
>> nodes, every hour we will update request to Solr to update the documents, 
>> the total documents are around 50k. From time to time, the Solr hangs and 
>> the client encounters the timeout issue.
>> 
>> below is the exception in Solr log.
>> 
>> 2017-08-06 07:28:03.682 ERROR (qtp401424608-31250) [c:taoke s:shard2 
>> r:core_node2 x:taoke_shard2_replica2] o.a.s.s.HttpSolrCall 
>> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle 
>> timeout expired: 5/5 ms
>>at 
>> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:219)
>>at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:220)
>>at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:583)
>>at 
>> org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
>> 
>> 
>> 
>> Thanks
> 



Re: solr hangs

2017-08-06 Thread Erick Erickson
You have several possibilities here:
1> you're hitting a massive GC pause that's timing out. You can turn
on GC logging and analyze if that's the case.
2> your updates are getting backed up. At some point it's possible
that the index writer blocks until merges are done IIUC.

Does this ever happen if you throttle your updates? Does it go away if
you batch your documents in batches of, say, 1,000

Best,
Erick

On Sun, Aug 6, 2017 at 5:19 PM, hawk@139.com  wrote:
> Hi Eric,
>
> I am using the restful api directly. In our application, system issues the 
> http request directly to Solr.
>
> 
>${solr.autoCommit.maxTime:15000}
>1
>true
> 
>
>
> Thanks
> Hawk
>
>
>
>> On 6 Aug 2017, at 11:10 AM, Erick Erickson  wrote:
>>
>> How are you updating 50K docs? SolrJ? If so are you using
>> CloudSolrClient? What are your commit settings? Details matter.
>>
>> Best,
>> Erick
>>
>> On Sat, Aug 5, 2017 at 6:19 PM, hawk  wrote:
>>> Hi All,
>>>
>>> I have encountered one problem of Solr. In our environment, we setup 2 Solr 
>>> nodes, every hour we will update request to Solr to update the documents, 
>>> the total documents are around 50k. From time to time, the Solr hangs and 
>>> the client encounters the timeout issue.
>>>
>>> below is the exception in Solr log.
>>>
>>> 2017-08-06 07:28:03.682 ERROR (qtp401424608-31250) [c:taoke s:shard2 
>>> r:core_node2 x:taoke_shard2_replica2] o.a.s.s.HttpSolrCall 
>>> null:java.io.IOException: java.util.concurrent.TimeoutException: Idle 
>>> timeout expired: 5/5 ms
>>>at 
>>> org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:219)
>>>at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:220)
>>>at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:583)
>>>at 
>>> org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
>>>
>>>
>>>
>>> Thanks
>>
>


Re: solr hangs

2017-08-06 Thread hawk....@139.com
We found the problem is caused by the delete command. The request is used to 
delete document by id.

 url --> http://10.91.1.120:8900/solr/taoke/update?&commit=true&wt=json
body --> {"delete":["20ec36ade0ca4da3bcd78269e2300f6f"]}

When we send over 3000 requests, the Solr starts to give OOM exceptions.  

Now have changed the logic to put all ids in the array, it seems Solr works 
without any exception.

Not sure whether Solr internally optimized the delete.

Thanks
Hawk

> On 7 Aug 2017, at 9:20 AM, Erick Erickson  wrote:
> 
> You have several possibilities here:
> 1> you're hitting a massive GC pause that's timing out. You can turn
> on GC logging and analyze if that's the case.
> 2> your updates are getting backed up. At some point it's possible
> that the index writer blocks until merges are done IIUC.
> 
> Does this ever happen if you throttle your updates? Does it go away if
> you batch your documents in batches of, say, 1,000
> 
> Best,
> Erick
> 
> On Sun, Aug 6, 2017 at 5:19 PM, hawk@139.com  wrote:
>> Hi Eric,
>> 
>> I am using the restful api directly. In our application, system issues the 
>> http request directly to Solr.
>> 
>> 
>>   ${solr.autoCommit.maxTime:15000}
>>   1
>>   true
>> 
>> 
>> 
>> Thanks
>> Hawk
>> 
>> 
>> 
>>> On 6 Aug 2017, at 11:10 AM, Erick Erickson  wrote:
>>> 
>>> How are you updating 50K docs? SolrJ? If so are you using
>>> CloudSolrClient? What are your commit settings? Details matter.
>>> 
>>> Best,
>>> Erick
>>> 
>>> On Sat, Aug 5, 2017 at 6:19 PM, hawk  wrote:
 Hi All,
 
 I have encountered one problem of Solr. In our environment, we setup 2 
 Solr nodes, every hour we will update request to Solr to update the 
 documents, the total documents are around 50k. From time to time, the Solr 
 hangs and the client encounters the timeout issue.
 
 below is the exception in Solr log.
 
 2017-08-06 07:28:03.682 ERROR (qtp401424608-31250) [c:taoke s:shard2 
 r:core_node2 x:taoke_shard2_replica2] o.a.s.s.HttpSolrCall 
 null:java.io.IOException: java.util.concurrent.TimeoutException: Idle 
 timeout expired: 5/5 ms
   at 
 org.eclipse.jetty.util.SharedBlockingCallback$Blocker.block(SharedBlockingCallback.java:219)
   at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:220)
   at org.eclipse.jetty.server.HttpOutput.write(HttpOutput.java:583)
   at 
 org.apache.commons.io.output.ProxyOutputStream.write(ProxyOutputStream.java:55)
 
 
 
 Thanks
>>> 
>> 
> 




Re: solr hangs

2017-08-06 Thread Shawn Heisey
On 8/6/2017 10:29 PM, hawk@139.com wrote:
> We found the problem is caused by the delete command. The request is used to 
> delete document by id.
>
>  url --> http://10.91.1.120:8900/solr/taoke/update?&commit=true&wt=json
> body --> {"delete":["20ec36ade0ca4da3bcd78269e2300f6f"]}
>
> When we send over 3000 requests, the Solr starts to give OOM exceptions.  

Do you have the full exception with stacktrace for those OOM?  I'm
curious exactly what resource ran out, whether it was heap or something
else.

When java programs throw OOME, the program's execution is usually
completely unpredictable from that point on, because something that the
program tried to do did not happen.  Whatever the program tries to do
next probably depends on the action that failed.  This unpredictability
is why Solr on non-windows systems will self-terminate when OOME is
encountered.  It's the only safe action to take.  There is an issue to
bring the same self-termination on OOME to Solr running on Windows.

If the OOME was due to heap space, there are exactly two ways to deal
with that.  You can find info about it here:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



Re: solr hangs

2017-08-06 Thread hawk....@139.com
Below is the OOM exception.

2017-08-07 12:45:48.446 WARN  (qtp33524623-4275) [c:taoke s:shard2 r:core_node4 
x:taoke_shard2_replica1] o.e.j.u.t.QueuedThreadPool
java.lang.OutOfMemoryError: unable to create new native thread
at java.lang.Thread.start0(Native Method)
at java.lang.Thread.start(Thread.java:714)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.startThreads(QueuedThreadPool.java:475)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.access$200(QueuedThreadPool.java:48)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:579)
at java.lang.Thread.run(Thread.java:745)


The timeout exception is thrown as well.

2017-08-07 13:20:20.336 ERROR (qtp33524623-2666) [c:taoke s:shard2 r:core_node4 
x:taoke_shard2_replica1] o.a.s.h.RequestHandlerBase 
org.apache.solr.update.processor.DistributedUpdateProcessor$DistributedUpdatesAsyncException:
 Async exception during distributed update: Read timed out
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doFinish(DistributedUpdateProcessor.java:973)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.finish(DistributedUpdateProcessor.java:1912)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:182)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:78)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2440)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:298)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at 
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:534)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

Thanks
Hawk

> On 7 Aug 2017, at 2:21 PM, Shawn Heisey  wrote:
> 
> On 8/6/2017 10:29 PM, hawk@139.com wrote:
>> We found the problem is caused by the delete command. The request is used to 
>> delete document by id.
>> 
>> url --> http://10.91.1.120:8900/solr/taoke/update?&commit=true&wt=json
>>body --> {"delete":["20ec36ade0ca4da3bcd78269e2300f6f"]}
>> 
>> When we send over 3000 requests, the Solr starts to give OOM exceptions.  
> 
> Do you have the full exception with stacktrace for those OOM?  I'm
> curious exactly what resource ran out, whether it