Re: Concurrent Indexing and Searching in Solr.

Erick Erickson Fri, 07 Aug 2015 15:38:07 -0700

bq: So, How much minimum concurrent threads should I run?

I really can't answer that in the abstract, you'll simply have to
test.


I'd prefer SolrJ to post.jar. If you're not going to SolrJ, I'd imagine that
moving from Python to post.jar isn't all that useful.

But before you do anything, see what really happens when you remove th
commit=true. That's likely way more important than the rest.

Best,
Erick

On Fri, Aug 7, 2015 at 3:15 PM, Nitin Solanki <nitinml...@gmail.com> wrote:
> Hi Erick,
>                 posting files to Solr via curl =>
> Rather than posting files via curl. Which is better SolrJ or post.jar... I
> don't use both things. I wrote a python script for indexing and using
> urllib and urllib2 for indexing data via http.. I don't have any  option to
> use SolrJ Right now. How can I do same thing via post.jar in python? Any
> help Please.
>
> indexing with 100 threads is going to eat up a lot of CPU cycles
> => So, How much minimum concurrent threads should I run? And I also need
> concurrent searching. So, How much?
>
> And Thanks for solr 5.2, I will go through that. Thanking for reply. Please
> help me..
>
> On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson <erickerick...@gmail.com>
> wrote:
>
>> bq: How much limitations does Solr has related to indexing and searching
>> simultaneously? It means that how many simultaneously calls, I made for
>> searching and indexing once?
>>
>> None a-priori. It all depends on the hardware you're throwing at it.
>> Obviously
>> indexing with 100 threads is going to eat up a lot of CPU cycles that
>> can't then
>> be devoted to satisfying queries. You need to strike a balance. Do
>> seriously
>> consider using some other method than posting files to Solr via curl
>> or the like,
>> that's rarely a robust solution for production.
>>
>> As for adding the commit=true, this shouldn't be affecting the index size,
>> I
>> suspect you were mislead by something else happening.
>>
>> Really, remove it or you'll beat up your system hugely. As for the soft
>> commit
>> interval, that's totally irrelevant when you're committing every
>> document. But do
>> lengthen it as much as you can. Most of the time when people say "real
>> time",
>> it turns out that 10 seconds is OK. Or 60 seconds is OK.  You have to check
>> what the _real_ requirement is, it's often not what's stated.
>>
>> bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
>> indexing and searching data.
>>
>> Did you read the link I provided? With replicas, 5.2 will index almost
>> twice as
>> fast. That means (roughly) half the work on the followers is being done,
>> freeing up cycles for performing queries.
>>
>> Best,
>> Erick
>>
>>
>> On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki <nitinml...@gmail.com>
>> wrote:
>> > Hi Erick,
>> >               You said that soft commit should be more than 3000 ms.
>> > Actually, I need Real time searching and that's why I need soft commit
>> fast.
>> >
>> > commit=true => I made commit=true because , It reduces by indexed data
>> size
>> > from 1.5GB to 500MB on* each shard*. When I did commit=false then, my
>> > indexed data size was 1.5GB. After changing it to commit=true, then size
>> > reduced to 500MB only. I am not getting how is it?
>> >
>> > I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding
>> > indexing and searching data.
>> >
>> > How much limitations does Solr has related to indexing and searching
>> > simultaneously? It means that how many simultaneously calls, I made for
>> > searching and indexing once?
>> >
>> >
>> > On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson <erickerick...@gmail.com>
>> > wrote:
>> >
>> >> Your soft commit time of 3 seconds is quite aggressive,
>> >> I'd lengthen it to as long as possible.
>> >>
>> >> Ugh, looked at your query more closely. Adding commit=true to every
>> update
>> >> request is horrible performance wise. Let your autocommit process
>> >> handle the commits is the first thing I'd do. Second, I'd try going to
>> >> SolrJ
>> >> and batching up documents (I usually start with 1,000) or using the
>> >> post.jar
>> >> tool rather than sending them via a raw URL.
>> >>
>> >> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what
>> >> version of Solr?
>> >> There was a 2x speedup in Solr 5.2, see:
>> >>
>> http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/
>> >>
>> >> One symptom was that the followers were doing waaaaay more work than the
>> >> leader
>> >> (BTW, using master/slave when talking SolrCloud is a bit confusing...)
>> >> which will
>> >> affect query response rates.
>> >>
>> >> Basically, if query response is paramount, you really need to throttle
>> >> your indexing,
>> >> there's just a whole lot of work going on here..
>> >>
>> >> Best,
>> >> Erick
>> >>
>> >> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira <u...@odoko.co.uk> wrote:
>> >> > How many CPUs do you have? 100 concurrent indexing calls seems like
>> >> > rather a lot. You're gonna end up doing a lot of context switching,
>> >> > hence degraded performance. Dunno what others would say, but I'd aim
>> for
>> >> > approx one indexing thread per CPU.
>> >> >
>> >> > Upayavira
>> >> >
>> >> > On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote:
>> >> >> Hello Everyone,
>> >> >>                           I have indexed 16 million documents in Solr
>> >> >> Cloud. Created 4 nodes and 8 shards with single replica.
>> >> >> I am trying to make concurrent indexing and searching on those
>> indexed
>> >> >> documents. Trying to make 100 concurrent indexing calls along with
>> 100
>> >> >> concurrent searching calls.
>> >> >> It *degrades searching and indexing* performance both.
>> >> >>
>> >> >> Configuration :
>> >> >>
>> >> >>       "commitWithin":{"softCommit":true},
>> >> >>       "autoCommit":{
>> >> >>         "maxDocs":-1,
>> >> >>         "maxTime":60000,
>> >> >>         "openSearcher":false},
>> >> >>       "autoSoftCommit":{
>> >> >>         "maxDocs":-1,
>> >> >>         "maxTime":3000}},
>> >> >>
>> >> >>       "indexConfig":{
>> >> >>       "maxBufferedDocs":-1,
>> >> >>       "maxMergeDocs":-1,
>> >> >>       "maxIndexingThreads":8,
>> >> >>       "mergeFactor":-1,
>> >> >>       "ramBufferSizeMB":100.0,
>> >> >>       "writeLockTimeout":-1,
>> >> >>       "lockType":"native"}}}
>> >> >>
>> >> >> AND  <maxWarmingSearchers>2</maxWarmingSearchers>
>> >> >>
>> >> >> I don't have know that how master and slave works. Normally, I
>> created 8
>> >> >> shards and indexed documents using :
>> >> >>
>> >> >>
>> >> >>
>> >> >>
>> >> >> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true
>> >> >> <http://localhost:8983/solr/test_commit_fast/update/json?commit=true
>> >
>> >> -H
>> >> >> 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching
>> >> >> using
>> >> >> *: http://localhost:8983/solr/test_commit_fast/select
>> >> >> <http://localhost:8983/solr/test_commit_fast/select>*?q=<
>> field_name:
>> >> >> search_string>
>> >> >>
>> >> >> Please any help on it. To make searching and indexing fast
>> concurrently.
>> >> >> Thanks.
>> >> >>
>> >> >>
>> >> >> Regards,
>> >> >> Nitin
>> >>
>>

Re: Concurrent Indexing and Searching in Solr.

Reply via email to