Hi Erick, posting files to Solr via curl => Rather than posting files via curl. Which is better SolrJ or post.jar... I don't use both things. I wrote a python script for indexing and using urllib and urllib2 for indexing data via http.. I don't have any option to use SolrJ Right now. How can I do same thing via post.jar in python? Any help Please.
indexing with 100 threads is going to eat up a lot of CPU cycles => So, How much minimum concurrent threads should I run? And I also need concurrent searching. So, How much? And Thanks for solr 5.2, I will go through that. Thanking for reply. Please help me.. On Fri, Aug 7, 2015 at 11:51 PM Erick Erickson <erickerick...@gmail.com> wrote: > bq: How much limitations does Solr has related to indexing and searching > simultaneously? It means that how many simultaneously calls, I made for > searching and indexing once? > > None a-priori. It all depends on the hardware you're throwing at it. > Obviously > indexing with 100 threads is going to eat up a lot of CPU cycles that > can't then > be devoted to satisfying queries. You need to strike a balance. Do > seriously > consider using some other method than posting files to Solr via curl > or the like, > that's rarely a robust solution for production. > > As for adding the commit=true, this shouldn't be affecting the index size, > I > suspect you were mislead by something else happening. > > Really, remove it or you'll beat up your system hugely. As for the soft > commit > interval, that's totally irrelevant when you're committing every > document. But do > lengthen it as much as you can. Most of the time when people say "real > time", > it turns out that 10 seconds is OK. Or 60 seconds is OK. You have to check > what the _real_ requirement is, it's often not what's stated. > > bq: I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding > indexing and searching data. > > Did you read the link I provided? With replicas, 5.2 will index almost > twice as > fast. That means (roughly) half the work on the followers is being done, > freeing up cycles for performing queries. > > Best, > Erick > > > On Fri, Aug 7, 2015 at 2:06 PM, Nitin Solanki <nitinml...@gmail.com> > wrote: > > Hi Erick, > > You said that soft commit should be more than 3000 ms. > > Actually, I need Real time searching and that's why I need soft commit > fast. > > > > commit=true => I made commit=true because , It reduces by indexed data > size > > from 1.5GB to 500MB on* each shard*. When I did commit=false then, my > > indexed data size was 1.5GB. After changing it to commit=true, then size > > reduced to 500MB only. I am not getting how is it? > > > > I am using Solr 5.0 version. Is 5.0 almost similar to 5.2 regarding > > indexing and searching data. > > > > How much limitations does Solr has related to indexing and searching > > simultaneously? It means that how many simultaneously calls, I made for > > searching and indexing once? > > > > > > On Fri, Aug 7, 2015 at 9:18 PM Erick Erickson <erickerick...@gmail.com> > > wrote: > > > >> Your soft commit time of 3 seconds is quite aggressive, > >> I'd lengthen it to as long as possible. > >> > >> Ugh, looked at your query more closely. Adding commit=true to every > update > >> request is horrible performance wise. Let your autocommit process > >> handle the commits is the first thing I'd do. Second, I'd try going to > >> SolrJ > >> and batching up documents (I usually start with 1,000) or using the > >> post.jar > >> tool rather than sending them via a raw URL. > >> > >> I agree with Upayavira, 100 concurrent threads is a _lot_. Also, what > >> version of Solr? > >> There was a 2x speedup in Solr 5.2, see: > >> > http://lucidworks.com/blog/indexing-performance-solr-5-2-now-twice-fast/ > >> > >> One symptom was that the followers were doing waaaaay more work than the > >> leader > >> (BTW, using master/slave when talking SolrCloud is a bit confusing...) > >> which will > >> affect query response rates. > >> > >> Basically, if query response is paramount, you really need to throttle > >> your indexing, > >> there's just a whole lot of work going on here.. > >> > >> Best, > >> Erick > >> > >> On Fri, Aug 7, 2015 at 11:23 AM, Upayavira <u...@odoko.co.uk> wrote: > >> > How many CPUs do you have? 100 concurrent indexing calls seems like > >> > rather a lot. You're gonna end up doing a lot of context switching, > >> > hence degraded performance. Dunno what others would say, but I'd aim > for > >> > approx one indexing thread per CPU. > >> > > >> > Upayavira > >> > > >> > On Fri, Aug 7, 2015, at 02:58 PM, Nitin Solanki wrote: > >> >> Hello Everyone, > >> >> I have indexed 16 million documents in Solr > >> >> Cloud. Created 4 nodes and 8 shards with single replica. > >> >> I am trying to make concurrent indexing and searching on those > indexed > >> >> documents. Trying to make 100 concurrent indexing calls along with > 100 > >> >> concurrent searching calls. > >> >> It *degrades searching and indexing* performance both. > >> >> > >> >> Configuration : > >> >> > >> >> "commitWithin":{"softCommit":true}, > >> >> "autoCommit":{ > >> >> "maxDocs":-1, > >> >> "maxTime":60000, > >> >> "openSearcher":false}, > >> >> "autoSoftCommit":{ > >> >> "maxDocs":-1, > >> >> "maxTime":3000}}, > >> >> > >> >> "indexConfig":{ > >> >> "maxBufferedDocs":-1, > >> >> "maxMergeDocs":-1, > >> >> "maxIndexingThreads":8, > >> >> "mergeFactor":-1, > >> >> "ramBufferSizeMB":100.0, > >> >> "writeLockTimeout":-1, > >> >> "lockType":"native"}}} > >> >> > >> >> AND <maxWarmingSearchers>2</maxWarmingSearchers> > >> >> > >> >> I don't have know that how master and slave works. Normally, I > created 8 > >> >> shards and indexed documents using : > >> >> > >> >> > >> >> > >> >> > >> >> *http://localhost:8983/solr/test_commit_fast/update/json?commit=true > >> >> <http://localhost:8983/solr/test_commit_fast/update/json?commit=true > > > >> -H > >> >> 'Content-type:application/json' -d ' [ JSON_Document ]'*And Searching > >> >> using > >> >> *: http://localhost:8983/solr/test_commit_fast/select > >> >> <http://localhost:8983/solr/test_commit_fast/select>*?q=< > field_name: > >> >> search_string> > >> >> > >> >> Please any help on it. To make searching and indexing fast > concurrently. > >> >> Thanks. > >> >> > >> >> > >> >> Regards, > >> >> Nitin > >> >