Hi Angel, a while ago I had issues with VMWare VM - somehow snapshots were created regularly which dragged down the machine. So I think is is a good idea to baseline the performance on physical box before moving to VMs, production boxes or whatever is thrown at you
Cheers, Siegfried Goeschl > On 22 May 2015, at 11:15, Angel Todorov <attodo...@gmail.com> wrote: > > Thanks for the feedback guys. What i am going to try now is deploying my > SOLR server on a physical machine with more RAM, and checking out this > scenario there. I have some suspicion it could well be a hypervisor issue, > but let's see. Just for the record - I've noticed those issues on a Win > 2008R2 VM with 8 GB of RAM and 2 cores. > > I don't see anything strange in the logs. One thing that I need to change, > though, is the verbosity of logs in the console - looks like by default > SOLR outputs text in the log for every single document that's indexed, as > well as for every query that's executed. > > Angel > > > On Fri, May 22, 2015 at 1:03 AM, Erick Erickson <erickerick...@gmail.com> > wrote: > >> bq: Which is logical as index growth and time needed to put something >> to it is log(n) >> >> Not really. Solr indexes to segments, each segment is a fully >> consistent "mini index". >> When a segment gets flushed to disk, a new one is started. Of course >> there'll be a >> _little bit_ of added overyead, but it shouldn't be all that noticeable. >> >> Furthermore, they're "append only". In the past, when I've indexed the >> Wiki example, >> my indexing speed actually goes faster. >> >> So on the surface this sounds very strange to me. Are you seeing >> anything at all in the >> Solr logs that's supsicious? >> >> Best, >> Erick >> >> On Thu, May 21, 2015 at 12:22 PM, Sergey Shvets <ser...@bintime.com> >> wrote: >>> Hi Angel >>> >>> We also noticed that kind of performance degrade in our workloads. >>> >>> Which is logical as index growth and time needed to put something to it >> is >>> log(n) >>> >>> >>> >>> четверг, 21 мая 2015 г. пользователь Angel Todorov написал: >>> >>>> hi Shawn, >>>> >>>> Thanks a bunch for your feedback. I've played with the heap size, but I >>>> don't see any improvement. Even if i index, say , a million docs, and >> the >>>> throughput is about 300 docs per sec, and then I shut down solr >> completely >>>> - after I start indexing again, the throughput is dropping below 300. >>>> >>>> I should probably experiment with sharding those documents to multiple >> SOLR >>>> cores - that should help, I guess. I am talking about something like >> this: >>>> >>>> >>>> >> https://cwiki.apache.org/confluence/display/solr/Shards+and+Indexing+Data+in+SolrCloud >>>> >>>> Thanks, >>>> Angel >>>> >>>> >>>> On Thu, May 21, 2015 at 11:36 AM, Shawn Heisey <apa...@elyograg.org >>>> <javascript:;>> wrote: >>>> >>>>> On 5/21/2015 2:07 AM, Angel Todorov wrote: >>>>>> I'm crawling a file system folder and indexing 10 million docs, and >> I >>>> am >>>>>> adding them in batches of 5000, committing every 50 000 docs. The >>>>> problem I >>>>>> am facing is that after each commit, the documents per sec that are >>>>> indexed >>>>>> gets less and less. >>>>>> >>>>>> If I do not commit at all, I can index those docs very quickly, and >>>> then >>>>> I >>>>>> commit once at the end, but once i start indexing docs _after_ that >>>> (for >>>>>> example new files get added to the folder), indexing is also slowing >>>>> down a >>>>>> lot. >>>>>> >>>>>> Is it normal that the SOLR indexing speed depends on the number of >>>>>> documents that are _already_ indexed? I think it shouldn't matter >> if i >>>>>> start from scratch or I index a document in a core that already has >> a >>>>>> couple of million docs. Looks like SOLR is either doing something >> in a >>>>>> linear fashion, or there is some magic config parameter that I am >> not >>>>> aware >>>>>> of. >>>>>> >>>>>> I've read all perf docs, and I've tried changing mergeFactor, >>>>>> autowarmCounts, and the buffer sizes - to no avail. >>>>>> >>>>>> I am using SOLR 5.1 >>>>> >>>>> Have you changed the heap size? If you use the bin/solr script to >> start >>>>> it and don't change the heap size with the -m option or another >> method, >>>>> Solr 5.1 runs with a default size of 512MB, which is *very* small. >>>>> >>>>> I bet you are running into problems with frequent and then ultimately >>>>> constant garbage collection, as Java attempts to free up enough memory >>>>> to allow the program to continue running. If that is what is >> happening, >>>>> then eventually you will see an OutOfMemoryError exception. The >>>>> solution is to increase the heap size. I would probably start with at >>>>> least 4G for 10 million docs. >>>>> >>>>> Thanks, >>>>> Shawn >>>>> >>>>> >>>> >>