On 9/20/2016 4:13 PM, vsolakhian wrote: > We knew that optimization is not a good idea and it was discussed in forums > that it should be completely removed from API and Solr Admin, but discussing > is one thing and doing it is another. To make the story short, we tried to > optimize through Solr API to remove deleted records: > > URL=http://<host>:8983/solr/<Collection>/update > curl "$URL?optimize=true&maxSegments=18&waitFlush=true" > > and all three replicas of the collection were merged to 18 segments and Solr > Admin was showing "Optimized: Yes (green)", but the deleted records were not > removed (which is an inconsistency with Solr Admin or a bug in the API).
Very likely the deleted documents were contained in segments that were NOT merged, and made up the total final segment count of 18. An optimize will only guarantee all deleted documents are gone if it optimizes to ONE segment, which is what the "Optimize" button in the admin UI does. > Finally, because people usually trust features fuond in UI (even if official > documentation is not found, see > https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface), > the "Optimize Now" button in Solr Admin was pressed and it removed all > deleted records and made the collection look very good (in UI). Here is the > problem: > > 1. The index was reduced to one large (60 GB) segment (some people's opinion > is that it is good, but I doubt). > 2. Our use case includes batch updates and then a soft commit (after which > the user sees results). Commit operation that was taking about 1.5 minutes > now takes from 12 to 25 minutes. > > Overall performance of our application is severely degraded. > > I am not going to talk about how confusing Solr optimization is, but I am > asking if anyone knows *what caused slowness of the commit operation after > optimization*. If the issue is having a large segment, then how is it > possible to split this segment into smaller ones (without sharding)? Best guess is that actual disk I/O was required after the optimization, because the important parts of the index were no longer in the OS disk cache. For good performance, Solr requires that data be cached and immediately available -- disks are slow. Performance would likely increase as additional queries were made until it returned to normal. If your indexes are in a filesystem local to the Solr server, then you probably need more memory in the Solr server (not allocated to the Java heap). If they are in a remote filesystem (HDFS, NFS, etc) then the remote filesystem device/server might need more memory and/or configuration adjustments. The speed of the network might be a factor with remote filesystems. Side note: A commit that takes 1.5 minutes is ALREADY very slow. Commits should normally take seconds. Well-tuned NRT environments will probably have commit times well below one second. Here's some specific info on slow commits: http://wiki.apache.org/solr/SolrPerformanceProblems#Slow_commits Thanks, Shawn