We are using Solr Cloud 4.10.3-cdh5.4.5 that is part of CLoudera CDH 5.4.5. Our collection (one shard with three replicas) became really big and we decided to delete some old records to improve performance (tests in staging environment have shown that after reaching 500 million records the index becomes very slow and Solr is less responsive). After deleting about 100 million records (out of 260 mil.), they were still shown as "Deleted Docs' in Solr Admin Statistics page. This page was showing 'Optimized: No (red)' and 'Current: No (red)'. Theoretically, having 100 million deleted (but not removed) records would be a performance issue and also, people tend to have clean picture.
Information found in Solr forums was that the only way to removed deleted records is to optimize the index. We knew that optimization is not a good idea and it was discussed in forums that it should be completely removed from API and Solr Admin, but discussing is one thing and doing it is another. To make the story short, we tried to optimize through Solr API to remove deleted records: URL=http://<host>:8983/solr/<Collection>/update curl "$URL?optimize=true&maxSegments=18&waitFlush=true" and all three replicas of the collection were merged to 18 segments and Solr Admin was showing "Optimized: Yes (green)", but the deleted records were not removed (which is an inconsistency with Solr Admin or a bug in the API). Finally, because people usually trust features fuond in UI (even if official documentation is not found, see https://cwiki.apache.org/confluence/display/solr/Using+the+Solr+Administration+User+Interface), the "Optimize Now" button in Solr Admin was pressed and it removed all deleted records and made the collection look very good (in UI). Here is the problem: 1. The index was reduced to one large (60 GB) segment (some people's opinion is that it is good, but I doubt). 2. Our use case includes batch updates and then a soft commit (after which the user sees results). Commit operation that was taking about 1.5 minutes now takes from 12 to 25 minutes. Overall performance of our application is severely degraded. I am not going to talk about how confusing Solr optimization is, but I am asking if anyone knows *what caused slowness of the commit operation after optimization*. If the issue is having a large segment, then how is it possible to split this segment into smaller ones (without sharding)? Thanks, Victor -- View this message in context: http://lucene.472066.n3.nabble.com/Very-Slow-Commits-After-Solr-Index-Optimization-tp4297022.html Sent from the Solr - User mailing list archive at Nabble.com.