mikemccand commented on PR #12829: URL: https://github.com/apache/lucene/pull/12829#issuecomment-1855755782
One small observation here: one can use the `add/updateDocuments` API today with no intention of using those as doc blocks at search time, purely as an optimization over calling separate `addDocument` every time. I'm not sure what performance difference this makes, but it should only help speed up indexing throughput. E.g. [luceneserver](https://github.com/mikemccand/luceneserver) (the search engine behind [jirasearch](https://jirasearch.mikemccandless.com/search.py) and soon now [githubsearch](https://githubsearch.mikemccandless.com/search.py)) does this in its bulk indexing API, I think. This saves the IW/DWPT overhead, but comes with some risk if your blocks are too big since IW cannot flush until the block is done. With this change, I think such usage would still be fine if you have no index sort? But if you have an index sort, then IW will always do the parent block validation / tracking? I think net/net this is fine, I just wanted to call it out, for crazy people trying to eek out every last bit of indexing throughput :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org