Re: [PR] Add support for index sorting with document blocks [lucene]

via GitHub Thu, 14 Dec 2023 04:24:58 -0800


mikemccand commented on PR #12829:
URL: https://github.com/apache/lucene/pull/12829#issuecomment-1855755782


   One small observation here: one can use the `add/updateDocuments` API today 
with no intention of using those as doc blocks at search time, purely as an 
optimization over calling separate `addDocument` every time.
   
   I'm not sure what performance difference this makes, but it should only help 
speed up indexing throughput.
   
   E.g. [luceneserver](https://github.com/mikemccand/luceneserver) (the search 
engine behind [jirasearch](https://jirasearch.mikemccandless.com/search.py) and 
soon now [githubsearch](https://githubsearch.mikemccandless.com/search.py)) 
does this in its bulk indexing API, I think.
   
   This saves the IW/DWPT overhead, but comes with some risk if your blocks are 
too big since IW cannot flush until the block is done.
   
   With this change, I think such usage would still be fine if you have no 
index sort?  But if you have an index sort, then IW will always do the parent 
block validation / tracking?
   
   I think net/net this is fine, I just wanted to call it out, for crazy people 
trying to eek out every last bit of indexing throughput :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add support for index sorting with document blocks [lucene]

Reply via email to