where is the link of this patch?
2010/7/24 Yonik Seeley <yo...@lucidimagination.com>: > On Fri, Jul 23, 2010 at 2:23 PM, MitchK <mitc...@web.de> wrote: >> why do we do not send the output of TermsComponent of every node in the >> cluster to a Hadoop instance? >> Since TermsComponent does the map-part of the map-reduce concept, Hadoop >> only needs to reduce the stuff. Maybe we even do not need Hadoop for this. >> After reducing, every node in the cluster gets the current values to compute >> the idf. >> We can store this information in a HashMap-based SolrCache (or something >> like that) to provide constant-time access. To keep the values up to date, >> we can repeat that after every x minutes. > > There's already a patch in JIRA that does distributed IDF. > Hadoop wouldn't be the right tool for that anyway... it's for batch > oriented systems, not low-latency queries. > >> If we got that, it does not care whereas we use doc_X from shard_A or >> shard_B, since they will all have got the same scores. > > That only works if the docs are exactly the same - they may not be. > > -Yonik > http://www.lucidimagination.com >