Re: Algorithm for retrieving documents

2014-02-13 Thread Harshvardhan Ojha
Hi Mikhail, Don't you think org.apache.lucene.codecs.bloom.FuzzySet.java, contains(BytesRef value) methods returns probability of having a field, and it is a place where we are using hashing ? Are there any other place in source which when given with document id, could determine by calculating it

Re: Algorithm for retrieving documents

2014-02-13 Thread Mikhail Khludnev
Harshvardhan, There almost nothing like this in bare Lucene, the closest analogy is http://wiki.apache.org/solr/SolrCaching#documentCache On Thu, Feb 13, 2014 at 1:46 PM, Harshvardhan Ojha < ojha.harshvard...@gmail.com> wrote: > Hi Mikhail, > > Thanks for sharing this nice link. I am pretty com

Re: Algorithm for retrieving documents

2014-02-13 Thread Harshvardhan Ojha
Hi Mikhail, Thanks for sharing this nice link. I am pretty comfortable with searching of lucene and this is very beginner level question on storage, mainly Hashing part(storage and retrieval). Which DS(I don't know currently), is being used to keep and again calculate that hash to get document bac

Re: Algorithm for retrieving documents

2014-02-13 Thread Mikhail Khludnev
Hello I think you can start from http://www.lucenerevolution.org/2013/What-is-in-a-lucene-index On Thu, Feb 13, 2014 at 12:56 PM, Harshvardhan Ojha < ojha.harshvard...@gmail.com> wrote: > Hi All, > > I have a question regarding retrieval of documents by lucene. > I know lucene uses many files

Algorithm for retrieving documents

2014-02-13 Thread Harshvardhan Ojha
Hi All, I have a question regarding retrieval of documents by lucene. I know lucene uses many files on disk to keep documents, each comprising fields in it, and uses many IR algorithms, and inverted index to match documents. My question is : 1. How lucene stores these documents inside file system