Re: Algorithm for retrieving documents

Harshvardhan Ojha Thu, 13 Feb 2014 04:17:23 -0800

Hi Mikhail,

Don't you
think org.apache.lucene.codecs.bloom.FuzzySet.java, contains(BytesRef
value) methods returns probability of having a field, and it is a place
where we are using hashing ?


Are there any other place in source which when given with document id,
could determine by calculating its hash and say if document with this id is
present or not in a single look up O(1) ?

Regards
Harshvardhan Ojha


On Thu, Feb 13, 2014 at 4:07 PM, Mikhail Khludnev <
[email protected]> wrote:

> Harshvardhan,
>
> There almost nothing like this in bare Lucene, the closest analogy is
> http://wiki.apache.org/solr/SolrCaching#documentCache
>
>
> On Thu, Feb 13, 2014 at 1:46 PM, Harshvardhan Ojha <
> [email protected]> wrote:
>
> > Hi Mikhail,
> >
> > Thanks for sharing this nice link. I am pretty comfortable with searching
> > of lucene and this is very beginner level question on storage, mainly
> > Hashing part(storage and retrieval).
> > Which DS(I don't know currently), is being used to keep and again
> calculate
> > that hash to get document back?
> >
> > Lets me put it very clearly,
> > If I know document to search id:1, and there is no other query, after
> > knowing this much about doc, there should ideally be no searching at
> > all(although it was indexed), its only fast retrieval.
> >
> > Let me know, If you want me to clarify question.
> >
> > Regards
> > Harshvardhan Ojha
> >
> >
> > On Thu, Feb 13, 2014 at 2:53 PM, Mikhail Khludnev <
> > [email protected]> wrote:
> >
> > > Hello
> > >
> > > I think you can start from
> > > http://www.lucenerevolution.org/2013/What-is-in-a-lucene-index
> > >
> > >
> > >
> > > On Thu, Feb 13, 2014 at 12:56 PM, Harshvardhan Ojha <
> > > [email protected]> wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have a question regarding retrieval of documents by lucene.
> > > > I know lucene uses many files on disk to keep documents, each
> > comprising
> > > > fields in it, and uses many IR algorithms, and inverted index to
> match
> > > > documents.
> > > >
> > > > My question is :
> > > > 1. How lucene stores these documents inside file system and gets it
> so
> > > > fast?
> > > > 2. Does lucene uses any Hashing algorithm to get docs in O(1) ? If
> not
> > > > which DS is         used by lucene ?
> > > > 3. Except id provided by us at the time of indexing, is there any
> other
> > > > unique identifier       which is assigned by lucene to its documents
> ?
> > > >
> > > > I will appreciate If someone can provide me with source file names to
> > > study
> > > > these algorithms in detail.
> > > >
> > > > Regards
> > > > Harshvardhan Ojha
> > > >
> > >
> > >
> > >
> > > --
> > > Sincerely yours
> > > Mikhail Khludnev
> > > Principal Engineer,
> > > Grid Dynamics
> > >
> > > <http://www.griddynamics.com>
> > >  <[email protected]>
> > >
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> <http://www.griddynamics.com>
>  <[email protected]>
>

Re: Algorithm for retrieving documents

Reply via email to