Better still start here: http://en.wikipedia.org/wiki/Inverted_index

http://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html

And there are several books on search engines and related algorithms.



On Tue, May 28, 2013 at 10:41 PM, Alexandre Rafalovitch
<arafa...@gmail.com>wrote:

> And you need to know this why?
>
> If you are really trying to understand how this all works under the
> covers, you need to look at Lucene's inverted index as a start. Start
> here:
> http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description
>
> Might take you a couple of weeks to put it all together.
>
> Or you could try asking the actual business-level question that you
> need an answer to. :-)
>
> Regards,
>    Alex.
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all
> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
>
>
> On Tue, May 28, 2013 at 10:13 PM, Kamal Palei <palei.ka...@gmail.com>
> wrote:
> > Dear All
> > I have a basic doubt how the data is stored in apache solr indexes.
> >
> > Say I have thousand registered users in my site. Lets say I want to store
> > skills of each users as a multivalued string index.
> >
> > Say
> > user 1 has skill set - Java, MySql, PHP
> > user 2 has skill set - C++, MySql, PHP
> > user 3 has skill set - Java, Android, iOS
> > ... so on
> >
> > You can see user 1 and 2 has two common skills that is MySql and PHP
> > In an actual case there might be millions of repetition of words.
> >
> > Now question is, does apache solr stores them as just words, OR converts
> > each words to an unique number and stores the number only.
> >
> > Best Regards
> > Kamal
> > Net Cloud Systems
> > Bangalore, India
>

Reply via email to