Thanks a lot for all your input. I will go ahead and store as strings. Best Regards Kamal
On Wed, May 29, 2013 at 9:00 AM, Jack Krupansky <j...@basetechnology.com>wrote: > As a general rule with Solr, do a proof of concept implementation with the > simplest sensible approach and only start piling on complexity if > performance or capacity become problematic. If the data is naturally a > string, use a string. If it is naturally a number, use a number. Use > whatever the query client's will be most comfortable with. > > -- Jack Krupansky > > -----Original Message----- From: Kamal Palei > Sent: Tuesday, May 28, 2013 10:54 PM > To: solr-user@lucene.apache.org > Subject: Re: How apache solr stores indexes > > > Thanks Alex. > > I am in dilemma how do I store the skill sets with solr index as a string > token or as an integer. To give little background - > > As of today, each skill I assign a unique id (take as auto increment field > in mysql table), and the store them against user id in a separate table. > That's how I do search for users having a particular skill or retrieve > complete skill set of a particular user. > > Now I want to dump everything to solr and will minimize mysql usage as low > as possible. This will help me to scale to higher load. > > I am just weighing down options between > 1. Should I store each skill as a string token (in a new multivalued string > index) > 2. OR should I store each skill as an integer (in a new multivalued integer > index) > > Kindly suggest which is better option. > > Best Regards > kamal > > > > > > > On Wed, May 29, 2013 at 8:11 AM, Alexandre Rafalovitch > <arafa...@gmail.com>wrote: > > And you need to know this why? >> >> If you are really trying to understand how this all works under the >> covers, you need to look at Lucene's inverted index as a start. Start >> here: >> http://lucene.apache.org/core/**4_3_0/core/org/apache/lucene/** >> codecs/lucene42/package-**summary.html#package_**description<http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description> >> >> Might take you a couple of weeks to put it all together. >> >> Or you could try asking the actual business-level question that you >> need an answer to. :-) >> >> Regards, >> Alex. >> Personal blog: http://blog.outerthoughts.com/ >> LinkedIn: >> http://www.linkedin.com/in/**alexandrerafalovitch<http://www.linkedin.com/in/alexandrerafalovitch> >> - Time is the quality of nature that keeps events from happening all >> at once. Lately, it doesn't seem to be working. (Anonymous - via GTD >> book) >> >> >> On Tue, May 28, 2013 at 10:13 PM, Kamal Palei <palei.ka...@gmail.com> >> wrote: >> > Dear All >> > I have a basic doubt how the data is stored in apache solr indexes. >> > >> > Say I have thousand registered users in my site. Lets say I want to > >> store >> > skills of each users as a multivalued string index. >> > >> > Say >> > user 1 has skill set - Java, MySql, PHP >> > user 2 has skill set - C++, MySql, PHP >> > user 3 has skill set - Java, Android, iOS >> > ... so on >> > >> > You can see user 1 and 2 has two common skills that is MySql and PHP >> > In an actual case there might be millions of repetition of words. >> > >> > Now question is, does apache solr stores them as just words, OR converts >> > each words to an unique number and stores the number only. >> > >> > Best Regards >> > Kamal >> > Net Cloud Systems >> > Bangalore, India >> >> >