Store them as a string token in multivalued fields. Solr/Lucene will
do the necessary mapping and lookups. That's what you are paying it
for. :-) That way you can easily facet and so on.

You may need to change some parts of your architecture later, but you
seem to be over-thinking it too early in the process.

Regards,
   Alex.
Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Tue, May 28, 2013 at 10:54 PM, Kamal Palei <palei.ka...@gmail.com> wrote:
> Thanks Alex.
>
> I am in dilemma how do I store the skill sets with solr index as a string
> token or as an integer. To give little background -
>
> As of today, each skill I assign a unique id (take as auto increment field
> in mysql table), and the store them against user id in a separate table.
> That's how I do search for users having  a particular skill or retrieve
> complete skill set of a particular user.
>
> Now I want to dump everything to solr and will minimize mysql usage as low
> as possible. This will help me to scale to higher load.
>
> I am just weighing down options between
> 1. Should I store each skill as a string token (in a new multivalued string
> index)
> 2. OR should I store each skill as an integer (in a new multivalued integer
> index)
>
> Kindly suggest which is better option.
>
> Best Regards
> kamal
>
>
>
>
>
>
> On Wed, May 29, 2013 at 8:11 AM, Alexandre Rafalovitch
> <arafa...@gmail.com>wrote:
>
>> And you need to know this why?
>>
>> If you are really trying to understand how this all works under the
>> covers, you need to look at Lucene's inverted index as a start. Start
>> here:
>> http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/codecs/lucene42/package-summary.html#package_description
>>
>> Might take you a couple of weeks to put it all together.
>>
>> Or you could try asking the actual business-level question that you
>> need an answer to. :-)
>>
>> Regards,
>>    Alex.
>> Personal blog: http://blog.outerthoughts.com/
>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>> - Time is the quality of nature that keeps events from happening all
>> at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
>> book)
>>
>>
>> On Tue, May 28, 2013 at 10:13 PM, Kamal Palei <palei.ka...@gmail.com>
>> wrote:
>> > Dear All
>> > I have a basic doubt how the data is stored in apache solr indexes.
>> >
>> > Say I have thousand registered users in my site. Lets say I want to store
>> > skills of each users as a multivalued string index.
>> >
>> > Say
>> > user 1 has skill set - Java, MySql, PHP
>> > user 2 has skill set - C++, MySql, PHP
>> > user 3 has skill set - Java, Android, iOS
>> > ... so on
>> >
>> > You can see user 1 and 2 has two common skills that is MySql and PHP
>> > In an actual case there might be millions of repetition of words.
>> >
>> > Now question is, does apache solr stores them as just words, OR converts
>> > each words to an unique number and stores the number only.
>> >
>> > Best Regards
>> > Kamal
>> > Net Cloud Systems
>> > Bangalore, India
>>

Reply via email to