Hi,

I'm new to Solr and already highly impressed about its possibilities and
speed. Until now, I only have used a relational database (MySQL) and
programmed so far everything in php or Java.

Now, I'm stuck and don't know how to represent my data in a Solr Index.

To simplify things, first I want to give a partial representation of my
data.

First Table KEYWORDS:
keyword_id, keyword
1, white horse
2, black horse
3, dark dog
4, brown cat

Second Table CATEGORY:
category_id, parent_id, category, full_category
1, 0, color, color
2, 1, light, color > light
3, 2, white, color > light > white
4, 2, beige, color > light > beige
5, 1, dark , color > dark
6, 5, dark , color > dark > black
7, 5, dark , color > dark > brown
8, 0, animal, animal
9, 8, horse,  animal > horse
10,8, dog,    animal > dog
11,8, cat,    animal > cat

Third Table ANIMAL:
animal_id, animal_name, keyword_ids, category_ids
1, Cathago, 1, 3:7
2, Zebra, 1:2, 3:6:7
3, Bello, 3, 5:10
4, Kitty, 7, 11

There are numerous possibilities to represent the data in Solr.

Solution 1:
Save all data like they are represented in my relational database. If
someone searches for brown cat, I first look for the keyword, which results
in 3, then search with the value 3 in the animal table and finally fetch the
category 11. I know, that with Solr 4 and the new JOIN function I could do
this in one query some time in the future. The only benefit I see in this
solution is the same as in MySQL: To save space and not having same field
values saved twice.

Solution 2:
I just save all data based on the animal table, but denormalized.

animal_name: Kitty
keyword: brown cat
category: animal
category: cat
category: color
category: dark
category: brown

animal_name:Zebra
keyword: white horse
keyword: black horse
category: animal
category: horse
category: color
category: light
category: white
category: dark
category: black

Solution 3:
Similar to Solution 2, but with deepest category only

animal_name: Kitty
keyword: brown cat
category: animal > cat
category: color > dark > brown

animal_name:Zebra
keyword: white horse
keyword: black horse
category: animal > horse 
category: color > light > white
category: color > dark > black

You already see, that I'm very confused :-) The second solution gets kind of
messy and the third solution have me to extract the parent categories by
hand.

Is there anything you can recommend how to solve this the smart way?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/complex-keywords-hierarchical-data-Solr-representation-problem-tp3642588p3642588.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to