Hi all,
I would like to use Solr to replace our site search based on MySQL but I
am not sure how to map entities into the search index. The model is
described byt the attached UML class diagram.
I have a Hotel that resides in some City in some Country. The hotel has
various Rooms. For each Room in a Hotel there are some Packages that can
be purchased by the client.
The entity returned from the search will be mainly the Hotel. E.g.:
- all hotels in USA
- all hotels in New York
- all hotels with name containing "Hilton"
- all hotels in Egypt with packages with all inclusive boarding
and price lower than 400 and startDate between 2010-08-20
and 2010-08-30
Our application also uses faceting a lot. e.g:
- # of hotels per country/city
- # of hotels based on room size
(# of beds - 1 bed - 100 hotels, 2 beds - 200 hotels, ...)
- # of hotels based on all inclusive package prices
(0-100 EUR, 100-200 EUR, ...)
But there are also use cases when a search should return a Room or
Package directly.
I'd like to use Data Import Handler to index directly from our database.
But which approach of mapping entities into the search index to use? It
seems to me that there are at least 2 ways.
1) One index based on Hotel with multivalued fields for Rooms and
multivalued fields for Packages. In DIH:
<document>
<entity name="hotel" ...>
<field name="id" .../>
<entity name="room" ...>
<field name="room_id" .../>
<entity name="package"...>
<field .../>
</entity>
</entity>
</entity>
</document>
But I am not sure whether this will work due to multivalued fields. The
queries may span accross all the entities - I want only hotels that have
room with 2 beds and the room has a package with all inclusive boarding
and price lower than 400.
2) Denormalize data, so that there will be only one index based on
Packages containing (duplicated) all the data from Room and Hotel and
then use Field Collapsing on Hotel ID for search results and faceting too.
This would enable also direct search for Packages or Rooms but I am not
sure about Field Collapsing which is still a kind of beta functionality
and about potential performance costs.
Can anybody give me some advice or share their experiences?
Thanks a lot
Wenca