Hello, I am new to Solr and am in the beginning planning stage of a large project and could use some advice so as not to make a huge design blunder that I will regret down the road.
Currently I have about 10 MySQL databases that store information about different archival collections. For example, we have data and metadata about a political poster collection, a television program, documents and photographs of and about a famous author, etc. My job is to work with the staff archivists to come up with a standard metadata template so the 10 databases can be consolidated into one. Currently the info in these databases is accessed through 10 different sets of PHP pages that were written a long time ago for PHP 4. My plan is to write a new Java application that will handle both public display of the info as well as an administrative interface so that staff members can add or edit the records. I have decided to use Solr as the search mechanism for this project. Because the info in each of our 10 collections is slightly different (e.g., a record about a poster does not contain duration information, but a record about a TV show does) I was thinking it would be good to separate each collection's index into a separate Solr core so that commits coming from one collection do not bog down the other unrelated collections. One reservation I have is that eventually we would like to be able to type in "Iraq" and find records across all of the collections at once instead of having to search each collection separately. Although I don't know anything about it at this stage, I did Google "sharding" after reading someone's recent post on this list and it sounds like that may be a potential answer to my question. Does anyone have any advice on how I should initially set up Solr for my situation? I am slowly making my way through the wiki and RTFMing, but I wanted to see what the experts have to say because at this point I don't really know where to start. Thank you very much, Mari