Re: Best practice for storing relational data in Solr

Ryan Grange Tue, 08 Jan 2008 16:14:55 -0800

I've found that Solr running on modest hardware (a 2.4 GHz PC runningWindows XP Pro for testing changes) is able to index about 23,000records in under three minutes. Assuming you aren't going to make toomany typos in your naming, you should be fine just doing there-indexing. Try timing your system. Make a change to about a thousandrecords and see how long it takes to index them.

When indexing, I've found it's better to do them in batches for largerupdates. I get up to a few hundred updates ready at a time and committhem at once. Goes much faster than committing each update documentindividually.


Ryan Grange, IT Manager
DollarDays International, LLC
[EMAIL PROTECTED]
480-922-8155 x106



steve.lillywhite wrote:

Hi all,

This is a (possibly very naive) newbie question regarding Solr best practice...

I run a website that displays/stores data on job applicants, together with information on where they came from (e.g. which recruiter), which office they are applying to, etc. This data is stored in a mySQL database. I currently have a basic search facility, but I plan to introduce Solr to improve this, by also storing applicant data in a Solr schema.

My problem is that *related* applicant data can also be updated in the web GUI 
(e.g. if there was a typo a recruiter could be changed from “My Rcruiter” to 
“My Recruiter”, and I don’t know how best to reflect this in the Solr schema.

Example:

We may have 20000 applicants that came from recruiter “My Recruiter”. If the 
name of this recruiter is altered in the GUI then I would have to reindex all 
20000 of those applicants in the Solr schema, which seems very overkill. The 
alternative would be if I didn’t store the recruiter name in the Solr schema, 
and instead only stored its mySQL database identifier. Then, I would need to 
parse any search results from Solr to put in the recruiter name before 
displaying the data in the GUI.

So I guess I’m asking which of these is the better approach;

1.       Use Solr to store the text value of related applicant data that exists 
in a relational mySQL database. Whenever that data is updated in the database 
reindex all dependent entries in the Solr schema. Advantage of this approach I 
guess is that search results can be returned from Solr and displayed as is (if 
XSLT is used). E.g. search result for “John Smith” of recruiter “My Recruiter” 
could be returned in the required HTML format from Solr, and displayed in the 
web GUI without any reformatting or further processing.

2.       Use Solr to store database Ids of related applicant data that exists 
in a relational mySQL database. When that data is updated in the database there 
is no need to reindex Solr. However, search results from Solr will need to be 
parsed before they can be output in the web GUI. E.g. if Solr returns “John 
Smith” of recruiter with database ID 143, then 143 will need to be mapped back 
to “My Recruiter” by my application before it can be displayed.

Can anyone offer any guidance here?

Regards

Steve


No virus found in this outgoing message.

Checked by AVG Free Edition.Version: 7.5.516 / Virus Database: 269.17.13/1208 - Release Date: 03/01/2008 15:52

Re: Best practice for storing relational data in Solr

Reply via email to