I've found that Solr running on modest hardware (a 2.4 GHz PC running
Windows XP Pro for testing changes) is able to index about 23,000
records in under three minutes. Assuming you aren't going to make too
many typos in your naming, you should be fine just doing the
re-indexing. Try timing your system. Make a change to about a thousand
records and see how long it takes to index them.
When indexing, I've found it's better to do them in batches for larger
updates. I get up to a few hundred updates ready at a time and commit
them at once. Goes much faster than committing each update document
individually.
Ryan Grange, IT Manager
DollarDays International, LLC
[EMAIL PROTECTED]
480-922-8155 x106
steve.lillywhite wrote:
Hi all,
This is a (possibly very naive) newbie question regarding Solr best practice...
I run a website that displays/stores data on job applicants, together with information on where they came from (e.g. which recruiter), which office they are applying to, etc. This data is stored in a mySQL database. I currently have a basic search facility, but I plan to introduce Solr to improve this, by also storing applicant data in a Solr schema.
My problem is that *related* applicant data can also be updated in the web GUI
(e.g. if there was a typo a recruiter could be changed from “My Rcruiter” to
“My Recruiter”, and I don’t know how best to reflect this in the Solr schema.
Example:
We may have 20000 applicants that came from recruiter “My Recruiter”. If the
name of this recruiter is altered in the GUI then I would have to reindex all
20000 of those applicants in the Solr schema, which seems very overkill. The
alternative would be if I didn’t store the recruiter name in the Solr schema,
and instead only stored its mySQL database identifier. Then, I would need to
parse any search results from Solr to put in the recruiter name before
displaying the data in the GUI.
So I guess I’m asking which of these is the better approach;
1. Use Solr to store the text value of related applicant data that exists
in a relational mySQL database. Whenever that data is updated in the database
reindex all dependent entries in the Solr schema. Advantage of this approach I
guess is that search results can be returned from Solr and displayed as is (if
XSLT is used). E.g. search result for “John Smith” of recruiter “My Recruiter”
could be returned in the required HTML format from Solr, and displayed in the
web GUI without any reformatting or further processing.
2. Use Solr to store database Ids of related applicant data that exists
in a relational mySQL database. When that data is updated in the database there
is no need to reindex Solr. However, search results from Solr will need to be
parsed before they can be output in the web GUI. E.g. if Solr returns “John
Smith” of recruiter with database ID 143, then 143 will need to be mapped back
to “My Recruiter” by my application before it can be displayed.
Can anyone offer any guidance here?
Regards
Steve
No virus found in this outgoing message.
Checked by AVG Free Edition.
Version: 7.5.516 / Virus Database: 269.17.13/1208 - Release Date: 03/01/2008 15:52