This is a DIH plug-in that lets you seach Solr directly in the processing chain.
https://issues.apache.org/jira/browse/SOLR-1499 You can fetch a database record, search Solr, then search the DB again using the return values. Lance On Tue, Jul 20, 2010 at 1:35 PM, Travis Low <t...@4centurion.com> wrote: > I have a large database table with many document records, and I plan to use > SOLR to improve the searching for the documents. > > The twist here is that perhaps 50% of the records will originate from > outside sources, and sometimes those records may be updated versions of > documents we already have. Currently, a human visually examines the > incoming information and performs a few document searches, and decides if a > new document must be created, or an existing one should be updated. We > would like to automate the matching to some extent, and it occurs to me that > SOLR might be useful for this as well. > > Each document has many attributes that can be used for matching. The > attributes are all in lookup tables. For example, there is a "location" > field that might be something like "Central Public Library, Crawford, NE" > for row with id #4444. The incoming document might have something like > "Crawford Central Public Library, Nebraska", which ideally would map to > #4444 as well. > > I'm currently thinking that a two-phase import might work. First, we use > SOLR to try and get a list of attribute ids for the incoming document. > Those can be used for ordinary database queries to find primary keys of > potential matches. Then we use SOLR again to search the reduced list for > the unstructured information, essentially by including those primary keys as > part of the search. > > I was looking at the example for DIH here: > http://wiki.apache.org/solr/DataImportHandler and it is clear, but it > obviously slanted on finding the products. I need to find the categories so > that I can *then* find the products, if that makes sense. > > Any suggestions on how to proceed? My first thought is that I should set up > two SOLR instances, one for indexing only attributes, and one for the > documents themselves. > > Thanks in advance for any help. > > cheers, > > Travis > -- Lance Norskog goks...@gmail.com