This is a DIH plug-in that lets you seach Solr directly in the processing chain.

https://issues.apache.org/jira/browse/SOLR-1499

You can fetch a database record, search Solr, then search the DB again
using the return values.

Lance

On Tue, Jul 20, 2010 at 1:35 PM, Travis Low <t...@4centurion.com> wrote:
> I have a large database table with many document records, and I plan to use
> SOLR to improve the searching for the documents.
>
> The twist here is that perhaps 50% of the records will originate from
> outside sources, and sometimes those records may be updated versions of
> documents we already have.  Currently, a human visually examines the
> incoming information and performs a few document searches, and decides if a
> new document must be created, or an existing one should be updated.  We
> would like to automate the matching to some extent, and it occurs to me that
> SOLR might be useful for this as well.
>
> Each document has many attributes that can be used for matching.  The
> attributes are all in lookup tables.  For example, there is a "location"
> field that might be something like "Central Public Library, Crawford, NE"
> for row with id #4444.  The incoming document might have something like
> "Crawford Central Public Library, Nebraska", which ideally would map to
> #4444 as well.
>
> I'm currently thinking that a two-phase import might work.  First, we use
> SOLR to try and get a list of attribute ids for the incoming document.
> Those can be used for ordinary database queries to find primary keys of
> potential matches.  Then we use SOLR again to search the reduced list for
> the unstructured information, essentially by including those primary keys as
> part of the search.
>
> I was looking at the example for DIH here:
> http://wiki.apache.org/solr/DataImportHandler and it is clear, but it
> obviously slanted on finding the products.  I need to find the categories so
> that I can *then* find the products, if that makes sense.
>
> Any suggestions on how to proceed?  My first thought is that I should set up
> two SOLR instances, one for indexing only attributes, and one for the
> documents themselves.
>
> Thanks in advance for any help.
>
> cheers,
>
> Travis
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to