Re: Indexing from a database via SolrJ

Erick Erickson Tue, 16 Aug 2011 10:24:16 -0700

The problem with anything "automatic" is that I don't see how it could know
which fields in the document to map DB columns to. Unless you had
fields that exactly matched column names, it would be iffy...


I assume DIH actually does something like this, but don't know any way
of having SolrJ automagically do this.

At root these kinds of things don't generalize well, but that doesn't mean
that there's not a good case for doing this.

Best
Erick

On Tue, Aug 16, 2011 at 11:26 AM, Shawn Heisey <[email protected]> wrote:
> On 8/16/2011 7:14 AM, Erick Erickson wrote:
>>
>> What have you tried and what doesn't it do that you want it to do?
>>
>> This works, instantiating the StreamingUpdateSolrServer (server) and
>> the JDBC connection/SQL statement are left as exercises for the
>> reader<G>.:
>>
>>     while (rs.next()) {
>>       SolrInputDocument doc = new SolrInputDocument();
>>
>>       String id = rs.getString("id");
>>       String title = rs.getString("title");
>>       String text = rs.getString("text");
>>
>>       doc.addField("id", id);
>>       doc.addField("title", title);
>>       doc.addField("text", text);
>>
>>       docs.add(doc);
>>       ++counter;
>>       ++total;
>>       if (counter>  100) { // Completely arbitrary, just batch up more
>> than one document for throughput!
>>         server.add(docs);
>>         docs.clear();
>>         counter = 0;
>>       }
>>     }
>
> I've implemented a basic loop with the structure you've demonstrated, but it
> currently doesn't do anything yet with SolrInputDocument or
> SolrDocumentList.  I figured there would be a way to avoid going through the
> field list one by one, but what you've written suggests that the
> field-by-field method is required.  I can live with that.
>
> It does look like addField just takes an Object, so hopefully I can create a
> loop that determines the type of each field from the JDBC metadata,
> retrieves the correct Java type from the ResultSet, and inserts it.  I
> imagine that everything still works if you happen to insert a field that
> doesn't exist in the index.  This must be how the DIH does it, so I was
> hoping that the DIH might expose a method that takes a ResultSet and
> produces a SolrDocumentList.  I still have to take a deeper look at the
> source and documentation.
>
> Thanks for the help so far, I can get a little more implemented now.
>
> Shawn
>
>

Re: Indexing from a database via SolrJ

Reply via email to