Hi, I just discovered UpdateProcessorFactory<http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/package-summary.html> in a big way. How did this completely slip by me?
Working on two ideas. 1. I have used the DIH in a local EmbeddedSolrServer previously. I could write a ForwardingUpdateProcessorFactory to take that local update and send it to a HttpSolrServer. 2. I have code which walks the file-system to compose rough documents but haven't yet written the part that handles the templated fields and cross-walking of the source(s) to the schema. I could configure the update handler on the Solr server side to do this with the RegexReplace <http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/RegexReplaceProcessorFactory.html>and DefaultValue<http://lucene.apache.org/solr/4_5_1/solr-core/org/apache/solr/update/processor/DefaultValueUpdateProcessorFactory.html> UpdateProcessorFactor(ies). Any thoughts on the advantages/disadvantages of these approaches? Thanks, Tricia On Thu, Nov 14, 2013 at 7:49 AM, Erick Erickson <erickerick...@gmail.com>wrote: > There's nothing that I know of that takes a DIH configuration and > uses it through SolrJ. You can use Tika directly in SolrJ if you > need to parse structured documents though, see: > http://searchhub.org/2012/02/14/indexing-with-solrj/ > > Yep, you're going to be kind of reinventing the wheel a bit I'm > afraid. > > Best, > Erick > > > On Wed, Nov 13, 2013 at 1:55 PM, P Williams > <williams.tricia.l...@gmail.com>wrote: > > > Hi All, > > > > I'm building a utility (Java jar) to create SolrInputDocuments and send > > them to a HttpSolrServer using the SolrJ API. The intention is to find > an > > efficient way to create documents from a large directory of files (where > > multiple files make one Solr document) and be sent to a remote Solr > > instance for update and commit. > > > > I've already solved the problem using the DataImportHandler (DIH) so I > have > > a data-config.xml that describes the templated fields and cross-walking > of > > the source(s) to the schema. The original data won't always be able to > be > > co-located with the Solr server which is why I'm looking for another > > option. > > > > I've also already solved the problem using ant and xslt to create a > > temporary (and unfortunately a potentially large) document which the > > UpdateHandler will accept. I couldn't think of a solution that took > > advantage of the XSLT support in the UpdateHandler because each document > is > > created from multiple files. Our current dated Java based solution > > significantly outperforms this solution in terms of disk and time. I've > > rejected it based on that and gone back to the drawing board. > > > > Does anyone have any suggestions on how I might be able to reuse my DIH > > configuration in the SolrJ context without re-inventing the wheel (or DIH > > in this case)? If I'm doing something ridiculous I hope you'll point > that > > out too. > > > > Thanks, > > Tricia > > >