Re: adding and updating a lot of document to Solr, metadata extraction etc

Eugene Dzhurinsky Tue, 03 Nov 2009 08:08:30 -0800

On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote:
> About large XML files and http overhead: you can tell solr to load the
> file directly from a file system. This will stream thousands of
> documents in one XML file without loading everything in memory at
> once.
> 
> This is a new book on Solr. It will help you through this early learning 
> phase.
> 
> http://www.packtpub.com/solr-1-4-enterprise-search-server


Thank you, but we have to prepare some proof of concept with the stable
version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now.

Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler
and looks like this way is preferred in my case.

I do have a lot of HTML pages on disk storage, and some metadata being stored
in SQL tables. What I seem to need is to provide some sort of EntityProcessor
and DataSource to DataImportHandler. Additionally I will need to provide some
sort of properties to instruct data source for data retrieval (table names
etc).

So may be there is some tutorial or how-to, describing the process of creation
of custom classes for importing the data into Solr 1.3.0?

Thank you in advance!

-- 
Eugene N Dzhurinsky

pgpN3WZoxS6be.pgp
Description: PGP signature

Re: adding and updating a lot of document to Solr, metadata extraction etc

Reply via email to