On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote: > About large XML files and http overhead: you can tell solr to load the > file directly from a file system. This will stream thousands of > documents in one XML file without loading everything in memory at > once. > > This is a new book on Solr. It will help you through this early learning > phase. > > http://www.packtpub.com/solr-1-4-enterprise-search-server
Thank you, but we have to prepare some proof of concept with the stable version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now. Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler and looks like this way is preferred in my case. I do have a lot of HTML pages on disk storage, and some metadata being stored in SQL tables. What I seem to need is to provide some sort of EntityProcessor and DataSource to DataImportHandler. Additionally I will need to provide some sort of properties to instruct data source for data retrieval (table names etc). So may be there is some tutorial or how-to, describing the process of creation of custom classes for importing the data into Solr 1.3.0? Thank you in advance! -- Eugene N Dzhurinsky
pgpN3WZoxS6be.pgp
Description: PGP signature