The DIH has improved a great deal from Solr 1.3 to 1.4. You will be much better off using the DIH from this.
This is the current Solr release candidate binary: http://people.apache.org/~gsingers/solr/1.4.0/ On Tue, Nov 3, 2009 at 8:08 AM, Eugene Dzhurinsky <b...@redwerk.com> wrote: > On Mon, Nov 02, 2009 at 05:45:37PM -0800, Lance Norskog wrote: >> About large XML files and http overhead: you can tell solr to load the >> file directly from a file system. This will stream thousands of >> documents in one XML file without loading everything in memory at >> once. >> >> This is a new book on Solr. It will help you through this early learning >> phase. >> >> http://www.packtpub.com/solr-1-4-enterprise-search-server > > Thank you, but we have to prepare some proof of concept with the stable > version. I didn't see any 1.4.0 artifacts released to repo1.maven.org for now. > > Additionally, I've learned about http://wiki.apache.org/solr/DataImportHandler > and looks like this way is preferred in my case. > > I do have a lot of HTML pages on disk storage, and some metadata being stored > in SQL tables. What I seem to need is to provide some sort of EntityProcessor > and DataSource to DataImportHandler. Additionally I will need to provide some > sort of properties to instruct data source for data retrieval (table names > etc). > > So may be there is some tutorial or how-to, describing the process of creation > of custom classes for importing the data into Solr 1.3.0? > > Thank you in advance! > > -- > Eugene N Dzhurinsky > -- Lance Norskog goks...@gmail.com