Man you people are fast! There is a bug in Solr/Lucene. It keeps memory around from previous fields, so giant text files might run out of memory when they should not. This bug is fixed in the trunk.
On 4/17/10, Lance Norskog <goks...@gmail.com> wrote: > The DataImportHandler can let you fetch the file name from the > database record, and then load the file as a field and process the > text with Tika. > > It will not be easy :) but it is possible. > > http://wiki.apache.org/solr/DataImportHandler > > On 4/17/10, Serdar Sahin <anlamar...@gmail.com> wrote: >> Hi, >> >> I am rather new to Solr and have a question. >> >> We have around 200.000 txt files which are placed into the file cloud. >> The file path is something similar to this: >> >> file/97/8f/840/fa4-1.txt >> file/a6/9d/ab0/ca2-2.txt etc. >> >> and we also store the metadata (like title, description, tags etc) >> about these files in the mysql server. So, what I want to do is to >> index title, description, tags and other data from mysql, and also get >> the txt file from file server, and link them as one record for >> searching, but I could not figure out how to automatize this process. >> I can give the path from the sql query like, Select id, title, >> description, file_path, and then solr can use this path to retrieve >> txt file, but I don't know whether is it possible or not. >> >> What is the best way to index these files with their tag title and >> description without coding in Java (Perl is ok). These txt files are >> large, between 100kb-10mb, so the last option is to store them in the >> database. >> >> Thanks, >> >> Serdar >> > > > -- > Lance Norskog > goks...@gmail.com > -- Lance Norskog goks...@gmail.com