Solr is indexing XML only?

2006-04-26 Thread David Trattnig
Hello! I'd like to setup/develop a search-server. I thought I would use Lucene, then I read about Solr. So I have done the Solr-Tutorial. Firstly really happy about the additional features to the Lucene-Functionality I now noticed that Solr can index only XML files. Or am I completely wrong? What

Re: Solr is indexing XML only?

2006-04-26 Thread Erik Hatcher
David, Solr doesn't index XML files, but rather XML is used as the wrapper of the text that does get indexed. The document structure is defined in schema.xml, and the field text to be indexed is sent wrapped in an XML request. Regarding your scenario, you would need to write code that pa

Re: Solr is indexing XML only?

2006-04-26 Thread Bill Au
With Solr you can index anything Lucene can index since Solr uses Lucene under the cover. The input to Solr is in XML format. You will need to process that data you want to index (ie exclude certain files and remove HTML tags) and put them into Solr's input format. Bill On 4/26/06, David Tratt

Re: Solr is indexing XML only?

2006-04-26 Thread Chris Hostetter
: will need to process that data you want to index (ie exclude certain : files and remove HTML tags) and put them into Solr's input format. minor clarification: Solr does ship with two Tokenizers that do a pretty good job of throwing away HTML markup, os you don't have to parse it yourlsef -- but

distributing indexes via solr

2006-04-26 Thread Johnny Monsod
Hi, Suppose I want the xml input submitted to solr to be distributed among a fixed set of partitions; basically, something like round-robin among each of them, so that each directory has a relatively equal size in terms of # of segments. Is there an easy way to do this? I took a quick look at th