Hello! I'd like to setup/develop a search-server. I thought I would use Lucene, then I read about Solr. So I have done the Solr-Tutorial. Firstly really happy about the additional features to the Lucene-Functionality I now noticed that Solr can index only XML files. Or am I completely wrong?
What should I use for the following situation: 1. Copy HTML-files to the Live-Server (via RSync) 2. Index them by the search engine 3. Exclude some "tagged" files (these files for example would have a specific meta-data-tag) 4. Exclude HTML-tags and other unworthy stuff How much work of development would that be with Lucene or Solr (If possible)? Any help would be appreciated! Thx in advance, david