On Dec 12, 2007, at 2:50 AM, Nuno Leitao wrote:


FAST uses two pipelines - an ingestion pipeline (for document feeding) and a query pipeline which are fully programmable (i.e., you can customize it fully). At ingestion time you typically prepare documents for indexing (tokenize, character normalize, lemmatize, clean up text, perform entity extraction for facets, perform static boosting for certain documents, etc.), while at query time you can expand synonyms, and do other general query side tasks (not unlike Solr).

Horizontal scalability means the ability to cluster your search engine across a large number of servers, so you can scale up on the number of documents, queries, crawls, etc.

There are FAST deployments out there which run on dozens, in some cases hundreds of nodes serving multiple terabyte size indexes and achieving hundreds of queries per seconds.

Yet again, if your requirements are relatively simple then Lucene might do the job just fine.

Hope this helps.

With Fast, you will also get things like:
- categorization
- clustering
- more flexible collapsing / grouping
- more scalable facets (navigators) - at least for multivalued fields
- gigabytes of poorly documented software
- operations from hell
- huge amount of bugs
- high bills, both for software and hardware.

As for linguistic features (named entity extraction, dictionary based lemmatization and so on) and things like categorization / clustering etc, things should not be expected to work to well unless you put a huge amount of work into it, and some of the features are really primitive.

To sum up, if Solr meets your needs I would highly recommend Solr. If you need some additional features and have the knowledge, integrate other products with Solr. If you really need the scalability, go for Fast or some other commercial software.

As for document preprocessing and connectors for Solr, if you need it, you could have a look at OpenPipe, http://openpipe.berlios.de/ (not yet announced).

Svein

Reply via email to