Re: SOLR X FAST

Svein Parnas Wed, 12 Dec 2007 02:49:06 -0800


On Dec 12, 2007, at 2:50 AM, Nuno Leitao wrote:

FAST uses two pipelines - an ingestion pipeline (for documentfeeding) and a query pipeline which are fully programmable (i.e.,you can customize it fully). At ingestion time you typically preparedocuments for indexing (tokenize, character normalize, lemmatize,clean up text, perform entity extraction for facets, perform staticboosting for certain documents, etc.), while at query time you canexpand synonyms, and do other general query side tasks (not unlikeSolr).
Horizontal scalability means the ability to cluster your searchengine across a large number of servers, so you can scale up on thenumber of documents, queries, crawls, etc.
There are FAST deployments out there which run on dozens, in somecases hundreds of nodes serving multiple terabyte size indexes andachieving hundreds of queries per seconds.
Yet again, if your requirements are relatively simple then Lucenemight do the job just fine.
Hope this helps.


With Fast, you will also get things like:
- categorization
- clustering
- more flexible collapsing / grouping
- more scalable facets (navigators) - at least for multivalued fields
- gigabytes of poorly documented software
- operations from hell
- huge amount of bugs
- high bills, both for software and hardware.

As for linguistic features (named entity extraction, dictionary basedlemmatization and so on) and things like categorization / clusteringetc, things should not be expected to work to well unless you put ahuge amount of work into it, and some of the features are reallyprimitive.

To sum up, if Solr meets your needs I would highly recommend Solr. Ifyou need some additional features and have the knowledge, integrateother products with Solr. If you really need the scalability, go forFast or some other commercial software.

As for document preprocessing and connectors for Solr, if you need it,you could have a look at OpenPipe, http://openpipe.berlios.de/ (notyet announced).


Svein

Re: SOLR X FAST

Reply via email to