On 15/01/2020 04:02, Dc Tech wrote:
I am SOLR fant and had implemented it in our company over 10 years ago.
I moved away from that role and the new search team in the meanwhile
implemented a proprietary (and expensive) nosql style search engine. That
the project did not go well, and now I am back to project and reviewing the
technology stack.
Some of the team think that ElasticSearch could be a good option,
especially since we can easily get hosted versions with AWS where we have
all the contractual stuff sorted out.
You can, but you should be aware that:
1. Amazon's hosted Elasticsearch isn't great, often lags behind the
current version, doesn't allow plugins etc.
2. Amazon and Elastic are currently engaged in legal battles over who
is the most open sourcey,who allegedly copied code that was 'open' but
commercially licensed, who would like to capture the hosted search
market...not sure how this will pan out (Google for details)
3. You can also buy fully hosted Solr from several places.
Whle SOLR definitely seems more advanced (LTR, streaming expressions,
graph, and all the knobs and dials for relevancy tuning), Elastic may be
sufficient for our needs. It does not seem to have LTR out of the box but
the relevancy tuning knobs and dials seem to be similar to what SOLR has.
Yes, they're basically the same under the hood (unsurprising as they're
both based on Lucene). If you need LTR there's an ES plugin for that
(disclaimer, my new employer built and maintains it:
https://github.com/o19s/elasticsearch-learning-to-rank). I've lost track
of the amount of times I've been asked 'Elasticsearch or Solr, which
should I choose?' and my current thoughts are:
1. Don't switch from one to the other for the sake of it. Switching
search engines rarely addresses underlying issues (content quality, team
skills, relevance tuning methodology)
2. Elasticsearch is easier to get started with, but at some point you'll
need to learn how it all works
3. Solr is harder to get started with, but you'll know more about how it
all works earlier
4. Both can be used for most search projects, most features are the
same, both can scale.
5. Lots of Elasticsearch projects (and developers) are focused on logs,
which is often not really a 'search' project.
The corpus size is not a challenge - we have about one million document,
of which about 1/2 have full text, while the test are simpler (i.e. company
directory etc.).
The query volumes are also quite low (max 5/second at peak).
We have implemented the content ingestion and processing pipelines already
in python and SPARK, so most of the data will be pushed in using APIs.
I would really appreciate any guidance from the community !!
Sounds like a pretty small setup to be honest, but as ever the devil is
in the details.
Cheers
Charlie
--
Charlie Hull
Flax - Open Source Enterprise Search (now part of OpenSourceConnections)
tel/fax: +44 (0)8700 118334
mobile: +44 (0)7767 825828
web: www.o19.com