> We have implemented the content ingestion and processing pipelines already > in python and SPARK, so most of the data will be pushed in using APIs.
I use the spark-solr library in production and have looked at the ES equivalent and the solr connector looks much more advanced for both loading and fetching data. In particular the fetching part uses the solr export handler which makes things incredibly fast. Also spark-solr uses the dataframe API while ES looks to be stuck with the RDD api AFAIK. A good connector to spark offer lot of perspectives in term of data transformation and machine learning advanced features within the search engine. On Tue, Jan 14, 2020 at 11:02:17PM -0500, Dc Tech wrote: > I am SOLR fant and had implemented it in our company over 10 years ago. > I moved away from that role and the new search team in the meanwhile > implemented a proprietary (and expensive) nosql style search engine. That > the project did not go well, and now I am back to project and reviewing the > technology stack. > > Some of the team think that ElasticSearch could be a good option, > especially since we can easily get hosted versions with AWS where we have > all the contractual stuff sorted out. > > Whle SOLR definitely seems more advanced (LTR, streaming expressions, > graph, and all the knobs and dials for relevancy tuning), Elastic may be > sufficient for our needs. It does not seem to have LTR out of the box but > the relevancy tuning knobs and dials seem to be similar to what SOLR has. > > The corpus size is not a challenge - we have about one million document, > of which about 1/2 have full text, while the test are simpler (i.e. company > directory etc.). > The query volumes are also quite low (max 5/second at peak). > We have implemented the content ingestion and processing pipelines already > in python and SPARK, so most of the data will be pushed in using APIs. > > I would really appreciate any guidance from the community !! -- nicolas