On 05/01/2016 16:05, Allison, Timothy B. wrote:
Might want to look into:

https://github.com/flaxsearch/luwak

Yes, this sounds like a very good fit for Luwak. We built it originally for media monitoring applications where one also needs just a hit/no-hit result. It's running in production at much larger scale than this.

Best

Charlie


or
  https://github.com/OpenSextant/SolrTextTagger

-----Original Message-----
From: Will Moy [mailto:w...@fullfact.org]
Sent: Tuesday, January 05, 2016 11:02 AM
To: solr-user@lucene.apache.org
Subject: Many patterns against many sentences, storing all results

Hello

Please may I have your advice as to whether Solr is a good tool for this job?

We have (per year) –
Up to 50,000,000 sentences
And about 5,000 search patterns (i.e. queries)

Our task is to identify all matches between any sentence and any search pattern.

That list of detections must be kept up to date as patterns are added or 
updated (a handful an hour), and as new sentences are added.

Some of the sentences will be added in real time, at probably max 100 / second 
and usually much less. The detections on these should be provided within 3 
seconds.

It's an unusual application in that we want all results in an external DB, and 
also in that every sentence is either a hit or not. we don't care about scoring 
results, only about matches for the exact search pattern entered.

The application is automatically detecting instances of factchecked statements.

The smaller-scale prototype was done with postgres full text searching, but 
that can't do exact phrase matching or other more sophisticated searches, so 
it's out.

Thanks very much

Will



--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk

Reply via email to