Highlight with multi word synonyms

2011-12-24 Thread O. Klein
Copy pasted following text from lucene mailing list as it describes my problem: " I'm trying to use multi-word synonyms. For example in my synonyms file I have nhl, national hockey league. If I do this index only, a search for nhl returns a correct match, but highlights the first word only, nation

Identifying common text in documents

2011-12-24 Thread Mike O'Leary
I am looking for a way to identify blocks of text that occur in several documents in a corpus for a research project with electronic medical records. They can be copied and pasted sections inserted into another document, text from a previous email in the corpus that is repeated in a follow-up em

Re: Identifying common text in documents

2011-12-24 Thread Lance Norskog
Great topic! 1) SignatureUpdateProcessor creates a hash of the exact byte stream of the document. Often your crawling software can't do an incremental update of your data, but can only re-index the entire corpus. The SUP makes the hash, searches for it, and it it is there the document indexer says

Re: Migration from Solr 1.4 to Solr 3.5

2011-12-24 Thread Shawn Heisey
On 12/23/2011 5:41 AM, Bhavnik Gajjar wrote: • Consider this case. http://myserver:8080/solr/mainindex/select/?q=solr&start=0&rows=10&shards=myserver:8080/solr/index1,myserver:8080/solr/mainindex,remoteserver:8080/solr/remotedata. In this example, consider that 'myserver' has been upgraded with S

Re: PlainTextEntityProcessor and RegexTransformer in DataImport Handler

2011-12-24 Thread Matthew Parker
I would try something like the following:

Re: Highlight with multi word synonyms

2011-12-24 Thread Koji Sekiguchi
(11/12/24 21:20), O. Klein wrote: Copy pasted following text from lucene mailing list as it describes my problem: " I'm trying to use multi-word synonyms. For example in my synonyms file I have nhl, national hockey league. If I do this index only, a search for nhl returns a correct match, but hi