Hi, I think I would look at a hybrid approach, where you keep adding new synonyms to a query-side qynonym dictionary for immediate effect. And then every now and then or every Nth night you move those synonyms over to the index-side dictionary and trigger a full reindex.
A nice side effect of reindexing now and then could be that if your OpenNLP extraction dictionaries have changed, it will be reflected too. BTW: Could you share details of your OpenNLP integration with us? I'm about to do it on another project.. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 1. juli 2010, at 06.57, Ravi Kiran wrote: > Hello, > Hoping some solr guru can help me out here. We are a news > organization trying to migrate 10 million documents from FAST to solr. The > plan is to have our Editorial team add/modify synonyms multiple times during > a day as they deem appropriate. Hence we plan on using query time synonyms > as we cannot reindex every time they modify the synonyms file(for the > entities extracted by OpenNLP like locations/organizations/person names from > article body) . Since the synonyms are for names Iam concerned that the > multi-phrase issue crops up with the query-time synonyms. for example > synonyms could be as follows > > The Washington Post Co., The Washington Post, Washington Post, The Post, > TWP, WAPO > DHS,D.H.S,D.H.S.,Department of Homeland Security,Homeland Security > USCIS, United States Citizenship and Immigration Services, U.S.C.I.S. > > Barack Obama,Barack H. Obama,Barack Hussein Obama,President Obama > Hillary Clinton,Hillary R. Clinton,Hillary Rodham Clinton,Secretary > Clinton,Sen. Clinton > William J. Clinton,William Jefferson Clinton,President Clinton,President > Bill Clinton > > Virginia, Va., VA > D.C,Washington D.C, District of Columbia > > I have the following fieldType in schema.xml for the keywords/entites...What > issues should I be aware off ? And is there a better way to achieve it > without having to reindex a million docs on each synonym change. NOTE that I > use tokenizerFactory="solr.KeywordTokenizerFactory" for the > SynonymFilterFactory to keep the words intact without splitting > > <!-- Field Type Keywords/Entities Extracted from OpenNLP --> > <fieldType name="keywordText" class="solr.TextField" > sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> > <analyzer type="index"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.TrimFilterFactory" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt,entity-stopwords.txt" enablePositionIncrements="true"/> > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.KeywordTokenizerFactory"/> > <filter class="solr.TrimFilterFactory" /> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt,entity-stopwords.txt" enablePositionIncrements="true" > /> > <filter class="solr.SynonymFilterFactory" > tokenizerFactory="solr.KeywordTokenizerFactory" > synonyms="person-synonyms.txt,organization-synonyms.txt,location-synonyms.txt,subject-synonyms.txt" > ignoreCase="true" expand="true" /> > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > </analyzer> > </fieldType>