About reindexing and performance. This is not really a problem as you can re-index on a completely different machine and then just move the completed index to your production machines and reopen your index. SOLR has this capability out of the box. Here's a link to get you started: http://wiki.apache.org/solr/SolrCollectionDistributionScripts
Your first few queries on a newly-opened index will be a bit slower unless you do pre-warming. But the reindexing process can be done without affecting the current searcher in any way. Of course you'll need the disk space available, but disks are cheap <G>... HTH Erick On Thu, Jul 1, 2010 at 2:06 PM, Ravi Kiran <ravi.bhas...@gmail.com> wrote: > Hello Mr. Høydahl, > I thought of doing it exactly as you have said, > Shall try out and see where I land. However Iam still skeptical about that > approach from the performance point of view as we are a round the clock > news > organization and huge reindexing might affect the speed of searches > moreover > in the news business "being first" is more important hence we need those > synonyms to take affect right away and thats where we are in a quandry > > With regards to the OpenNLP implementation, our design is plain vanilla > outside of SOLR. We generate the XML on the fly with extracted entities > from > OpenNLP and then index it straight into SOLR. However, we do some sanity > checks for locations prior to indexing using wordnet so that false > positives > are avoided in location names. > > Thanks, > > Ravi Kiran Bhaskar > > On Thu, Jul 1, 2010 at 5:40 AM, Jan Høydahl / Cominvent < > jan....@cominvent.com> wrote: > > > Hi, > > > > I think I would look at a hybrid approach, where you keep adding new > > synonyms to a query-side qynonym dictionary for immediate effect. And > then > > every now and then or every Nth night you move those synonyms over to the > > index-side dictionary and trigger a full reindex. > > > > A nice side effect of reindexing now and then could be that if your > OpenNLP > > extraction dictionaries have changed, it will be reflected too. > > > > BTW: Could you share details of your OpenNLP integration with us? I'm > about > > to do it on another project.. > > > > -- > > Jan Høydahl, search solution architect > > Cominvent AS - www.cominvent.com > > Training in Europe - www.solrtraining.com > > > > On 1. juli 2010, at 06.57, Ravi Kiran wrote: > > > > > Hello, > > > Hoping some solr guru can help me out here. We are a news > > > organization trying to migrate 10 million documents from FAST to solr. > > The > > > plan is to have our Editorial team add/modify synonyms multiple times > > during > > > a day as they deem appropriate. Hence we plan on using query time > > synonyms > > > as we cannot reindex every time they modify the synonyms file(for the > > > entities extracted by OpenNLP like locations/organizations/person names > > from > > > article body) . Since the synonyms are for names Iam concerned that the > > > multi-phrase issue crops up with the query-time synonyms. for example > > > synonyms could be as follows > > > > > > The Washington Post Co., The Washington Post, Washington Post, The > Post, > > > TWP, WAPO > > > DHS,D.H.S,D.H.S.,Department of Homeland Security,Homeland Security > > > USCIS, United States Citizenship and Immigration Services, U.S.C.I.S. > > > > > > Barack Obama,Barack H. Obama,Barack Hussein Obama,President Obama > > > Hillary Clinton,Hillary R. Clinton,Hillary Rodham Clinton,Secretary > > > Clinton,Sen. Clinton > > > William J. Clinton,William Jefferson Clinton,President > Clinton,President > > > Bill Clinton > > > > > > Virginia, Va., VA > > > D.C,Washington D.C, District of Columbia > > > > > > I have the following fieldType in schema.xml for the > > keywords/entites...What > > > issues should I be aware off ? And is there a better way to achieve it > > > without having to reindex a million docs on each synonym change. NOTE > > that I > > > use tokenizerFactory="solr.KeywordTokenizerFactory" for the > > > SynonymFilterFactory to keep the words intact without splitting > > > > > > <!-- Field Type Keywords/Entities Extracted from OpenNLP --> > > > <fieldType name="keywordText" class="solr.TextField" > > > sortMissingLast="true" omitNorms="true" positionIncrementGap="100"> > > > <analyzer type="index"> > > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > > <filter class="solr.TrimFilterFactory" /> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="stopwords.txt,entity-stopwords.txt" > > enablePositionIncrements="true"/> > > > > > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > > </analyzer> > > > <analyzer type="query"> > > > <tokenizer class="solr.KeywordTokenizerFactory"/> > > > <filter class="solr.TrimFilterFactory" /> > > > <filter class="solr.StopFilterFactory" ignoreCase="true" > > > words="stopwords.txt,entity-stopwords.txt" > > enablePositionIncrements="true" > > > /> > > > <filter class="solr.SynonymFilterFactory" > > > tokenizerFactory="solr.KeywordTokenizerFactory" > > > > > > synonyms="person-synonyms.txt,organization-synonyms.txt,location-synonyms.txt,subject-synonyms.txt" > > > ignoreCase="true" expand="true" /> > > > <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> > > > </analyzer> > > > </fieldType> > > > > >