Wenca, I have an app with requirements similar to yours. We have maybe 40 caches that need to be built, then when they're done (and if they all succeed), the main indexing runs. For this I wrote some quick-n-squirrley code that executes a configurable # of cache-building handlers at a time. When one finishes, another starts until they're all done. When they all finish, the main indexing DIH starts. I just run this in a separate JVM on the master solr node. It keeps track of which ones are running and then polls the handlers w/ http every few seconds to see if they're done (scrapeing that "experimental/subject-to-change with typos" page to get the status).
So this is similar to Mikhail's advice. Possibly you can script this simply if you just have a 1 or a few caches that need to be built. You might even be able to monitor your container's log output to know when the first one finishes and the next one starts, if you don't want to scrape the http output (I forget if DIHCacheWriter logs anything useful you could use). My opinion is this is a real missing feature with DIH. However, I would shy away from adding more stuff like this until we can clean up some of DIHs more fundamental shortcomings. (DIH is great for many use cases, but the code has suffered neglect and needs a facelift in my opinion) James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -----Original Message----- From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] Sent: Wednesday, March 07, 2012 3:24 AM To: solr-user@lucene.apache.org Subject: Re: How to stop processing of DataImportHandler in EventListener Hello, It seems you have some app which triggers these DIH requests. Can't you add a precondition in that app? Before run the second DIH, check status of the first one whether it RUNNING or IDLE. Regards 2012/3/7 Wenca <we...@dovolenou.cz> > Hi, > > I have 2 DataImportHandlers configured. The first one prepares data to > berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then > indexes documents reading subentity data from the cache. > > I need some way to prevent the second handler to run if the first one is > currently runnig to prevent reading any inconsistent data. I have't found > any clear way to achieve this yet. > > I thought I can use EventListener before the second handler that will > check whether the cache dataimport is running and if so set some flag, that > the processing should not continue. > > Or is there another way to block data import handler when another one is > running? > > in solrconfig.xml I have: > > <requestHandler name="/dataimport" > class="org.apache.solr.**handler.dataimport.**DataImportHandler"> > <lst name="defaults"> > <str name="config">db-data-config.**xml</str> > <str name="persistCacheBaseDir">...**</str> > </lst> > </requestHandler> > > <requestHandler name="/dih-cache" > class="org.apache.solr.**handler.dataimport.**DataImportHandler"> > <lst name="defaults"> > <str name="config">cache-db-data-**config.xml</str> > <str name="writerImpl"> > org.apache.solr.handler.**dataimport.DIHCacheWriter > </str> > <str name="persistCacheImpl"> > org.apache.solr.handler.**dataimport.BerkleyBackedCache > </str> > <str name="persistCacheBaseDir">...**</str> > <str name="persistCacheName">data_**cache</str> > <str name="cachePk">id</str> > </lst> > </requestHandler> > > Thank wenca > -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics <http://www.griddynamics.com> <mkhlud...@griddynamics.com>