Wenca,

I have an app with requirements similar to yours.  We have maybe 40 caches that 
need to be built, then when they're done (and if they all succeed), the main 
indexing runs.  For this I wrote some quick-n-squirrley code that executes a 
configurable # of cache-building handlers at a time.  When one finishes, 
another starts until they're all done.  When they all finish, the main indexing 
DIH starts.  I just run this in a separate JVM on the master solr node.  It 
keeps track of which ones are running and then polls the handlers w/ http every 
few seconds to see if they're done (scrapeing that 
"experimental/subject-to-change with typos" page to get the status). 

So this is similar to Mikhail's advice.  Possibly you can script this simply if 
you just have a 1 or a few caches that need to be built.  You might even be 
able to monitor your container's log output to know when the first one finishes 
and the next one starts, if you don't want to scrape the http output (I forget 
if DIHCacheWriter logs anything useful you could use).

My opinion is this is a real missing feature with DIH.  However, I would shy 
away from adding more stuff like this until we can clean up some of DIHs more 
fundamental shortcomings.  (DIH is great for many use cases, but the code has 
suffered neglect and needs a facelift in my opinion)

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-----Original Message-----
From: Mikhail Khludnev [mailto:mkhlud...@griddynamics.com] 
Sent: Wednesday, March 07, 2012 3:24 AM
To: solr-user@lucene.apache.org
Subject: Re: How to stop processing of DataImportHandler in EventListener

Hello,

It seems you have some app which triggers these DIH requests. Can't you add
a precondition in that app? Before run the second DIH, check status of the
first one whether it RUNNING or IDLE.

Regards

2012/3/7 Wenca <we...@dovolenou.cz>

> Hi,
>
> I have 2 DataImportHandlers configured. The first one prepares data to
> berkeley backed cache (SOLR-2382, SOLR-2613) and the second one then
> indexes documents reading subentity data from the cache.
>
> I need some way to prevent the second handler to run if the first one is
> currently runnig to prevent reading any inconsistent data. I have't found
> any clear way to achieve this yet.
>
> I thought I can use EventListener before the second handler that will
> check whether the cache dataimport is running and if so set some flag, that
> the processing should not continue.
>
> Or is there another way to block data import handler when another one is
> running?
>
> in solrconfig.xml I have:
>
> <requestHandler name="/dataimport"
>  class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>    <lst name="defaults">
>      <str name="config">db-data-config.**xml</str>
>      <str name="persistCacheBaseDir">...**</str>
>    </lst>
> </requestHandler>
>
> <requestHandler name="/dih-cache"
>  class="org.apache.solr.**handler.dataimport.**DataImportHandler">
>    <lst name="defaults">
>        <str name="config">cache-db-data-**config.xml</str>
>        <str name="writerImpl">
>                org.apache.solr.handler.**dataimport.DIHCacheWriter
>        </str>
>        <str name="persistCacheImpl">
>                org.apache.solr.handler.**dataimport.BerkleyBackedCache
>        </str>
>        <str name="persistCacheBaseDir">...**</str>
>        <str name="persistCacheName">data_**cache</str>
>        <str name="cachePk">id</str>
>    </lst>
> </requestHandler>
>
> Thank wenca
>



-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

<http://www.griddynamics.com>
 <mkhlud...@griddynamics.com>

Reply via email to