It is useful for parsing PDFs on a multi-processor machine. Also, if a sub-entity does an outbound I/O call to a database, a file, or another SOLR (SOLR-1499).
Anything where the pipeline time outweighs disk i/o time. Threading happens on a per-document level- there is no concurrent access inside a document pipeline. There is a bug which causes Entityprocessor that look up attributes to throw an exception. This make Tika unusable inside a thread. Two other EPs also won't work, but I did not test them. https://issues.apache.org/jira/browse/SOLR-2186 On Mon, Nov 1, 2010 at 10:43 AM, Dyer, James <james.d...@ingrambook.com> wrote: > Mark, > > I have the same question so I did a little research on this. Not a complete > answer but here is what I've found: > > - "threads" was aded with SOLR-1352 > (https://issues.apache.org/jira/browse/SOLR-1352). > > - Also see > http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler > for background info. > > - Only available in 3.x and trunk. Committed on 1/12/2010 by Noble Paul (who > surely can tell you more accurate info than I can). > > - Seems like when using, each thread will call "nextRow" on your root entity > datasource in parallel. > > - Not sure this will help with child entities (ie. I had hoped I could get it > to build child caches in parallel but I don't think this is the case). > > - A doc comment on ThreadedEntityProcessorWrapper indicates this will help > speed up running transformers becauses they'd be in parallel. This would > make sense if maybe your database can only pull back so fast, but then you > have an intensive transformer. Maybe adding a thread would make your > processing no slower than the db... > > James Dyer > E-Commerce Systems > Ingram Content Group > (615) 213-4311 > > > -----Original Message----- > From: markwaddle [mailto:m...@markwaddle.com] > Sent: Tuesday, October 26, 2010 2:25 PM > To: solr-user@lucene.apache.org > Subject: How does DIH multithreading work? > > > I understand that the thread count is specified on root entities only. Does > it spawn multiple threads per root entity? Or multiple threads per > descendant entity? Can someone give an example of how you would make a > database query in an entity with 4 threads that would select 1 row per > thread? > > Thanks, > Mark > -- > View this message in context: > http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html > Sent from the Solr - User mailing list archive at Nabble.com. > -- Lance Norskog goks...@gmail.com