It is useful for parsing PDFs on a multi-processor machine. Also, if a
sub-entity does an outbound I/O call to a database, a file, or another
SOLR (SOLR-1499).

Anything where the pipeline time outweighs disk i/o time.

Threading happens on a per-document level- there is no concurrent
access inside a document pipeline.

There is a bug which causes Entityprocessor that look up attributes to
throw an exception. This make Tika unusable inside a thread. Two other
EPs also won't work, but I did not test them.

https://issues.apache.org/jira/browse/SOLR-2186

On Mon, Nov 1, 2010 at 10:43 AM, Dyer, James <james.d...@ingrambook.com> wrote:
> Mark,
>
> I have the same question so I did a little research on this.  Not a complete 
> answer but here is what I've found:
>
> - "threads" was aded with SOLR-1352 
> (https://issues.apache.org/jira/browse/SOLR-1352).
>
> - Also see 
> http://www.lucidimagination.com/search/document/a9b26ade46466ee/queries_regarding_a_paralleldataimporthandler
>  for background info.
>
> - Only available in 3.x and trunk.  Committed on 1/12/2010 by Noble Paul (who 
> surely can tell you more accurate info than I can).
>
> - Seems like when using, each thread will call "nextRow" on your root entity 
> datasource in parallel.
>
> - Not sure this will help with child entities (ie. I had hoped I could get it 
> to build child caches in parallel but I don't think this is the case).
>
> - A doc comment on ThreadedEntityProcessorWrapper indicates this will help 
> speed up running transformers becauses they'd be in parallel.  This would 
> make sense if maybe your database can only pull back so fast, but then you 
> have an intensive transformer.  Maybe adding a thread would make your 
> processing no slower than the db...
>
> James Dyer
> E-Commerce Systems
> Ingram Content Group
> (615) 213-4311
>
>
> -----Original Message-----
> From: markwaddle [mailto:m...@markwaddle.com]
> Sent: Tuesday, October 26, 2010 2:25 PM
> To: solr-user@lucene.apache.org
> Subject: How does DIH multithreading work?
>
>
> I understand that the thread count is specified on root entities only. Does
> it spawn multiple threads per root entity? Or multiple threads per
> descendant entity? Can someone give an example of how you would make a
> database query in an entity with 4 threads that would select 1 row per
> thread?
>
> Thanks,
> Mark
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/How-does-DIH-multithreading-work-tp1776111p1776111.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Lance Norskog
goks...@gmail.com

Reply via email to