Alexandre,

Unfortunately this is poorly documented and it takes a little trian-and-error 
to figure out what is going on.  I believe the order is this:

1. Get data from the EntityProcessor (in your case, MailEntityProcessor)
2. Run transformers on the data.
3. Run and "post-transform" operations on the data (MailEntityProcessor doesn't 
do this)
4. Add the data to the Solr Document.
5. Run child entities
6. Repeat with next document

You can have transformers on child entities.  However, even if the child 
entity's data is cached, the transformer runs after the data is taken from the 
cache.  This is significant because the parent entity triggers a cache lookup 
based on the key.  But if the key is supposed to be generated by a transformer 
in the child it won't exist yet, so this will fail. 

In short, (if I'm fully correct here!), a parent entity can have its child's 
lookup key generated by a transformer, but the child cannot even if a cache is 
being used.

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311

-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, January 10, 2013 7:29 AM
To: solr-user@lucene.apache.org
Subject: DIH: Transformers and Nested entities - order of execution

Hello,

If I have both nested entities and transformers defined, what is the order
of execution?

As an example, say I have an email with a job offer, which includes JobNo
as part of a field. I want to parse that JobNo with a RegexTransformer and
then use nested entity to import that job's field into the same entity.

This, obviously, relies on being able to create the JobNo field in between
an outer entity and inner entity processing. But I am not sure whether
that's the case.

Does anyone knows of the top of their head what the order is?

Also, can I have Transformers on inner entities? If so, when do they
trigger?

Regards,
  Alex.

Personal blog: http://blog.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)

Reply via email to