Hi Dave, Sorry for the delayed reply. Did you end up trying the (scary) caching idea?
Yeah, there's no reasonable way today to access data from other fields from the document in the analyzers. Creating an update request processor which pulls the data prior to the field-by-field analysis and injects it (in some format) into the field that needs the data pulled from other fields is how to do this today. In my examples, I only inserted a prefix prior to the entire field (i.e. en,es|hables espanol is what she asks), but if you need something more complicated to identify specific sections of the field to use different analyzers then you could pull that off, as well. For example: <field name="multilingual_field">[langs="en"]hello world [langs="en,es"]hables espanol is what she asks.[ autodetectOtherLangs="true" fallbackLangs="en"]some unknown language text for identification</field> Then, you would just have the analyzer for the field parse the content, pass each chunk of text into the appropriate analyzer, and then modify the term positions and offsets as necessary. My example in chapter 14 of Solr in Action assumed you would be using the same languages throughout the whole field, but it would just require a little bit of pre-parsing work to direct the use of specific analyers only for specific parts of the content. Frankly, I'm not sure pulling the data from another field (particularly if you want different sections processed with different languages) is going to be much simpler than putting it all into the field to be analyzed to begin with (or better yet having an update request processor do it for you - including the detection of language boundaries - inside of Solr so the customer doesn't have to worry about it). -Trey On Tue, Oct 29, 2013 at 12:18 PM, davetroiano <dtroi...@basistech.com>wrote: > Hi Trey, > > I was reading v9 of the Solr in Action MEAP but browsing your github repo, > so I think I'm looking at the latest stuff. > > Agreed that the thread caching idea is dangerous. Perhaps it would work > now, but it could easily break in a later version of Solr. > > I didn't mention another reason why I'd like to analyze based on other > field > values, which is that I'd like the ability to run analyzers on sub-sections > of the MultiTextField. e.g., given a multilingual document, run my > text_english analyzer on the first half of a document and my text_french > analyzer on the second half. Of course, I could extend the prepend > approach > to take start and end offsets (e.g., <field > name="myField">[en_0_1000,fr_1001_2500|]blah, blah, ...</field>), but if it > were possible I'd rather grab that data from another field and simplify the > tokenizer (in terms of the string manipulation and having to adjust > position > offsets to ignore the prepended data... though you've already done the > tricky part). > > Based on what I'm seeing on the message boards and JIRA (e.g., SOLR-1536 / > SOLR-1327 not being fixed), it seems like there isn't a clean way to run > analyzers dynamically based on data in other field(s). If I end up trying > the caching idea, I'll report my findings here. > > Thanks, > Dave > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Single-multilingual-field-analyzed-based-on-other-field-values-tp4098141p4098242.html > Sent from the Solr - User mailing list archive at Nabble.com. >