Re: Document Processing

Tommaso Teofili Tue, 06 Dec 2011 13:40:32 -0800

Hello Michael,

I can help you with using the UIMA UpdateRequestProcessor [1]; the current
implementation uses in-memory execution of UIMA pipelines but since I was
planning to add the support for higher scalability (with UIMA-AS [2]) that
may help you as well.


Tommaso

[1] :
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/contrib/uima/src/java/org/apache/solr/uima/processor/UIMAUpdateRequestProcessor.java
[2] : http://uima.apache.org/doc-uimaas-what.html

2011/12/5 Michael Kelleher <mj.kelle...@gmail.com>

> Hello Erik,
>
> I will take a look at both:
>
> org.apache.solr.update.**processor.**LangDetectLanguageIdentifierUp**
> dateProcessor
>
> and
>
> org.apache.solr.update.**processor.**TikaLanguageIdentifierUpdatePr**
> ocessor
>
>
> and figure out what I need to extend to handle processing in the way I am
> looking for.  I am assuming that "component" configuration is handled in a
> standard way such that I can configure my new UpdateProcessor in the same
> way I would configure any other UpdateProcessor "component"?
>
> Thanks for the suggestion.
>
>
> 1 more question:  given that I am probably going to convert the HTML to
> XML so I can use XPath expressions to "extract" my content, do you think
> that this kind of processing will overload Solr?  This Solr instance will
> be used solely for indexing, and will only ever have a single ManifoldCF
> crawling job feeding it documents at one time.
>
> --mike
>

Re: Document Processing

Reply via email to