Hi Bart,

Like I said, I didn't actually hook my UIMA stuff into Solr, content and 
queries are annotated before they reach Solr. What you describe sounds like a 
classpath problem (but of course you already knew that :-)). Since I haven't 
actually done what you are trying to do, here are some suggestions, they may or 
may not work...

1) package up the XML files into your custom JAR at the top level, that way you 
don't need to specify it as /RoomNumberAnnotator.xml.
2) if you are using solr4, then you should drop your custom JAR into 
$SOLR_HOME/collection1/lib, not $SOLR_HOME/lib.

-sujit

On Feb 11, 2013, at 9:40 AM, jazz wrote:

> Hi Sujit and others who answered my question,
> 
> I have been working on the UIMA path which seems great with the available 
> Eclipse tooling and this:
> 
> http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html
> 
> Now I worked through the UIMA tutorial of the RoomNumberAnnotator: 
> http://uima.apache.org/doc-uima-annotator.html
> And I am able to test it using the UIMA CAS Virtuall Debugger. So far so good.
> 
> But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot 
> find the xml file and the Java class (they are in the correct lib 
> directories, because the WhitespaceTokenizer works fine).
> 
> <updateRequestProcessorChain name="uima">
>      <processor 
> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
>        <lst name="uimaConfig">
>          <lst name="runtimeParameters">
>          </lst>
>          <str name="analysisEngine">/RoomNumberAnnotator.xml</str>
>          <bool name="ignoreErrors">false</bool>
>          <lst name="analyzeFields">
>            <bool name="merge">false</bool>
>            <arr name="fields">
>              <str>content</str>
>            </arr>
>          </lst>
>          <lst name="fieldMappings">
>            <lst name="type">
>              <str name="name">org.apache.uima.tutorial.RoomNumber</str>
>              <lst name="mapping">
>                <str name="feature">building</str>
>                <str name="field">UIMAname</str>
>              </lst>
>            </lst>
>          </lst>
>        </lst>
>      </processor>
>      <processor class="solr.LogUpdateProcessorFactory" />
>      <processor class="solr.RunUpdateProcessorFactory" />
> 
> On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it 
> fails:
> Deploy new jars inside one of the lib directories
> 
> Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path.
> 
> Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch 
> can I checkout? This is the Stable release I am running:
> 
> Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36
> 
> Regards, Bart
> 
> 
> On 8 Feb 2013, at 22:11, SUJIT PAL wrote:
> 
>> Hi Bart,
>> 
>> I did some work with UIMA but this was to annotate the data before it goes 
>> to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked 
>> through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I 
>> believe you will have to set up your own aggregate analysis chain in place 
>> of the one currently configured.
>> 
>> Writing UIMA annotators is very simple (there is a tutorial here:  
>> [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]).
>>  You provide the XML description for the annotation and let UIMA generate 
>> the annotation bean. You write Java code for the annotator and also the 
>> annotator XML descriptor. UIMA uses the annotator XML descriptor to 
>> instantiate and run your annotator. Overall, sounds really complicated but 
>> its actually quite simple.
>> 
>> The tutorial has quite a few examples that you will find useful, but in case 
>> you need more, I have some on this github repository:
>> [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima]
>> 
>> The dictionary and pattern annotators may be similar to what you are looking 
>> for (date and city annotators).
>> 
>> Best regards,
>> Sujit
>> 
>> On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote:
>> 
>>> Hi Alex,
>>> 
>>> Indeed that is exactly what I am trying to achieve using wordcities. Date 
>>> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how 
>>> do I integrate the Java library as UIMA? The documentation about changing 
>>> schema.xml and solr.xml is not very detailed. 
>>> 
>>> Regards, Bart
>>> 
>>> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
>>> 
>>>> Hi Bart,
>>>> 
>>>> I haven't done any UIMA work (I used other stuff for my NLP phase), so not
>>>> sure I can help much further. But in general, you are venturing into pure
>>>> research territory here.
>>>> 
>>>> Even for dates, what do you actually mean? Just fixed expression? Relative
>>>> dates (e.g. last tuesday?). What about times (7pm?).
>>>> 
>>>> Same with cities. If you want it offline, you need the gazetteer and
>>>> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
>>>> lot of duplicate names (Paris, Ontario is apparently a short drive from
>>>> London, Ontario eh?). Something like
>>>> http://www.maxmind.com/en/worldcities? And disambiguation usually
>>>> requires training corpus that is similar to
>>>> what your text will look like.
>>>> 
>>>> Online services like OpenCalais are backed by gigantic databases and some
>>>> serious corpus-training Machine Language disambiguation algorithms.
>>>> 
>>>> So, no plug-and-play solution here. If you really need to get this done, I
>>>> would recommend narrowing down the specification of exactly what you will
>>>> settle for and looking for software that can do it. Once you have that,
>>>> integration with Solr is your next - and smaller - concern.
>>>> 
>>>> Regards,
>>>> Alex.
>>>> 
>>>> Personal blog: http://blog.outerthoughts.com/
>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>> - Time is the quality of nature that keeps events from happening all at
>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>>> 
>>>> 
>>>> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote:
>>>> 
>>>>> Thanks Alex,
>>>>> 
>>>>> I checked the documentation but it seems there is only a webservice
>>>>> (OpenCalais) available to extract dates and places.
>>>>> 
>>>>> http://uima.apache.org/sandbox.html
>>>>> 
>>>>> Do you know is there is a Solr Compatible UIMA add-on which detects dates
>>>>> and places (cities) without a webservice? If not, how do you write one?
>>>>> 
>>>>> Regards, Bart
>>>>> 
>>>>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
>>>>> 
>>>>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
>>>>>> probably in Update Request Processor pipeline.
>>>>>> 
>>>>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
>>>>>> 
>>>>>> You will have to put some serious work into this, it is not all tied
>>>>>> together and packaged. Mostly because the Natural Language Processing
>>>>> (the
>>>>>> field you are getting into) is kind of messy all of its own.
>>>>>> 
>>>>>> Good luck,
>>>>>> Alex.
>>>>>> 
>>>>>> Personal blog: http://blog.outerthoughts.com/
>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>> - Time is the quality of nature that keeps events from happening all at
>>>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>>>>> 
>>>>>> 
>>>>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote:
>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> I want to know if Solr can analyze text and recoginze dates and places.
>>>>> If
>>>>>>> yes, is it then possible to create new dynamic fields with these dates
>>>>> and
>>>>>>> places (e.g. city).
>>>>>>> 
>>>>>>> Thanks, Bart
>>>>>>> 
>>>>> 
>>>>> 
>> 
> 

Reply via email to