Hi Sujit,

Thanks for your help! I moved the RoomNumberAnnotator.xml to the top level of 
the jar and used the same solrconfig.xml (with the /). Now it works perfect.

Best regards, Bart


On 11 Feb 2013, at 20:13, SUJIT PAL wrote:

> Hi Bart,
> 
> Like I said, I didn't actually hook my UIMA stuff into Solr, content and 
> queries are annotated before they reach Solr. What you describe sounds like a 
> classpath problem (but of course you already knew that :-)). Since I haven't 
> actually done what you are trying to do, here are some suggestions, they may 
> or may not work...
> 
> 1) package up the XML files into your custom JAR at the top level, that way 
> you don't need to specify it as /RoomNumberAnnotator.xml.
> 2) if you are using solr4, then you should drop your custom JAR into 
> $SOLR_HOME/collection1/lib, not $SOLR_HOME/lib.
> 
> -sujit
> 
> On Feb 11, 2013, at 9:40 AM, jazz wrote:
> 
>> Hi Sujit and others who answered my question,
>> 
>> I have been working on the UIMA path which seems great with the available 
>> Eclipse tooling and this:
>> 
>> http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html
>> 
>> Now I worked through the UIMA tutorial of the RoomNumberAnnotator: 
>> http://uima.apache.org/doc-uima-annotator.html
>> And I am able to test it using the UIMA CAS Virtuall Debugger. So far so 
>> good.
>> 
>> But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot 
>> find the xml file and the Java class (they are in the correct lib 
>> directories, because the WhitespaceTokenizer works fine).
>> 
>> <updateRequestProcessorChain name="uima">
>>     <processor 
>> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
>>       <lst name="uimaConfig">
>>         <lst name="runtimeParameters">
>>         </lst>
>>         <str name="analysisEngine">/RoomNumberAnnotator.xml</str>
>>         <bool name="ignoreErrors">false</bool>
>>         <lst name="analyzeFields">
>>           <bool name="merge">false</bool>
>>           <arr name="fields">
>>             <str>content</str>
>>           </arr>
>>         </lst>
>>         <lst name="fieldMappings">
>>           <lst name="type">
>>             <str name="name">org.apache.uima.tutorial.RoomNumber</str>
>>             <lst name="mapping">
>>               <str name="feature">building</str>
>>               <str name="field">UIMAname</str>
>>             </lst>
>>           </lst>
>>         </lst>
>>       </lst>
>>     </processor>
>>     <processor class="solr.LogUpdateProcessorFactory" />
>>     <processor class="solr.RunUpdateProcessorFactory" />
>> 
>> On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it 
>> fails:
>> Deploy new jars inside one of the lib directories
>> 
>> Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima 
>> path.
>> 
>> Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch 
>> can I checkout? This is the Stable release I am running:
>> 
>> Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36
>> 
>> Regards, Bart
>> 
>> 
>> On 8 Feb 2013, at 22:11, SUJIT PAL wrote:
>> 
>>> Hi Bart,
>>> 
>>> I did some work with UIMA but this was to annotate the data before it goes 
>>> to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked 
>>> through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I 
>>> believe you will have to set up your own aggregate analysis chain in place 
>>> of the one currently configured.
>>> 
>>> Writing UIMA annotators is very simple (there is a tutorial here:  
>>> [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]).
>>>  You provide the XML description for the annotation and let UIMA generate 
>>> the annotation bean. You write Java code for the annotator and also the 
>>> annotator XML descriptor. UIMA uses the annotator XML descriptor to 
>>> instantiate and run your annotator. Overall, sounds really complicated but 
>>> its actually quite simple.
>>> 
>>> The tutorial has quite a few examples that you will find useful, but in 
>>> case you need more, I have some on this github repository:
>>> [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima]
>>> 
>>> The dictionary and pattern annotators may be similar to what you are 
>>> looking for (date and city annotators).
>>> 
>>> Best regards,
>>> Sujit
>>> 
>>> On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote:
>>> 
>>>> Hi Alex,
>>>> 
>>>> Indeed that is exactly what I am trying to achieve using wordcities. Date 
>>>> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how 
>>>> do I integrate the Java library as UIMA? The documentation about changing 
>>>> schema.xml and solr.xml is not very detailed. 
>>>> 
>>>> Regards, Bart
>>>> 
>>>> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
>>>> 
>>>>> Hi Bart,
>>>>> 
>>>>> I haven't done any UIMA work (I used other stuff for my NLP phase), so not
>>>>> sure I can help much further. But in general, you are venturing into pure
>>>>> research territory here.
>>>>> 
>>>>> Even for dates, what do you actually mean? Just fixed expression? Relative
>>>>> dates (e.g. last tuesday?). What about times (7pm?).
>>>>> 
>>>>> Same with cities. If you want it offline, you need the gazetteer and
>>>>> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
>>>>> lot of duplicate names (Paris, Ontario is apparently a short drive from
>>>>> London, Ontario eh?). Something like
>>>>> http://www.maxmind.com/en/worldcities? And disambiguation usually
>>>>> requires training corpus that is similar to
>>>>> what your text will look like.
>>>>> 
>>>>> Online services like OpenCalais are backed by gigantic databases and some
>>>>> serious corpus-training Machine Language disambiguation algorithms.
>>>>> 
>>>>> So, no plug-and-play solution here. If you really need to get this done, I
>>>>> would recommend narrowing down the specification of exactly what you will
>>>>> settle for and looking for software that can do it. Once you have that,
>>>>> integration with Solr is your next - and smaller - concern.
>>>>> 
>>>>> Regards,
>>>>> Alex.
>>>>> 
>>>>> Personal blog: http://blog.outerthoughts.com/
>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>> - Time is the quality of nature that keeps events from happening all at
>>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>>>>> 
>>>>> 
>>>>> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote:
>>>>> 
>>>>>> Thanks Alex,
>>>>>> 
>>>>>> I checked the documentation but it seems there is only a webservice
>>>>>> (OpenCalais) available to extract dates and places.
>>>>>> 
>>>>>> http://uima.apache.org/sandbox.html
>>>>>> 
>>>>>> Do you know is there is a Solr Compatible UIMA add-on which detects dates
>>>>>> and places (cities) without a webservice? If not, how do you write one?
>>>>>> 
>>>>>> Regards, Bart
>>>>>> 
>>>>>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
>>>>>> 
>>>>>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, 
>>>>>>> most
>>>>>>> probably in Update Request Processor pipeline.
>>>>>>> 
>>>>>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
>>>>>>> 
>>>>>>> You will have to put some serious work into this, it is not all tied
>>>>>>> together and packaged. Mostly because the Natural Language Processing
>>>>>> (the
>>>>>>> field you are getting into) is kind of messy all of its own.
>>>>>>> 
>>>>>>> Good luck,
>>>>>>> Alex.
>>>>>>> 
>>>>>>> Personal blog: http://blog.outerthoughts.com/
>>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
>>>>>>> - Time is the quality of nature that keeps events from happening all at
>>>>>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD 
>>>>>>> book)
>>>>>>> 
>>>>>>> 
>>>>>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote:
>>>>>>> 
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I want to know if Solr can analyze text and recoginze dates and places.
>>>>>> If
>>>>>>>> yes, is it then possible to create new dynamic fields with these dates
>>>>>> and
>>>>>>>> places (e.g. city).
>>>>>>>> 
>>>>>>>> Thanks, Bart
>>>>>>>> 
>>>>>> 
>>>>>> 
>>> 
>> 
> 

Reply via email to