Hi Sujit, Thanks for your help! I moved the RoomNumberAnnotator.xml to the top level of the jar and used the same solrconfig.xml (with the /). Now it works perfect.
Best regards, Bart On 11 Feb 2013, at 20:13, SUJIT PAL wrote: > Hi Bart, > > Like I said, I didn't actually hook my UIMA stuff into Solr, content and > queries are annotated before they reach Solr. What you describe sounds like a > classpath problem (but of course you already knew that :-)). Since I haven't > actually done what you are trying to do, here are some suggestions, they may > or may not work... > > 1) package up the XML files into your custom JAR at the top level, that way > you don't need to specify it as /RoomNumberAnnotator.xml. > 2) if you are using solr4, then you should drop your custom JAR into > $SOLR_HOME/collection1/lib, not $SOLR_HOME/lib. > > -sujit > > On Feb 11, 2013, at 9:40 AM, jazz wrote: > >> Hi Sujit and others who answered my question, >> >> I have been working on the UIMA path which seems great with the available >> Eclipse tooling and this: >> >> http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html >> >> Now I worked through the UIMA tutorial of the RoomNumberAnnotator: >> http://uima.apache.org/doc-uima-annotator.html >> And I am able to test it using the UIMA CAS Virtuall Debugger. So far so >> good. >> >> But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot >> find the xml file and the Java class (they are in the correct lib >> directories, because the WhitespaceTokenizer works fine). >> >> <updateRequestProcessorChain name="uima"> >> <processor >> class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory"> >> <lst name="uimaConfig"> >> <lst name="runtimeParameters"> >> </lst> >> <str name="analysisEngine">/RoomNumberAnnotator.xml</str> >> <bool name="ignoreErrors">false</bool> >> <lst name="analyzeFields"> >> <bool name="merge">false</bool> >> <arr name="fields"> >> <str>content</str> >> </arr> >> </lst> >> <lst name="fieldMappings"> >> <lst name="type"> >> <str name="name">org.apache.uima.tutorial.RoomNumber</str> >> <lst name="mapping"> >> <str name="feature">building</str> >> <str name="field">UIMAname</str> >> </lst> >> </lst> >> </lst> >> </lst> >> </processor> >> <processor class="solr.LogUpdateProcessorFactory" /> >> <processor class="solr.RunUpdateProcessorFactory" /> >> >> On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it >> fails: >> Deploy new jars inside one of the lib directories >> >> Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima >> path. >> >> Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch >> can I checkout? This is the Stable release I am running: >> >> Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36 >> >> Regards, Bart >> >> >> On 8 Feb 2013, at 22:11, SUJIT PAL wrote: >> >>> Hi Bart, >>> >>> I did some work with UIMA but this was to annotate the data before it goes >>> to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked >>> through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I >>> believe you will have to set up your own aggregate analysis chain in place >>> of the one currently configured. >>> >>> Writing UIMA annotators is very simple (there is a tutorial here: >>> [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). >>> You provide the XML description for the annotation and let UIMA generate >>> the annotation bean. You write Java code for the annotator and also the >>> annotator XML descriptor. UIMA uses the annotator XML descriptor to >>> instantiate and run your annotator. Overall, sounds really complicated but >>> its actually quite simple. >>> >>> The tutorial has quite a few examples that you will find useful, but in >>> case you need more, I have some on this github repository: >>> [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima] >>> >>> The dictionary and pattern annotators may be similar to what you are >>> looking for (date and city annotators). >>> >>> Best regards, >>> Sujit >>> >>> On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: >>> >>>> Hi Alex, >>>> >>>> Indeed that is exactly what I am trying to achieve using wordcities. Date >>>> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how >>>> do I integrate the Java library as UIMA? The documentation about changing >>>> schema.xml and solr.xml is not very detailed. >>>> >>>> Regards, Bart >>>> >>>> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> wrote: >>>> >>>>> Hi Bart, >>>>> >>>>> I haven't done any UIMA work (I used other stuff for my NLP phase), so not >>>>> sure I can help much further. But in general, you are venturing into pure >>>>> research territory here. >>>>> >>>>> Even for dates, what do you actually mean? Just fixed expression? Relative >>>>> dates (e.g. last tuesday?). What about times (7pm?). >>>>> >>>>> Same with cities. If you want it offline, you need the gazetteer and >>>>> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a >>>>> lot of duplicate names (Paris, Ontario is apparently a short drive from >>>>> London, Ontario eh?). Something like >>>>> http://www.maxmind.com/en/worldcities? And disambiguation usually >>>>> requires training corpus that is similar to >>>>> what your text will look like. >>>>> >>>>> Online services like OpenCalais are backed by gigantic databases and some >>>>> serious corpus-training Machine Language disambiguation algorithms. >>>>> >>>>> So, no plug-and-play solution here. If you really need to get this done, I >>>>> would recommend narrowing down the specification of exactly what you will >>>>> settle for and looking for software that can do it. Once you have that, >>>>> integration with Solr is your next - and smaller - concern. >>>>> >>>>> Regards, >>>>> Alex. >>>>> >>>>> Personal blog: http://blog.outerthoughts.com/ >>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>>>> - Time is the quality of nature that keeps events from happening all at >>>>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) >>>>> >>>>> >>>>> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote: >>>>> >>>>>> Thanks Alex, >>>>>> >>>>>> I checked the documentation but it seems there is only a webservice >>>>>> (OpenCalais) available to extract dates and places. >>>>>> >>>>>> http://uima.apache.org/sandbox.html >>>>>> >>>>>> Do you know is there is a Solr Compatible UIMA add-on which detects dates >>>>>> and places (cities) without a webservice? If not, how do you write one? >>>>>> >>>>>> Regards, Bart >>>>>> >>>>>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: >>>>>> >>>>>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, >>>>>>> most >>>>>>> probably in Update Request Processor pipeline. >>>>>>> >>>>>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA >>>>>>> >>>>>>> You will have to put some serious work into this, it is not all tied >>>>>>> together and packaged. Mostly because the Natural Language Processing >>>>>> (the >>>>>>> field you are getting into) is kind of messy all of its own. >>>>>>> >>>>>>> Good luck, >>>>>>> Alex. >>>>>>> >>>>>>> Personal blog: http://blog.outerthoughts.com/ >>>>>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch >>>>>>> - Time is the quality of nature that keeps events from happening all at >>>>>>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD >>>>>>> book) >>>>>>> >>>>>>> >>>>>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I want to know if Solr can analyze text and recoginze dates and places. >>>>>> If >>>>>>>> yes, is it then possible to create new dynamic fields with these dates >>>>>> and >>>>>>>> places (e.g. city). >>>>>>>> >>>>>>>> Thanks, Bart >>>>>>>> >>>>>> >>>>>> >>> >> >