Lance did some work on opennlp integration. Check the Wiki. Otis Solr & ElasticSearch Support http://sematext.com/ On Feb 8, 2013 4:12 PM, "SUJIT PAL" <sujit....@comcast.net> wrote:
> Hi Bart, > > I did some work with UIMA but this was to annotate the data before it goes > to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked > through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and > I believe you will have to set up your own aggregate analysis chain in > place of the one currently configured. > > Writing UIMA annotators is very simple (there is a tutorial here: [ > http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]). > You provide the XML description for the annotation and let UIMA generate > the annotation bean. You write Java code for the annotator and also the > annotator XML descriptor. UIMA uses the annotator XML descriptor to > instantiate and run your annotator. Overall, sounds really complicated but > its actually quite simple. > > The tutorial has quite a few examples that you will find useful, but in > case you need more, I have some on this github repository: > [ > https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima > ] > > The dictionary and pattern annotators may be similar to what you are > looking for (date and city annotators). > > Best regards, > Sujit > > On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote: > > > Hi Alex, > > > > Indeed that is exactly what I am trying to achieve using wordcities. > Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But > how do I integrate the Java library as UIMA? The documentation about > changing schema.xml and solr.xml is not very detailed. > > > > Regards, Bart > > > > On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com> > wrote: > > > >> Hi Bart, > >> > >> I haven't done any UIMA work (I used other stuff for my NLP phase), so > not > >> sure I can help much further. But in general, you are venturing into > pure > >> research territory here. > >> > >> Even for dates, what do you actually mean? Just fixed expression? > Relative > >> dates (e.g. last tuesday?). What about times (7pm?). > >> > >> Same with cities. If you want it offline, you need the gazetteer and > >> disambiguation modules. Gazetteer for cities (worldwide) is huge and > has a > >> lot of duplicate names (Paris, Ontario is apparently a short drive from > >> London, Ontario eh?). Something like > >> http://www.maxmind.com/en/worldcities? And disambiguation usually > >> requires training corpus that is similar to > >> what your text will look like. > >> > >> Online services like OpenCalais are backed by gigantic databases and > some > >> serious corpus-training Machine Language disambiguation algorithms. > >> > >> So, no plug-and-play solution here. If you really need to get this > done, I > >> would recommend narrowing down the specification of exactly what you > will > >> settle for and looking for software that can do it. Once you have that, > >> integration with Solr is your next - and smaller - concern. > >> > >> Regards, > >> Alex. > >> > >> Personal blog: http://blog.outerthoughts.com/ > >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > >> - Time is the quality of nature that keeps events from happening all at > >> once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > >> > >> > >> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote: > >> > >>> Thanks Alex, > >>> > >>> I checked the documentation but it seems there is only a webservice > >>> (OpenCalais) available to extract dates and places. > >>> > >>> http://uima.apache.org/sandbox.html > >>> > >>> Do you know is there is a Solr Compatible UIMA add-on which detects > dates > >>> and places (cities) without a webservice? If not, how do you write one? > >>> > >>> Regards, Bart > >>> > >>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote: > >>> > >>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration, > most > >>>> probably in Update Request Processor pipeline. > >>>> > >>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA > >>>> > >>>> You will have to put some serious work into this, it is not all tied > >>>> together and packaged. Mostly because the Natural Language Processing > >>> (the > >>>> field you are getting into) is kind of messy all of its own. > >>>> > >>>> Good luck, > >>>> Alex. > >>>> > >>>> Personal blog: http://blog.outerthoughts.com/ > >>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch > >>>> - Time is the quality of nature that keeps events from happening all > at > >>>> once. Lately, it doesn't seem to be working. (Anonymous - via GTD > book) > >>>> > >>>> > >>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I want to know if Solr can analyze text and recoginze dates and > places. > >>> If > >>>>> yes, is it then possible to create new dynamic fields with these > dates > >>> and > >>>>> places (e.g. city). > >>>>> > >>>>> Thanks, Bart > >>>>> > >>> > >>> > >