Lance did some work on opennlp integration.  Check the Wiki.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Feb 8, 2013 4:12 PM, "SUJIT PAL" <sujit....@comcast.net> wrote:

> Hi Bart,
>
> I did some work with UIMA but this was to annotate the data before it goes
> to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked
> through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and
> I believe you will have to set up your own aggregate analysis chain in
> place of the one currently configured.
>
> Writing UIMA annotators is very simple (there is a tutorial here:  [
> http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]).
> You provide the XML description for the annotation and let UIMA generate
> the annotation bean. You write Java code for the annotator and also the
> annotator XML descriptor. UIMA uses the annotator XML descriptor to
> instantiate and run your annotator. Overall, sounds really complicated but
> its actually quite simple.
>
> The tutorial has quite a few examples that you will find useful, but in
> case you need more, I have some on this github repository:
> [
> https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima
> ]
>
> The dictionary and pattern annotators may be similar to what you are
> looking for (date and city annotators).
>
> Best regards,
> Sujit
>
> On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote:
>
> > Hi Alex,
> >
> > Indeed that is exactly what I am trying to achieve using wordcities.
> Date will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But
> how do I integrate the Java library as UIMA? The documentation about
> changing schema.xml and solr.xml is not very detailed.
> >
> > Regards, Bart
> >
> > On 8 Feb 2013, at 16:57, Alexandre Rafalovitch <arafa...@gmail.com>
> wrote:
> >
> >> Hi Bart,
> >>
> >> I haven't done any UIMA work (I used other stuff for my NLP phase), so
> not
> >> sure I can help much further. But in general, you are venturing into
> pure
> >> research territory here.
> >>
> >> Even for dates, what do you actually mean? Just fixed expression?
> Relative
> >> dates (e.g. last tuesday?). What about times (7pm?).
> >>
> >> Same with cities. If you want it offline, you need the gazetteer and
> >> disambiguation modules. Gazetteer for cities (worldwide) is huge and
> has a
> >> lot of duplicate names (Paris, Ontario is apparently a short drive from
> >> London, Ontario eh?). Something like
> >> http://www.maxmind.com/en/worldcities? And disambiguation usually
> >> requires training corpus that is similar to
> >> what your text will look like.
> >>
> >> Online services like OpenCalais are backed by gigantic databases and
> some
> >> serious corpus-training Machine Language disambiguation algorithms.
> >>
> >> So, no plug-and-play solution here. If you really need to get this
> done, I
> >> would recommend narrowing down the specification of exactly what you
> will
> >> settle for and looking for software that can do it. Once you have that,
> >> integration with Solr is your next - and smaller - concern.
> >>
> >> Regards,
> >>  Alex.
> >>
> >> Personal blog: http://blog.outerthoughts.com/
> >> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >> - Time is the quality of nature that keeps events from happening all at
> >> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> >>
> >>
> >> On Fri, Feb 8, 2013 at 10:41 AM, jazz <jazzsa...@me.com> wrote:
> >>
> >>> Thanks Alex,
> >>>
> >>> I checked the documentation but it seems there is only a webservice
> >>> (OpenCalais) available to extract dates and places.
> >>>
> >>> http://uima.apache.org/sandbox.html
> >>>
> >>> Do you know is there is a Solr Compatible UIMA add-on which detects
> dates
> >>> and places (cities) without a webservice? If not, how do you write one?
> >>>
> >>> Regards, Bart
> >>>
> >>> On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:
> >>>
> >>>> Yes, it is possible. You are looking at UIMA or OpenNLP integration,
> most
> >>>> probably in Update Request Processor pipeline.
> >>>>
> >>>> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
> >>>>
> >>>> You will have to put some serious work into this, it is not all tied
> >>>> together and packaged. Mostly because the Natural Language Processing
> >>> (the
> >>>> field you are getting into) is kind of messy all of its own.
> >>>>
> >>>> Good luck,
> >>>>  Alex.
> >>>>
> >>>> Personal blog: http://blog.outerthoughts.com/
> >>>> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> >>>> - Time is the quality of nature that keeps events from happening all
> at
> >>>> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
> book)
> >>>>
> >>>>
> >>>> On Fri, Feb 8, 2013 at 9:24 AM, jazz <jazzsa...@me.com> wrote:
> >>>>
> >>>>> Hi,
> >>>>>
> >>>>> I want to know if Solr can analyze text and recoginze dates and
> places.
> >>> If
> >>>>> yes, is it then possible to create new dynamic fields with these
> dates
> >>> and
> >>>>> places (e.g. city).
> >>>>>
> >>>>> Thanks, Bart
> >>>>>
> >>>
> >>>
>
>

Reply via email to