Can Solr analyze content and find dates and places

2013-02-08 Thread jazz
Hi,

I want to know if Solr can analyze text and recoginze dates and places. If yes, 
is it then possible to create new dynamic fields with these dates and places 
(e.g. city).

Thanks, Bart


Re: Can Solr analyze content and find dates and places

2013-02-08 Thread jazz
Thanks Alex,

I checked the documentation but it seems there is only a webservice 
(OpenCalais) available to extract dates and places.

http://uima.apache.org/sandbox.html

Do you know is there is a Solr Compatible UIMA add-on which detects dates and 
places (cities) without a webservice? If not, how do you write one?

Regards, Bart

On 8 Feb 2013, at 15:29, Alexandre Rafalovitch wrote:

> Yes, it is possible. You are looking at UIMA or OpenNLP integration, most
> probably in Update Request Processor pipeline.
> 
> Have a look here as a start: https://wiki.apache.org/solr/SolrUIMA
> 
> You will have to put some serious work into this, it is not all tied
> together and packaged. Mostly because the Natural Language Processing (the
> field you are getting into) is kind of messy all of its own.
> 
> Good luck,
>Alex.
> 
> Personal blog: http://blog.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
> 
> 
> On Fri, Feb 8, 2013 at 9:24 AM, jazz  wrote:
> 
>> Hi,
>> 
>> I want to know if Solr can analyze text and recoginze dates and places. If
>> yes, is it then possible to create new dynamic fields with these dates and
>> places (e.g. city).
>> 
>> Thanks, Bart
>> 



Re: Can Solr analyze content and find dates and places

2013-02-11 Thread jazz
Hi Sujit and others who answered my question,

I have been working on the UIMA path which seems great with the available 
Eclipse tooling and this:

http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html

Now I worked through the UIMA tutorial of the RoomNumberAnnotator: 
http://uima.apache.org/doc-uima-annotator.html
And I am able to test it using the UIMA CAS Virtuall Debugger. So far so good.

But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot 
find the xml file and the Java class (they are in the correct lib directories, 
because the WhitespaceTokenizer works fine).

 
  

  
  
  /RoomNumberAnnotator.xml
  false
  
false

  content

  
  

  org.apache.uima.tutorial.RoomNumber
  
building
UIMAname
  

  

  
  
  
 
On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it 
fails:
Deploy new jars inside one of the lib directories

Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima path.

Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch 
can I checkout? This is the Stable release I am running:

Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36

Regards, Bart


On 8 Feb 2013, at 22:11, SUJIT PAL wrote:

> Hi Bart,
> 
> I did some work with UIMA but this was to annotate the data before it goes to 
> Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked through 
> the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I believe 
> you will have to set up your own aggregate analysis chain in place of the one 
> currently configured.
> 
> Writing UIMA annotators is very simple (there is a tutorial here:  
> [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]).
>  You provide the XML description for the annotation and let UIMA generate the 
> annotation bean. You write Java code for the annotator and also the annotator 
> XML descriptor. UIMA uses the annotator XML descriptor to instantiate and run 
> your annotator. Overall, sounds really complicated but its actually quite 
> simple.
> 
> The tutorial has quite a few examples that you will find useful, but in case 
> you need more, I have some on this github repository:
> [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima]
> 
> The dictionary and pattern annotators may be similar to what you are looking 
> for (date and city annotators).
> 
> Best regards,
> Sujit
> 
> On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote:
> 
>> Hi Alex,
>> 
>> Indeed that is exactly what I am trying to achieve using wordcities. Date 
>> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how 
>> do I integrate the Java library as UIMA? The documentation about changing 
>> schema.xml and solr.xml is not very detailed. 
>> 
>> Regards, Bart
>> 
>> On 8 Feb 2013, at 16:57, Alexandre Rafalovitch  wrote:
>> 
>>> Hi Bart,
>>> 
>>> I haven't done any UIMA work (I used other stuff for my NLP phase), so not
>>> sure I can help much further. But in general, you are venturing into pure
>>> research territory here.
>>> 
>>> Even for dates, what do you actually mean? Just fixed expression? Relative
>>> dates (e.g. last tuesday?). What about times (7pm?).
>>> 
>>> Same with cities. If you want it offline, you need the gazetteer and
>>> disambiguation modules. Gazetteer for cities (worldwide) is huge and has a
>>> lot of duplicate names (Paris, Ontario is apparently a short drive from
>>> London, Ontario eh?). Something like
>>> http://www.maxmind.com/en/worldcities? And disambiguation usually
>>> requires training corpus that is similar to
>>> what your text will look like.
>>> 
>>> Online services like OpenCalais are backed by gigantic databases and some
>>> serious corpus-training Machine Language disambiguation algorithms.
>>> 
>>> So, no plug-and-play solution here. If you really need to get this done, I
>>> would recommend narrowing down the specification of exactly what you will
>>> settle for and looking for software that can do it. Once you have that,
>>> integration with Solr is your next - and smaller - concern.
>>> 
>>> Regards,
>>> Alex.
>>> 
>>> Personal blog: http://blog.outerthoughts.com/
>>> LinkedIn: http://www.linkedin.com/in/alexandre

Re: Can Solr analyze content and find dates and places

2013-02-11 Thread jazz
Hi Sujit,

Thanks for your help! I moved the RoomNumberAnnotator.xml to the top level of 
the jar and used the same solrconfig.xml (with the /). Now it works perfect.

Best regards, Bart


On 11 Feb 2013, at 20:13, SUJIT PAL wrote:

> Hi Bart,
> 
> Like I said, I didn't actually hook my UIMA stuff into Solr, content and 
> queries are annotated before they reach Solr. What you describe sounds like a 
> classpath problem (but of course you already knew that :-)). Since I haven't 
> actually done what you are trying to do, here are some suggestions, they may 
> or may not work...
> 
> 1) package up the XML files into your custom JAR at the top level, that way 
> you don't need to specify it as /RoomNumberAnnotator.xml.
> 2) if you are using solr4, then you should drop your custom JAR into 
> $SOLR_HOME/collection1/lib, not $SOLR_HOME/lib.
> 
> -sujit
> 
> On Feb 11, 2013, at 9:40 AM, jazz wrote:
> 
>> Hi Sujit and others who answered my question,
>> 
>> I have been working on the UIMA path which seems great with the available 
>> Eclipse tooling and this:
>> 
>> http://sujitpal.blogspot.nl/2011/03/smart-query-parsing-with-uima.html
>> 
>> Now I worked through the UIMA tutorial of the RoomNumberAnnotator: 
>> http://uima.apache.org/doc-uima-annotator.html
>> And I am able to test it using the UIMA CAS Virtuall Debugger. So far so 
>> good.
>> 
>> But, now I want to use the new RoomNumberAnnotator with Solr, but it cannot 
>> find the xml file and the Java class (they are in the correct lib 
>> directories, because the WhitespaceTokenizer works fine).
>> 
>> 
>> > class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
>>   
>> 
>> 
>> /RoomNumberAnnotator.xml
>> false
>> 
>>   false
>>   
>> content
>>   
>> 
>> 
>>   
>> org.apache.uima.tutorial.RoomNumber
>> 
>>   building
>>   UIMAname
>> 
>>   
>> 
>>   
>> 
>> 
>> 
>> 
>> On the Wiki (http://wiki.apache.org/solr/SolrUIMA) this is mentioned but it 
>> fails:
>> Deploy new jars inside one of the lib directories
>> 
>> Run 'ant clean dist' (or 'mvn clean package') from the solr/contrib/uima 
>> path.
>> 
>> Is it needed to deploy the new jar (RoomAnnotator.jar)? If yes, which branch 
>> can I checkout? This is the Stable release I am running:
>> 
>> Solr 4.1.0 1434440 - sarowe - 2013-01-16 17:21:36
>> 
>> Regards, Bart
>> 
>> 
>> On 8 Feb 2013, at 22:11, SUJIT PAL wrote:
>> 
>>> Hi Bart,
>>> 
>>> I did some work with UIMA but this was to annotate the data before it goes 
>>> to Lucene/Solr, ie not built as a UpdateRequestProcessor. I just looked 
>>> through the SolrUima wiki page [http://wiki.apache.org/solr/SolrUIMA] and I 
>>> believe you will have to set up your own aggregate analysis chain in place 
>>> of the one currently configured.
>>> 
>>> Writing UIMA annotators is very simple (there is a tutorial here:  
>>> [http://uima.apache.org/downloads/releaseDocs/2.1.0-incubating/docs/html/tutorials_and_users_guides/tutorials_and_users_guides.html]).
>>>  You provide the XML description for the annotation and let UIMA generate 
>>> the annotation bean. You write Java code for the annotator and also the 
>>> annotator XML descriptor. UIMA uses the annotator XML descriptor to 
>>> instantiate and run your annotator. Overall, sounds really complicated but 
>>> its actually quite simple.
>>> 
>>> The tutorial has quite a few examples that you will find useful, but in 
>>> case you need more, I have some on this github repository:
>>> [https://github.com/sujitpal/tgni/tree/master/src/main/java/com/mycompany/tgni/analysis/uima]
>>> 
>>> The dictionary and pattern annotators may be similar to what you are 
>>> looking for (date and city annotators).
>>> 
>>> Best regards,
>>> Sujit
>>> 
>>> On Feb 8, 2013, at 8:50 AM, Bart Rijpers wrote:
>>> 
>>>> Hi Alex,
>>>> 
>>>> Indeed that is exactly what I am trying to achieve using wordcities. Date 
>>>> will be simple: 16-Jan becomes 16-Jan-2013 in a new dynamic field. But how 
>>>> do I integrate the Java library as UIMA? The documentation about

Solr UIMA

2013-02-20 Thread jazz
Hi,

I managed to get Solr and UIMA work together. When I send a document to Solr it 
annotates the field "contents" and adds the result of the UIMA annotations to 
e.g. a field "location". My question is: how do I annotate the contents of an 
already existing solr database without triggering an /update ? My UIMA 
processor defaults for an /update command. 
I was thinking about exporting the contents and re-importing it but that seems 
too complex using the DIH. Is there a smarter way?

Regards Bart

xml output question

2013-03-12 Thread jazz
Hi,

I am having trouble with XML output:

localhost:8983/solr/collection1/select?q=*.*?wt=xml

It is possible to use this schema format in Solr?


  

  
So, extending a field with XML childs ext and last? Or it it possible to 
reformat the XML output with and XSLT processor such as Saxon 
(http://wiki.apache.org/solr/XsltResponseWriter)?

Regards,

Bart

Re: xml output question

2013-03-12 Thread jazz
Hi Michael,

Thans for the reply. My question is how to make child XML elements such as:

>

These can be converted using Saxon.

Regards Bart

On 12 Mar 2013, at 20:18, Michael Della Bitta 
 wrote:

>


Auto Suggest

2010-09-01 Thread Jazz Globe

Hallo

How would one implement a multiple term auto-suggest feature in Solr that is 
filter sensitive?
For example, a user enters :
"mp3"
  and solr might suggest:
  ->   "mp3 player"
  ->   "mp3 nano"
  ->   "mp3 sony"
and then the user starts the second word :
"mp3 n"
and that narrows it down to:
  -> "mp3 nano"

I had a quick look at the Terms Component.
I suppose it just returns term totals for the entire index and cannot be used 
with a filter or query?

Thanks
Johan