On Jul 20, 2009, at 6:43 AM, JCodina wrote:

D: Break things down. The CAS would only produce XML that solr can process. Then different Tokenizers can be used to deal with the data in the CAS. the
main point is that the XML has a the doc and field labels of solr.

I just committed the DelimitedPayloadTokenFilterFactory, I suspect this is along the lines of what you are thinking, but I haven't done all that much with UIMA.

I also suspect the Tee/Sink capabilities of Lucene could be helpful, but they aren't available in Solr yet.


E: The set of capabilities to process the xml is defined in XML, similar to
lucas to define the ouput and in the solr schema to define how this is
processed.


I want to use it in order to index something that is common but I can't get
any tool to do that with sol: indexing a word and coding at the same
position the syntactic and semantic information. I know that in Lucene this is evolving and it will be possible to include metadata but for the moment

What does Lucas do with Lucene? Is it putting multiple tokens at the same position or using Payloads?

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene:
http://www.lucidimagination.com/search

Reply via email to