No, sorry maybe my explanation was just too abstract. What I was suggesting is an alternative way of extracting language based on stopwords dictionaries (using one DictionaryAnnotator instance for each language) and a custom Annotator to evaluate which dictionary collected more hits. In general extracting language with UIMA without having an internet connection can be done in various ways, if you need help on this however it may be better asking about it on UIMA mailing list ( [email protected] ). Another option for language identification task which does not use UIMA but exploits Tika capabilities is being discussed/developed on https://issues.apache.org/jira/browse/SOLR-1979 Hope this helps, Tommaso
2011/7/4 PacoPeralta <[email protected]> > > > Sorry for my insistence... > If I have configured into the uima_config in the solrconfig.xml: > > <lst name="type"> > <str > name="name">org.apache.uima.alchemy.ts.language.LanguageFS</str> > <lst name="mapping"> > <str name="feature">language</str> > <str name="field">language</str> > </lst> > </lst> > > <lst name="type"> > <str name="name">org.apache.uima.DictionaryEntry</str> > <lst name="mapping"> > <str name="feature">coveredText</str> > <str name="field">tag</str> > </lst> > </lst> > > And I follow the steps that you listed, Could I extract language and > dictionary entries form the indexed documents? > > Excuse my ignorance... > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/UIMA-without-API-key-tp3135299p3137478.html > Sent from the Lucene - General mailing list archive at Nabble.com. >
