I suggest you to look at here: http://www.javadocexamples.com/java_source/org/apache/lucene/wikipedia/analysis/WikipediaTokenizerTest.java.html
2013/10/4 Ken Krugler <kkrugler_li...@transpac.com> > Hi all, > > Where's the documentation on the WikipediaTokenizer? > > Specifically I'm wondering how pieces from the source XML get mapped to > field names in the Solr schema. > > For example, <revision><timestamp> seems to be going into the "date" field > for an example schema I've got. > > And <revision><text> goes into "body". > > But is there any way to get <revision><contributor><username>, for example? > > Thanks, > > -- Ken > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > > > >