I have some info and examples for the WikipediaTokenizer in my book, but a
tokenizer does not direct tokens to a field. Rather, you would use the
tokenizer in the analyzer for whatever field you wish to store values in.
You could use the same input for multiple fields and then filter the tokens
to keep only some token types.
Besides my book, the best reference is going to be... the source code.
-- Jack Krupansky
-----Original Message-----
From: Ken Krugler
Sent: Thursday, October 03, 2013 9:03 PM
To: solr-user@lucene.apache.org
Subject: WikipediaTokenizer documentation
Hi all,
Where's the documentation on the WikipediaTokenizer?
Specifically I'm wondering how pieces from the source XML get mapped to
field names in the Solr schema.
For example, <revision><timestamp> seems to be going into the "date" field
for an example schema I've got.
And <revision><text> goes into "body".
But is there any way to get <revision><contributor><username>, for example?
Thanks,
-- Ken
--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr