I have some info and examples for the WikipediaTokenizer in my book, but a tokenizer does not direct tokens to a field. Rather, you would use the tokenizer in the analyzer for whatever field you wish to store values in. You could use the same input for multiple fields and then filter the tokens to keep only some token types.

Besides my book, the best reference is going to be... the source code.

-- Jack Krupansky

-----Original Message----- From: Ken Krugler
Sent: Thursday, October 03, 2013 9:03 PM
To: solr-user@lucene.apache.org
Subject: WikipediaTokenizer documentation

Hi all,

Where's the documentation on the WikipediaTokenizer?

Specifically I'm wondering how pieces from the source XML get mapped to field names in the Solr schema.

For example, <revision><timestamp> seems to be going into the "date" field for an example schema I've got.

And <revision><text> goes into "body".

But is there any way to get <revision><contributor><username>, for example?

Thanks,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to