Re: WikipediaTokenizer documentation

Jack Krupansky Fri, 04 Oct 2013 08:01:21 -0700

I have some info and examples for the WikipediaTokenizer in my book, but atokenizer does not direct tokens to a field. Rather, you would use thetokenizer in the analyzer for whatever field you wish to store values in.You could use the same input for multiple fields and then filter the tokensto keep only some token types.


Besides my book, the best reference is going to be... the source code.


-- Jack Krupansky

-----Original Message-----From: Ken Krugler

Sent: Thursday, October 03, 2013 9:03 PM
To: solr-user@lucene.apache.org
Subject: WikipediaTokenizer documentation

Hi all,

Where's the documentation on the WikipediaTokenizer?

Specifically I'm wondering how pieces from the source XML get mapped tofield names in the Solr schema.

For example, <revision><timestamp> seems to be going into the "date" fieldfor an example schema I've got.


And <revision><text> goes into "body".

But is there any way to get <revision><contributor><username>, for example?

Thanks,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

Re: WikipediaTokenizer documentation

Reply via email to