Hi, Using the RegexTransformer<http://wiki.apache.org/solr/DataImportHandler#RegexTransformer>? I guess you can make a regular expression for the wikipedia text field to extract category and external links.
Bye, Andras 2012/5/13 vineet yadav <vineet.yadav.i...@gmail.com> > Hi all, > I want to create Lucene/Solr index of wikipedia xml dump. I used Solr > example( > http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia) > to index wikipedia xml dump. Since in wikipedia, Category and external > links are part of wikipedia text, I am not able to index category and > external links separately. I want to index Category, Externals > links etc separately and store them in separate fields. > Would anyone please be kind enough to give me a bit of advice? > Thanks > Vineet Yadav >