Re: Indexing Wikipedia with Solr/Lucene

András Bártházi Sun, 13 May 2012 12:01:18 -0700

Hi,

Using the 
RegexTransformer<http://wiki.apache.org/solr/DataImportHandler#RegexTransformer>?
I guess you can make a regular expression for the wikipedia text field to
extract category and external links.


Bye,
  Andras

2012/5/13 vineet yadav <vineet.yadav.i...@gmail.com>

> Hi all,
> I want to create Lucene/Solr index of wikipedia xml dump. I used Solr
> example(
> http://wiki.apache.org/solr/DataImportHandler#Example:_Indexing_wikipedia)
> to index wikipedia xml dump. Since in wikipedia, Category and external
> links are part of wikipedia text, I am not able to index category and
> external links separately.     I want to index  Category, Externals
> links etc separately and store them in separate fields.
> Would anyone please be kind enough to give me a bit of advice?
> Thanks
> Vineet Yadav
>

Re: Indexing Wikipedia with Solr/Lucene

Reply via email to