Hi;

I have indexed wikipedia data with Solr DIH. However when I look data that
is indexed at Solr I something like that as well:

{| style="text-align: left; width: 50%; table-layout: fixed;" border="0"
|- valign="top"
| style="width: 50%"|
:*[[Ubuntu]]
:*[[Fedora]]
:*[[Mandriva]]
:*[[Linux Mint]]
:*[[Debian]]
:*[[OpenSUSE]]
|
*[[Red Hat]]
*[[Mageia]]
*[[Arch Linux]]
*[[PCLinuxOS]]
*[[Slackware]]
|}

However I want to remove them before indexing. I know that there is a
WikipediaTokenizer in Lucene but how can I remove unnecessary parts ( as
like links, style, etc..) with Solr?

Reply via email to