Hi; I have indexed wikipedia data with Solr DIH. However when I look data that is indexed at Solr I something like that as well:
{| style="text-align: left; width: 50%; table-layout: fixed;" border="0" |- valign="top" | style="width: 50%"| :*[[Ubuntu]] :*[[Fedora]] :*[[Mandriva]] :*[[Linux Mint]] :*[[Debian]] :*[[OpenSUSE]] | *[[Red Hat]] *[[Mageia]] *[[Arch Linux]] *[[PCLinuxOS]] *[[Slackware]] |} However I want to remove them before indexing. I know that there is a WikipediaTokenizer in Lucene but how can I remove unnecessary parts ( as like links, style, etc..) with Solr?