Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Furkan KAMACI
dmin UI analysis page? > > You should just see the text tokens plus the URLs for links. > > -- Jack Krupansky > > -Original Message- From: Furkan KAMACI > Sent: Tuesday, July 23, 2013 10:53 AM > To: solr-user@lucene.apache.org > Subject: WikipediaTokenizer for R

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Jack Krupansky
AM To: solr-user@lucene.apache.org Subject: WikipediaTokenizer for Removing Unnecesary Parts Hi; I have indexed wikipedia data with Solr DIH. However when I look data that is indexed at Solr I something like that as well: {| style="text-align: left; width: 50%; table-layout: fixed;"

Re: WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Robert Muir
If you use wikipediatokenizer it will tag different wiki elements with different types (you can see it in the admin UI). so then followup with typetokenfilter to only filter the types you care about, and i think it will do what you want. On Tue, Jul 23, 2013 at 7:53 AM, Furkan KAMACI wrote: > Hi

WikipediaTokenizer for Removing Unnecesary Parts

2013-07-23 Thread Furkan KAMACI
Hi; I have indexed wikipedia data with Solr DIH. However when I look data that is indexed at Solr I something like that as well: {| style="text-align: left; width: 50%; table-layout: fixed;" border="0" |- valign="top" | style="width: 50%"| :*[[Ubuntu]] :*[[Fedora]] :*[[Mandriva]] :*[[Linux Mint]]