> 1- How can I put the file extension into my index? I'm using Nutch to
> crawling web pages and sending Nutch's data to Solr for indexing. and I
> have no idea to put the file extension to my index.

To get the file extension in a separate field you can copyField the url and 
use Solr's char pattern replace filter to strip away everything up to the last 
dot, if there is any.

> 2- please give me some help links about mime type. I'm new to Solr and
> don't know anything about mime type. please note that I should index data
> of Nutch and I couldn't find useful commands in Nutch tutorial for
> advanced indexing! thank you very much

Use Nutch' index-more plugin. It'll by default add two or three values to a 
multi valued field (type); both sub-types and the complete mime-type of i'm 
not mistaken. There's a configuration directive to have it only index the 
complete mime-type.

> 
> On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT 
<jay.jae...@dot.wi.gov>wrote:
> > Some possibilities:
> > 
> > 1) Put the file extension into your index (that is what we did when we
> > were testing indexing documents with Solr)
> > 2) Put a mime type for the document into your index.
> > 3) Put the whole file name / URL into your index, and match on part of
> > the name.  This will give some false positives.
> > 
> > JRJ
> > 
> > -----Original Message-----
> > From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com]
> > Sent: Monday, September 12, 2011 5:58 AM
> > To: solr-user@lucene.apache.org
> > Subject: Fwd: How to serach on specific file types ?
> > 
> > Hello
> > I want to search on articles. So need to find only specific files like
> > doc, docx, and pdf.
> > I don't need any html pages. Thus the result of our search should only
> > consists of doc, docx, and pdf files.
> > can you help me?

Reply via email to