> 1- How can I put the file extension into my index? I'm using Nutch to > crawling web pages and sending Nutch's data to Solr for indexing. and I > have no idea to put the file extension to my index.
To get the file extension in a separate field you can copyField the url and use Solr's char pattern replace filter to strip away everything up to the last dot, if there is any. > 2- please give me some help links about mime type. I'm new to Solr and > don't know anything about mime type. please note that I should index data > of Nutch and I couldn't find useful commands in Nutch tutorial for > advanced indexing! thank you very much Use Nutch' index-more plugin. It'll by default add two or three values to a multi valued field (type); both sub-types and the complete mime-type of i'm not mistaken. There's a configuration directive to have it only index the complete mime-type. > > On Mon, Sep 12, 2011 at 6:07 PM, Jaeger, Jay - DOT <jay.jae...@dot.wi.gov>wrote: > > Some possibilities: > > > > 1) Put the file extension into your index (that is what we did when we > > were testing indexing documents with Solr) > > 2) Put a mime type for the document into your index. > > 3) Put the whole file name / URL into your index, and match on part of > > the name. This will give some false positives. > > > > JRJ > > > > -----Original Message----- > > From: ahmad ajiloo [mailto:ahmad.aji...@gmail.com] > > Sent: Monday, September 12, 2011 5:58 AM > > To: solr-user@lucene.apache.org > > Subject: Fwd: How to serach on specific file types ? > > > > Hello > > I want to search on articles. So need to find only specific files like > > doc, docx, and pdf. > > I don't need any html pages. Thus the result of our search should only > > consists of doc, docx, and pdf files. > > can you help me?