dhastings, my recommendation for the approaches from both sides ...
Lucene: try on a whitespace analyzer for size Analyzer an = new WhitespaceAnalyzer(Version.LUCENE_31); Solr: in your /index/solr/conf/schema.xml <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> ... </analyzer> </fieldType> -craig -----Original Message----- From: dhastings [mailto:dhasti...@wshein.com] Sent: Tuesday, 2 August 2011 10:14 PM To: solr-user@lucene.apache.org Subject: lucene/solr, raw indexing/searching Hello, I am trying to get lucene and solr to agree on a completely Raw indexing method. I use lucene in my indexers that write to an index on disk, and solr to search those indexes that i create, as creating the indexes without solr is much much faster than using the solr server. are there settings for BOTH solr and lucene to use EXACTLY whats in the content as opposed to interpreting what it thinks im trying to do? My content is extremely specific and needs no interpretation or adjustment, indexing or searching, a text field. for example: 203.1 seems to be indexed as 2031. searching for 203.1 i can get to work correctly, but then it wont find whats indexed using 3.1's standard analyzer. if i have content that is : "this is rev. 23.302" i need it indexed EXACTLY as it appears, "this is rev. 23.302" I do not want any of solr or lucenes attempts to "fix" my content or my queries. "rev." needs to stay "rev." and not turn into "rev", "23.302" needs to stay as such, and NOT turn into "23302". this is for BOTH indexing and searching. any hints? right now for indexing i have: Set nostopwords = new HashSet(); nostopwords.add("buahahahahahaha"); Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords); writer = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED); writer.setUseCompoundFile(false) ; and for searching i have in my schema : <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> <analyzer> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer> </fieldType> Thanks. Very much appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219 277p3219277.html Sent from the Solr - User mailing list archive at Nabble.com.