dhastings,

my recommendation for the approaches from both sides ...

Lucene:
try on a whitespace analyzer for size

   Analyzer an = new WhitespaceAnalyzer(Version.LUCENE_31);


Solr:
in your /index/solr/conf/schema.xml

   <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
     <analyzer type="query">
       <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        ...
     </analyzer>
   </fieldType>


-craig


-----Original Message-----
From: dhastings [mailto:dhasti...@wshein.com] 
Sent: Tuesday, 2 August 2011 10:14 PM
To: solr-user@lucene.apache.org
Subject: lucene/solr, raw indexing/searching

Hello,
I am trying to get lucene and solr to agree on a completely Raw indexing
method.  I use lucene in my indexers that write to an index on disk, and
solr to search those indexes that i create, as creating the indexes without
solr is much much faster than using the solr server.

are there settings for BOTH solr and lucene to use EXACTLY whats in the
content as opposed to interpreting what it thinks im trying to do?  My
content is extremely specific and needs no interpretation or adjustment,
indexing or searching, a text field.

for example:

203.1 seems to be indexed as 2031.  searching for 203.1 i can get to work
correctly, but then it wont find whats indexed using 3.1's standard
analyzer.

if i have content that is :
"this is rev. 23.302"

i need it indexed EXACTLY as it appears,
"this is rev. 23.302"

I do not want any of solr or lucenes attempts to "fix" my content or my
queries.  "rev." needs to stay "rev." and not turn into "rev", "23.302"
needs to stay as such, and NOT turn into "23302".  this is for BOTH indexing
and searching.  

any hints?

right now for indexing i have:

        Set nostopwords = new HashSet(); nostopwords.add("buahahahahahaha");

Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
writer  = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);

writer.setUseCompoundFile(false) ;


and for searching i have in my schema :


 <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
       <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
     
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>


Thanks.  Very much appreciated.


--
View this message in context:
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219
277p3219277.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to