In your solr schema.xml, are the fields you are using defined as text fields with analyzers? It sounds like you want no analysis at all, which probably means you don't want text fields either, you just want string fields. That will make it impossible to search for individual tokens though, searches will match only on complete matches of the value.

I'm not quite sure how to do what you want, it depends on exactly what you want. What kind of searching do you expect to support? If you still do want tokenization, you'll still want some analysis... but I'm not quite sure how that corresponds to what you'd want to do on the lucene end. What you're trying to do is going to be inevitably confusing, I think. Which doesn't mean it's not possible. You might find it less confusing if you were willing to use Solr to index though, rather than straight lucene -- you could use Solr via the SolrJ java classes, rather than the HTTP interface.

On 8/2/2011 11:14 AM, dhastings wrote:
Hello,
I am trying to get lucene and solr to agree on a completely Raw indexing
method.  I use lucene in my indexers that write to an index on disk, and
solr to search those indexes that i create, as creating the indexes without
solr is much much faster than using the solr server.

are there settings for BOTH solr and lucene to use EXACTLY whats in the
content as opposed to interpreting what it thinks im trying to do?  My
content is extremely specific and needs no interpretation or adjustment,
indexing or searching, a text field.

for example:

203.1 seems to be indexed as 2031.  searching for 203.1 i can get to work
correctly, but then it wont find whats indexed using 3.1's standard
analyzer.

if i have content that is :
"this is rev. 23.302"

i need it indexed EXACTLY as it appears,
"this is rev. 23.302"

I do not want any of solr or lucenes attempts to "fix" my content or my
queries.  "rev." needs to stay "rev." and not turn into "rev", "23.302"
needs to stay as such, and NOT turn into "23302".  this is for BOTH indexing
and searching.

any hints?

right now for indexing i have:

         Set nostopwords = new HashSet(); nostopwords.add("buahahahahahaha");

Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
writer  = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);
writer.setUseCompoundFile(false) ;


and for searching i have in my schema :


  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
         <tokenizer class="solr.StandardTokenizerFactory"/>

         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
     </fieldType>


Thanks.  Very much appreciated.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html
Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to