searching

Jonathan Rochkind Tue, 02 Aug 2011 09:37:32 -0700

In your solr schema.xml, are the fields you are using defined as textfields with analyzers? It sounds like you want no analysis at all, whichprobably means you don't want text fields either, you just want stringfields. That will make it impossible to search for individual tokensthough, searches will match only on complete matches of the value.

I'm not quite sure how to do what you want, it depends on exactly whatyou want. What kind of searching do you expect to support? If you stilldo want tokenization, you'll still want some analysis... but I'm notquite sure how that corresponds to what you'd want to do on the luceneend. What you're trying to do is going to be inevitably confusing, Ithink. Which doesn't mean it's not possible. You might find it lessconfusing if you were willing to use Solr to index though, rather thanstraight lucene -- you could use Solr via the SolrJ java classes, ratherthan the HTTP interface.


On 8/2/2011 11:14 AM, dhastings wrote:

Hello,
I am trying to get lucene and solr to agree on a completely Raw indexing
method.  I use lucene in my indexers that write to an index on disk, and
solr to search those indexes that i create, as creating the indexes without
solr is much much faster than using the solr server.

are there settings for BOTH solr and lucene to use EXACTLY whats in the
content as opposed to interpreting what it thinks im trying to do?  My
content is extremely specific and needs no interpretation or adjustment,
indexing or searching, a text field.

for example:

203.1 seems to be indexed as 2031.  searching for 203.1 i can get to work
correctly, but then it wont find whats indexed using 3.1's standard
analyzer.

if i have content that is :
"this is rev. 23.302"

i need it indexed EXACTLY as it appears,
"this is rev. 23.302"

I do not want any of solr or lucenes attempts to "fix" my content or my
queries.  "rev." needs to stay "rev." and not turn into "rev", "23.302"
needs to stay as such, and NOT turn into "23302".  this is for BOTH indexing
and searching.

any hints?

right now for indexing i have:

         Set nostopwords = new HashSet(); nostopwords.add("buahahahahahaha");

Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
writer  = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);
writer.setUseCompoundFile(false) ;


and for searching i have in my schema :


  <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
        <analyzer>
         <tokenizer class="solr.StandardTokenizerFactory"/>

         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
     </fieldType>


Thanks.  Very much appreciated.


--
View this message in context: 
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: lucene/solr, raw indexing/searching

Reply via email to