In your solr schema.xml, are the fields you are using defined as text
fields with analyzers? It sounds like you want no analysis at all, which
probably means you don't want text fields either, you just want string
fields. That will make it impossible to search for individual tokens
though, searches will match only on complete matches of the value.
I'm not quite sure how to do what you want, it depends on exactly what
you want. What kind of searching do you expect to support? If you still
do want tokenization, you'll still want some analysis... but I'm not
quite sure how that corresponds to what you'd want to do on the lucene
end. What you're trying to do is going to be inevitably confusing, I
think. Which doesn't mean it's not possible. You might find it less
confusing if you were willing to use Solr to index though, rather than
straight lucene -- you could use Solr via the SolrJ java classes, rather
than the HTTP interface.
On 8/2/2011 11:14 AM, dhastings wrote:
Hello,
I am trying to get lucene and solr to agree on a completely Raw indexing
method. I use lucene in my indexers that write to an index on disk, and
solr to search those indexes that i create, as creating the indexes without
solr is much much faster than using the solr server.
are there settings for BOTH solr and lucene to use EXACTLY whats in the
content as opposed to interpreting what it thinks im trying to do? My
content is extremely specific and needs no interpretation or adjustment,
indexing or searching, a text field.
for example:
203.1 seems to be indexed as 2031. searching for 203.1 i can get to work
correctly, but then it wont find whats indexed using 3.1's standard
analyzer.
if i have content that is :
"this is rev. 23.302"
i need it indexed EXACTLY as it appears,
"this is rev. 23.302"
I do not want any of solr or lucenes attempts to "fix" my content or my
queries. "rev." needs to stay "rev." and not turn into "rev", "23.302"
needs to stay as such, and NOT turn into "23302". this is for BOTH indexing
and searching.
any hints?
right now for indexing i have:
Set nostopwords = new HashSet(); nostopwords.add("buahahahahahaha");
Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
writer = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);
writer.setUseCompoundFile(false) ;
and for searching i have in my schema :
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
Thanks. Very much appreciated.
--
View this message in context:
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html
Sent from the Solr - User mailing list archive at Nabble.com.