I predict you'll spend a lot of time on the admin/analysis page
understanding what the various combinations of tokenizers and filters do.
Because, you see, you already have differences, to whit: your Solr schema
has LowercaseFilter and removeDuplicates.

Have you determined *why* Solr indexing is slower? You might consider using
SolrJ and firing multiple threads/processes at the issue to bring indexing
performance up to acceptable levels and avoid this problem entirely....

Best
Erick
On Aug 2, 2011 12:37 PM, "Jonathan Rochkind" <rochk...@jhu.edu> wrote:
> In your solr schema.xml, are the fields you are using defined as text
> fields with analyzers? It sounds like you want no analysis at all, which
> probably means you don't want text fields either, you just want string
> fields. That will make it impossible to search for individual tokens
> though, searches will match only on complete matches of the value.
>
> I'm not quite sure how to do what you want, it depends on exactly what
> you want. What kind of searching do you expect to support? If you still
> do want tokenization, you'll still want some analysis... but I'm not
> quite sure how that corresponds to what you'd want to do on the lucene
> end. What you're trying to do is going to be inevitably confusing, I
> think. Which doesn't mean it's not possible. You might find it less
> confusing if you were willing to use Solr to index though, rather than
> straight lucene -- you could use Solr via the SolrJ java classes, rather
> than the HTTP interface.
>
> On 8/2/2011 11:14 AM, dhastings wrote:
>> Hello,
>> I am trying to get lucene and solr to agree on a completely Raw indexing
>> method. I use lucene in my indexers that write to an index on disk, and
>> solr to search those indexes that i create, as creating the indexes
without
>> solr is much much faster than using the solr server.
>>
>> are there settings for BOTH solr and lucene to use EXACTLY whats in the
>> content as opposed to interpreting what it thinks im trying to do? My
>> content is extremely specific and needs no interpretation or adjustment,
>> indexing or searching, a text field.
>>
>> for example:
>>
>> 203.1 seems to be indexed as 2031. searching for 203.1 i can get to work
>> correctly, but then it wont find whats indexed using 3.1's standard
>> analyzer.
>>
>> if i have content that is :
>> "this is rev. 23.302"
>>
>> i need it indexed EXACTLY as it appears,
>> "this is rev. 23.302"
>>
>> I do not want any of solr or lucenes attempts to "fix" my content or my
>> queries. "rev." needs to stay "rev." and not turn into "rev", "23.302"
>> needs to stay as such, and NOT turn into "23302". this is for BOTH
indexing
>> and searching.
>>
>> any hints?
>>
>> right now for indexing i have:
>>
>> Set nostopwords = new HashSet(); nostopwords.add("buahahahahahaha");
>>
>> Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords);
>> writer = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED);
>> writer.setUseCompoundFile(false) ;
>>
>>
>> and for searching i have in my schema :
>>
>>
>> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>> <analyzer>
>> <tokenizer class="solr.StandardTokenizerFactory"/>
>>
>> <filter class="solr.LowerCaseFilterFactory"/>
>> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>> </analyzer>
>> </fieldType>
>>
>>
>> Thanks. Very much appreciated.
>>
>>
>> --
>> View this message in context:
http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>

Reply via email to