I predict you'll spend a lot of time on the admin/analysis page understanding what the various combinations of tokenizers and filters do. Because, you see, you already have differences, to whit: your Solr schema has LowercaseFilter and removeDuplicates.
Have you determined *why* Solr indexing is slower? You might consider using SolrJ and firing multiple threads/processes at the issue to bring indexing performance up to acceptable levels and avoid this problem entirely.... Best Erick On Aug 2, 2011 12:37 PM, "Jonathan Rochkind" <rochk...@jhu.edu> wrote: > In your solr schema.xml, are the fields you are using defined as text > fields with analyzers? It sounds like you want no analysis at all, which > probably means you don't want text fields either, you just want string > fields. That will make it impossible to search for individual tokens > though, searches will match only on complete matches of the value. > > I'm not quite sure how to do what you want, it depends on exactly what > you want. What kind of searching do you expect to support? If you still > do want tokenization, you'll still want some analysis... but I'm not > quite sure how that corresponds to what you'd want to do on the lucene > end. What you're trying to do is going to be inevitably confusing, I > think. Which doesn't mean it's not possible. You might find it less > confusing if you were willing to use Solr to index though, rather than > straight lucene -- you could use Solr via the SolrJ java classes, rather > than the HTTP interface. > > On 8/2/2011 11:14 AM, dhastings wrote: >> Hello, >> I am trying to get lucene and solr to agree on a completely Raw indexing >> method. I use lucene in my indexers that write to an index on disk, and >> solr to search those indexes that i create, as creating the indexes without >> solr is much much faster than using the solr server. >> >> are there settings for BOTH solr and lucene to use EXACTLY whats in the >> content as opposed to interpreting what it thinks im trying to do? My >> content is extremely specific and needs no interpretation or adjustment, >> indexing or searching, a text field. >> >> for example: >> >> 203.1 seems to be indexed as 2031. searching for 203.1 i can get to work >> correctly, but then it wont find whats indexed using 3.1's standard >> analyzer. >> >> if i have content that is : >> "this is rev. 23.302" >> >> i need it indexed EXACTLY as it appears, >> "this is rev. 23.302" >> >> I do not want any of solr or lucenes attempts to "fix" my content or my >> queries. "rev." needs to stay "rev." and not turn into "rev", "23.302" >> needs to stay as such, and NOT turn into "23302". this is for BOTH indexing >> and searching. >> >> any hints? >> >> right now for indexing i have: >> >> Set nostopwords = new HashSet(); nostopwords.add("buahahahahahaha"); >> >> Analyzer an = new StandardAnalyzer(Version.LUCENE_31, nostopwords); >> writer = new IndexWriter(fsDir,an,MaxFieldLength.UNLIMITED); >> writer.setUseCompoundFile(false) ; >> >> >> and for searching i have in my schema : >> >> >> <fieldType name="text" class="solr.TextField" positionIncrementGap="100"> >> <analyzer> >> <tokenizer class="solr.StandardTokenizerFactory"/> >> >> <filter class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> >> </analyzer> >> </fieldType> >> >> >> Thanks. Very much appreciated. >> >> >> -- >> View this message in context: http://lucene.472066.n3.nabble.com/lucene-solr-raw-indexing-searching-tp3219277p3219277.html >> Sent from the Solr - User mailing list archive at Nabble.com. >>