SpellCheckComponent questions

2010-06-16 Thread Blargy

Is it generally wiser to build the dictionary from the existing index? Search
Log? Other? 

For "Did you mean" does one usually just use collate=true and then return
that string?

Should I be using a separate spellchecker handler to should I just always
include spellcheck=true in my original search queries? I noticed in some
sample solrconfig files that it recommends against creating a separate
request handler just for spellcheck requests but I why should I tax every
single request when I really only want to perform a spellcheck when there
are less than x amount of results.

I'm guessing if I wanted to achieve the above functionality (only spellcheck
when there are < x results) I could create a custom SearchComponent that
subclasses the solr.SpellCheckComponent. If I decide to go down this route,
how can I get access to the number or results/and or actual results?

Thanks again nabble ;)



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-questions-tp901672p901672.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SpellCheckComponent questions

2010-06-16 Thread Blargy

Follow up question.

How can I influence the "scoring" of results that comeback either through
term frequency (if i build of an index) or through # of search results
returned (if using a search log)?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SpellCheckComponent-questions-tp901672p901789.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: how to apply patch SOLR-1316

2010-06-16 Thread Blargy

Im trying to apply this via the command line "patch -p0 < SOLR-1316.patch".

When patching against trunk I get the following errors.

~/workspace $ patch -p0 < SOLR-1316.patch 
patching file
dev/trunk/solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
Hunk #2 succeeded at 575 (offset -3 lines).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/AbstractLuceneSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/IndexBasedSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/SolrSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/BufferingTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/FileDictionary.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Lookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/SortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Suggester.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/UnsortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTAutocomplete.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TernaryTreeNode.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java
Hunk #1 FAILED at 54.
Hunk #2 FAILED at 69.
2 out of 2 hunks FAILED -- saving rejects to file
dev/trunk/solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java.rej
patching file
dev/trunk/solr/src/java/org/apache/solr/util/SortedIterator.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/TermFreqIterator.java
patching file
dev/trunk/solr/src/test/org/apache/solr/spelling/suggest/SuggesterTest.java
patching file
dev/trunk/solr/src/test/test-files/solr/conf/schema-spellchecker.xml
patching file
dev/trunk/solr/src/test/test-files/solr/conf/solrconfig-spellchecker.xml

Patching against the 1.4.0 tag I get the following errors

$ patch -p0 < SOLR-1316.patch 
patching file
dev/trunk/solr/src/java/org/apache/solr/handler/component/SpellCheckComponent.java
Hunk #1 succeeded at 102 (offset -5 lines).
Hunk #2 succeeded at 348 (offset -230 lines).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/AbstractLuceneSpellChecker.java
Hunk #1 succeeded at 40 (offset 1 line).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/IndexBasedSpellChecker.java
Hunk #1 succeeded at 105 (offset 3 lines).
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/SolrSpellChecker.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/BufferingTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/FileDictionary.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Lookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/SortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/Suggester.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/UnsortedTermFreqIteratorWrapper.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/jaspell/JaspellTernarySearchTrie.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTAutocomplete.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TSTLookup.java
patching file
dev/trunk/solr/src/java/org/apache/solr/spelling/suggest/tst/TernaryTreeNode.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/HighFrequencyDictionary.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/SortedIterator.java
patching file
dev/trunk/solr/src/java/org/apache/solr/util/TermFreqIterator.java
patching file
dev/trunk/solr/src/test/org/apache/solr/spelling/suggest/SuggesterTest.java
patching file
dev/trunk/solr/src/test/test-files/solr/conf/schema-spellchecker.xml
patching file
dev/trunk/solr/src/test/test-files/solr/conf/solrconfig-spellchecker.xml
Hunk #1 succeeded at 86 with fuzz 1 (offset -6 lines).

As you can see both versions don't appear to be working. I tried building
each but neither would compile. Which version/tag should be used when
applying this patch?

Thanks

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-apply-patch-SOLR-1316-tp676497p901887.html
Sent from the Solr - User mailing list archive at Nabble.com.


Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy

How can I preserve phrases for either autosuggest/autocomplete/spellcheck?

For example we have a bunch of product listings and I want if someone types:
"louis" for it to common up with "Louis Vuitton". "World" ... "World cup". 

Would I need n-grams? Shingling? Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p902951.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy

Thanks for the reply Michael. Ill definitely try that out and let you know
how it goes. Your solution sounds similar to the one I've read here:
http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/
 

There are some good comments in there too.

I think I am having the biggest trouble distinguishing what needs to be done
for autocomplete/autosuggestion (google like behavior) and a separate issue
involving spellchecking (Did you mean...). I guess I originally thought
those 2 distinct features would involve the same solution but it appears
that they are completely different. Your solution sounds like its works best
for autocomplete and I will be using it for that exact purpose ;) One
question though... how do you handle more popular words/documents over
others? 

Now my next question is, how would I get spellchecker to work with phrases.
So if I typed "vitton" it would come back with something like: "Did you
mean: 'Louis Vuitton'?" Will this also require a combination of ngrams and
shingles? 

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p903225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Autsuggest/autocomplete/spellcheck phrases

2010-06-17 Thread Blargy

Ok that makes perfect sense.

"What I did was use a combination of the two running the indexed terms
through " - I initially read this as you used your current index and use
the terms from that to buildup your dictionary.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Autsuggest-autocomplete-spellcheck-phrases-tp902951p903299.html
Sent from the Solr - User mailing list archive at Nabble.com.


DismaxRequestHandler

2010-06-17 Thread Blargy

I have a title field and a description filed. I am searching across both
fields but I don't want description matches unless they are within some slop
of each other. How can I query for this? It seems that im getting back crazy
results when there are matches that are nowhere each other

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/DismaxRequestHandler-tp903641p903641.html
Sent from the Solr - User mailing list archive at Nabble.com.


defType=Dismax questions

2010-06-17 Thread Blargy

Sorry for the repost but I posted under DismaxRequestHandler when I should
have listed it as DismaxQueryParser.. ie im using defType=dismax

I have a title field and a description filed. I am searching across both
fields but I don't want description matches unless they are within some slop
of each other. How can I query for this? It seems that im getting back crazy
results when there are matches that are nowhere each other 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/defType-Dismax-questions-tp904087p904087.html
Sent from the Solr - User mailing list archive at Nabble.com.


Peformance tuning

2010-06-17 Thread Blargy

After indexing our item descriptions our index grew from around 3gigs to now
17.5 and I can see our search has deteriorated from sub 50ms searches to
over 500ms now. The sick thing is I'm not even searching across that field
at the moment but I plan to in the near future as well as include
highlighting.

What size is considered to be "too big" for one index? When should one
looking into sharding/federation etc?

What are some generic performance tuning options that could possible help?
We are currently hosting 4 slaves. Would increasing the number of slaves
help?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904540.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Peformance tuning

2010-06-17 Thread Blargy

Is there an alternative for highlighting on a large stored field? I thought
for highlighting you needed the field stored? I really just need the
excerpting feature for highlighting relevant portions of our item
descriptions.

Not sure if this is because of the index size (17.5G) or because of
highlighting but our slave servers are experiencing high loads... possibly
due to replication That actually leads me to my next question, I thought
replication would only download new segments without the need to always
re-download the whole index. This doesn't appear to be the case from what
I'm seeing. Am I wrong?

Thanks again

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904610.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Peformance tuning

2010-06-17 Thread Blargy


Blargy - Please try to quote the mail you're responding to, at least  
> the relevant piece.  It's nice to see some context to the discussion.

No problem ;)


Depends - if you optimize the index on the master, then the entire index is
replicated.  If you simply commit and let Lucene take care of  adding
segments you'll generally reduce what is replicated. 

As a side question... would reducing the mergeFactor help at all? This is
currently what I am using...


false
64
5
false
true


  1
  0


false
  
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904810.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Peformance tuning

2010-06-17 Thread Blargy



> first step is to do an &debugQuery=true and see where the time is  
> going on the server-side.  If you're doing highlighting of a stored  
> field, that can be a biggie.   The timings will be in the debug output  
> - be sure to look at both sections of the timings. 
> 

Looks like the majority of the time is spend on the QueryComponent in the
Process section. Any suggestions on how I can improve this? Thanks!

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p904861.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Peformance tuning

2010-06-18 Thread Blargy


Otis Gospodnetic-2 wrote:
> 
> Smaller merge factor will make things worse - 
> 

- Whoops... Ill guess Ill change it from 5 to the default 10
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p905726.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Peformance tuning

2010-06-18 Thread Blargy


Otis Gospodnetic-2 wrote:
> 
> You may want to try the RPM tool, it will show you what inside of that
> QueryComponent is really slow.
> 

We are already using it :)

Where should I be concentrating on? Transaction trace?

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Peformance-tuning-tp904540p905730.html
Sent from the Solr - User mailing list archive at Nabble.com.


jdbc4.CommunicationsException

2010-06-20 Thread Blargy

Does anyone know a solution to this problem? I've already tried
autoReconnect=true and it doesn't appear to help. This happened 34 hours
into my full-import... ouch! 

org.apache.solr.handler.dataimport.DataImportHandlerException:
com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: The last packet
successfully received from the server was 21 milliseconds ago.  The last
packet sent successfully to the server was 124,896,004 milliseconds ago. is
longer than the server configured value of 'wait_timeout'. You should
consider either expiring and/or testing connection validity before use in
your application, increasing the server configured values for client
timeouts, or using the Connector/J connection property 'autoReconnect=true'
to avoid this problem.
at
org.apache.solr.handler.dataimport.DataImportHandlerException.wrapAndThrow(DataImportHandlerException.java:64)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.hasnext(JdbcDataSource.java:339)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator.access$700(JdbcDataSource.java:228)
at
org.apache.solr.handler.dataimport.JdbcDataSource$ResultSetIterator$1.hasNext(JdbcDataSource.java:262)
at
org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(EntityProcessorBase.java:78)
at
org.apache.solr.handler.dataimport.SqlEntityProcessor.nextRow(SqlEntityProcessor.java:73)
at
org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityProcessorWrapper.java:237)
at
org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:361)
at
org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:246)
at
org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180)
at
org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331)
at
org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389)
at
org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/jdbc4-CommunicationsException-tp909274p909274.html
Sent from the Solr - User mailing list archive at Nabble.com.


IDH - "Total Documents Processed" is missing

2010-06-20 Thread Blargy

It seems that when importing via DIH the "Total Documents Processed" status
message does not appear when there are two entities for a given document. Is
this by design?

 
   
 

-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/IDH-Total-Documents-Processed-is-missing-tp909325p909325.html
Sent from the Solr - User mailing list archive at Nabble.com.


LocalParams?

2010-06-21 Thread Blargy

Huh? Read through the wiki: See http://wiki.apache.org/solr/LocalParams but I
still don't understand its utility? 

Can someone explain to me why this would even be used? Any examples to help
clarify? Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/LocalParams-tp913183p913183.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Blargy

Need, 

Seems like we are in the same boat. Our index consist of 5M records which
roughly equals around 30 gigs. All in all thats not too bad however our
indexing process (we use DIH but I'm now revisiting that idea) takes a
whopping 30+ hours!!!

I just bought the Hadoop In Action early edition but haven't had time to
read it yet. I was wondering what resources you are using to learn Hadoop
and more importantly its applications to Solr. Would you mind explaining
your thought process on how you will be using Hadoop in more detail? 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p914606.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: anyone use hadoop+solr?

2010-06-22 Thread Blargy


Muneeb Ali wrote:
> 
> Hi Blargy,
> 
> Nice to hear that I am not alone ;) 
> 
> Well we have been using Hadoop for other data-intensive services, those
> that can be done in parallel. We have multiple nodes, which are used by
> Hadoop for all our MapReduce jobs. I personally don't have much experience
> with its use and hence wouldn't be able to help you much with that.
> 
> Our indexing takes 6+ hours to index 15 million documents (using
> solrj.streamUpdateSolrServer). I wanted to explore hadoop for this task,
> as it can be done in parallel.
> 
> I have just started investigating into this, will keep this post updated
> if found anything helpful.
>  
> -Neeb 
> 

Would you mind explaining how your full indexing strategy is implemented
using the StreamingUpdateSolrServer? I am currently only familar with using
the DataImportHandler. Thanks.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/anyone-use-hadoop-solr-tp485333p915227.html
Sent from the Solr - User mailing list archive at Nabble.com.


Similarity

2010-06-24 Thread Blargy

Can someone explain how I can override the default behavior of the tf
contributing a higher score for documents with repeated words?

For example:

Query: "foo"
Doc1: "foo bar" score 1.0
Doc2: "foo foo bar" score 1.1

Doc2 contains "foo" twice so it is scored higher. How can I override this
behavior?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-tp920366p920366.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Similarity

2010-06-24 Thread Blargy


Yonik Seeley-2-2 wrote:
> 
> Depends on the larger context of what you are trying to do.
> Do you still want the idf and length norm relevancy factors?  If not,
> use a filter, or boost the particular clause with 0.
> 

I do want the other relevancy factors.. ie boost, phrase-boosting etc but I
just want to make it so that only unique terms in the query contribute to
the overall score.

For example:

Query: "foo"
Doc1: "foo bar baz"
Doc2: "foo foo bar"

The above documents should have the same score.

Query "foo baz"
Doc1: "foo bar baz"
Doc2: "foo foo bar"

In this example Doc1 should be scored higher because it has 2 unique terms
that match


-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-tp920366p920530.html
Sent from the Solr - User mailing list archive at Nabble.com.


SweetSpotSimilarity

2010-06-25 Thread Blargy

Would someone mind explaining how this differs from the DefaultSimilarity?
Also how would one replace the use of the DefaultSimilarity class with this
one? I can't seem to find any such configuration in solrconfig.xml.

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p922546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SweetSpotSimilarity

2010-06-25 Thread Blargy


iorixxx wrote:
> 
> it is in schema.xml:
> 
> 
> 

Thanks. Im guessing this is all or nothing.. ie you can't you one similarity
class for one request handler and another for a separate request handler. Is
that correct?



-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p922622.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SweetSpotSimilarity

2010-06-28 Thread Blargy


iorixxx wrote:
> 
> it is in schema.xml:
> 
> 
> 

How would you configure the tfBaselineTfFactors and LengthNormFactors when
configuring via schema.xml? Do I have to create a subclass that hardcodes
these values?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p928730.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SweetSpotSimilarity

2010-06-28 Thread Blargy


iorixxx wrote:
> 
> CustomSimilarityFactory that extends
> org.apache.solr.schema.SimilarityFactory should do it. There is an example
> CustomSimilarityFactory.java under src/test/org...
> 

This is exactly what I was looking for... this is very similar ( no put
intended ;) ) to the updateProcessorFactory configuration in
solr-config.xml. The wiki should probably include this information.

Side question. How would I know if a configuration option can also take a
factory class.. like in this instance?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/SweetSpotSimilarity-tp922546p928862.html
Sent from the Solr - User mailing list archive at Nabble.com.


Optimizing cache

2010-06-28 Thread Blargy

Here is a screen shot for our cache from New Relic.

http://s4.postimage.org/mmuji-31d55d69362066630eea17ad7782419c.png

Query cache: 55-65%
Filter cache: 100%
Document cache: 63%

Cache size is 512 for above 3 caches.

How do I interpret this data? What are some optimal configuration changes
given the above stats?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Optimizing-cache-tp929156p929156.html
Sent from the Solr - User mailing list archive at Nabble.com.


Custom PhraseQuery

2010-06-29 Thread Blargy

Is there anyway to override/change up the default PhraseQuery class that is
used... similar to how you can change out the Similarity class?

Let me explain what I am trying to do. I would like to override the TF is
calculated... always returning a max of 1 for phraseFreq. 

For example:
Query: "foo bar"
Doc1: "foo bar baz"
Doc2: "foo bar foo bar"

These two documents should be scored exactly the same. I accomplished the
above in the "normal" query use-case by using the SweetSpotSimilarity class.
There doesn't happen to be a SweetSpotPhraseQuery class is there?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-PhraseQuery-tp932414p932414.html
Sent from the Solr - User mailing list archive at Nabble.com.


ValueSource/Function questions

2010-07-01 Thread Blargy

Can someone explain what the createWeight methods should do?

And one someone mind explaining what the hashCode method is doing in this
use case?

  public int hashCode() {
int h = a.hashCode();
h ^= (h << 13) | (h >>> 20);
h += b.hashCode();
h ^= (h << 23) | (h >>> 10);
h += name().hashCode();
return h;
  }
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/ValueSource-Function-questions-tp936672p936672.html
Sent from the Solr - User mailing list archive at Nabble.com.


MLT with boost capability

2010-07-09 Thread Blargy

I've asked this question in the past without too much success. I figured I
would try to revive it.

Is there a way I can incorporate boost functions with a MoreLikeThis search?
Can it be accomplished at the MLT request handler level or would I need to
create a custom request handler which in turn delegates the majority of the
search to a specialized instance of MLT? Can someone point me in the right
direction?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/MLT-with-boost-capability-tp954650p954650.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom PhraseQuery

2010-07-09 Thread Blargy

Oh.. i didnt know about the different signatures to tf. Thanks for that
clarification.

It sounds like all I need to do is actually override tf(float) in the
SweetSpotSimilarity class to delegate to baselineTF just like tf(int) does.
Is this correct?

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Custom-PhraseQuery-tp932414p955257.html
Sent from the Solr - User mailing list archive at Nabble.com.


Foreign characters question

2010-07-13 Thread Blargy

I am trying to add the following synonym while indexing/searching

swimsuit, bañadores, bañador

I testing searching for "bañadores" however it didn't return any results.
After further inspection I noticed in the field analysis admin that swimsuit
gets expanded to ba�adores. Not sure if it will show up but the "n" is a
black diamond with a white question mark in it. 

So basically, how can I add support for foreign characters?  Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p964078.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Foreign characters question

2010-07-14 Thread Blargy

Thanks for the reply but that didnt help. 

Tomcat is accepting foreign characters but for some reason when it reads the
synonyms file and it encounters that character ñ it doesnt appear correctly
in the Field Analysis admin. It shows up as �. If I query exactly for ñ it
will work but the synonyms file is srcrewy.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p966740.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Foreign characters question

2010-07-14 Thread Blargy

How can I tell and/or create a UTF-8 synonyms file? Do I have to instruct
solr that this file is UTF-8?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967037.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Foreign characters question

2010-07-14 Thread Blargy

Nevermind. Apparently my IDE (Netbeans) was set to "No encoding"... wtf.
Changed it to UTF-8 and recreated the file and all is good now. Thanks!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Foreign-characters-question-tp964078p967058.html
Sent from the Solr - User mailing list archive at Nabble.com.


Stemming

2010-07-20 Thread Blargy

I am using the LucidKStemmer and I noticed that it doesnt stem certain
words... for example "bags". How could I create a list of explicit words to
stem... ie sort of the opposite of protected words.

I know this can be accomplished using the synonyms file but I want to know
how to just replace one word with another. 

"This is a bags test" => "This is a bag test"
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-tp982690p982690.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stemming

2010-07-20 Thread Blargy

Perfect!

Is there an associated JIRA ticket/patch for this so I can patch my 4.1
build?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Stemming-tp982690p982786.html
Sent from the Solr - User mailing list archive at Nabble.com.


Architectural help

2010-03-09 Thread blargy

I was wondering if someone could be so kind to give me some architectural
guidance.

A little about our setup. We are RoR shop that is currently using Ferret (no
laughs please) as our search technology. Our indexing process at the moment
is quite poor as well as our search results. After some deliberation we have
decided to switch to Solr to satisfy our search requirements. 

We have about 5M records ranging in size all coming from a DB source (only 2
tables). What will be the most efficient way of indexing all of these
documents? I am looking at DIH but before I go down that road I wanted to
get some guidance. Are there any pitfalls I should be aware of before I
start? Anything I can do now that will help me down the road?

I have also been exploring the Sunspot rails plugin
(http://outoftime.github.com/sunspot/) which so far seems amazing. There is
an easy way to reindex all of your models like Model.reindex but I doubt
this is the most efficient. Has anyone had any experience using Sunspot with
their rails environment and if so should I bother with the DIH?

Please let me know of any suggestions/opinions you may have. Thanks.


-- 
View this message in context: 
http://old.nabble.com/Architectural-help-tp27844268p27844268.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Architectural help

2010-03-10 Thread blargy

So I can just create a view  (or temporary table) and then just have a simple
"select * from (view or table)" in my DIH config?


Constantijn Visinescu wrote:
> 
> Try making a database view that contains everything you want to index, and
> then just use the DIH.
> 
> Worked when i tested it ;)
> 
> On Wed, Mar 10, 2010 at 1:56 AM, blargy  wrote:
> 
>>
>> I was wondering if someone could be so kind to give me some architectural
>> guidance.
>>
>> A little about our setup. We are RoR shop that is currently using Ferret
>> (no
>> laughs please) as our search technology. Our indexing process at the
>> moment
>> is quite poor as well as our search results. After some deliberation we
>> have
>> decided to switch to Solr to satisfy our search requirements.
>>
>> We have about 5M records ranging in size all coming from a DB source
>> (only
>> 2
>> tables). What will be the most efficient way of indexing all of these
>> documents? I am looking at DIH but before I go down that road I wanted to
>> get some guidance. Are there any pitfalls I should be aware of before I
>> start? Anything I can do now that will help me down the road?
>>
>> I have also been exploring the Sunspot rails plugin
>> (http://outoftime.github.com/sunspot/) which so far seems amazing. There
>> is
>> an easy way to reindex all of your models like Model.reindex but I doubt
>> this is the most efficient. Has anyone had any experience using Sunspot
>> with
>> their rails environment and if so should I bother with the DIH?
>>
>> Please let me know of any suggestions/opinions you may have. Thanks.
>>
>>
>> --
>> View this message in context:
>> http://old.nabble.com/Architectural-help-tp27844268p27844268.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Architectural-help-tp27844268p27854256.html
Sent from the Solr - User mailing list archive at Nabble.com.



DIH field options

2010-03-11 Thread blargy

How can you simply add a static value like? 
How does one add a static multi-value field? 

Is there any documentation on all the options for the field tag in
data-config.xml?

Thanks for the help
-- 
View this message in context: 
http://old.nabble.com/DIH-field-options-tp27873996p27873996.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH field options

2010-03-12 Thread blargy

Forgive me but I'm slightly retarded... I grew up underneath some power lines
;)

I've read through that wiki but I still can't find what I'm looking for. I
just want to give one of the DIH entities/fields a static value (ie it
doesnt come from a database column). How can I configure this?

FYI this is data-config.xml not schema.xml.

  

  
  

  




Tommy Chheng-4 wrote:
> 
>   The wiki page has most of the info you need
> *http://wiki*.apache.org/*solr*/DataImportHandler
> 
> To use multi-value fields, your schema.xml must define it with 
> multiValued="true"
> 
> 
> On 3/11/10 10:58 PM, blargy wrote:
>> How can you simply add a static value like?> value="123"/>
>> How does one add a static multi-value field?> values="123, 456"/>
>>
>> Is there any documentation on all the options for the field tag in
>> data-config.xml?
>>
>> Thanks for the help
> 
> -- 
> Tommy Chheng
> Programmer and UC Irvine Graduate Student
> Twitter @tommychheng
> http://tommy.chheng.com
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/DIH-field-options-tp27873996p27878836.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH field options

2010-03-12 Thread blargy

I feel like the default option is a little hacky plus I'll probably be
sharing my schema.xml for multiple cores using dynamic field types.

I can't believe there isnt an easy way to specify this. So my only options
is something like this?


  
   
   
  


What if I don't need a template transformer for all the other fields? Is it
ok to mix and match? Will this effect performance at all? 

Thanks again!



Ahmet Arslan wrote:
> 
>> Forgive me but I'm slightly retarded... I grew up
>> underneath some power lines
>> ;)
>> 
>> I've read through that wiki but I still can't find what I'm
>> looking for. I
>> just want to give one of the DIH entities/fields a static
>> value (ie it
>> doesnt come from a database column). How can I configure
>> this?
>> 
>> FYI this is data-config.xml not schema.xml.
>> 
>>   
>>     
>>       > column="static_value_not_from_db"/>
>>       
>>     
>>   
>> 
> 
> I didn't do it by-myself but i think it can be done with
> TemplateTransformer[1] with something like:
> 
> 
> 
> Alternatively you can define default value of a field in schema.xml:
> 
>  default="NOW" multiValued="false"/>
> 
> [1] http://wiki.apache.org/solr/DataImportHandler#TemplateTransformer
> 
> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/DIH-field-options-tp27873996p27880065.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH field options

2010-03-12 Thread blargy

Im still having a problem with this... for example I would assume this would
index the value Item into the field called type 

  

  

  


However I receive this error when starting up Solr: Caused by:
org.apache.solr.handler.dataimport.DataImportHandlerException: Field must
have a column attribute

I must not be getting something. Did I mentioned I was booted in the head by
a horse when I was young?

Ok, even if I have the template transformer working to set static values
will this work for multi-valued fields?

Thanks yet again



Ahmet Arslan wrote:
> 
>> I feel like the default option is a little hacky plus I'll
>> probably be
>> sharing my schema.xml for multiple cores using dynamic
>> field types.
>> 
>> I can't believe there isnt an easy way to specify this. So
>> my only options
>> is something like this?
> 
> Also you can generate this static value from your SQL sentence. Something
> like:  select *, 'static_value_not_from_db' as my_field from items
>  
>> What if I don't need a template transformer for all the
>> other fields? Is it
>> ok to mix and match? Will this effect performance at all? 
> 
> Template Transformer is activated with template="". All Transformers work
> like that. I don't think that it will noticeably effect performance.
> 
> 
>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/DIH-field-options-tp27873996p27881023.html
Sent from the Solr - User mailing list archive at Nabble.com.



DIH template multivalued fields

2010-03-12 Thread blargy

How can I manually specify a static multiple value field in the
DataImportHandler?

I finally figured out the answer of how to statically define a value from
this FAQ: http://wiki.apache.org/solr/DataImportHandlerFaq which basically
states to use the TemplateTransformer.

My question is what do I put for template when the value is multi-valued?
-- 
View this message in context: 
http://old.nabble.com/DIH-template-multivalued-fields-tp27883630p27883630.html
Sent from the Solr - User mailing list archive at Nabble.com.



Hardware Recommendations

2010-03-12 Thread blargy

Ill have about 5m documents indexed (ranging in size) with an expected amount
of searches to be between 750k and 1m per day. 

Ill be using a master/slave setup with an unknown number of slaves. What
hardware requirements would you recommend/suggest? 

Thoughts?
-- 
View this message in context: 
http://old.nabble.com/Hardware-Recommendations-tp27885080p27885080.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH template multivalued fields

2010-03-12 Thread blargy

I was actually able to accomplish (althought not pretty) what I wanted using
a regex transformer.

  

  


blargy wrote:
> 
> How can I manually specify a static multiple value field in the
> DataImportHandler?
> 
> I finally figured out the answer of how to statically define a value from
> this FAQ: http://wiki.apache.org/solr/DataImportHandlerFaq which basically
> states to use the TemplateTransformer.
> 
> My question is what do I put for template when the value is multi-valued?
> 

-- 
View this message in context: 
http://old.nabble.com/DIH-template-multivalued-fields-tp27883630p27885965.html
Sent from the Solr - User mailing list archive at Nabble.com.



Solr Logging XML

2010-03-13 Thread blargy

How can I enable logging of all the xml posted to my Solr server? Is this
possible? As of right now all I see in the logs are the request params when
querying.

While I am on the topic of logging I have one other question too. Is it
possible to use custom variables in the logging.properties file such as

catalina.base=/my/catalina/dir
java.util.logging.FileHandler.pattern = ${catalina.base}/solr_log-%g.log

This doesnt seem to work. Can you even do interpolation .properties files?

Thanks
-- 
View this message in context: 
http://old.nabble.com/Solr-Logging-XML-tp27890682p27890682.html
Sent from the Solr - User mailing list archive at Nabble.com.



DataImportHandler development console

2010-03-13 Thread blargy

Is there any documentation on this screen? (and dont point me
http://wiki.apache.org/solr/DataImportHandler)

When using the Full-import, Status, Reload-Config, Document-Count and Full
Import With Cleaning everything works as expected but when I use any of the
following I get an exception: Debug Now, Start Row, No of Rows.

Caused by: java.sql.SQLException: Illegal value for setFetchSize().

Any ideas what might be causing this?

Thanks
-- 
View this message in context: 
http://old.nabble.com/DataImportHandler-development-console-tp27890750p27890750.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DataImportHandler development console

2010-03-13 Thread blargy

Also how would one auto-commit after a delta-import?

I click on the commit, clean and verbose checkboxes but those seem to have
no affect.


blargy wrote:
> 
> Is there any documentation on this screen? (and dont point me
> http://wiki.apache.org/solr/DataImportHandler)
> 
> When using the Full-import, Status, Reload-Config, Document-Count and Full
> Import With Cleaning everything works as expected but when I use any of
> the following I get an exception: Debug Now, Start Row, No of Rows.
> 
> Caused by: java.sql.SQLException: Illegal value for setFetchSize().
> 
> Any ideas what might be causing this?
> 
> Thanks
> 

-- 
View this message in context: 
http://old.nabble.com/DataImportHandler-development-console-tp27890750p27890914.html
Sent from the Solr - User mailing list archive at Nabble.com.



Managing configuration files/Environment variables

2010-03-13 Thread blargy

How are you guys solving the problem with managing all of your configuration
difference between development and production.

For example when deploying to production I need to change the
data-config.xml (DataImportHandler) database settings. I also have some ant
scripts to start/stop tomcat as well as symlinking a context docBase. 

I was wondering if there were some way to interpolate variables in the
configuration files... similar to how you can in ant files... for example

  

Same question goes for all other Solr.xml files.. can I insert custom
variables?

-- 
View this message in context: 
http://old.nabble.com/Managing-configuration-files-Environment-variables-tp27892349p27892349.html
Sent from the Solr - User mailing list archive at Nabble.com.



DIH datasource configuration

2010-03-14 Thread blargy

My current DIH is configured via the requestHandler block in solrconfig.xml



  data-config.xml
  
${datasource.driver}
${datasource.url}
${datasource.user}
${datasource.password}
 -1
true
  

  

My question is, does the batchsize and readOnly properties sill work if I
specify it here as opposed to the data-config.xml? I can't seem to find this
answer anywhere. An even better question is how can I check my current
datasource configuration while the application is running?

Thanks!


-- 
View this message in context: 
http://old.nabble.com/DIH-datasource-configuration-tp27897206p27897206.html
Sent from the Solr - User mailing list archive at Nabble.com.



RegexTransformer

2010-03-14 Thread blargy

How would I go about splitting a column by a certain delimiter AND ignore all
empty matches.

For example:
 


I have a some columns that dont have a value for values but so its getting
actually index as blank. I just want to totally ignore those values. Is this
possible?

-- 
View this message in context: 
http://old.nabble.com/RegexTransformer-tp27897870p27897870.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: RegexTransformer

2010-03-15 Thread blargy

Thanks for the replies. Ill just roll out my own transformer for this.


Shalin Shekhar Mangar wrote:
> 
> On Mon, Mar 15, 2010 at 2:53 PM, Michael Kuhlmann <
> michael.kuhlm...@zalando.de> wrote:
> 
>> On 03/15/10 08:56, Shalin Shekhar Mangar wrote:
>> > On Mon, Mar 15, 2010 at 2:12 AM, blargy  wrote:
>> >
>> >>
>> >> How would I go about splitting a column by a certain delimiter AND
>> ignore
>> >> all
>> >> empty matches.
>> [...]
>> > You will probably have to write a custom Transformer to remove empty
>> values.
>> > See http://wiki.apache.org/solr/DIHCustomTransformer
>> >
>> Shouldn't a PatternTokenizerFactory combined with a LengthFilterFactory
>> do the job?
>>
>> See http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters.
>>
>>
> Yes but only on the indexed values. Empty values will still be stored and
> returned in the response unless you stop them from reaching the indexing
> chain.
> 
> -- 
> Regards,
> Shalin Shekhar Mangar.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/RegexTransformer-tp27897870p27907090.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH datasource configuration

2010-03-15 Thread blargy

Shalin,

Where in the admin console can I view the current data-config.xml settings?

The reason I choose solrconfig.xml  to configure the datasource is because
there is no way for me to pass dynamic values into data-config.xml as I have
in my example. Is there a way that this can be accomplished?


blargy wrote:
> 
> My current DIH is configured via the requestHandler block in
> solrconfig.xml
> 
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
>   data-config.xml
>   
> ${datasource.driver}
> ${datasource.url}
> ${datasource.user}
> ${datasource.password}
>  -1
> true
>   
> 
>   
> 
> My question is, does the batchsize and readOnly properties sill work if I
> specify it here as opposed to the data-config.xml? I can't seem to find
> this answer anywhere. An even better question is how can I check my
> current datasource configuration while the application is running?
> 
> Thanks!
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/DIH-datasource-configuration-tp27897206p27907208.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH datasource configuration

2010-03-15 Thread blargy

Thanks but I was thinking more of a way to check the datasource's current
internal configuration. IE, i wanted to check if I put batchsize -1 in the
sorlconfig.xml that it was actually set.

Anyway, back to my previous question. Is there a way to dynamically set the
values for the datasource in data-config.xml? I can accomplish this in
solrconfig.xml by passing in arguments on the command line via the -D
option. 


Ahmet Arslan wrote:
> 
> 
>> Where in the admin console can I view the current
>> data-config.xml settings?
> 
> solr/admin/file/?file=data-config.xml&contentType=text/xml;charset=utf-8
> 
> 
>   
> 
> 

-- 
View this message in context: 
http://old.nabble.com/DIH-datasource-configuration-tp27897206p27910915.html
Sent from the Solr - User mailing list archive at Nabble.com.



Dyanmic variables/properties

2010-03-15 Thread blargy

Can someone point me in the right direction as where to find some
documentation on how and where I can configure dynamic variables/properties
to be used throughout the solr configuration files. Also what is the correct
term for these dynamic variables?

For example in solrconfig.xml there is this one define:
${solr.abortOnConfigurationError:true} which can be altered on the command
line. 

Thanks
-- 
View this message in context: 
http://old.nabble.com/Dyanmic-variables-properties-tp27913280p27913280.html
Sent from the Solr - User mailing list archive at Nabble.com.



Stemming suggestions

2010-03-16 Thread blargy

Most of our documents will be in English but not all and we are certain in
the process of acquiring more international content. Does anyone have any
experience using all of the different stemmers for languages of unknown
origin? Which ones perform the best? Give the most relevant results? What
are the main advantages of each one? I've heard that the KStemmer is a
less-aggressive stemmer and it is supposed to perform quite well will it
work for multi-languages? 

Any suggestions would be appreciated. Thanks
 
-- 
View this message in context: 
http://old.nabble.com/Stemming-suggestions-tp27920788p27920788.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: LucidWorks Solr

2010-03-16 Thread blargy

Kevin,

When you say you just included the war you mean the /packs/solr.war correct?
I see that the KStemmer is nicely packed in there but I don't see LucidGaze
anywhere. Have you had any experience using this? 

So I'm guessing you would suggest using the LucidWorks solr.war over the
apache-solr-war just because of the various bug-fixes/tests. 

As a side question. Is there a reason you choose the LucidKStemmer over any
other stemmers (KStemmer, Porter, etc)? I'm unsure of which stemmer would
work best. Thanks again!


Kevin Osborn-2 wrote:
> 
> I used it mostly for KStemmer, but I also liked the fact that it included
> about a dozen or so stable patches since Solr 1.4 was released. We just
> use the included WAR in our project however. We don't use the installer or
> anything like that.
> 
> 
> 
> 
> 
> 
> From: blargy 
> To: solr-user@lucene.apache.org
> Sent: Tue, March 16, 2010 11:52:17 AM
> Subject: LucidWorks Solr
> 
> 
> Has anyone used this?:
> http://www.lucidimagination.com/Downloads/LucidWorks-for-Solr
> 
> Other than the KStemmer and installer what are the other "enhancements"
> that
> this download offers? Is it worth using over the default Solr
> installation?
> 
> Thanks
> 
> -- 
> View this message in context:
> http://old.nabble.com/LucidWorks-Solr-tp27922870p27922870.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 
> 
>   
> 

-- 
View this message in context: 
http://old.nabble.com/LucidWorks-Solr-tp27922870p27923359.html
Sent from the Solr - User mailing list archive at Nabble.com.



Stopwords

2010-03-16 Thread blargy

I was reading "Scaling Lucen and Solr"
(http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Scaling-Lucene-and-Solr/)
and I came across the section StopWords. 

In there it mentioned that its not recommended to remove stop words at index
time. Why is this the case? Don't all the extraneous stopwords bloat the
index and lead to less relevant results? Can someone please explain this to
me. Thanks
-- 
View this message in context: 
http://old.nabble.com/Stopwords-tp27927028p27927028.html
Sent from the Solr - User mailing list archive at Nabble.com.



APR setup

2010-03-16 Thread blargy

[java] INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path:
.:/Library/Java/Extensions:/System/Library/Java/Extensions:/usr/lib/java

What the heck is this and why is it recommended for production settings?
Anyone?

-- 
View this message in context: 
http://old.nabble.com/APR-setup-tp27927553p27927553.html
Sent from the Solr - User mailing list archive at Nabble.com.



Recommended OS

2010-03-18 Thread blargy

Does anyone have any recommendations on which OS to use when setting up Solr
search server?

Any memory/disk space recommendations? 

Thanks
-- 
View this message in context: 
http://old.nabble.com/Recommended-OS-tp27948306p27948306.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Recommended OS

2010-03-18 Thread blargy

Beat me to the punch with that question.

KWong, did you happen to install the Apache APR? Wondering if it is even
worth the trouble.

I am thinking about going with RedHat Enterprise 5 unless anyone has any
objections?


Jean-Sebastien Vachon wrote:
> 
> 
> On 2010-03-18, at 1:03 PM, K Wong wrote:
> 
>> http://wiki.apache.org/solr/FAQ#What_are_the_Requirements_for_running_a_Solr_server.3F
>> 
>> I have Solr running on CentOS 5.4. It runs fine on the OpenJDK 1.6.0
>> and Tomcat 5. If I were to do it again, I'd probably just stick with
>> Jetty.
> 
> Would you mind explaining why you would stick with Jetty instead of
> Tomcat?
> 
> 
>> You really will need to read the docs to get the settings right as
>> there is no one-size-fits-all setting. (re your mem/dsk question)
>> 
>> K
>> 
>> 
>> 
>> On Thu, Mar 18, 2010 at 9:51 AM, blargy  wrote:
>>> 
>>> Does anyone have any recommendations on which OS to use when setting up
>>> Solr
>>> search server?
>>> 
>>> Any memory/disk space recommendations?
>>> 
>>> Thanks
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Recommended-OS-tp27948306p27948306.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>> 
>>> 
> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Recommended-OS-tp27948306p27948867.html
Sent from the Solr - User mailing list archive at Nabble.com.



Omitting norms question

2010-03-18 Thread blargy

Should I include not omit-norms on any fields that I would like to boost via
a boost-query/function query?

For example I have a created_on field on one of my documents and I would
like to add some sort of function query to this field when querying. In this
case does this mean I need to have the norms?

What about sortable fields? Facetable fields?

Thanks!
-- 
View this message in context: 
http://old.nabble.com/Omitting-norms-question-tp27950893p27950893.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Omitting norms question

2010-03-19 Thread blargy

Ok so as if I wanted to add boost to fields at indexing time then I should
include norms. On the other hand if I just want to boost at query time then
its quite alright to omit norms. 

Anyone mind explaining what norms are in layman's terms ;)


Marc Sturlese wrote:
> 
>>>Should I include not omit-norms on any fields that I would like to boost
via a boost-query/function >>query?
> You don't have to set norms to use boost queries or functions. Just have
> to set them when you want to boost docs or fields at indexing time.
> 
>>>What about sortable fields? Facetable fields?
> You can use both without setting norms aswell.
> 
> See what norms are for:
> http://lucene.apache.org/java/2_4_0/api/org/apache/lucene/search/Similarity.html#lengthNorm%28java.lang.String,%20int%29
> 
> 
> blargy wrote:
>> 
>> Should I include not omit-norms on any fields that I would like to boost
>> via a boost-query/function query?
>> 
>> For example I have a created_on field on one of my documents and I would
>> like to add some sort of function query to this field when querying. In
>> this case does this mean I need to have the norms?
>> 
>> What about sortable fields? Facetable fields?
>> 
>> Thanks!
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Omitting-norms-question-tp27950893p27950977.html
Sent from the Solr - User mailing list archive at Nabble.com.



Delta-Import quick question

2010-03-19 Thread blargy

Does the DIH delta-import automatically commit and optimize after its done?

...
8120
0
...

What is the difference between these? Usually I see the Total Documents
Processed.
-- 
View this message in context: 
http://old.nabble.com/Delta-Import-quick-question-tp27951022p27951022.html
Sent from the Solr - User mailing list archive at Nabble.com.



MLT question

2010-03-20 Thread blargy

Im playing around with MLT and I am getting back decent results when
searching against a particular document.

My question is how can I paginate the results of this query? For example
instead of setting rows you must specify mlt.count in the params. But how
can I set the offset? mlt.offset?

Thanks
-- 
View this message in context: 
http://old.nabble.com/MLT-question-tp27973301p27973301.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: DIH - Deleting documents

2010-03-23 Thread blargy

Are there any examples out there for using these special commands? Im not
quite sure of the syntax. Any simple example will suffice. Thanks


mausch wrote:
> 
> Take a look at the DIH special commands:
> http://wiki.apache.org/solr/DataImportHandler#Special_Commands
> Some other
> options:
> http://stackoverflow.com/questions/1555610/solr-dih-how-to-handle-deleted-documents
> 
> Cheers,
> Mauricio
> 
> 2010/3/23 André Maldonado 
> 
>> Hy all.
>>
>> How can I delete documents when using DataImportHandler on a delta
>> import?
>>
>> Thank's
>>
>> "Então aproximaram-se os que estavam no barco, e adoraram-no, dizendo: És
>> verdadeiramente o Filho de Deus." (Mateus 14:33)
>>
> 
> 

-- 
View this message in context: 
http://old.nabble.com/DIH---Deleting-documents-tp28004771p28005199.html
Sent from the Solr - User mailing list archive at Nabble.com.



Impossible Boost Query?

2010-03-23 Thread blargy

I was wondering if this is even possible. I'll try to explain what I'm trying
to do to the best of my ability. 

Ok, so our site has a bunch of products that are sold by any number of
sellers. Currently when I search for some product I get back all products
matching that search term but the problem is there may be multiple products
sold by the same seller that are all closely related, therefore their scores
are related. So basically the search ends up with results that are all
closely clumped together by the same seller but I would much rather prefer
to distribute these results across sellers (given each seller a fair shot to
sell their goods). 

Is there any way to add some boost query for example that will start
weighing products lower when their seller has already been listed a few
times. For example, right now I have

Product foo by Seller A
Product foo by Seller A
Product foo by Seller A
Product foo by Seller B
Product foo by Seller B
Product foo by Seller B
Product foo by Seller C
Product foo by Seller C
Product foo by Seller C

where each result is very close in score. I would like something like this

Product foo by Seller A
Product foo by Seller B
Product foo by Seller C
Product foo by Seller A
Product foo by Seller B
Product foo by Seller C


basically distributing the results over the sellers. Is something like this
possible? I don't care if the solution involves a boost query or not. I just
want some way to distribute closely related documents.

Thanks!!!
-- 
View this message in context: 
http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Impossible Boost Query?

2010-03-23 Thread blargy

Possibly. How can I install this as a contrib or do I need to actually
perform the patch?


Otis Gospodnetic wrote:
> 
> Would Field Collapsing from SOLR-236 do the job for you?
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> - Original Message 
>> From: blargy 
>> To: solr-user@lucene.apache.org
>> Sent: Tue, March 23, 2010 2:39:48 PM
>> Subject: Impossible Boost Query?
>> 
>> 
> I was wondering if this is even possible. I'll try to explain what I'm 
>> trying
> to do to the best of my ability. 
> 
> Ok, so our site has a bunch 
>> of products that are sold by any number of
> sellers. Currently when I search 
>> for some product I get back all products
> matching that search term but the 
>> problem is there may be multiple products
> sold by the same seller that are 
>> all closely related, therefore their scores
> are related. So basically the 
>> search ends up with results that are all
> closely clumped together by the same 
>> seller but I would much rather prefer
> to distribute these results across 
>> sellers (given each seller a fair shot to
> sell their goods). 
> 
> Is there 
>> any way to add some boost query for example that will start
> weighing products 
>> lower when their seller has already been listed a few
> times. For example, 
>> right now I have
> 
> Product foo by Seller A
> Product foo by Seller 
>> A
> Product foo by Seller A
> Product foo by Seller B
> Product foo by Seller 
>> B
> Product foo by Seller B
> Product foo by Seller C
> Product foo by Seller 
>> C
> Product foo by Seller C
> 
> where each result is very close in score. I 
>> would like something like this
> 
> Product foo by Seller A
> Product foo by 
>> Seller B
> Product foo by Seller C
> Product foo by Seller A
> Product foo by 
>> Seller B
> Product foo by Seller C
> 
> 
> basically distributing the 
>> results over the sellers. Is something like this
> possible? I don't care if 
>> the solution involves a boost query or not. I just
> want some way to 
>> distribute closely related documents.
> 
> Thanks!!!
> -- 
> View this 
>> message in context: 
>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>>  
>> target=_blank 
>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html
> Sent 
>> from the Solr - User mailing list archive at Nabble.com.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Impossible-Boost-Query--tp28005354p2800.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Impossible Boost Query?

2010-03-23 Thread blargy

Maybe a better question is... how can I install this and will it work with
1.4?

Thanks


blargy wrote:
> 
> Possibly. How can I install this as a contrib or do I need to actually
> perform the patch?
> 
> 
> Otis Gospodnetic wrote:
>> 
>> Would Field Collapsing from SOLR-236 do the job for you?
>> 
>> Otis
>> 
>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>> Hadoop ecosystem search :: http://search-hadoop.com/
>> 
>> 
>> 
>> - Original Message 
>>> From: blargy 
>>> To: solr-user@lucene.apache.org
>>> Sent: Tue, March 23, 2010 2:39:48 PM
>>> Subject: Impossible Boost Query?
>>> 
>>> 
>> I was wondering if this is even possible. I'll try to explain what I'm 
>>> trying
>> to do to the best of my ability. 
>> 
>> Ok, so our site has a bunch 
>>> of products that are sold by any number of
>> sellers. Currently when I search 
>>> for some product I get back all products
>> matching that search term but the 
>>> problem is there may be multiple products
>> sold by the same seller that are 
>>> all closely related, therefore their scores
>> are related. So basically the 
>>> search ends up with results that are all
>> closely clumped together by the same 
>>> seller but I would much rather prefer
>> to distribute these results across 
>>> sellers (given each seller a fair shot to
>> sell their goods). 
>> 
>> Is there 
>>> any way to add some boost query for example that will start
>> weighing products 
>>> lower when their seller has already been listed a few
>> times. For example, 
>>> right now I have
>> 
>> Product foo by Seller A
>> Product foo by Seller 
>>> A
>> Product foo by Seller A
>> Product foo by Seller B
>> Product foo by Seller 
>>> B
>> Product foo by Seller B
>> Product foo by Seller C
>> Product foo by Seller 
>>> C
>> Product foo by Seller C
>> 
>> where each result is very close in score. I 
>>> would like something like this
>> 
>> Product foo by Seller A
>> Product foo by 
>>> Seller B
>> Product foo by Seller C
>> Product foo by Seller A
>> Product foo by 
>>> Seller B
>> Product foo by Seller C
>> 
>> 
>> basically distributing the 
>>> results over the sellers. Is something like this
>> possible? I don't care if 
>>> the solution involves a boost query or not. I just
>> want some way to 
>>> distribute closely related documents.
>> 
>> Thanks!!!
>> -- 
>> View this 
>>> message in context: 
>>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>>>  
>>> target=_blank 
>>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html
>> Sent 
>>> from the Solr - User mailing list archive at Nabble.com.
>> 
>> 
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Impossible-Boost-Query--tp28005354p28007880.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Impossible Boost Query?

2010-03-23 Thread blargy

Thanks but Im not quite show on how to apply the patch. I just use the
packaged solr-1.4.0.war in my deployment (no compiling, etc). Is there a way
I can patch the war file?

Any instructions would be greatly appreciated. Thanks


Otis Gospodnetic wrote:
> 
> You'd likely want to get the latest patch and trunk and try applying.
> 
> Otis
> 
> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
> Hadoop ecosystem search :: http://search-hadoop.com/
> 
> 
> 
> - Original Message 
>> From: blargy 
>> To: solr-user@lucene.apache.org
>> Sent: Tue, March 23, 2010 6:10:22 PM
>> Subject: Re: Impossible Boost Query?
>> 
>> 
> Maybe a better question is... how can I install this and will it work 
>> with
> 1.4?
> 
> Thanks
> 
> 
> blargy wrote:
>> 
>> Possibly. 
>> How can I install this as a contrib or do I need to actually
>> perform the 
>> patch?
>> 
>> 
>> Otis Gospodnetic wrote:
>>> 
>> 
>>> Would Field Collapsing from SOLR-236 do the job for 
>> you?
>>> 
>>> Otis
>>> 
>>> Sematext :: 
>> href="http://sematext.com/"; target=_blank >http://sematext.com/ :: Solr - 
>> Lucene - Nutch
>>> Hadoop ecosystem search :: 
>> href="http://search-hadoop.com/"; target=_blank 
>> >http://search-hadoop.com/
>>> 
>>> 
>>> 
>> 
>>> - Original Message 
>>>> From: blargy <
>> ymailto="mailto:zman...@hotmail.com"; 
>> href="mailto:zman...@hotmail.com";>zman...@hotmail.com>
>>>> 
>> To: 
>> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>>> 
>> Sent: Tue, March 23, 2010 2:39:48 PM
>>>> Subject: Impossible Boost 
>> Query?
>>>> 
>>>> 
>>> I was wondering if this is 
>> even possible. I'll try to explain what I'm 
>>>> trying
>>> 
>> to do to the best of my ability. 
>>> 
>>> Ok, so our site has a 
>> bunch 
>>>> of products that are sold by any number of
>>> 
>> sellers. Currently when I search 
>>>> for some product I get back 
>> all products
>>> matching that search term but the 
>>>> 
>> problem is there may be multiple products
>>> sold by the same seller 
>> that are 
>>>> all closely related, therefore their 
>> scores
>>> are related. So basically the 
>>>> search ends up 
>> with results that are all
>>> closely clumped together by the same 
>> 
>>>> seller but I would much rather prefer
>>> to distribute 
>> these results across 
>>>> sellers (given each seller a fair shot 
>> to
>>> sell their goods). 
>>> 
>>> Is there 
>> 
>>>> any way to add some boost query for example that will 
>> start
>>> weighing products 
>>>> lower when their seller has 
>> already been listed a few
>>> times. For example, 
>>>> right 
>> now I have
>>> 
>>> Product foo by Seller A
>>> Product 
>> foo by Seller 
>>>> A
>>> Product foo by Seller A
>>> 
>> Product foo by Seller B
>>> Product foo by Seller 
>>>> 
>> B
>>> Product foo by Seller B
>>> Product foo by Seller 
>> C
>>> Product foo by Seller 
>>>> C
>>> Product foo 
>> by Seller C
>>> 
>>> where each result is very close in score. I 
>> 
>>>> would like something like this
>>> 
>>> Product 
>> foo by Seller A
>>> Product foo by 
>>>> Seller B
>>> 
>> Product foo by Seller C
>>> Product foo by Seller A
>>> Product 
>> foo by 
>>>> Seller B
>>> Product foo by Seller C
>>> 
>> 
>>> 
>>> basically distributing the 
>>>> 
>> results over the sellers. Is something like this
>>> possible? I don't 
>> care if 
>>>> the solution involves a boost query or not. I 
>> just
>>> want some way to 
>>>> distribute closely related 
>> documents.
>>> 
>>> Thanks!!!
>>> -- 
>>> View 
>> this 
>>>> message in context: 
>>>> href="
>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>>  
>> target=_blank 
>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html"; 
>> 
>>>> target=_blank 
>>>> >
>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html";
>>  
>> target=_blank 
>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28005354.html
>>> 
>> Sent 
>>>> from the Solr - User mailing list archive at 
>> Nabble.com.
>>> 
>>> 
>> 
>> 
> 
> -- 
> View this 
>> message in context: 
>> href="http://old.nabble.com/Impossible-Boost-Query--tp28005354p28007880.html";
>>  
>> target=_blank 
>> >http://old.nabble.com/Impossible-Boost-Query--tp28005354p28007880.html
> Sent 
>> from the Solr - User mailing list archive at Nabble.com.
> 
> 

-- 
View this message in context: 
http://old.nabble.com/Impossible-Boost-Query--tp28005354p28008495.html
Sent from the Solr - User mailing list archive at Nabble.com.



SOLR-236 patch with version 1.4

2010-03-23 Thread blargy

Is the field collapsing patch (236) not compatible with Solr 1.4?

$ patch -p0 -i ~/Desktop/SOLR-236.patch 
patching file src/test/test-files/solr/conf/solrconfig-fieldcollapse.xml
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/DocumentGroupCountCollapseCollectorFactory.java
patching file
src/java/org/apache/solr/search/fieldcollapse/CollapseGroup.java
patching file
src/java/org/apache/solr/search/fieldcollapse/AdjacentDocumentCollapser.java
patching file src/java/org/apache/solr/search/DocSetAwareCollector.java
patching file src/java/org/apache/solr/search/SolrIndexSearcher.java
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 530.
Hunk #3 FAILED at 586.
Hunk #4 FAILED at 610.
Hunk #5 FAILED at 663.
Hunk #6 FAILED at 705.
Hunk #7 FAILED at 716.
Hunk #8 FAILED at 740.
Hunk #9 FAILED at 1255.
9 out of 9 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/search/SolrIndexSearcher.java.rej
patching file
src/java/org/apache/solr/handler/component/CollapseComponent.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/CollapseCollectorFactory.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/AggregateFunction.java
patching file
src/test/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapserTest.java
patching file src/java/org/apache/solr/util/DocSetScoreCollector.java
patching file
src/java/org/apache/solr/search/fieldcollapse/AbstractDocumentCollapser.java
patching file
src/java/org/apache/solr/search/fieldcollapse/util/Counter.java
patching file
src/java/org/apache/solr/search/fieldcollapse/DocumentCollapser.java
patching file
src/solrj/org/apache/solr/client/solrj/response/FieldCollapseResponse.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/FieldValueCountCollapseCollectorFactory.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/CollapseCollector.java
patching file
src/test/org/apache/solr/search/fieldcollapse/DistributedFieldCollapsingIntegrationTest.java
patching file
src/test/org/apache/solr/client/solrj/response/FieldCollapseResponseTest.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/MaxFunction.java
patching file src/test/test-files/solr/conf/solrconfig.xml
Hunk #1 FAILED at 396.
Hunk #2 FAILED at 418.
2 out of 2 hunks FAILED -- saving rejects to file
src/test/test-files/solr/conf/solrconfig.xml.rej
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/CollapseContext.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/MinFunction.java
patching file
src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 42.
Hunk #3 FAILED at 58.
Hunk #4 FAILED at 125.
Hunk #5 FAILED at 298.
5 out of 5 hunks FAILED -- saving rejects to file
src/solrj/org/apache/solr/client/solrj/response/QueryResponse.java.rej
patching file src/test/test-files/fieldcollapse/testResponse.xml
patching file
src/java/org/apache/solr/search/fieldcollapse/NonAdjacentDocumentCollapser.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/AbstractCollapseCollector.java
patching file src/java/org/apache/solr/handler/component/QueryComponent.java
Hunk #1 FAILED at 522.
1 out of 1 hunk FAILED -- saving rejects to file
src/java/org/apache/solr/handler/component/QueryComponent.java.rej
patching file
src/java/org/apache/solr/search/fieldcollapse/DocumentCollapseResult.java
patching file
src/test/org/apache/solr/handler/component/CollapseComponentTest.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/DocumentFieldsCollapseCollectorFactory.java
patching file src/test/test-files/solr/conf/schema-fieldcollapse.xml
patching file src/common/org/apache/solr/common/params/CollapseParams.java
patching file src/solrj/org/apache/solr/client/solrj/SolrQuery.java
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 50.
Hunk #3 FAILED at 76.
Hunk #4 FAILED at 148.
Hunk #5 FAILED at 197.
Hunk #6 FAILED at 665.
Hunk #7 FAILED at 721.
7 out of 7 hunks FAILED -- saving rejects to file
src/solrj/org/apache/solr/client/solrj/SolrQuery.java.rej
patching file
src/test/org/apache/solr/search/fieldcollapse/AdjacentCollapserTest.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/AggregateCollapseCollectorFactory.java
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/SumFunction.java
patching file src/java/org/apache/solr/search/DocSetHitCollector.java
Hunk #1 FAILED at 17.
Hunk #2 FAILED at 28.
2 out of 2 hunks FAILED -- saving rejects to file
src/java/org/apache/solr/search/DocSetHitCollector.java.rej
patching file
src/java/org/apache/solr/search/fieldcollapse/collector/aggregate/AverageFunction.java
patching file
src/test/org/apache/solr/search/fieldcollapse/FieldCollapsingIntegrationTest.java

-- 
View this message in context: 
http://old.nabble.com/SOLR-236-patch-with-version-1.4-tp28008954p28008954.html
S

Re: Impossible Boost Query?

2010-03-24 Thread blargy

This sound a little closer to what I want but I don't want fully randomized
results. 

How exactly does this field work? Is it more than just a simple random sort
(order by rand())? What would be nice is if I could randomize documents
within a certain score percentage of each other. Is this available?

Thanks



Lance Norskog-2 wrote:
> 
> Also, there is a 'random' type which generates random numbers. This
> might help you also.
> 
> On Tue, Mar 23, 2010 at 7:18 PM, Lance Norskog  wrote:
>> At this point (and for almost 3 years :) field collapsing is a source
>> patch. You have to check out the Solr trunk from the Apache subversion
>> server, apply the patch with the 'patch' command, and build the new
>> Solr with 'ant'.
>>
>> On Tue, Mar 23, 2010 at 4:13 PM, blargy  wrote:
>>>
>>> Thanks but Im not quite show on how to apply the patch. I just use the
>>> packaged solr-1.4.0.war in my deployment (no compiling, etc). Is there a
>>> way
>>> I can patch the war file?
>>>
>>> Any instructions would be greatly appreciated. Thanks
>>>
>>>
>>> Otis Gospodnetic wrote:
>>>>
>>>> You'd likely want to get the latest patch and trunk and try applying.
>>>>
>>>> Otis
>>>> 
>>>> Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
>>>> Hadoop ecosystem search :: http://search-hadoop.com/
>>>>
>>>>
>>>>
>>>> - Original Message 
>>>>> From: blargy 
>>>>> To: solr-user@lucene.apache.org
>>>>> Sent: Tue, March 23, 2010 6:10:22 PM
>>>>> Subject: Re: Impossible Boost Query?
>>>>>
>>>>>
>>>> Maybe a better question is... how can I install this and will it work
>>>>> with
>>>> 1.4?
>>>>
>>>> Thanks
>>>>
>>>>
>>>> blargy wrote:
>>>>>
>>>>> Possibly.
>>>>> How can I install this as a contrib or do I need to actually
>>>>> perform the
>>>>> patch?
>>>>>
>>>>>
>>>>> Otis Gospodnetic wrote:
>>>>>>
>>>>>
>>>>>> Would Field Collapsing from SOLR-236 do the job for
>>>>> you?
>>>>>>
>>>>>> Otis
>>>>>> 
>>>>>> Sematext ::
>>>>> href="http://sematext.com/"; target=_blank >http://sematext.com/ ::
>>>>> Solr -
>>>>> Lucene - Nutch
>>>>>> Hadoop ecosystem search ::
>>>>> href="http://search-hadoop.com/"; target=_blank
>>>>> >http://search-hadoop.com/
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>> - Original Message 
>>>>>>> From: blargy <
>>>>> ymailto="mailto:zman...@hotmail.com";
>>>>> href="mailto:zman...@hotmail.com";>zman...@hotmail.com>
>>>>>>>
>>>>> To:
>>>>> href="mailto:solr-user@lucene.apache.org";>solr-user@lucene.apache.org
>>>>>>>
>>>>> Sent: Tue, March 23, 2010 2:39:48 PM
>>>>>>> Subject: Impossible Boost
>>>>> Query?
>>>>>>>
>>>>>>>
>>>>>> I was wondering if this is
>>>>> even possible. I'll try to explain what I'm
>>>>>>> trying
>>>>>>
>>>>> to do to the best of my ability.
>>>>>>
>>>>>> Ok, so our site has a
>>>>> bunch
>>>>>>> of products that are sold by any number of
>>>>>>
>>>>> sellers. Currently when I search
>>>>>>> for some product I get back
>>>>> all products
>>>>>> matching that search term but the
>>>>>>>
>>>>> problem is there may be multiple products
>>>>>> sold by the same seller
>>>>> that are
>>>>>>> all closely related, therefore their
>>>>> scores
>>>>>> are related. So basically the
>>>>>>> search ends up
>>>>> with results that are all
>>>>>> closely clumped together by the same
>>>>>
>>>>>>> seller but I would much rather pre

Field Collapsing SOLR-236

2010-03-24 Thread blargy

Has anyone had any luck with the field collapsing patch (SOLR-236) with Solr
1.4? I tried patching my version of 1.4 with no such luck.

Thanks
-- 
View this message in context: 
http://old.nabble.com/Field-Collapsing-SOLR-236-tp28019949p28019949.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: Impossible Boost Query?

2010-03-25 Thread Blargy

Ok so this is basically just a random sort.

Anyway I can get this to randomly sort documents that closely related and
not the rest of the results?
-- 
View this message in context: 
http://n3.nabble.com/Impossible-Boost-Query-tp472080p580214.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH best pratices question

2010-03-26 Thread Blargy

I have a items table on db1 and and item_descriptions table on db2. 

The items table is very small in the sense that it has small columns while
the item_descriptions table has a very large text field column. Both tables
are around 7 million rows

What is the best way to import these into one document? 


  
 
   


Or


 
 
   


Or is there an alternative way? Maybe using the second way with a
CachedSqlEntityProcessor for the item entity?

Any thoughts are greatly appreciated. Thanks!
-- 
View this message in context: 
http://n3.nabble.com/DIH-best-pratices-question-tp677568p677568.html
Sent from the Solr - User mailing list archive at Nabble.com.


Multicore process

2010-03-28 Thread Blargy

I was hoping someone could explain to me how your Solr multicore process
currently operates.

This is what I am thinking about and I was hoping I could get some
ideas/suggestions. 

I have a master/slave setup where the master will be doing all the indexing
via DIH. Ill be doing a full-import every day or two with delta-imports
being run throughout the day. I want to be able to have have an offline core
that will be responsible for the the full-importing and when finished it
will be swapped with the live core. While the full-import may take a few
hours on the offline core Ill have delta-imports running on the live core.
All slaves will be replicating from the master live core. Any comments on
this logic?

Ok, now to the implementation. I've been playing around with the core admin
all day today but Im still unsure on the best way to accomplish the above
process. Im guessing first I need to create a new core. Then Ill have to
issue a DIH full-import against this new core. Then Ill run a swap command
against offline and live cores which should switch the cores. This sounds
about right but then Ill have a core named live which will not actually be
live anymore right? Is there anyway around this?

When setting up the new core what should I use for my instanceDir and
dataDir? At first I had something like this

home/items/data/live/index
home/items/data/offline/index

but I dont think this is right. Should I have something like this?

home/items/data/index
home/items-offline/data/index

When creating a new core from an existing core do the index files get
copied? 

Can someone please explain to me this whole process. Thanks!



-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p681929.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore process

2010-03-28 Thread Blargy

Also, how do I share the same schema and config files?
-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p681936.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore process

2010-03-28 Thread Blargy

Mark, first off thanks for the response. Im glad someone is around today ;)

So this is what I have so far:


  


  


So my directory structure is:

home/items/data/live/index
home/items/data/offline/index

So after playing around I see that swap literally just swaps the dataDir in
solr.xml. I have peristent = true so it saves which core is pointing to
which dataDir. So where I think I am a little confused is the naming
convention I used above. In this type of setup there is no such thing as a
live or offline dataDir as at any point they can be one or the other... the
core name is what really matters. So Im guessing this naming convention
makes a little more sense


  


  


Sine the actually dataDir name really doesnt mean anything. Is this the
correct reasoning? 
-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p682088.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore process

2010-03-28 Thread Blargy

Ok great... its starting to make sense. Now Im just a little confused on
replication.

So I had previously had my slave configuration as follows

 

  commit
  startup
  schema.xml,stopwords.txt


  
   
http://${replication.host}:8983/solr/${solr.core.instanceDir}replication
  
  ${replication.interval}

  

But Im assuming Ill need to change this now? I really only want my "live"
data to be replicated so how can I configure this? There is no real need for
the slaves to replicate the "offline" data.

FYI my dir structure looks like this:

home/items/data/core0/index
home/items/data/core1/index

-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p682141.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore process

2010-03-28 Thread Blargy

I just thought about this...

Im guessing my slaves should always be replicating from the "live" master
core: (http://localhost:8983/solr/items-live/replication). 

So my master solr will have a directory structure like this:

home/items/data/core0/index
home/items/data/core1/index

and at any point the "live" core could be physically located at core0 or
core1

Whereas my slave solr will have a directory structure like this:
home/items/data/index

Is this close?



-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p682149.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore process

2010-03-28 Thread Blargy

Nice. Almost there...

So it appears then that I will need two different solr.xml configurations.
One for the master defining core0 and core1 and one for the slave with the
default configuration. Is there anyway to specify master/slave specific
settings in solr.xml or will I have to have 2 different versions?

Not as big of a deal but in the future when I have more than 1 type of
document (currently "items") how would I configure solrconfig.xml for
replication? For example I have this as of now:


 http://localhost:8983/solr/items-live/replication


Which is fine... but what happens when I have another object say "users"


 http://localhost:8983/solr/users-live/replication


I guess when it comes down to that I will have to have 2 different versions
of solrconfig.xml too?

ps. I can't thank you enough for your time
-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p682176.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore process

2010-03-28 Thread Blargy

Thanks that makes perfect sense for solrconfig.xml however I dont see that
sort of functionality for solr.xml.

Im guessing Ill need to manage 2 different versions of solr.xml

Version 1 master

  


  


Version 2 slave

  

  


And my app will always be pointing to http://slave-host:8983/solr/items

This isnt the biggest deal but if there is a better/alternative way I would
love to know.

Mark, I see you work for LucidImagination. Does the Lucid solr distribution
happen to come with Solr-236 patch (Field Collapsing). I know it has some
extras thrown in there but not quite sure of the exact nature of it. Im
already using the LucidKStemmer ;)
-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p682205.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multicore process

2010-03-28 Thread Blargy


Mark Miller-3 wrote:
> 
> Hmmm...but isn't your slave on a different machine? Every install is
> going to need a solr.xml, no way around that..
> 

Of course its on another machine. I was just hoping to only have 1 version
of solr.xml checked into our source control and that I can change which
configuration to use by passing some sort of java property on the command
line. Like I said its no real probelm.. im just getting picky now ;) Ill
just have to make sure that during the deploy that the correct configuration
gets copied to home/solr.xml

Thanks again!


-- 
View this message in context: 
http://n3.nabble.com/Multicore-process-tp681929p682225.html
Sent from the Solr - User mailing list archive at Nabble.com.


Optimize after delta-import (DIH)

2010-03-29 Thread Blargy

According to the wiki: http://wiki.apache.org/solr/DataImportHandler#Commands
the delta-import command will accept the same clean, commit and optimize
parameters that the full-import command takes but I am my index keeps saying
its not optimized.

[java] INFO: [items] webapp=/solr path=/dataimport
params={optimize=true&clean=true&commit=true&command=delta-import} status=0
QTime=1 

Also can someone explain to me exactly what the clean command does? The wiki
states: "Tells whether to clean up the index before the indexing is started"
but thats kind of vague. What does it actually do?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Optimize-after-delta-import-DIH-tp685147p685147.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH after import hooks

2010-03-30 Thread Blargy

Can you use a RunExecutableListener on DIH to run external scripts after a
full-import/delta-import just like you can use on the DirectUpdateHandler2?

If not, is there any alternative way to achieve this functionality? Thanks
-- 
View this message in context: 
http://n3.nabble.com/DIH-after-import-hooks-tp686482p686482.html
Sent from the Solr - User mailing list archive at Nabble.com.


MoreLikeThis function queries

2010-04-01 Thread Blargy

Are function queries possible using the MLT request handler? How about using
the _val_ hack? Thanks for your help
-- 
View this message in context: 
http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p692377.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis function queries

2010-04-02 Thread Blargy

Bueller? Anyone? :)
-- 
View this message in context: 
http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p693648.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis function queries

2010-04-02 Thread Blargy

Fair enough :)
-- 
View this message in context: 
http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p693872.html
Sent from the Solr - User mailing list archive at Nabble.com.


Related terms/combined terms

2010-04-02 Thread Blargy

Not sure of the exact vocabulary I am looking for so I'll try to explain
myself.

Given a search term is there anyway to return back a list of related/grouped
keywords (based on the current state of the index) for that term. 

For example say I have a sports catalog and I search for "Callaway". Is
there anything that could give me back

"Callaway Driver"
"Callaway Golf Balls"
"Callaway Hat"
"Callaway Glove"

Since these words are always grouped to together/related. Note sure if
something like this is even possible.

Thanks

-- 
View this message in context: 
http://n3.nabble.com/Related-terms-combined-terms-tp694083p694083.html
Sent from the Solr - User mailing list archive at Nabble.com.


an OR filter query

2010-04-04 Thread Blargy

Is there anyway to use a filter query as an OR clause?

For example I have product listings and I want to be able to filter out
mature items by default. To do this I added: 


  mature:false


But then I can never return any mature items because appending
fq=mature:true will obviously return 0 results because no item can both be
mature and non-mature. 

I can get around this using defaults:


  mature:false


But this is a little hacky because anytime I want to include mature items
with non-mature items I need to explicitly set fq as a blank string.

So is there any better way to do this? Thanks
-- 
View this message in context: 
http://n3.nabble.com/an-OR-filter-query-tp696579p696579.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: MoreLikeThis function queries

2010-04-05 Thread Blargy

Ok its now monday and everyone should have had their nice morning cup of
coffee :)
-- 
View this message in context: 
http://n3.nabble.com/MoreLikeThis-function-queries-tp692377p698304.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Related terms/combined terms

2010-04-05 Thread Blargy

Thanks for the response Mitch. 

I'm not too sure how well this will work for my needs but Ill certainly play
around with it. I think something more along the lines of Ahmet's solution
is what I was looking for. 
-- 
View this message in context: 
http://n3.nabble.com/Related-terms-combined-terms-tp694083p698327.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Related terms/combined terms

2010-04-05 Thread Blargy

Ahmet thanks, this sounds like what I was looking for. 

Would one recommend using the TermsComponent prefix search or the Faceted
prefix search for this sort of functionality. I know for auto-suggest
functionality the generally consensus has been leaning towards the Faceted
prefix search over the TermsComponent. Wondering if this holds true for this
use case.

Thanks again
-- 
View this message in context: 
http://n3.nabble.com/Related-terms-combined-terms-tp694083p698349.html
Sent from the Solr - User mailing list archive at Nabble.com.


DIH multiple queries per sub-entity?

2010-04-06 Thread Blargy

I am going through some of my DIH verbose output and I noticed that for each
sub entity it appear to be query the DB multiple times and it keeps
increasing at a linear fashion!

For example:


   .
   
   
   select * from item_categories where item_id=1
   
   ...
   


   .
   
   
   select * from item_categories where item_id=2
   
   
   select * from item_categories where item_id=2
   
...
   


Notice how document#2 has to queries for item_category. Document#3 has 3...
Document #1000 has 1000 queries. Is this normal? Is this just how the output
is displayed?
-- 
View this message in context: 
http://n3.nabble.com/DIH-multiple-queries-per-sub-entity-tp701038p701038.html
Sent from the Solr - User mailing list archive at Nabble.com.


Bucketing a price field

2010-04-06 Thread Blargy

What would be the best way to do range bucketing on a price field? 

I'm sort of taking the example from the Solr 1.4 book and I was thinking
about using a PatternTokenizerFactory  with a SynonymFilterFactory. 

Is there a better way?

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Bucketing-a-price-field-tp701801p701801.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Bucketing a price field

2010-04-07 Thread Blargy

Duh, didnt even think of that. This will probably be the easy way for now
since we are only using a small number of predefined ranges.

Thanks for the reply
-- 
View this message in context: 
http://n3.nabble.com/Bucketing-a-price-field-tp701801p703169.html
Sent from the Solr - User mailing list archive at Nabble.com.


Best practice to handle misspellings

2010-04-07 Thread Blargy

Whats is the best way to handle misspellings? Complete ignore them and
suggest alternative searches or some sort of fuzzy matching? 

Also, is it possible to use fuzzy matching using the dismax request handler? 

Thanks
-- 
View this message in context: 
http://n3.nabble.com/Best-practice-to-handle-misspellings-tp704006p704006.html
Sent from the Solr - User mailing list archive at Nabble.com.


Need help with StackOverflowError

2010-04-07 Thread Blargy

My last few delta-imports via DIH have been failing with a StackOverFlow
error. Has anyone else encountered this why trying to importing? I don't
even see any relevant information in the stack trace. Can anyone lend some
suggestions. Thanks...

pr 7, 2010 2:13:34 PM org.apache.solr.handler.dataimport.DataImporter
doDeltaImport
SEVERE: Delta Import Failed
java.lang.StackOverflowError
at sun.nio.cs.UTF_8$Decoder.decodeLoop(UTF_8.java:324)
at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:561)
at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:158)
at java.lang.StringCoding.decode(StringCoding.java:191)
at java.lang.String.(String.java:451)
at java.lang.String.(String.java:523)
at java.net.SocketOutputStream.socketWrite0(Native Method)
at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3296)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:1941)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2114)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2690)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.RowDataDynamic.close(RowDataDynamic.java:201)
at com.mysql.jdbc.ResultSetImpl.realClose(ResultSetImpl.java:7624)
at com.mysql.jdbc.ResultSetImpl.close(ResultSetImpl.java:908)
at com.mysql.jdbc.StatementImpl.realClose(StatementImpl.java:2364)
at
com.mysql.jdbc.ConnectionImpl.closeAllOpenStatements(ConnectionImpl.java:1583)
at com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4454)
at com.mysql.jdbc.ConnectionImpl.cleanup(ConnectionImpl.java:1359)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2723)
at
com.mysql.jdbc.StatementImpl.executeSimpleNonQuery(StatementImpl.java:1545)
at com.mysql.jdbc.

Re: Need help with StackOverflowError

2010-04-07 Thread Blargy

If it helps at all to mention, I manually updated the last_index_time in
conf/dataimport.properties so I could select a smaller subset and the
delta-import worked which leads me to believe there is nothing wrong with my
DIH delta queries themselves. There must be something wrong with my dataset
that ends up in this circular recursion?

Any thoughts?
-- 
View this message in context: 
http://n3.nabble.com/Need-help-with-StackOverflowError-tp704451p705022.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Need help with StackOverflowError

2010-04-08 Thread Blargy

Also, If i remove my deletedPkQuery on the root entity the delta-import will
complete successfully. Does anyone have any idea how a deletedPkQuery would
end up in this circular StackOverflowError?

FYI.

I have a logical model called "item" and whenever an item gets deleted it
gets moved over to the "deleted_items" table. Here is my deletedPkQuery

... deletedPkQuery="select id from deleted_items where updated_on >
'${dataimporter.last_index_time}'"
-- 
View this message in context: 
http://n3.nabble.com/Need-help-with-StackOverflowError-tp704451p706618.html
Sent from the Solr - User mailing list archive at Nabble.com.


  1   2   >