questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-15 Thread jiag
Hello everyone :)

I have a product called "xbox" indexed, and when the user search for
either "x-box" or "x box" i want the "xbox" product to be
returned.  I'm new to Solr, and from reading online, I thought I need
to use WordDelimiterFilterFactory for "x-box" case, and
WordBreakSolrSpellChecker for "x box" case. Is this correct?

(1) In my schema file, this is what I changed:


But I don't see the xbox product returned when the search term is
"x-box", so I must have missed something

(2) I tried to use  WordBreakSolrSpellChecker together with
DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
never got used:


wc_textSpell


  default
  spellCheck
  solr.DirectSolrSpellChecker
  internal
0.3
2
1
5
3
0.01
0.004

 
wordbreak
solr.WordBreakSolrSpellChecker
spellCheck
true
true
10
  
  

  

SpellCheck
true
default
wordbreak
 true
false
10
true
false


  wc_spellcheck

  

I tried to build the dictionary this way:
http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
but the response returned is this:


0
0

true
true


build



What's the correct way to build the dictionary?
Even though my requestHandler's name="/spellcheck", i wasn't able to
use
http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
.. is there something wrong with my definition above?

(3) I also tried to use WordBreakSolrSpellChecker without the
DirectSolrSpellChecker as shown below:


  wc_textSpell

default
solr.WordBreakSolrSpellChecker
spellCheck
true
true
10
  
   

   

SpellCheck
true
default

 true
false
10
true
false


  wc_spellcheck

  

And still unable to see WordBreakSolrSpellChecker being called anywhere.

Would someone kindly help me?

Many thanks,
Jia


Re: questions on Solr WordBreakSolrSpellChecker and WordDelimiterFilterFactory

2014-07-17 Thread jiag
Hi Ahmet,

using  or  didn't
make any difference. Still running into the same issues aforementioned :(

Thanks,
Jia

On 7/16/2014, "Ahmet Arslan"  wrote:

>Hi Jia,
>
>What happens when you use 
>
> 
>
>instead of 
>
> 
>
>Ahmet
>
>
>On Wednesday, July 16, 2014 3:07 AM, "j...@ece.ubc.ca"  wrote:
>
>
>
>Hello everyone :)
>
>I have a product called "xbox" indexed, and when the user search for
>either "x-box" or "x box" i want the "xbox" product to be
>returned.  I'm new to Solr, and from reading online, I thought I need
>to use WordDelimiterFilterFactory for "x-box" case, and
>WordBreakSolrSpellChecker for "x box" case. Is this correct?
>
>(1) In my schema file, this is what I changed:
>generateNumberParts="1" catenateWords="1" catenateNumbers="1"
>catenateAll="1" splitOnCaseChange="0" preserveOriginal="1"/>
>
>But I don't see the xbox product returned when the search term is
>"x-box", so I must have missed something
>
>(2) I tried to use  WordBreakSolrSpellChecker together with
>DirectSolrSpellChecker as shown below, but the WordBreakSolrSpellChecker
>never got used:
>
>class="solr.SpellCheckComponent">
>    wc_textSpell
>
>    
>      default
>      spellCheck
>      solr.DirectSolrSpellChecker
>      internal
>          0.3
>            2
>            1
>            5
>            3
>            0.01
>            0.004
>    
>
>    wordbreak
>    solr.WordBreakSolrSpellChecker
>    spellCheck
>    true
>    true
>    10
>  
>  
>
>  class="org.apache.solr.handler.component.SearchHandler">
>    
>        SpellCheck
>        true
>       default
>        wordbreak
>         true
>       false
>       10
>       true
>       false
>    
>    
>      wc_spellcheck
>    
>  
>
>I tried to build the dictionary this way:
>http://localhost/solr/coreName/select?spellcheck=true&spellcheck.build=true,
>but the response returned is this:
>
>
>0
>0
>
>true
>true
>
>
>build
>
>
>
>What's the correct way to build the dictionary?
>Even though my requestHandler's name="/spellcheck", i wasn't able to
>use
>http://localhost/solr/coreName/spellcheck?spellcheck=true&spellcheck.build=true
>.. is there something wrong with my definition above?
>
>(3) I also tried to use WordBreakSolrSpellChecker without the
>DirectSolrSpellChecker as shown below:
>class="solr.SpellCheckComponent">
>
>  wc_textSpell
>    
>    default
>    solr.WordBreakSolrSpellChecker
>    spellCheck
>    true
>    true
>    10
>  
>   
>
>   class="org.apache.solr.handler.component.SearchHandler">
>    
>        SpellCheck
>        true
>       default
>        
>         true
>       false
>       10
>       true
>       false
>    
>    
>      wc_spellcheck
>    
>  
>
>And still unable to see WordBreakSolrSpellChecker being called anywhere.
>
>Would someone kindly help me?
>
>Many thanks,
>Jia
>


how to combine solr join with boost in Edismax query?

2014-08-25 Thread jiag
Hello everyone :)

I have an index for groupId and one for product. For an input search
keyword, I only want to boost the result if the keyword appears in both
groupId and product indices.
I was able to get Solr join with fq to work with the following syntax:
example: q=searchTerm&fq={!join from=id_1 to=id_2
fromIndex=groupId}searchTerm

But I want to use solr join with bf or bq, does anyone have suggestions
on how to make it work?
(I also use qf, pf, and ps)

I tried the following but failed:
q=searchTerm&bf=({!join from=id_1 to=id_2
fromIndex=groupId}searchTerm)^100

q=searchTerm&bq=({!join from=id_1 to=id_2
fromIndex=groupId}searchTerm)^100

Many thanks
jia


UUIDUpdateProcessorFactory causes repeated documents when uploading csv files?

2015-01-02 Thread jiag
Happy New Year Everyone :)

I am trying to automatically generate document Id when indexing a csv
file that contains multiple lines of documents. The desired case: if the
csv file contains 2 lines (each line is a document), then the index
should contain 2 documents.

 What I observed: If the csv files contains 2 lines, then the index
contains 3 documents, because the 1st document is repeated once, an
example output:

 doc1 
 rank1 
 randomlyGeneratedId1


 doc1 
 rank1 
 randomlyGeneratedId2


 doc2 
 rank2 
 randomlyGeneratedId3


And if the csv file contains 3 lines, then the index contains 6 elements,
because document 1 is repeated 3 times and document 2 is repeated twice,
as following:

 doc1 
 rank1 
 randomlyGeneratedId1


 doc1 
 rank1 
 randomlyGeneratedId2


 doc2 
 rank2 
 randomlyGeneratedId3

 doc1 
 rank1 
 randomlyGeneratedId4


 doc2 
 rank2 
 randomlyGeneratedId5


 doc3 
 rank3 
 randomlyGeneratedId6


Here's what I have done:
1. In my solrConfig:


doc_key





   
autoGenId
   
  
2. in schema.xml:



 id

This problem doesn't exist when I assign an Id field, instead of using
the UUIDUpdateProcessorFactory, so I assumed the problem is there? Looks
like the csv file is processed one line at a time, and the index shows
the entire process: so we see each previous line repeated in the output.
Is there a way to not show the 'appending of previous lines', and
rather just the 'final results' - so the total number of indexed
document would match the input number of documents from the csv file?

Many thanks,
Jia