Re: SolrCloud without NRT and indexing only on the master

2014-07-31 Thread Ramkumar R. Aiyengar
I agree with Erick that this gain you are looking at might not be worth, so
do measure and see if there's a difference.

Also, the next release of Solr is to have some significant improvements
when it comes to CPU usage under heavy indexing load, and we have had at
least one anecdote so far where the throughput has increased by an order of
magnitude, so one option might be to try that out as well and see. See
SOLR-6136 and potentially SOLR-6259 (probably lesser so, depends on your
schema) if you want to try out before the release.

An another option is to use the HDFS directory support in Solr. That way
you can build indices offline and make them available for all your Solr
replicas for search. See batch indexing at
http://www.cloudera.com/content/cloudera-content/cloudera-docs/Search/latest/Cloudera-Search-User-Guide/csug_introducing.html
On 30 Jul 2014 11:54, "Harald Kirsch"  wrote:

> Hi Daniel,
>
> well, I assume there is a performance difference on host B between
>
> a) getting some ready-made segments from host A (master, taking care of
> indexing) to host B (slave, taking care of answering queries)
>
> and
>
> b) host B (along with host A) doing all the work necessary to prepare
> incoming SolrDocument objects into a segment and make it searchable.
>
> I am talking here about a setup where during peak loads the CPUs on host B
> are sweating at >80% and I assume the following:
>
> i) Indexing will draw more than 20% CPU. Thereby it would start competing
> with query answering
>
> ii) Merely copying finished segments to the query-answering node will not
> draw more than 20% CPU and will thereby not compete with query answering.
>
> Index consistency is not an issue, because the number of documents and the
> number of different, hard-to-get-at source we will be indexing will always
> be out-of-sync with the index. Adding and hour or two here is the least of
> my problems.
>
> Harald.
>
> On 30.07.2014 11:58, Daniel Collins wrote:
>
>> Working backwards slightly, what do you think SolrCloud is going to give
>> you, apart from the consistency of the index (which you want to turn off)?
>>   What are "all the other benefits of SolrCloud", if you are querying
>> separate instances that aren't guaranteed to be in sync (since you want to
>> use the traditional-style master-slave for indexing.
>>
>> And secondly, why don't you want to use SolrCloud for indexing everywhere?
>>   Again, what do you think master-slave methodology gains you?  You have
>> said you want all the resources of the slaves to be for querying, which
>> makes sense, but the slaves have to get the new updates somehow, surely?
>> Whether that is from SolrCloud directly, or via a master-slave
>> replication,
>> the work has to be done at some point?
>>
>> If you don't have NRT, and you set your commit frequency to something
>> reasonably large, then I don't see the "cost" of SolrCloud, but I guess it
>> depends on the frequency of your updates.
>>
>>
>> On 30 July 2014 08:22, Harald Kirsch  wrote:
>>
>>  Thanks Erick,
>>>
>>> for the confirmation.
>>>
>>> You say "traditional" but the docs call it "legacy". Not a native speaker
>>> I might misinterpret the meaning slightly but to me it conveys the notion
>>> of "don't use this stuff if you don't have to".
>>>
>>>
>>> "SolrCloud indexes to all nodes all the time, there's no real way to turn
>>> that off."
>>>
>>> which is really a pity when only query-load must be scaled and NRT is not
>>> necessary. :-/
>>>
>>> Harald.
>>>
>>>
>>> On 29.07.2014 18:16, Erick Erickson wrote:
>>>
>>>  bq: What if I don't need NRT and in particular want the slave to use all
 resources for query answering, i.e. only the master shall index. But at
 the
 same time I want all the other benefits of SolrCloud.

 You want all the benefits of SolrCloud without... using SolrCloud?

 Your only two choices are traditional master/slave or SolrCloud.
 SolrCloud
 indexes to all nodes all the time, there's no real way to turn that off.
 You _can_ control the frequency of commits but you can't turn off the
 indexing to all the nodes.

 FWIW,
 Erick


 On Tue, Jul 29, 2014 at 5:41 AM, Mikhail Khludnev <
 mkhlud...@griddynamics.com> wrote:

   I never did it, but always like.

>
> http://lucene.472066.n3.nabble.com/Best-practice-for-
> rebuild-index-in-SolrCloud-td4054574.html
>   From time to time such recipes are mentioned in the list.
>
>
> On Tue, Jul 29, 2014 at 12:39 PM, Harald Kirsch <
> harald.kir...@raytion.com
>
>
>>   wrote:
>>
>
>   Hi all,
>
>>
>> from the Solr documentation I find two options how replication of an
>> indexing is handled:
>>
>> a) SolrCloud indexes on master and all slaves in parallel to support
>> NRT
>> (near realtime search)
>>
>> b) Legacy replication where only the master does the indexing and
>> slave
>> receiv

RE: Identify specific document insert error inside a solrj batch request

2014-07-31 Thread Liram Vardi
Hi Jack,
Thank you for your reply.
This is the Solr stack trace. As you can see, the missing field is "hourOfDay".

Thanks,
Liram

2014-07-30 14:27:54,934 ERROR [qtp-608368492-19] (SolrException.java:108) - 
org.apache.solr.common.SolrException: 
[doc=53b16126--0002-2b03-17ac4d4a07b6] missing required field: hourOfDay
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:189)
at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:94)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
at 
com.checkpoint.solr_plugins.MulticoreUpdateRequestProcessor.processAdd(MulticoreUpdateRequestProcessorFactory.java:152)
at 
org.apache.solr.handler.loader.XMLLoader.processUpdate(XMLLoader.java:246)
at org.apache.solr.handler.loader.XMLLoader.load(XMLLoader.java:173)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at 
com.checkpoint.solr_plugins.MulticoreUpdateRequestProcessor.processAdd(MulticoreUpdateRequestProcessorFactory.java:248)
at 
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:86)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:143)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:123)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:220)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:108)
at 
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:185)
at 
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:111)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:150)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:96)
at 
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1474)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:499)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1086)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:428)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1020)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
   

Re: Querying from solr shards

2014-07-31 Thread Jack Krupansky
That would be two separate queries, one specifying a single core, and the 
other specifying all cores.


Or, if that one ID is unique to that core, just combine the two queries as 
an OR.


If not unique, try to find some field query that would make it unique. If 
even that is not unique, then you do need separate queries.


-- Jack Krupansky

-Original Message- 
From: Smitha Rajiv

Sent: Thursday, July 31, 2014 1:31 AM
To: solr-user@lucene.apache.org
Subject: Querying from solr shards

Hi All,

Currently i am using solr legacy distributed configuration (not solr cloud,
single solr server with multiple shards).

I need to write a query to get one particular document (id specific) from
one shard and all documents from other shards.

Can you please help me to get this query right.

Thanks & Regards,
Smitha S 



Auto suggest with adding accents

2014-07-31 Thread benjelloun
Hello,

i'm trying to autosuggest frensh word with accents,
but if the user write q="gene" it will not suggest "genève", it will suggest
"general","genetic" ...



  suggestDic
  org.apache.solr.spelling.suggest.Suggester
  org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
  suggestFolder
  suggestField  
  true
  true
   suggest/emptyDic.txt

textSuggest
  
  
  

  suggests
  true
  
  suggestDic
  true
  6   
  true
  6 
  true  


  suggests

  
 
The field "suggestField" dont isolate accents.

Thanks for help,

Best regards,
Anass BENJELLOUN




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Index a time/date range

2014-07-31 Thread Ryan Cutter
Great resources, thanks everyone!


On Wed, Jul 30, 2014 at 8:12 PM, david.w.smi...@gmail.com <
david.w.smi...@gmail.com> wrote:

> The wiki page on the technique cleans up some small errors from Hoss’s
> presentation:
> http://wiki.apache.org/solr/SpatialForTimeDurations
>
> But please try Solr trunk which has first-class support for date durations:
> https://issues.apache.org/jira/browse/SOLR-6103
> Soonish I’ll back-port to 4x.
>
> ~ David Smiley
> Freelance Apache Lucene/Solr Search Consultant/Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Wed, Jul 30, 2014 at 7:29 PM, Jost Baron  wrote:
>
>> -BEGIN PGP SIGNED MESSAGE-
>> Hash: SHA1
>>
>> Hi Ryan,
>>
>> On 07/31/2014 01:26 AM, Ryan Cutter wrote:
>> > Is there a way to index time or date ranges?  That is, assume 2
>> > docs:
>> >
>> > #1: date = 2014-01-01 #2: date = 2014-02-01 through 2014-05-01
>> >
>> > Would there be a way to index #2's date as a single field and have
>> > all the search options you usually get with time/date?
>> >
>> > One strategy could be to index the start and stop values
>> > separately.  Just wondering if there's a fancier option out there.
>>
>> Take a look at this:
>>
>>
>> https://people.apache.org/~hossman/spatial-for-non-spatial-meetup-20130117/
>>
>> Regards,
>> Jost
>>
>>
>> -BEGIN PGP SIGNATURE-
>> Version: GnuPG v1
>> Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
>>
>> iEYEARECAAYFAlPZf94ACgkQNme/yCvmvTIp9ACfeuKfCRFuGY/Y2aLH6BxtkS+c
>> kNMAoIcWFuJnnwV8ouajvTUXojR6HiTo
>> =EKfo
>> -END PGP SIGNATURE-
>>
>
>


Re: Auto suggest with adding accents

2014-07-31 Thread Ahmet Arslan
Hi,

What happens when you add ASCIIFoldingFilter to field type definition of 
suggestField?

Ahmet


On Thursday, July 31, 2014 5:49 PM, benjelloun  wrote:
Hello,

i'm trying to autosuggest frensh word with accents,
but if the user write q="gene" it will not suggest "genève", it will suggest
"general","genetic" ...


    
      suggestDic
      org.apache.solr.spelling.suggest.Suggester
      org.apache.solr.spelling.suggest.fst.WFSTLookupFactory
      suggestFolder
      suggestField  
      true
      true
       suggest/emptyDic.txt
    
    textSuggest
  
  
  
    
      suggests
      true
      
      suggestDic
      true
      6  
      true
      6 
      true  
    
    
      suggests
    
  

The field "suggestField" dont isolate accents.

Thanks for help,

Best regards,
Anass BENJELLOUN




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Ranking based on match position in field

2014-07-31 Thread Ahmet Arslan
Hi Tomas,

Sorry for the confusion. That link (open issue) means that, it is a proposed 
and desired functionality. However it didn't included in code base yet.

You could do : 

* ping the author through jira and request to bring patch to trunk
* vote for the issue
* you could try if patch works with current version etc.

http://wiki.apache.org/solr/HowToContribute#Working_With_Patches

Ahmet


On Thursday, July 31, 2014 9:55 AM, Thomas Michael Engelke 
 wrote:
Hi,

thanks for the link. I've upgraded from the used 4.7 to the
recent 4.9 version. I've tried to use the new feature with this query in
the admin interface using edismax:

description:Kühler^~1^5

However,
the result seems to stay the same:


description:Kühler~1^5
description:Kühler~1^5
(+description:kühler~1^5.0)/no_coord
+description:kühler~1^5.0


2.334934 = (MATCH)
weight(description:kühler^5.0 in 4080) [DefaultSimilarity], result of:

2.334934 = score(doc=4080,freq=1.0 = termFreq=1.0
), product of:

0.9994 = queryWeight, product of:
5.0 = boost
6.226491 =
idf(docFreq=64, maxDocs=12099)
0.03212082 = queryNorm
2.3349342 =
fieldWeight in 4080, product of:
1.0 = tf(freq=1.0), with freq of:
1.0
= termFreq=1.0
6.226491 = idf(docFreq=64, maxDocs=12099)
0.375 =
fieldNorm(doc=4080)


2.334934 = (MATCH)
weight(description:kühler^5.0 in 5754) [DefaultSimilarity], result of:

2.334934 = score(doc=5754,freq=1.0 = termFreq=1.0
), product of:

0.9994 = queryWeight, product of:
5.0 = boost
6.226491 =
idf(docFreq=64, maxDocs=12099)
0.03212082 = queryNorm
2.3349342 =
fieldWeight in 5754, product of:
1.0 = tf(freq=1.0), with freq of:
1.0
= termFreq=1.0
6.226491 = idf(docFreq=64, maxDocs=12099)
0.375 =
fieldNorm(doc=5754)


Am I using this feature wrong?

Am
30.07.2014 14:48 schrieb Ahmet Arslan: 

> Hi,
> 
> Please see :
https://issues.apache.org/jira/browse/SOLR-3925 [1]
> 
> Ahmet
> 
> On
Wednesday, July 30, 2014 2:39 PM, Thomas Michael Engelke
 wrote:
> Hi,
> 
> an example. We have 2
records with this data in the same field
> (description):
> 
> 1:
Lufthutze vor Kühler Bj 62-65, DS
> 2: Kühler HY im
> Austausch,
Altteilpfand 250 Euro
> 
> A search with the parameters
>
'description:Kühler' does provide this debug:
> 
> 2.3234584 = (MATCH)
>
weight(description:kühler in 4053) [DefaultSimilarity], result of:
> 
>
2.3234584 = fieldWeight in 4053, product of:
> 1.0 = tf(freq=1.0),
with
> freq of:
> 1.0 = termFreq=1.0
> 6.195889 = idf(docFreq=69,
maxDocs=12637)
> 
> 0.375 = fieldNorm(doc=4053)
> 
> 
> 2.3234584 =
> (MATCH) weight(description:kühler in 5729)
[DefaultSimilarity], result
> of:
> 2.3234584 = fieldWeight in 5729,
product of:
> 1.0 = tf(freq=1.0),
> with freq of:
> 1.0 = termFreq=1.0
>
6.195889 = idf(docFreq=69,
> maxDocs=12637)
> 0.375 =
fieldNorm(doc=5729)
> 
> As you can see, both get
> the exact same
score. However, we would like to rank the second document
> higher, on
the basis that the search term occurs further to the left of
> the
field.
> 
> Is there a component/setting that can do that?




Links:
--
[1] https://issues.apache.org/jira/browse/SOLR-3925


Re: Searching words with spaces for word without spaces in solr

2014-07-31 Thread sunshine glass
I am not clear with this. This link is related to spell check. Can you
elaborate it more ?


On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James 
wrote:

> In addition to the analyzer configuration you're using, you might want to
> also use WordBreakSolrSpellChecker to catch possible matches that can't
> easily be solved through analysis.  For more information, see the section
> for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: sunshine glass [mailto:sunshineglassof2...@gmail.com]
> Sent: Wednesday, July 30, 2014 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Searching words with spaces for word without spaces in solr
>
> This is the new configuration:
>
>  > positionIncrementGap="100">
> >   
> > 
> > 
> >  > outputUnigrams="true" tokenSeparator=""/>
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > language="English" protected="protwords.txt"/>
> >> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> >   
> >   
> > 
> > 
> >  > words="stopwords_text_prime_search.txt" enablePositionIncrements="true"
> />
> >  > outputUnigrams="true" tokenSeparator=""/>
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> >  > language="English" protected="protwords.txt"/>
> >   
> >
> >
> These are current docs in my index:
>
> 
> 
> 2
> Icecream
> 1475063961342705664
> 
> 
> 3
> Ice-cream
> 1475063961344802816
> 
> 
> 1
> Ice Cream
> 1475063961203245056
> 
> 
> 
>
> Query:
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
>
> Response:
>
> 
> 
> 1
> Ice Cream
> 1475063961203245056
> 
> 
> 3
> Ice-cream
> 1475063961344802816
> 
> 
> 
> title:ice cream
> title:ice cream
> 
> (+(title:ice DisjunctionMaxQuery((title:cream/no_coord
> 
> +(title:ice (title:cream))
> 
> 
> 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 
> 
> 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
> [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
> product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.5 = fieldWeight in 2, product of: 1.0 = tf(freq=1.0), with freq
> of: 1.0 = termFreq=1.0 1.0 = idf(docFreq=2, maxDocs=3) 0.5 =
> fieldNorm(doc=2)
> 
> 
>
> Still not working 
>
>
> On Fri, May 30, 2014 at 9:21 PM, Erick Erickson 
> wrote:
>
> > I'd spend some time with the admin/analysis page to understand the exact
> > tokenization going on here. For instance, sequencing the
> > shinglefilterfactory before worddelimiterfilterfactory may produce
> > "interesting" resutls. And then throwing the Snowball factory at it and
> > putting synonyms in front I suspect you're not indexing or searching
> > what you think you are.
> >
> > Second, what happens when you query with &debug=query? That'll show you
> > what the search string looks like.
> >
> > If that doesn't help, please post the results of looking at those things
> > here, that'll provide some information for us to work with.
> >
> > Best,
> > Erick
> >
> >
> > On Fri, May 30, 2014 at 3:32 AM, sunshine glass <
> > sunshineglassof2...@gmail.com> wrote:
> >
> > > Hi Folks,
> > >
> > > Any updates ??
> > >
> > >
> > > On Wed, May 28, 2014 at 12:13 PM, sunshine glass <
> > > sunshineglassof2...@gmail.com> wrote:
> > >
> > > > Dear Team,
> > > >
> > > > How can I handle compound word searches in solr ?.
> > > > How can i search "hand bag" if I have "handbag

Re: Auto suggest with adding accents

2014-07-31 Thread benjelloun
hi,

q="gene"  it suggest "geneve"
ASCIIFoldingFilter work like isolate accent

what i need to suggest is "genève"

any idea?

thanks
best reagards
Anass BENJELLOUN



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Searching words with spaces for word without spaces in solr

2014-07-31 Thread Dyer, James
If a user is searching on "ice cream" but your index has "icecream", you can 
treat this like a spelling error.  WordBreakSolrSpellChecker would identify the 
fact that  while "ice cream" is not in your index, "icecream" and then you can 
re-query for the corrected version without the space.

The problem with solving this with analyers, is that you can analyze 
"ice-cream" as either "ice cream" or "icecream" (split or catenate on hyphen).  
You can even analyze "IceCream > Ice Cream" (catenate on case change).  But how 
is your analyzer going to know that "icecream" should index as two tokens: 
"ice" "cream" ?  You're asking analysis to do too much in this case.  This is 
where spellcheck can bridge the gap.

Of course, if you have a discrete list of words you want split like this, then 
you can do it with analysis using index-time synonyms.  In this case, you need 
to provide it with the list.  See 
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
 for more information.

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: sunshine glass [mailto:sunshineglassof2...@gmail.com] 
Sent: Thursday, July 31, 2014 10:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Searching words with spaces for word without spaces in solr

I am not clear with this. This link is related to spell check. Can you
elaborate it more ?


On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James 
wrote:

> In addition to the analyzer configuration you're using, you might want to
> also use WordBreakSolrSpellChecker to catch possible matches that can't
> easily be solved through analysis.  For more information, see the section
> for it at https://cwiki.apache.org/confluence/display/solr/Spell+Checking
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
> -Original Message-
> From: sunshine glass [mailto:sunshineglassof2...@gmail.com]
> Sent: Wednesday, July 30, 2014 9:38 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Searching words with spaces for word without spaces in solr
>
> This is the new configuration:
>
>  > positionIncrementGap="100">
> >   
> > 
> > 
> >  > outputUnigrams="true" tokenSeparator=""/>
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > language="English" protected="protwords.txt"/>
> >> synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > expand="true"/>
> >   
> >   
> > 
> > 
> >  > words="stopwords_text_prime_search.txt" enablePositionIncrements="true"
> />
> >  > outputUnigrams="true" tokenSeparator=""/>
> >  > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> >  > language="English" protected="protwords.txt"/>
> >   
> >
> >
> These are current docs in my index:
>
> 
> 
> 2
> Icecream
> 1475063961342705664
> 
> 
> 3
> Ice-cream
> 1475063961344802816
> 
> 
> 1
> Ice Cream
> 1475063961203245056
> 
> 
> 
>
> Query:
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
>
> Response:
>
> 
> 
> 1
> Ice Cream
> 1475063961203245056
> 
> 
> 3
> Ice-cream
> 1475063961344802816
> 
> 
> 
> title:ice cream
> title:ice cream
> 
> (+(title:ice DisjunctionMaxQuery((title:cream/no_coord
> 
> +(title:ice (title:cream))
> 
> 
> 0.875 = (MATCH) sum of: 0.4375 = (MATCH) weight(title:ice in 0)
> [DefaultSimilarity], result of: 0.4375 = score(doc=0,freq=2.0 =
> termFreq=2.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.61871845 = fieldWeight
> in 0, product of: 1.4142135 = tf(freq=2.0), with freq of: 2.0 =
> termFreq=2.0 1.0 = idf(docFreq=2, maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 0.4375 = (MATCH) weight(title:cream in 0) [DefaultSimilarity], result of:
> 0.4375 = score(doc=0,freq=2.0 = termFreq=2.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.61871845 = fieldWeight in 0, product of: 1.4142135 =
> tf(freq=2.0), with freq of: 2.0 = termFreq=2.0 1.0 = idf(docFreq=2,
> maxDocs=3) 0.4375 = fieldNorm(doc=0)
> 
> 
> 0.70710677 = (MATCH) sum of: 0.35355338 = (MATCH) weight(title:ice in 2)
> [DefaultSimilarity], result of: 0.35355338 = score(doc=2,freq=1.0 =
> termFreq=1.0 ), product of: 0.70710677 = queryWeight, product of: 1.0 =
> idf(docFreq=2, maxDocs=3) 0.70710677 = queryNorm 0.5 = fieldWeight in 2,
> product of: 1.0 = tf(freq=1.0), with freq of: 1.0 = termFreq=1.0 1.0 =
> idf(docFreq=2, maxDocs=3) 0.5 = fieldNorm(doc=2) 0.35355338 = (MATCH)
> weight(title:cream in 2) [DefaultSimilarity], result of: 0.35355338 =
> score(doc=2,freq=1.0 = termFreq=1.0 ), product of: 0.70710677 =
> queryWeight, product of: 1.0 = idf(docFreq=2, maxDocs=3) 0.70710677 =
> queryNorm 0.5 = fieldWeight in 2, prod

Re: SolrCloud loadbalancing, replication, and failover

2014-07-31 Thread Shawn Heisey
On 7/31/2014 12:58 AM, shuss...@del.aithent.com wrote:
> Thanks for giving great explanation about the memory requirements. Could you 
> tell be what all parameters that I need to change in my SolrConfig.xml to 
> handle large index size. What are the optimal values that I need to use.
>
> My indexed data size is 65 GB (for 8.6 million documents) and I am having 48 
> GB RAM on my server. Whenever I perform delta-indexing, the server become 
> unresponsive while updating the index. 
>
> Following are the changes that I did in solrconfig.xml after going through net
> 6
> 256
> false
> 1000
>
>  
>   10
>   10
>  
>  
> 10
> 
>
> simple
> true
>
> 
>   
> 15000
> true
>   
>   
>   ${solr.data.dir:}
>  
> 
>
> So, please provide your valuable suggestion on this problem

You replied directly to me, not to the list.  I am redirecting this back
to the list.

One of the first things that I would do is change openSearcher to false
in your autoCommit settings.  This will mean that you must take care of
commits yourself when you index, to make documents visible.  If you want
any more suggestions, we'll need to see the entire solrconfig.xml file.

The fact that you don't have enough RAM to cache your whole index could
be a problem.  If 8.6 million documents results in 65GB of index, then
your documents are probably quite large, and that can lead to other
possible challenges, because it usually means that a lot of work must be
done to index a single document.  There are also probably a lot of terms
to match when querying.

I do not know how much of your 48GB has been allocated to the java heap,
which takes away from memory that the operating system can use to cache
index files.

Thanks,
Shawn



How to sync lib directory in SolrCloud?

2014-07-31 Thread P Williams
Hi,

I have an existing collection that I'm trying to add to a new SolrCloud.
 This collection has all the normal files in conf but also has a lib
directory to support the filters schema.xml uses.

wget
https://github.com/projectblacklight/blacklight-jetty/archive/v4.9.0.zip
unzip v4.9.0.zip

I add the configuration to Zookeeper

cd /solr-4.9.0/example/scripts
cloud-scripts/zkcli.sh -cmd upconfig -confname blacklight -zkhost
zk1:2181,zk2:2181,zk3:2181 -confdir
~/blacklight-jetty-4.9.0/solr/blacklight-core/conf/

I try to create the collection
curl "
http://solr1:8080/solr/admin/collections?action=CREATE&name=blacklight&numShards=3&collection.configName=blacklight&replicationFactor=2&maxShardsPerNode=2
"

but it looks like the jars in the lib directory aren't available and this
is what is causing my collection creation to fail.  I guess this makes
sense because it's not one of the files that I added to Zookeeper to share.
 How do I share the lib directory via Zookeeper?

Thanks,
Tricia

[pjenkins@solr1 scripts]$ cloud-scripts/zkcli.sh -cmd upconfig -zkhost
zk1:2181,zk2:2181,zk3:2181 -confdir
~/blacklight-jetty-4.9.0/solr/blacklight-core/conf/ -confname blacklight
INFO  - 2014-07-31 09:28:06.289; org.apache.zookeeper.Environment; Client
environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
INFO  - 2014-07-31 09:28:06.292; org.apache.zookeeper.Environment; Client
environment:host.name=solr1.library.ualberta.ca
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.version=1.7.0_65
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.vendor=Oracle Corporation
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
environment:java.class.path=cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hppc-0.5.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-auth-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/asm-commons-4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-queries-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-memory-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-codec-1.9.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-join-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/joda-time-2.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-codecs-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-common-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/httpmime-4.3.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-hdfs-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/noggit-0.5.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-kuromoji-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/guava-14.0.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-configuration-1.6.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-expressions-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-highlighter-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-annotations-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/asm-4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/dom4j-1.6.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-io-2.3.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/zookeeper-3.4.6.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/spatial4j-0.4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/httpcore-4.3.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/protobuf-java-2.5.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-spatial-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-grouping-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-misc-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-suggest-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-phonetic-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-core-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-cli-1.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/solr-core-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/solr-solrj-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/antlr-runtime-3.5.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/concurrentlinkedhashmap-lru-1.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-queryparser-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/org.restlet.ext.servlet-2.1.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-fileupload-1.2.1.jar:cloud-scri

How to search for phrase "IAE_UPC_0001"

2014-07-31 Thread Paul Rogers
Hi Guys

I have a Solr application searching on data uploaded by Nutch.  The search
I wish to carry out is for a particular document reference contained within
the "url" field, e.g. IAE-UPC-0001.

The problem is is that the file names that comprise the url's are not
consistent, so a url might contain the reference as IAE-UPC-0001 or
IAE_UPC_0001 (ie using either the minus or underscore as the delimiter) but
not both.

I have created the query (in the solr admin interface):

url:"IAE-UPC-0001"

which works (returning the single expected document), as do:

url:"IAE*UPC*0001"
url:"IAE?UPC?0001"

when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as
a delimiter).

However:

url:"IAE_UPC_0001"
url:"IAE*UPC*0001"
url:"IAE?UPC?0001"

do not work (returning zero documents) when the doc ref is in the format
IAE_UPC_0001 (ie using the underscore character as the delimiter).

I'm assuming the underscore is a special character but have tried looking
at the solr wiki but can't find anything to say what the problem is.  Also
the minus sign also has a specific meaning but is nullified by adding the
quotes.

Can anyone suggest what I'm doing wrong?

Many thanks

Paul


Re: Auto suggest with adding accents

2014-07-31 Thread Otis Gospodnetic
You need to do the opposite.  Make sure accents are NOT removed at index &
query time.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jul 31, 2014 at 5:49 PM, benjelloun  wrote:

> hi,
>
> q="gene"  it suggest "geneve"
> ASCIIFoldingFilter work like isolate accent
>
> what i need to suggest is "genève"
>
> any idea?
>
> thanks
> best reagards
> Anass BENJELLOUN
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Auto-suggest-with-adding-accents-tp4150379p4150392.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How to search for phrase "IAE_UPC_0001"

2014-07-31 Thread Erick Erickson
Take a look at WordDelimiterFilterFactory. It has a bunch of
options to allow this kind of thing to be indexed and searched.

Note that in the default schema, the definition in the index part
of the fieldType definition has slightly different parameters than
the query time WordDelimiterFilterFactory, that's a good place
to start.

WARNING: WDFF is a bit complex, you _really_ would be well
served by spending some time with the Admin/Analysis page to
understand the effects of these parameters...

Best,
Erick




On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers  wrote:

> Hi Guys
>
> I have a Solr application searching on data uploaded by Nutch.  The search
> I wish to carry out is for a particular document reference contained within
> the "url" field, e.g. IAE-UPC-0001.
>
> The problem is is that the file names that comprise the url's are not
> consistent, so a url might contain the reference as IAE-UPC-0001 or
> IAE_UPC_0001 (ie using either the minus or underscore as the delimiter) but
> not both.
>
> I have created the query (in the solr admin interface):
>
> url:"IAE-UPC-0001"
>
> which works (returning the single expected document), as do:
>
> url:"IAE*UPC*0001"
> url:"IAE?UPC?0001"
>
> when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as
> a delimiter).
>
> However:
>
> url:"IAE_UPC_0001"
> url:"IAE*UPC*0001"
> url:"IAE?UPC?0001"
>
> do not work (returning zero documents) when the doc ref is in the format
> IAE_UPC_0001 (ie using the underscore character as the delimiter).
>
> I'm assuming the underscore is a special character but have tried looking
> at the solr wiki but can't find anything to say what the problem is.  Also
> the minus sign also has a specific meaning but is nullified by adding the
> quotes.
>
> Can anyone suggest what I'm doing wrong?
>
> Many thanks
>
> Paul
>


Solr gives the same fieldnorm for two different-size fields

2014-07-31 Thread gorjida
I use solr for searching over a collection of institution names... My solr DB
contains multiple field names such as name, country, city,  A sample
document looks like this:

{
"solr_id": 130950,
"rg_id": 140239,
"rg_parent_id": 1438,
"name": "University of California Berkeley Research",
"ext_name": "",
"city": "Berkeley",
"country": "US",
"state": "CA",
"type": "academic/gen",
"ext_city": "",
"zip": "94720-5100",
"_version_": 1474909528315134000
  },

I need to search over this database... My query looks like this:

name: (university of california berkeley)

After running this query, top-2 matches are as follows:

{
"solr_id": 130950,
"rg_id": 140239,
"rg_parent_id": 1438,
"name": "University of California Berkeley Research",
"ext_name": "",
"city": "Berkeley",
"country": "US",
"state": "CA",
"type": "academic/gen",
"ext_city": "",
"zip": "94720-5100",
"_version_": 1474909528315134000,
"score": 1.8849033
  },
  {
"solr_id": 350,
"rg_id": 1438,
"rg_parent_id": 1439,
"name": "University of California Berkeley",
"ext_name": "",
"city": "Berkeley",
"country": "US",
"state": "CA",
"type": "academic",
"ext_city": "",
"zip": "94720",
"_version_": 1474909520371122200,
"score": 1.8849033
  },

Indeed, both "University of California Berkeley Research" and "University of
California Berkeley" get the same score (1.8849033)... FYI, my schema looks
like this:

fieldType name="text_general" class="solr.TextField" omitNorms="false"
autoGeneratePhraseQueries="true">
  



  
  



  


I also checked the debugger and noticed that both documents return the same
fieldnorm (.5)... The bizzare thing is that solr works fine for these
queries:
--- name: (university of toronto)
--- name: (university of california los angeles)

Indeed, it seems that solr fails once the number of tokens in the documents
is equal to "4"... For above queries, the first one (university of toronto)
has three tokens and the second one has 5 tokens... I am totally stuck at
this point why solr cannot provide different fieldnorms for (University of
California Berkeley) and (University of California Berkeley Research)...
Also, I do not understand why it just happens when I have 4 tokens in the
field? I would appreciate if anyone can share the feedback...

PS. I have also tested "solr.StopFilterFactory" ignoreCase="true" and the
problem is not still resolved...

Regards,

Ali



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to search for phrase "IAE_UPC_0001"

2014-07-31 Thread Paul Rogers
Hi Erick

Thanks for the reply.  I'll have a look and see if it is any help.  Again
thanks for pointing me in the right direction.

regards

Paul


On 31 July 2014 11:58, Erick Erickson  wrote:

> Take a look at WordDelimiterFilterFactory. It has a bunch of
> options to allow this kind of thing to be indexed and searched.
>
> Note that in the default schema, the definition in the index part
> of the fieldType definition has slightly different parameters than
> the query time WordDelimiterFilterFactory, that's a good place
> to start.
>
> WARNING: WDFF is a bit complex, you _really_ would be well
> served by spending some time with the Admin/Analysis page to
> understand the effects of these parameters...
>
> Best,
> Erick
>
>
>
>
> On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers 
> wrote:
>
> > Hi Guys
> >
> > I have a Solr application searching on data uploaded by Nutch.  The
> search
> > I wish to carry out is for a particular document reference contained
> within
> > the "url" field, e.g. IAE-UPC-0001.
> >
> > The problem is is that the file names that comprise the url's are not
> > consistent, so a url might contain the reference as IAE-UPC-0001 or
> > IAE_UPC_0001 (ie using either the minus or underscore as the delimiter)
> but
> > not both.
> >
> > I have created the query (in the solr admin interface):
> >
> > url:"IAE-UPC-0001"
> >
> > which works (returning the single expected document), as do:
> >
> > url:"IAE*UPC*0001"
> > url:"IAE?UPC?0001"
> >
> > when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign
> as
> > a delimiter).
> >
> > However:
> >
> > url:"IAE_UPC_0001"
> > url:"IAE*UPC*0001"
> > url:"IAE?UPC?0001"
> >
> > do not work (returning zero documents) when the doc ref is in the format
> > IAE_UPC_0001 (ie using the underscore character as the delimiter).
> >
> > I'm assuming the underscore is a special character but have tried looking
> > at the solr wiki but can't find anything to say what the problem is.
>  Also
> > the minus sign also has a specific meaning but is nullified by adding the
> > quotes.
> >
> > Can anyone suggest what I'm doing wrong?
> >
> > Many thanks
> >
> > Paul
> >
>


Re: Solr gives the same fieldnorm for two different-size fields

2014-07-31 Thread Erick Erickson
And it won't be . Basically, the norms are an approximation (They used
to be just a byte long), so
fields of "close" lengths will have the same value here.

Why is this an issue? If you back up a second, is a word appearing in a
4-word field really "enough" more
important than one appearing in a 5 word field to require a distinction?

Lately you can specify field norms that are longer than a byte, but the
overall problem still remains.

Frankly, though, I think this is something that's a distraction and that
users won't notice.

FWIW,
Erick


On Thu, Jul 31, 2014 at 9:56 AM, gorjida  wrote:

> I use solr for searching over a collection of institution names... My solr
> DB
> contains multiple field names such as name, country, city,  A sample
> document looks like this:
>
> {
> "solr_id": 130950,
> "rg_id": 140239,
> "rg_parent_id": 1438,
> "name": "University of California Berkeley Research",
> "ext_name": "",
> "city": "Berkeley",
> "country": "US",
> "state": "CA",
> "type": "academic/gen",
> "ext_city": "",
> "zip": "94720-5100",
> "_version_": 1474909528315134000
>   },
>
> I need to search over this database... My query looks like this:
>
> name: (university of california berkeley)
>
> After running this query, top-2 matches are as follows:
>
> {
> "solr_id": 130950,
> "rg_id": 140239,
> "rg_parent_id": 1438,
> "name": "University of California Berkeley Research",
> "ext_name": "",
> "city": "Berkeley",
> "country": "US",
> "state": "CA",
> "type": "academic/gen",
> "ext_city": "",
> "zip": "94720-5100",
> "_version_": 1474909528315134000,
> "score": 1.8849033
>   },
>   {
> "solr_id": 350,
> "rg_id": 1438,
> "rg_parent_id": 1439,
> "name": "University of California Berkeley",
> "ext_name": "",
> "city": "Berkeley",
> "country": "US",
> "state": "CA",
> "type": "academic",
> "ext_city": "",
> "zip": "94720",
> "_version_": 1474909520371122200,
> "score": 1.8849033
>   },
>
> Indeed, both "University of California Berkeley Research" and "University
> of
> California Berkeley" get the same score (1.8849033)... FYI, my schema looks
> like this:
>
> fieldType name="text_general" class="solr.TextField" omitNorms="false"
> autoGeneratePhraseQueries="true">
>   
> 
> 
> 
>   
>   
> 
> 
> 
>   
> 
>
> I also checked the debugger and noticed that both documents return the same
> fieldnorm (.5)... The bizzare thing is that solr works fine for these
> queries:
> --- name: (university of toronto)
> --- name: (university of california los angeles)
>
> Indeed, it seems that solr fails once the number of tokens in the documents
> is equal to "4"... For above queries, the first one (university of toronto)
> has three tokens and the second one has 5 tokens... I am totally stuck at
> this point why solr cannot provide different fieldnorms for (University of
> California Berkeley) and (University of California Berkeley Research)...
> Also, I do not understand why it just happens when I have 4 tokens in the
> field? I would appreciate if anyone can share the feedback...
>
> PS. I have also tested "solr.StopFilterFactory" ignoreCase="true" and the
> problem is not still resolved...
>
> Regards,
>
> Ali
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Searching words with spaces for word without spaces in solr

2014-07-31 Thread sunshine glass
*Point 1:*
On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James 
 wrote:

> If a user is searching on "ice cream" but your index has "icecream", you
> can treat this like a spelling error.  WordBreakSolrSpellChecker would
> identify the fact that  while "ice cream" is not in your index, "icecream"
> and then you can re-query for the corrected version without the space.
>

What if I have  1M records for "ice cream" & same number for "icecream".
Then trick will not work here. What is desire in this case is that either I
search for "ice cream" or "icecream", Solr should return 2M results.

*Point 2:*
On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James 
 wrote:
The problem with solving this with analyers, is that you can analyze
"ice-cream" as either "ice cream" or "icecream" (split or catenate on
hyphen).  You can even analyze "IceCream > Ice Cream" (catenate on case
change).  But how is your analyzer going to know that "icecream" should
index as two tokens: "ice" "cream" ?  You're asking analysis to do too much
in this case. This is where spellcheck can bridge the gap.

I don't want "icecream" to be indexed as "ice" or "cream". I agree that
this is not feasible. What I am looking forward is to create shingles at
query time as well. In more words, while querying "ice cream", Can't it
search as "ice" or "cream" or "icecream".
That is forming shingles at query time.

There is a long list of such words in my inde. So, I does want to implement
via synonym filter factory.


On Thu, Jul 31, 2014 at 9:32 PM, Dyer, James 
wrote:

> If a user is searching on "ice cream" but your index has "icecream", you
> can treat this like a spelling error.  WordBreakSolrSpellChecker would
> identify the fact that  while "ice cream" is not in your index, "icecream"
> and then you can re-query for the corrected version without the space.
>
> The problem with solving this with analyers, is that you can analyze
> "ice-cream" as either "ice cream" or "icecream" (split or catenate on
> hyphen).  You can even analyze "IceCream > Ice Cream" (catenate on case
> change).  But how is your analyzer going to know that "icecream" should
> index as two tokens: "ice" "cream" ?  You're asking analysis to do too much
> in this case.  This is where spellcheck can bridge the gap.
>
> Of course, if you have a discrete list of words you want split like this,
> then you can do it with analysis using index-time synonyms.  In this case,
> you need to provide it with the list.  See
> https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
> for more information.
>
> James Dyer
> Ingram Content Group
> (615) 213-4311
>
>
> -Original Message-
> From: sunshine glass [mailto:sunshineglassof2...@gmail.com]
> Sent: Thursday, July 31, 2014 10:32 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Searching words with spaces for word without spaces in solr
>
> I am not clear with this. This link is related to spell check. Can you
> elaborate it more ?
>
>
> On Wed, Jul 30, 2014 at 9:17 PM, Dyer, James  >
> wrote:
>
> > In addition to the analyzer configuration you're using, you might want to
> > also use WordBreakSolrSpellChecker to catch possible matches that can't
> > easily be solved through analysis.  For more information, see the section
> > for it at
> https://cwiki.apache.org/confluence/display/solr/Spell+Checking
> >
> > James Dyer
> > Ingram Content Group
> > (615) 213-4311
> >
> > -Original Message-
> > From: sunshine glass [mailto:sunshineglassof2...@gmail.com]
> > Sent: Wednesday, July 30, 2014 9:38 AM
> > To: solr-user@lucene.apache.org
> > Subject: Re: Searching words with spaces for word without spaces in solr
> >
> > This is the new configuration:
> >
> >  > > positionIncrementGap="100">
> > >   
> > > 
> > > 
> > >  > > outputUnigrams="true" tokenSeparator=""/>
> > >  > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
> > > 
> > >  > > language="English" protected="protwords.txt"/>
> > >> > synonyms="stemmed_synonyms_text_prime_index.txt" ignoreCase="true"
> > > expand="true"/>
> > >   
> > >   
> > > 
> > > 
> > >  > > words="stopwords_text_prime_search.txt" enablePositionIncrements="true"
> > />
> > >  > > outputUnigrams="true" tokenSeparator=""/>
> > >  > > generateWordParts="1" generateNumberParts="1" catenateWords="1"
> > > catenateNumbers="1" catenateAll="1" splitOnCaseChange="1"/>
> > >  > > language="English" protected="protwords.txt"/>
> > >   
> > >
> > >
> > These are current docs in my index:
> >
> > 
> > 
> > 2
> > Icecream
> > 1475063961342705664
> > 
> > 
> > 3
> > Ice-cream
> > 1475063961344802816
> > 
> > 
> > 1
> > Ice Cream
> > 1475063961203245056
> > 
> > 
> > 
> >
> > Query:
> >
> http://localhost:8983/solr/collection1/select?q=title:ice+cream&debug=true
> >
> > Response:
> >
> > 
> > 
> 

Re: Solr gives the same fieldnorm for two different-size fields

2014-07-31 Thread gorjida
Thanks so much for your reply... In my case, it really matters because I am
going to find the correct institution match for an affiliation string... For
example, if an author belongs to the "university of Toronto", his/her
affiliation should be normalized against the solr... In this case,
"University of California Berkley Research" is a different place to
"university of california berkeley"... I see top-matches are tied in the
score for this specific example... I can break the tie using other
techniques... However, I am keen to see if this is a common problem in solr? 

Regards,

Ali  



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418p4150430.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to sync lib directory in SolrCloud?

2014-07-31 Thread Timothy Potter
You'll need to scp the JAR files to all nodes in the cluster. ZK is
not a great distribution mechanism for large binary files since it has
a 1MB znode size limit (by default)

On Thu, Jul 31, 2014 at 10:26 AM, P Williams
 wrote:
> Hi,
>
> I have an existing collection that I'm trying to add to a new SolrCloud.
>  This collection has all the normal files in conf but also has a lib
> directory to support the filters schema.xml uses.
>
> wget
> https://github.com/projectblacklight/blacklight-jetty/archive/v4.9.0.zip
> unzip v4.9.0.zip
>
> I add the configuration to Zookeeper
>
> cd /solr-4.9.0/example/scripts
> cloud-scripts/zkcli.sh -cmd upconfig -confname blacklight -zkhost
> zk1:2181,zk2:2181,zk3:2181 -confdir
> ~/blacklight-jetty-4.9.0/solr/blacklight-core/conf/
>
> I try to create the collection
> curl "
> http://solr1:8080/solr/admin/collections?action=CREATE&name=blacklight&numShards=3&collection.configName=blacklight&replicationFactor=2&maxShardsPerNode=2
> "
>
> but it looks like the jars in the lib directory aren't available and this
> is what is causing my collection creation to fail.  I guess this makes
> sense because it's not one of the files that I added to Zookeeper to share.
>  How do I share the lib directory via Zookeeper?
>
> Thanks,
> Tricia
>
> [pjenkins@solr1 scripts]$ cloud-scripts/zkcli.sh -cmd upconfig -zkhost
> zk1:2181,zk2:2181,zk3:2181 -confdir
> ~/blacklight-jetty-4.9.0/solr/blacklight-core/conf/ -confname blacklight
> INFO  - 2014-07-31 09:28:06.289; org.apache.zookeeper.Environment; Client
> environment:zookeeper.version=3.4.6-1569965, built on 02/20/2014 09:09 GMT
> INFO  - 2014-07-31 09:28:06.292; org.apache.zookeeper.Environment; Client
> environment:host.name=solr1.library.ualberta.ca
> INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
> environment:java.version=1.7.0_65
> INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
> environment:java.vendor=Oracle Corporation
> INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
> environment:java.home=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.65.x86_64/jre
> INFO  - 2014-07-31 09:28:06.295; org.apache.zookeeper.Environment; Client
> environment:java.class.path=cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hppc-0.5.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-auth-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/asm-commons-4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-queries-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-memory-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-codec-1.9.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-join-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/joda-time-2.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-codecs-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-common-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-common-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/httpmime-4.3.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-hdfs-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/noggit-0.5.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-kuromoji-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/guava-14.0.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-configuration-1.6.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-expressions-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-highlighter-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/hadoop-annotations-2.2.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/asm-4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/dom4j-1.6.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-io-2.3.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/zookeeper-3.4.6.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/spatial4j-0.4.1.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/httpcore-4.3.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/protobuf-java-2.5.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-spatial-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-grouping-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-misc-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-suggest-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-analyzers-phonetic-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/lucene-core-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/commons-cli-1.2.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/solr-core-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/solr-solrj-4.9.0.jar:cloud-scripts/../../solr-webapp/webapp/WEB-INF/lib/antlr-runtime-3.5.jar:cloud-scripts/../../solr-webapp

Solr is working very slow after certain time

2014-07-31 Thread Ameya Aware
Hi,

i could index around 10 documents in couple of hours. But after that
the time for indexing very large (around just 15-20 documents per minute).

i have taken care of garbage collection.

i am passing below parameters to Solr:
-Xms6144m -Xmx6144m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC
-XX:ConcGCThreads=6 -XX:ParallelGCThreads=6
-XX:CMSInitiatingOccupancyFraction=70 -XX:NewRatio=3
-XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
-XX:+UseCompressedOops -XX:+ParallelRefProcEnabled -XX:+UseLargePages
-XX:+AggressiveOpts -XX:-UseGCOverheadLimit



Can anyone help to solve this problem?


Thanks,
Ameya


Re: Solr is working very slow after certain time

2014-07-31 Thread Otis Gospodnetic
Can we look at your disk IO and CPU?  SPM  can
help.

Isn't "UseCompressedOops" a typo? And deprecated?  In general, may want to
simplify your JVM params unless you are really sure they are helping.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jul 31, 2014 at 7:54 PM, Ameya Aware  wrote:

> Hi,
>
> i could index around 10 documents in couple of hours. But after that
> the time for indexing very large (around just 15-20 documents per minute).
>
> i have taken care of garbage collection.
>
> i am passing below parameters to Solr:
> -Xms6144m -Xmx6144m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC
> -XX:ConcGCThreads=6 -XX:ParallelGCThreads=6
> -XX:CMSInitiatingOccupancyFraction=70 -XX:NewRatio=3
> -XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
> -XX:+UseCompressedOops -XX:+ParallelRefProcEnabled -XX:+UseLargePages
> -XX:+AggressiveOpts -XX:-UseGCOverheadLimit
>
>
>
> Can anyone help to solve this problem?
>
>
> Thanks,
> Ameya
>


re: Solr is working very slow after certain time

2014-07-31 Thread Chris Morley
A page Solr Performance Factors mentions 2 big tips that may help you, but 
you have to read the rest of the page to make sure you understand the 
caveats there.

In general, adding many documents per update request is faster than one per 
update request.  

Reducing the frequency of automatic commits or disabling them entirely may 
speed indexing.  
 Source:
 http://wiki.apache.org/solr/SolrPerformanceFactors#Indexing_Performance
  
  


 From: "Ameya Aware" 
Sent: Thursday, July 31, 2014 1:56 PM
To: solr-user@lucene.apache.org
Subject: Solr is working very slow after certain time   
Hi,

i could index around 10 documents in couple of hours. But after that
the time for indexing very large (around just 15-20 documents per minute).

i have taken care of garbage collection.

i am passing below parameters to Solr:
-Xms6144m -Xmx6144m -XX:MaxPermSize=128m -XX:+UseConcMarkSweepGC
-XX:ConcGCThreads=6 -XX:ParallelGCThreads=6
-XX:CMSInitiatingOccupancyFraction=70 -XX:NewRatio=3
-XX:MaxTenuringThreshold=8 -XX:+CMSParallelRemarkEnabled
-XX:+UseCompressedOops -XX:+ParallelRefProcEnabled -XX:+UseLargePages
-XX:+AggressiveOpts -XX:-UseGCOverheadLimit

Can anyone help to solve this problem?

Thanks,
Ameya
 



Re: Solr gives the same fieldnorm for two different-size fields

2014-07-31 Thread Erick Erickson
You can consider, say, a copyField directive and copy the field into a
string type (or perhaps keyworTokenizer followed by lowerCaseFilter) and
then match or boost on an exact match rather than trying to make scoring
fill this role.

In any case, I'm thinking of normalizing the sensitive fields and indexing
them as a single token (i.e. the string type or keywordtokenizer) to
disambiguate these cases.

Because otherwise I fear you'll get one situation to work, then fail on the
next case. In your example, you're trying to use length normalization to
influence scoring to get the doc with the shorter field to sort above the
doc with the longer field. But what are you going to do when your target is
"university of california berkley research"? Rely on matching all the
terms? And so on...

Best,
Erick


On Thu, Jul 31, 2014 at 10:26 AM, gorjida  wrote:

> Thanks so much for your reply... In my case, it really matters because I am
> going to find the correct institution match for an affiliation string...
> For
> example, if an author belongs to the "university of Toronto", his/her
> affiliation should be normalized against the solr... In this case,
> "University of California Berkley Research" is a different place to
> "university of california berkeley"... I see top-matches are tied in the
> score for this specific example... I can break the tie using other
> techniques... However, I am keen to see if this is a common problem in
> solr?
>
> Regards,
>
> Ali
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-gives-the-same-fieldnorm-for-two-different-size-fields-tp4150418p4150430.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr vs ElasticSearch

2014-07-31 Thread Salman Akram
This is quite an old discussion. Wanted to check any new comparisons after
SOLR 4 especially with regards to performance/scalability/throughput?


On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:

> Have a look:
>
>
> http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
>
> http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/
>
> Regards,
> Peter.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Regards,

Salman Akram


Re: integrating Accumulo with solr

2014-07-31 Thread Jack Krupansky
To be clear, I wasn't suggesting that Accumulo was the cause of integration 
complexity - EVERY NoSQL will have integration complexity of comparable 
magnitude. The advantage of DataStax Enterprise or Sqrrl Enterprise is that 
they have done the integration work for you.


-- Jack Krupansky

-Original Message- 
From: Ali Nazemian

Sent: Wednesday, July 30, 2014 2:53 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Sure,
Thank you very much for your guide. I think I am not that kind of gunslinger
and probably I will go for another NoSQL that can be integrated with
solr/elastic search much easier:)
Best regards.


On Sun, Jul 27, 2014 at 5:02 PM, Jack Krupansky 
wrote:


Right, and that's exactly what DataStax Enterprise provides (at great
engineering effort!) - synchronization of database updates and search
indexing. Sure, you can do it as well, but that's a significant 
engineering
challenge with both sides of the equation, and not a simple "plug and 
play"

configuration setting by writing a simple "connector."

But, hey, if you consider yourself one of those "true hard-core
gunslingers" then you'll be able to code that up in a weekend without any
of our assistance, right?

In short, synchronizing two data stores is a real challenge. Yes, it is
doable, but... it is non-trivial. Especially if both stores are 
distributed

clusters. Maybe now you can guess why the Sqrrl guys went the Lucene route
instead of Solr.

I'm certainly not suggesting that it can't be done. Just highlighting the
challenge of such a task.

Just to be clear, you are referring to "sync mode" and not mere "ETL",
which people do all the time with batch scripts, Java extraction and
ingestion connectors, and cron jobs.

Give it a shot and let us know how it works out.


-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Sunday, July 27, 2014 1:20 AM

To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Hi,
One more thing to mention: I dont want to use solr or lucence for indexing
accumulo or full text search inside that. I am looking for have both in a
sync mode. I mean import some parts of data to solr for indexing. For this
purpose probably I need something like trigger in RDBMS, I have to define
something (probably with accumulo iterator) to import to solr on inserting
new data.
Regards.

On Fri, Jul 25, 2014 at 12:59 PM, Ali Nazemian 
wrote:

 Dear Jack,

Actually I am going to do benefit-cost analysis for in-house developement
or going for sqrrl support.
Best regards.


On Thu, Jul 24, 2014 at 11:48 PM, Jack Krupansky 
wrote:

 Like I said, you're going to have to be a real, hard-core gunslinger to

do that well. Sqrrl uses Lucene directly, BTW:

"Full-Text Search: Utilizing open-source Lucene and custom indexing
methods, Sqrrl Enterprise users can conduct real-time, full-text search
across data in Sqrrl Enterprise."

See:
http://sqrrl.com/product/search/

Out of curiosity, why are you not using that integrated Lucene support 
of

Sqrrl Enterprise?


-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 3:07 PM

To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr

Dear Jack,
Thank you. I am aware of datastax but I am looking for integrating
accumulo
with solr. This is something like what sqrrl guys offer.
Regards.


On Thu, Jul 24, 2014 at 7:27 PM, Jack Krupansky 
wrote:

 If you are not a "true hard-core gunslinger" who is willing to dive in


and
integrate the code yourself, instead you should give serious
consideration
to a product such as DataStax Enterprise that fully integrates and
packages
a NoSQL database (Cassandra) and Solr for search. The security aspects
are
still a work in progress, but certainly headed in the right direction.
And
it has Hadoop and Spark integration as well.

See:
http://www.datastax.com/what-we-offer/products-services/
datastax-enterprise

-- Jack Krupansky

-Original Message- From: Ali Nazemian
Sent: Thursday, July 24, 2014 10:30 AM
To: solr-user@lucene.apache.org
Subject: Re: integrating Accumulo with solr


Thank you very much. Nice Idea but how can Solr and Accumulo can be
synchronized in this way?
I know that Solr can be integrated with HDFS and also Accumulo works on
the
top of HDFS. So can I use HDFS as integration point? I mean set Solr to
use
HDFS as a source of documents as well as the destination of documents.
Regards.


On Thu, Jul 24, 2014 at 4:33 PM, Joe Gresock 
wrote:

 Ali,



Sounds like a good choice.  It's pretty standard to store the primary
storage id as a field in Solr so that you can search the full text in
Solr
and then retrieve the full document elsewhere.

I would recommend creating a document structure in Solr with whatever
fields you want indexed (most likely as text_en, etc.), and then store
a
"string" field named "content_id", which would be the Accumulo row id
that
you look up with a scan.

One caveat -- Accu

Re: Searching and highlighting ten's of fields

2014-07-31 Thread Manuel Le Normand
Right, it works!
I was not aware of this functionality and being able to customize it by
hl.requireFieldMatch param.

Thanks


Re: How to search for phrase "IAE_UPC_0001"

2014-07-31 Thread Jack Krupansky
And I have a lot more explanation and examples for word delimiter filter in 
my e-book:

http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-7/ebook/product-21203548.html

-- Jack Krupansky

-Original Message- 
From: Erick Erickson

Sent: Thursday, July 31, 2014 12:58 PM
To: solr-user@lucene.apache.org
Subject: Re: How to search for phrase "IAE_UPC_0001"

Take a look at WordDelimiterFilterFactory. It has a bunch of
options to allow this kind of thing to be indexed and searched.

Note that in the default schema, the definition in the index part
of the fieldType definition has slightly different parameters than
the query time WordDelimiterFilterFactory, that's a good place
to start.

WARNING: WDFF is a bit complex, you _really_ would be well
served by spending some time with the Admin/Analysis page to
understand the effects of these parameters...

Best,
Erick




On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers  wrote:


Hi Guys

I have a Solr application searching on data uploaded by Nutch.  The search
I wish to carry out is for a particular document reference contained 
within

the "url" field, e.g. IAE-UPC-0001.

The problem is is that the file names that comprise the url's are not
consistent, so a url might contain the reference as IAE-UPC-0001 or
IAE_UPC_0001 (ie using either the minus or underscore as the delimiter) 
but

not both.

I have created the query (in the solr admin interface):

url:"IAE-UPC-0001"

which works (returning the single expected document), as do:

url:"IAE*UPC*0001"
url:"IAE?UPC?0001"

when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as
a delimiter).

However:

url:"IAE_UPC_0001"
url:"IAE*UPC*0001"
url:"IAE?UPC?0001"

do not work (returning zero documents) when the doc ref is in the format
IAE_UPC_0001 (ie using the underscore character as the delimiter).

I'm assuming the underscore is a special character but have tried looking
at the solr wiki but can't find anything to say what the problem is.  Also
the minus sign also has a specific meaning but is nullified by adding the
quotes.

Can anyone suggest what I'm doing wrong?

Many thanks

Paul





Extend the Solr Terms Component to implement a customized Autosuggest

2014-07-31 Thread Juan Pablo Albuja
Good afternoon guys, I really appreciate if someone on the community can help 
me with the following issue:

I need to implement a Solr autosuggest that supports:

1.   Get autosuggestion over multivalued fields

2.   Case - Insensitiveness

3.   Look for content in the middle for example I have the value "Hello 
World" indexed, and I need to get that value when the user types "wor"

4.   Filter by an additional field.

I was using the terms component because with it I can satisfy 1 to 3, but for 
point 4 is not possible. I also was looking at faceting searches and 
Ngram.Edge-Ngrams, but the problem with those approaches is that I need to copy 
fields over to make them tokenized or apply grams to those, and I don't want to 
do that because I have more than 6 fields that needs autosuggest, my index is 
big I have more than 400k documents and I don't want to increase the size.
I was trying to Extend the terms component in order to add an additional filter 
but it uses TermsEnum that is a vector over an specific field and I couldn't 
figure out how to filter it in a really efficient way.
Do you guys have an idea on how can I satisfy my requirements in an efficient 
way? If there is another way without using the terms component for me is also 
awesome.

Thanks




Juan Pablo Albuja
Senior Developer




Re: How to search for phrase "IAE_UPC_0001"

2014-07-31 Thread Paul Rogers
Hi Jack

Thanks for the info. I'll take a look and see if I can figure it out (just
purchased the book).

P


On 31 July 2014 17:16, Jack Krupansky  wrote:

> And I have a lot more explanation and examples for word delimiter filter
> in my e-book:
> http://www.lulu.com/us/en/shop/jack-krupansky/solr-4x-
> deep-dive-early-access-release-7/ebook/product-21203548.html
>
> -- Jack Krupansky
>
> -Original Message- From: Erick Erickson
> Sent: Thursday, July 31, 2014 12:58 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to search for phrase "IAE_UPC_0001"
>
>
> Take a look at WordDelimiterFilterFactory. It has a bunch of
> options to allow this kind of thing to be indexed and searched.
>
> Note that in the default schema, the definition in the index part
> of the fieldType definition has slightly different parameters than
> the query time WordDelimiterFilterFactory, that's a good place
> to start.
>
> WARNING: WDFF is a bit complex, you _really_ would be well
> served by spending some time with the Admin/Analysis page to
> understand the effects of these parameters...
>
> Best,
> Erick
>
>
>
>
> On Thu, Jul 31, 2014 at 9:31 AM, Paul Rogers 
> wrote:
>
>  Hi Guys
>>
>> I have a Solr application searching on data uploaded by Nutch.  The search
>> I wish to carry out is for a particular document reference contained
>> within
>> the "url" field, e.g. IAE-UPC-0001.
>>
>> The problem is is that the file names that comprise the url's are not
>> consistent, so a url might contain the reference as IAE-UPC-0001 or
>> IAE_UPC_0001 (ie using either the minus or underscore as the delimiter)
>> but
>> not both.
>>
>> I have created the query (in the solr admin interface):
>>
>> url:"IAE-UPC-0001"
>>
>> which works (returning the single expected document), as do:
>>
>> url:"IAE*UPC*0001"
>> url:"IAE?UPC?0001"
>>
>> when the doc ref is in the format IAE-UPC-0001 (ie using the minus sign as
>> a delimiter).
>>
>> However:
>>
>> url:"IAE_UPC_0001"
>> url:"IAE*UPC*0001"
>> url:"IAE?UPC?0001"
>>
>> do not work (returning zero documents) when the doc ref is in the format
>> IAE_UPC_0001 (ie using the underscore character as the delimiter).
>>
>> I'm assuming the underscore is a special character but have tried looking
>> at the solr wiki but can't find anything to say what the problem is.  Also
>> the minus sign also has a specific meaning but is nullified by adding the
>> quotes.
>>
>> Can anyone suggest what I'm doing wrong?
>>
>> Many thanks
>>
>> Paul
>>
>>
>


Re: Solr vs ElasticSearch

2014-07-31 Thread Otis Gospodnetic
Not super fresh, but more recent than the 2 links you sent:
http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> This is quite an old discussion. Wanted to check any new comparisons after
> SOLR 4 especially with regards to performance/scalability/throughput?
>
>
> On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:
>
> > Have a look:
> >
> >
> >
> http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
> >
> > http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/
> >
> > Regards,
> > Peter.
> >
> > --
> > View this message in context:
> >
> http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html
> > Sent from the Solr - User mailing list archive at Nabble.com.
> >
>
>
>
> --
> Regards,
>
> Salman Akram
>


Solr Query Elevation Component

2014-07-31 Thread dboychuck
The documentation is very unclear (at least to me) around Query Elevation
Component and filter queries (fq param)

The documentation for Solr 4.9 states:

The fq Parameter
Query elevation respects the standard filter query (fq) parameter. That is,
if the query contains the fq parameter, all results will be within that
filter even if elevate.xml adds other documents to the result set.


Now when I read this it made me think that only documents that are contained
in the result set could be filtered. So when I apply a filter using the fq
param that removes a document from the result set it should no longer be
elevated.

I have tested the elevator component using the elevateId's and elevate.xml
and both still elevate documents that have been filtered from the result set
using the fq parameter IF they exist in the result set before filtering.

I would like to have the elevate component have an optional flag. Something
like showFiltered=false where any results that have been filtered from the
result set with the fq parameter will no longer be elevated.

I have created the following ticket if anybody wants to take a stab at it:
https://issues.apache.org/jira/browse/SOLR-6308



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Query-Elevation-Component-tp4150531.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Search on Date Field

2014-07-31 Thread Jack Krupansky
A range query: published_date:["2012-09-26T00:00:00Z" TO 
"2012-09-27T00:00:00Z"}


WIth LucidWorks Search, you can simply say: published_date:2012-09-26 and it 
will internally generate that full range query.


See:
http://docs.lucidworks.com/display/lweug/Date+Queries

-- Jack Krupansky

-Original Message- 
From: Pbbhoge

Sent: Wednesday, July 30, 2014 8:59 AM
To: solr-user@lucene.apache.org
Subject: Search on Date Field

In my SOLR there is date field(published_date) and values are in this format
"2012-09-26T10:08:09.123Z"

How I can search by simple input like "2012-09-10" instead of full ISO date
format.

Is it possible in SOLR?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Search-on-Date-Field-tp4150076.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Solr vs ElasticSearch

2014-07-31 Thread Salman Akram
I did see that earlier. My main concern is search
performance/scalability/throughput which unfortunately that article didn't
address. Any benchmarks or comments about that?

We are already using SOLR but there has been a push to check elasticsearch.
All the benchmarks I have seen are at least few years old.


On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic  wrote:

> Not super fresh, but more recent than the 2 links you sent:
> http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
> > This is quite an old discussion. Wanted to check any new comparisons
> after
> > SOLR 4 especially with regards to performance/scalability/throughput?
> >
> >
> > On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:
> >
> > > Have a look:
> > >
> > >
> > >
> >
> http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
> > >
> > >
> http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/
> > >
> > > Regards,
> > > Peter.
> > >
> > > --
> > > View this message in context:
> > >
> >
> http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html
> > > Sent from the Solr - User mailing list archive at Nabble.com.
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
>



-- 
Regards,

Salman Akram


Re: Solr vs ElasticSearch

2014-07-31 Thread Alexandre Rafalovitch
Maybe Charlie Hull can answer that:
https://twitter.com/FlaxSearch/status/494859596117602304 . He seems to
think that - at least in some cases - Solr is faster.

I am also doing a talk and a book on Solr vs. ElasticSearch, but I am
not really planning to address those issues either, only the feature
comparisons.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On Fri, Aug 1, 2014 at 12:35 PM, Salman Akram
 wrote:
> I did see that earlier. My main concern is search
> performance/scalability/throughput which unfortunately that article didn't
> address. Any benchmarks or comments about that?
>
> We are already using SOLR but there has been a push to check elasticsearch.
> All the benchmarks I have seen are at least few years old.
>
>
> On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic > wrote:
>
>> Not super fresh, but more recent than the 2 links you sent:
>> http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/
>>
>> Otis
>> --
>> Performance Monitoring * Log Analytics * Search Analytics
>> Solr & Elasticsearch Support * http://sematext.com/
>>
>>
>> On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
>> salman.ak...@northbaysolutions.net> wrote:
>>
>> > This is quite an old discussion. Wanted to check any new comparisons
>> after
>> > SOLR 4 especially with regards to performance/scalability/throughput?
>> >
>> >
>> > On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:
>> >
>> > > Have a look:
>> > >
>> > >
>> > >
>> >
>> http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
>> > >
>> > >
>> http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/
>> > >
>> > > Regards,
>> > > Peter.
>> > >
>> > > --
>> > > View this message in context:
>> > >
>> >
>> http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html
>> > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > >
>> >
>> >
>> >
>> > --
>> > Regards,
>> >
>> > Salman Akram
>> >
>>
>
>
>
> --
> Regards,
>
> Salman Akram


Re: Autocommit, opensearchers and ingestion

2014-07-31 Thread rulinma
good



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Autocommit-opensearchers-and-ingestion-tp4119604p4150558.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr vs ElasticSearch

2014-07-31 Thread Otis Gospodnetic
If performance is the main reason, you can stick with Solr.  Both Solr and
ES have many knobs to turn for performance, it is impossible to give a
direct and correct answer to the question which is faster.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Fri, Aug 1, 2014 at 7:35 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> I did see that earlier. My main concern is search
> performance/scalability/throughput which unfortunately that article didn't
> address. Any benchmarks or comments about that?
>
> We are already using SOLR but there has been a push to check elasticsearch.
> All the benchmarks I have seen are at least few years old.
>
>
> On Fri, Aug 1, 2014 at 4:59 AM, Otis Gospodnetic <
> otis.gospodne...@gmail.com
> > wrote:
>
> > Not super fresh, but more recent than the 2 links you sent:
> >
> http://blog.sematext.com/2012/08/23/solr-vs-elasticsearch-part-1-overview/
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Thu, Jul 31, 2014 at 10:33 PM, Salman Akram <
> > salman.ak...@northbaysolutions.net> wrote:
> >
> > > This is quite an old discussion. Wanted to check any new comparisons
> > after
> > > SOLR 4 especially with regards to performance/scalability/throughput?
> > >
> > >
> > > On Tue, Jul 26, 2011 at 7:33 PM, Peter  wrote:
> > >
> > > > Have a look:
> > > >
> > > >
> > > >
> > >
> >
> http://stackoverflow.com/questions/2271600/elasticsearch-sphinx-lucene-solr-xapian-which-fits-for-which-usage
> > > >
> > > >
> > http://karussell.wordpress.com/2011/05/12/elasticsearch-vs-solr-lucene/
> > > >
> > > > Regards,
> > > > Peter.
> > > >
> > > > --
> > > > View this message in context:
> > > >
> > >
> >
> http://lucene.472066.n3.nabble.com/Solr-vs-ElasticSearch-tp3009181p3200492.html
> > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Salman Akram
> > >
> >
>
>
>
> --
> Regards,
>
> Salman Akram
>


Solr 4.4.0 on hadoop 2.2.0

2014-07-31 Thread Jeniba Johnson
Hi,

Iam new to solr. I have integrated solr4.9.0 and Hadoop 2.3.0.
I have changed the solrconfig.xml file  so that it can index and store the data 
on hdfs.

Solrconfig.xml

  hdfs://xxx.xx.xx.xx:50070/user/solr/data
  true
  1
  true
  16384
  true
  true
  true
  16
  192


Commands
java -Dsolr.directoryFactory=HdfsDirectoryFactory -Dsolr.lock.type=hdfs 
-Dsolr.data.dir=hdfs:// xxx.xx.xx.xx:50070/user/solr/data/collection1 -
Dsolr.updatelog=hdfs:// xxx.xx.xx.xx:50070/user/solr/data/collection1  -jar 
start.jar

Error

48454 [coreLoadExecutor-5-thread-1] ERROR org.apache.solr.core.CoreContainer  â 
Unable to create core: collection1
org.apache.solr.common.SolrException: Problem creating directory: 
hdfs://172.29.17.40:50070/user/solr/data/collection1
at org.apache.solr.core.SolrCore.(SolrCore.java:868)
at org.apache.solr.core.SolrCore.(SolrCore.java:643)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:556)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:261)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:253)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.lang.RuntimeException: Problem creating directory: 
hdfs://172.29.17.40:50070/user/solr/data/collection1
at 
org.apache.solr.store.hdfs.HdfsDirectory.(HdfsDirectory.java:87)
at 
org.apache.solr.core.HdfsDirectoryFactory.create(HdfsDirectoryFactory.java:148)
at 
org.apache.solr.core.CachingDirectoryFactory.get(CachingDirectoryFactory.java:351)
at org.apache.solr.core.SolrCore.getNewIndexDir(SolrCore.java:267)
at org.apache.solr.core.SolrCore.initIndex(SolrCore.java:479)
at org.apache.solr.core.SolrCore.(SolrCore.java:774)
... 12 more
Caused by: java.io.IOException: Failed on local exception: 
com.google.protobuf.InvalidProtocolBufferException: Protocol message end-group 
tag did not match expected tag.; Host Details : local host is: 
"cldx-1310-1182/172.29.17.40"; destination host is: "cldx-1310-1182":50070;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:764)
at org.apache.hadoop.ipc.Client.call(Client.java:1351)
at org.apache.hadoop.ipc.Client.call(Client.java:1300)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:186)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:651)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1679)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1106)
at 
org.apache.hadoop.hdfs.DistributedFileSystem$17.doCall(DistributedFileSystem.java:1102)
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
   at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1102)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1397)
at 
org.apache.solr.store.hdfs.HdfsDirectory.(HdfsDirectory.java:62)
... 17 more
Caused by: com.google.protobuf.InvalidProtocolBufferException: Protocol message 
end-group tag did not match expected tag.
at 
com.google.protobuf.InvalidProtocolBufferException.invalidEndTag(InvalidProtocolBufferException.java:94)
at 
com.google.protobuf.CodedInputStream.checkLastTagWas(CodedInputStream.java:124)
at 
com.google.protobuf.AbstractParser.parsePartialFrom(AbstractParser.java:202)
at 
com.google.protobuf.AbstractParser.parsePartialDelimitedFrom(AbstractParser.java:241)
at 
com.google.protobuf.AbstractParser.parseDelimitedFrom(AbstractParser.java:25