Re: Solr and ActiveMQ

2014-09-12 Thread Flavio Pompermaier
Sorry for dumb question but how do you integrate ActiveMQ and Solr? What is
the purpos/use case?

Thanks, in advance,
Flavio

On Thu, Sep 11, 2014 at 11:00 PM, vivison  wrote:

> Solr works fine with ActiveMQ provided a good solrconfig.xml. I was
> omitting
> the required property "java.naming.provider.url".
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-and-ActiveMQ-tp4158266p4158336.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Facets not supporting multi language?

2014-09-12 Thread davyme
The reason why I'm asking this is because I have no influence on the fields
that are indexed. The CMS automatically does that. So there is no way for me
to split up languages in seperate fields. 

I can change the scheme.xml, but I don't know if there is a way to copy
fields into seperate language fields.

So if there is a way without splitting fields, I would like to know it.

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Facets-not-supporting-multi-language-tp4158143p4158394.html
Sent from the Solr - User mailing list archive at Nabble.com.


AUTO: Nicholas M. Wertzberger is out of the office (returning 09/15/2014)

2014-09-12 Thread Nicholas M. Wertzberger


I am out of the office until 09/15/2014.

I'll be out of the office this Friday, but I will be back on Monday.
Please contact Jason Brown for anything JAS Team related.


Note: This is an automated response to your message  "Re: Is there any
sentence tokenizers in sold 4.9.0?" sent on 9/12/2014 12:15:18 AM.

This is the only notification you will receive while this person is away.
**

This email and any attachments may contain information that is confidential 
and/or privileged for the sole use of the intended recipient.  Any use, review, 
disclosure, copying, distribution or reliance by others, and any forwarding of 
this email or its contents, without the express permission of the sender is 
strictly prohibited by law.  If you are not the intended recipient, please 
contact the sender immediately, delete the e-mail and destroy all copies.
**


Re: q and logical operators.

2014-09-12 Thread John Nielsen
I didn't know about sloppy queries. This is great stuff!

I solved it with a &qs=100.

Thank you for the help.



On Thu, Sep 11, 2014 at 11:36 PM, Erick Erickson 
wrote:

> just skimmed, but:
>
> bq:  I would get a hit for "vis dis dur", but "vis dur dis" no longer
> returns anything. This is not an option for me
>
> Would slop help here? i.e. "vis dur dis"~3 or some such?
>
> Best
> Erick
>
> On Thu, Sep 11, 2014 at 4:34 AM, John Nielsen  wrote:
> > q and logical operators.
> >
> > Hi all,
> >
> > I have a strange problem which seems to stomp my google-fu skills.
> >
> > We have a webshop which has a solr based search mechanism which allows
> > customers to search for products based on a range of different fields,
> > including item numbers. I recently added a feature which allows users who
> > are logged in to search for custom item numbers which are associated with
> > that user. What this means in practical terms is that when a user logs
> in,
> > the solr search query has to look in one extra field compared to when the
> > user is not logged in.
> >
> > The standard non-logged in search query looks like this (I only included
> > the relevant first part of the query.):
> > http://
> >
> /solr/11731_Danish/search?defType=edismax&q=Visitkort+display+Durable+4+rum+til+240+kort
> >
> > When doing the same search while logged in, the query looks like this:
> > http://
> >
> /solr/11731_Danish/search?defType=edismax&q=Visitkort+display+Durable+4+rum+til+240+kort+OR+customer_5266762_product_number_string:Visitkort+display+Durable+4+rum+til+240+kort
> >
> > Here I add an extra field, customer_5266762_product_number_string
> (5266762
> > being the logged in users internal ID), basically including the same
> search
> > tearm two times.
> >
> > The above examples work beautifully when searching for a specific item
> > number stored in the customer_5266762_product_number_string. The problem
> is
> > that when a user is logged in and want to do regular searches, the system
> > begins to break down. In the specific example above, I expect to get a
> > single hit for a product with the title "Visitkort display Durable 4 rum
> > til 240 kort". It works as expected with the first non-logged-in example.
> > The second logged-in example returns over 7000 hits. I would expect it to
> > return just one hit since there is nothing relevant in the
> > customer_5266762_product_number_string for this query.
> >
> > Now, the following is where my brain begins to melt down.
> >
> > I discovered that if you put the search text in quotation marks, it will
> > work as expected, but doing so breaks another loved feature we have:
> >
> > If i want a hit on the product named "Visitkort display Durable 4 rum til
> > 240 kort", I could do a search for "vis dis dur", and it would show up. I
> > could also get a hit if i write "vis dur dis", changing the orden of the
> > words. If i put the search query in quotation marks, I break that
> > capability. I would get a hit for "vis dis dur", but "vis dur dis" no
> > longer returns anything. This is not an option for me.
> >
> > It is entirely posible that there is a better way of implementing this
> and
> > fortunately, a rewrite is possible at this time. If my basic approach is
> > correct and I just don't understand how to construct my query correctly,
> an
> > RTFM pointer will be most welcome!
> >
> > --
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > p...@mcb.dk
> > www.mcb.dk
>



-- 
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk


Solr personalize document deletion.

2014-09-12 Thread Guy Moshkowich
I'm working on a production system that is indexing user's interaction 
events as documents in Solr index.
Each documents looks similar to: {user_id, event_data, update_time}
The index size increase monotonously over time and so documents need to be 
deleted from the index in fixed intervals.
A requirement for the deletion process is to delete documents so each user 
will be left with ~500 of the most updated documents (by update_time 
field).
Another requirement is that deletion process needs to be efficient as 
there are millions of users and many documents that need to be deleted 
each time.

Can you advise on how can I implement such deletion mechanism?

-Guy


SolrJ : fieldcontent from (multiple) file(s)

2014-09-12 Thread Clemens Wyss DEV
First of all  I'd like to say hello to the Solr world/community ;) So far we 
have been using Lucene as-is and now intend to go for Solr.

Say I have a document which in one field should have the content of  a file 
(indexed only, not stored), in order to make the document searchable due to the 
file's content. I know

How is this achieved using SolrJ, i.e. how do I hand in this document?

Thx
Clemens



How to make solr fault tolerant for query?

2014-09-12 Thread Amey Jadiye
Just a dumb question but how can i make solr cloud fault tolerant for queries ? 
why i am asking this question because, i have 12 different physical server and 
i am running  12 solr shards on that, whenever any one of them is going down 
because of any reason it gives me below error, i have 3 zookeeper for 12 
servers all are leader and no replica for this solr cloud.
I have option of using shards.tolerant=true  but this is slow and dont gives 
all results.
Best,Amey
{
  "responseHeader": {
"status": 503,
"QTime": 7,
"params": {
  "sort": "last_modified asc",
  "indent": "true",
  "q": "+links:[* TO *]",
  "_": "1410512274068",
  "wt": "json"
}
  },
  "error": {
"msg": "no servers hosting shard: ",
"code": 503
  }
} 

Solr personalize document deletion.

2014-09-12 Thread Guy Moshkowich
I'm working on a production system that is indexing user's interaction 
events as documents in Solr index.
Each documents looks similar to: {user_id, event_data, update_time}
The index size increase monotonously over time and so documents need to be 

deleted from the index in fixed intervals.
A requirement for the deletion process is to delete documents so each user 

will be left with ~500 of the most updated documents (by update_time 
field).
Another requirement is that deletion process needs to be efficient as 
there are millions of users and many documents that need to be deleted 
each time.

Can you advise on how can I implement such deletion mechanism?

-Guy

AW: SolrJ : fieldcontent from (multiple) file(s)

2014-09-12 Thread Clemens Wyss DEV
Looks like I haven't finished " I know"
I know I could extract the content on our server's side, but I'd really like to 
take that burden of it. 
That said:
Can I hand in the path-to-the-file in a "specific field" which would yield an 
extraction in Solr?

-Ursprüngliche Nachricht-
Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch] 
Gesendet: Freitag, 12. September 2014 11:30
An: 'solr-user@lucene.apache.org'
Betreff: SolrJ : fieldcontent from (multiple) file(s)

First of all  I'd like to say hello to the Solr world/community ;) So far we 
have been using Lucene as-is and now intend to go for Solr.

Say I have a document which in one field should have the content of  a file 
(indexed only, not stored), in order to make the document searchable due to the 
file's content. I know

How is this achieved using SolrJ, i.e. how do I hand in this document?

Thx
Clemens



Re: Solr personalize document deletion.

2014-09-12 Thread Alexandre Rafalovitch
In an ideal world, how often would you be running such cleanup and how
many documents would you expect to delete each time?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 12 September 2014 05:42, Guy Moshkowich  wrote:
> I'm working on a production system that is indexing user's interaction
> events as documents in Solr index.
> Each documents looks similar to: {user_id, event_data, update_time}
> The index size increase monotonously over time and so documents need to be
>
> deleted from the index in fixed intervals.
> A requirement for the deletion process is to delete documents so each user
>
> will be left with ~500 of the most updated documents (by update_time
> field).
> Another requirement is that deletion process needs to be efficient as
> there are millions of users and many documents that need to be deleted
> each time.
>
> Can you advise on how can I implement such deletion mechanism?
>
> -Guy


Re: SolrJ : fieldcontent from (multiple) file(s)

2014-09-12 Thread Alexandre Rafalovitch
Do you just care about document content? Not metadata, such as file
name, date, author, etc?

Does it have to be push into Solr or can be pull? If pull,
DataImportHandler should be able to do what you want with nested
entities design.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 12 September 2014 06:53, Clemens Wyss DEV  wrote:
> Looks like I haven't finished " I know"
> I know I could extract the content on our server's side, but I'd really like 
> to take that burden of it.
> That said:
> Can I hand in the path-to-the-file in a "specific field" which would yield an 
> extraction in Solr?
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> Gesendet: Freitag, 12. September 2014 11:30
> An: 'solr-user@lucene.apache.org'
> Betreff: SolrJ : fieldcontent from (multiple) file(s)
>
> First of all  I'd like to say hello to the Solr world/community ;) So far we 
> have been using Lucene as-is and now intend to go for Solr.
>
> Say I have a document which in one field should have the content of  a file 
> (indexed only, not stored), in order to make the document searchable due to 
> the file's content. I know
>
> How is this achieved using SolrJ, i.e. how do I hand in this document?
>
> Thx
> Clemens
>


Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-12 Thread Benson Margulies
Basis Technology's toolset includes sentence boundary detectors. Please
contact me for more details.

On Fri, Sep 12, 2014 at 1:15 AM, Sandeep B A 
wrote:

> Hi All,
> Sorry for the delayed response.
> I was out of office for last few days and was not able to reply.
> Thanks for the information.
>
> We have a use case were one sentence is the unit token with which we need
> to do normalization and semantic analyzer.
>
> We need to finalize on the type of normalizer and analyzer but was trying
> to view if solr has any inbuilt libraries, so that no cross language
> integration might be required.
>
> Again Wil get back if something works or not works.
>
> @susheel,
> Thanks will try to see if that works.
>
> Thanks,
> Sandeep.
> On Sep 8, 2014 12:54 PM, "Sandeep B A"  wrote:
>
> > Hi Susheel ,
> > Thanks for the information.
> > I have crawled few website and all I need is for sentence tokenizers on
> > the data I have collected.
> > These websites are English only.
> >
> > Well I don't have experience in writing custom sentence tokenizers for
> > solr. Is there any tutorial link which tell how to do it?
> >
> > Is it possible to integrate nltk for solr? If yes how to do it? Because I
> > found sentence tokenizers for English in nltk.
> >
> > Thanks,
> > Sandeep
> > On Sep 5, 2014 8:10 PM, "Sandeep B A"  wrote:
> >
> >> Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
> >>  On Sep 5, 2014 7:48 PM, "Sandeep B A" 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I was looking out the options for sentence tokenizers default in solr
> >>> but could not find it. Does any one used? Integrated from any other
> >>> language tokenizers to solr. Example python etc.. Please let me know.
> >>>
> >>>
> >>> Thanks and regards,
> >>> Sandeep
> >>>
> >>
>


AW: SolrJ : fieldcontent from (multiple) file(s)

2014-09-12 Thread Clemens Wyss DEV
Thanks Alex,
> Do you just care about document content?
content only. 

The documents (not necessarily coming from a Db) are being pushed (through 
Solrj). This is at least the initial idea, mainly due to the dynamic nature of 
our index/search architecture.
I could of course push in the filename(s) in a field, but this would require 
Solr (due to field-type e.g. "filecontent") to extract the content from the 
given file. Is something alike this possible in Solr-indexing?

> DataImportHandler
Would I need to write a custom DIH? Or is the DIH as is, i.e. just configurable 
through the data-config.xml?

> nested entities design
Could you link me to this concept/idea?

-Ursprüngliche Nachricht-
Von: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Gesendet: Freitag, 12. September 2014 14:12
An: solr-user
Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)

Do you just care about document content? Not metadata, such as file name, date, 
author, etc?

Does it have to be push into Solr or can be pull? If pull, DataImportHandler 
should be able to do what you want with nested entities design.

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
community: https://www.linkedin.com/groups?gid=6713853


On 12 September 2014 06:53, Clemens Wyss DEV  wrote:
> Looks like I haven't finished " I know"
> I know I could extract the content on our server's side, but I'd really like 
> to take that burden of it.
> That said:
> Can I hand in the path-to-the-file in a "specific field" which would yield an 
> extraction in Solr?
>
> -Ursprüngliche Nachricht-
> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
> Gesendet: Freitag, 12. September 2014 11:30
> An: 'solr-user@lucene.apache.org'
> Betreff: SolrJ : fieldcontent from (multiple) file(s)
>
> First of all  I'd like to say hello to the Solr world/community ;) So far we 
> have been using Lucene as-is and now intend to go for Solr.
>
> Say I have a document which in one field should have the content of  a 
> file (indexed only, not stored), in order to make the document 
> searchable due to the file's content. I know
>
> How is this achieved using SolrJ, i.e. how do I hand in this document?
>
> Thx
> Clemens
>


Fatal full GC

2014-09-12 Thread YouPeng Yang
Hi
 We build the SolrCloud using solr4.6.0 and jdk1.7.60 ,our cluster contains
360G*3 data(one core with 2 replica).
  Our cluster becomes unstable which means occasionlly it comes out long
time full gc.This is awful,the full gc take long take that the solrcloud
consider it as down.
  Normally full gc happens when the Old Generaion  get 70%,and it is
OK.However In the awfull condition,the percentage is highly above 70% ,and
become  99% so that the long full gc happens,and the node is considered as
down.
   We set he JVM parameters referring to the URL
:*https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
*, the only difference
is that we change the *-Xms48009m -Xmx48009m* to *-Xms49152M -Xmx81920M* .
  The appendix[1]  is the output of the jstat when the awful full gc
happens.I have marked  the important port with red font hoping to be
helpful.
   By the way,I have notice that Eden part of Young Generation takes 100%
always during the awful condition happens,which I think it is a import
indication.
  The SolrCloud will be used to support our applications as a very
important part.
  Would you please give me any suggestion? Do I need to change the JDK
version?


Any suggestions will be appreciated.

Best Regard


[1]--
   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
 ..ommit..
 33.27  84.37 100.00  70.14  59.94  28070 4396.141143.724 4399.864
100.00   0.00  30.96  70.38  59.94  28077 4397.877143.724 4401.601
  0.00  72.06   0.00  70.66  59.94  28083 4399.554143.724 4403.277
 59.50  49.30 100.00  70.88  59.94  28091 4401.101143.724 4404.825
 76.98 100.00 100.00  71.07  59.94  28098 4402.707143.724 4406.431
100.00  84.59 100.00  71.41  59.94  28105 4404.526143.724 4408.250
100.00  89.60 100.00  71.77  59.94  28111 4406.216143.724 4409.939
100.00 100.00  99.92  72.16  59.94  28116 4407.609143.724 4411.333
100.00 100.00 100.00  72.68  59.94  28120 4409.041143.724 4412.764
100.00 100.00 100.00  73.02  59.94  28126 4410.666143.724 4414.390
 92.06 100.00 100.00  73.37  59.94  28132 4412.389143.724 4416.113
 68.89 100.00 100.00  73.74  59.94  28138 4414.004143.724 4417.728
100.00 100.00 100.00  73.99  59.94  28144 4415.555143.724 4419.278
100.00  56.44 100.00  74.31  59.94  28151 4417.311143.724 4421.034
 65.78  25.37 100.00  74.57  59.94  28159 4419.051143.724 4422.774
 62.41  43.09 100.00  74.76  59.94  28167 4420.740143.724 4424.464
 36.14  15.59 100.00  74.97  59.94  28175 4422.353143.724 4426.077
 91.86  37.75 100.00  75.09  59.94  28183 4423.976143.724 4427.700
 87.88 100.00 100.00  75.30  59.94  28190 4425.713143.724 4429.437
 88.91 100.00 100.00  75.63  59.94  28196 4427.293143.724 4431.017
100.00 100.00 100.00  76.01  59.94  28202 4428.816143.724 4432.539
  0.00 100.00  97.08  76.28  59.94  28208 4430.504143.724 4434.228
 63.42  45.06 100.00  76.57  59.94  28215 4432.018143.724 4435.742
 52.26  35.19 100.00  76.73  59.94  28223 4433.644143.724 4437.367
100.00   0.00  75.24  76.88  59.94  28230 4435.231143.724 4438.955
100.00 100.00 100.00  77.27  59.94  28235 4436.334143.724 4440.057
 87.09 100.00 100.00  77.63  59.94  28242 4438.118143.724 4441.842
 92.06 100.00 100.00  95.77  59.94  28248 4439.763143.724 4443.487
  0.00 100.00  37.93  78.65  59.94  28253 4441.483143.724 4445.207
 68.38  81.73 100.00  79.04  59.94  28260 4442.971143.724 4446.695
100.00 100.00 100.00  79.24  59.94  28267 .706143.724 4448.429
 95.40   0.00   0.00  79.56  59.94  28274 4446.608143.724 4450.332
 53.60   0.00 100.00  79.82  59.94  28283 4448.213143.724 4451.937
100.00  89.81 100.00  80.01  59.94  28291 4449.759143.724 4453.483
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 ..ommit..


Fatal full GC

2014-09-12 Thread YouPeng Yang
Hi
 We build the SolrCloud using solr4.6.0 and jdk1.7.60 ,our cluster contains
360G*3 data(one core with 2 replica).
  Our cluster becomes unstable which means occasionlly it comes out long
time full gc.This is awful,the full gc take long take that the solrcloud
consider it as down.
  Normally full gc happens when the Old Generaion  get 70%,and it is
OK.However In the awfull condition,the percentage is highly above 70% ,and
become  99% so that the long full gc happens,and the node is considered as
down.
   We set he JVM parameters referring to the URL
:*https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
*, the only difference
is that we change the *-Xms48009m -Xmx48009m* to *-Xms49152M -Xmx81920M* .
  The appendix[1]  is the output of the jstat when the awful full gc
happens.I have marked  the important port with red font hoping to be
helpful.
   By the way,I have notice that Eden part of Young Generation takes 100%
always during the awful condition happens,which I think it is a import
indication.
  The SolrCloud will be used to support our applications as a very
important part.
  Would you please give me any suggestion? Do I need to change the JDK
version?


Any suggestions will be appreciated.

Best Regard


[1]--
   S0 S1 E  O  P YGC YGCTFGCFGCT GCT
 ..ommit..
 33.27  84.37 100.00  70.14  59.94  28070 4396.141143.724 4399.864
100.00   0.00  30.96  70.38  59.94  28077 4397.877143.724 4401.601
  0.00  72.06   0.00  70.66  59.94  28083 4399.554143.724 4403.277
 59.50  49.30 100.00  70.88  59.94  28091 4401.101143.724 4404.825
 76.98 100.00 100.00  71.07  59.94  28098 4402.707143.724 4406.431
100.00  84.59 100.00  71.41  59.94  28105 4404.526143.724 4408.250
100.00  89.60 100.00  71.77  59.94  28111 4406.216143.724 4409.939
100.00 100.00  99.92  72.16  59.94  28116 4407.609143.724 4411.333
100.00 100.00 100.00  72.68  59.94  28120 4409.041143.724 4412.764
100.00 100.00 100.00  73.02  59.94  28126 4410.666143.724 4414.390
 92.06 100.00 100.00  73.37  59.94  28132 4412.389143.724 4416.113
 68.89 100.00 100.00  73.74  59.94  28138 4414.004143.724 4417.728
100.00 100.00 100.00  73.99  59.94  28144 4415.555143.724 4419.278
100.00  56.44 100.00  74.31  59.94  28151 4417.311143.724 4421.034
 65.78  25.37 100.00  74.57  59.94  28159 4419.051143.724 4422.774
 62.41  43.09 100.00  74.76  59.94  28167 4420.740143.724 4424.464
 36.14  15.59 100.00  74.97  59.94  28175 4422.353143.724 4426.077
 91.86  37.75 100.00  75.09  59.94  28183 4423.976143.724 4427.700
 87.88 100.00 100.00  75.30  59.94  28190 4425.713143.724 4429.437
 88.91 100.00 100.00  75.63  59.94  28196 4427.293143.724 4431.017
100.00 100.00 100.00  76.01  59.94  28202 4428.816143.724 4432.539
  0.00 100.00  97.08  76.28  59.94  28208 4430.504143.724 4434.228
 63.42  45.06 100.00  76.57  59.94  28215 4432.018143.724 4435.742
 52.26  35.19 100.00  76.73  59.94  28223 4433.644143.724 4437.367
100.00   0.00  75.24  76.88  59.94  28230 4435.231143.724 4438.955
100.00 100.00 100.00  77.27  59.94  28235 4436.334143.724 4440.057
 87.09 100.00 100.00  77.63  59.94  28242 4438.118143.724 4441.842
 92.06 100.00 100.00  95.77  59.94  28248 4439.763143.724 4443.487
  0.00 100.00  37.93  78.65  59.94  28253 4441.483143.724 4445.207
 68.38  81.73 100.00  79.04  59.94  28260 4442.971143.724 4446.695
100.00 100.00 100.00  79.24  59.94  28267 .706143.724 4448.429
 95.40   0.00   0.00  79.56  59.94  28274 4446.608143.724 4450.332
 53.60   0.00 100.00  79.82  59.94  28283 4448.213143.724 4451.937
100.00  89.81 100.00  80.01  59.94  28291 4449.759143.724 4453.483
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 88.21 100.00 100.00  80.38  59.94  28298 4451.466143.724 4455.190
 ..ommit..


Re: Fatal full GC

2014-09-12 Thread Shawn Heisey
On 9/12/2014 7:36 AM, YouPeng Yang wrote:
>  We build the SolrCloud using solr4.6.0 and jdk1.7.60 ,our cluster contains
> 360G*3 data(one core with 2 replica).
>   Our cluster becomes unstable which means occasionlly it comes out long
> time full gc.This is awful,the full gc take long take that the solrcloud
> consider it as down.
>   Normally full gc happens when the Old Generaion  get 70%,and it is
> OK.However In the awfull condition,the percentage is highly above 70% ,and
> become  99% so that the long full gc happens,and the node is considered as
> down.
>We set he JVM parameters referring to the URL
> :*https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
> *, the only difference
> is that we change the *-Xms48009m -Xmx48009m* to *-Xms49152M -Xmx81920M* .
>   The appendix[1]  is the output of the jstat when the awful full gc
> happens.I have marked  the important port with red font hoping to be
> helpful.
>By the way,I have notice that Eden part of Young Generation takes 100%
> always during the awful condition happens,which I think it is a import
> indication.
>   The SolrCloud will be used to support our applications as a very
> important part.
>   Would you please give me any suggestion? Do I need to change the JDK
> version?

My GC parameter page is getting around. :)

Do you really need an 80GB heap?  I realize that your index is 360GB ...
but if you really do need a heap that large, you may need to adjust your
configuration so you use a lot less heap memory.

The red font you mentioned did not make it through, so I cannot tell
what lines you highlighted.

I pulled your jstat output into a spreadsheet and calculated the length
of each GC.  The longest GC in there took 1.903 seconds.  It's the one
that had a GCT of 4450.332.  For an 80GB heap, you couldn't hope for
anything better.  Based on what I see here, I don't think GC is your
problem.  If I read the other numbers on that 1.903 second GC line
correctly (not sure that I am), it dropped your Eden size from 100% to
0% ... suggesting that you really don't need an 80GB heap.

How much RAM does this machine have?  For ideal performance, you'll need
your index size plus your heap size, which for you right now is 440 GB. 
Normally you don't need the ideal memory size ... but you do need a
*significant* portion of it.  I don't think I'd try running this index
with less than 256GB of RAM, and that's assuming a much lower heap size
than 80GB.

Here's some general info about performance problems and possible solutions:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn



Re: Is there any sentence tokenizers in sold 4.9.0?

2014-09-12 Thread Aman Tandon
Hi,

Is there any semantic analyzer in solr?
On Sep 12, 2014 10:51 AM, "Sandeep B A"  wrote:

> Hi All,
> Sorry for the delayed response.
> I was out of office for last few days and was not able to reply.
> Thanks for the information.
>
> We have a use case were one sentence is the unit token with which we need
> to do normalization and semantic analyzer.
>
> We need to finalize on the type of normalizer and analyzer but was trying
> to view if solr has any inbuilt libraries, so that no cross language
> integration might be required.
>
> Again Wil get back if something works or not works.
>
> @susheel,
> Thanks will try to see if that works.
>
> Thanks,
> Sandeep.
> On Sep 8, 2014 12:54 PM, "Sandeep B A"  wrote:
>
> > Hi Susheel ,
> > Thanks for the information.
> > I have crawled few website and all I need is for sentence tokenizers on
> > the data I have collected.
> > These websites are English only.
> >
> > Well I don't have experience in writing custom sentence tokenizers for
> > solr. Is there any tutorial link which tell how to do it?
> >
> > Is it possible to integrate nltk for solr? If yes how to do it? Because I
> > found sentence tokenizers for English in nltk.
> >
> > Thanks,
> > Sandeep
> > On Sep 5, 2014 8:10 PM, "Sandeep B A"  wrote:
> >
> >> Sorry for typo it is solr 4.9.0 instead of sold 4.9.0
> >>  On Sep 5, 2014 7:48 PM, "Sandeep B A" 
> wrote:
> >>
> >>> Hi,
> >>>
> >>> I was looking out the options for sentence tokenizers default in solr
> >>> but could not find it. Does any one used? Integrated from any other
> >>> language tokenizers to solr. Example python etc.. Please let me know.
> >>>
> >>>
> >>> Thanks and regards,
> >>> Sandeep
> >>>
> >>
>


Re: Fatal full GC

2014-09-12 Thread Walter Underwood
I agree about the 80Gb heap as a possible problem.

A GC is essentially a linear scan of memory. More memory means a longer scan.

We run with an 8Gb heap. I’d try that. Test it by replaying logs from 
production against a test instance. You can use JMeter and the Apache access 
log sampler.

https://jmeter.apache.org/usermanual/jmeter_accesslog_sampler_step_by_step.pdf

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/


On Sep 12, 2014, at 7:10 AM, Shawn Heisey  wrote:

> On 9/12/2014 7:36 AM, YouPeng Yang wrote:
>> We build the SolrCloud using solr4.6.0 and jdk1.7.60 ,our cluster contains
>> 360G*3 data(one core with 2 replica).
>>  Our cluster becomes unstable which means occasionlly it comes out long
>> time full gc.This is awful,the full gc take long take that the solrcloud
>> consider it as down.
>>  Normally full gc happens when the Old Generaion  get 70%,and it is
>> OK.However In the awfull condition,the percentage is highly above 70% ,and
>> become  99% so that the long full gc happens,and the node is considered as
>> down.
>>   We set he JVM parameters referring to the URL
>> :*https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
>> *, the only difference
>> is that we change the *-Xms48009m -Xmx48009m* to *-Xms49152M -Xmx81920M* .
>>  The appendix[1]  is the output of the jstat when the awful full gc
>> happens.I have marked  the important port with red font hoping to be
>> helpful.
>>   By the way,I have notice that Eden part of Young Generation takes 100%
>> always during the awful condition happens,which I think it is a import
>> indication.
>>  The SolrCloud will be used to support our applications as a very
>> important part.
>>  Would you please give me any suggestion? Do I need to change the JDK
>> version?
> 
> My GC parameter page is getting around. :)
> 
> Do you really need an 80GB heap?  I realize that your index is 360GB ...
> but if you really do need a heap that large, you may need to adjust your
> configuration so you use a lot less heap memory.
> 
> The red font you mentioned did not make it through, so I cannot tell
> what lines you highlighted.
> 
> I pulled your jstat output into a spreadsheet and calculated the length
> of each GC.  The longest GC in there took 1.903 seconds.  It's the one
> that had a GCT of 4450.332.  For an 80GB heap, you couldn't hope for
> anything better.  Based on what I see here, I don't think GC is your
> problem.  If I read the other numbers on that 1.903 second GC line
> correctly (not sure that I am), it dropped your Eden size from 100% to
> 0% ... suggesting that you really don't need an 80GB heap.
> 
> How much RAM does this machine have?  For ideal performance, you'll need
> your index size plus your heap size, which for you right now is 440 GB. 
> Normally you don't need the ideal memory size ... but you do need a
> *significant* portion of it.  I don't think I'd try running this index
> with less than 256GB of RAM, and that's assuming a much lower heap size
> than 80GB.
> 
> Here's some general info about performance problems and possible solutions:
> 
> http://wiki.apache.org/solr/SolrPerformanceProblems
> 
> Thanks,
> Shawn
> 



UpdateRequest commit policy

2014-09-12 Thread Joshi, Shital
Hi,

We're updating Solr cloud from a java process using UpdateRequest API. 
 
UpdateRequest req = new UpdateRequest();
req.setResponseParser(new XMLResponseParser());
req.setParam("_shard_", shard);
req.add(docs);

We see too many searcher open errors in log and wondering if frequent updates 
from java process are causing it. What commit policy gets used when 
UpdateRequest Solr API is used? Is it opensearcher=true? How do we disable it? 

This is from log file: 

[commitScheduler-16-thread-1] INFO  org.apache.solr.update.UpdateHandler  ? 
start 
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,soft
Commit=true,prepareCommit=false}


Re: UpdateRequest commit policy

2014-09-12 Thread Shawn Heisey
On 9/12/2014 9:10 AM, Joshi, Shital wrote:
> We're updating Solr cloud from a java process using UpdateRequest API. 
>  
> UpdateRequest req = new UpdateRequest();
> req.setResponseParser(new XMLResponseParser());
> req.setParam("_shard_", shard);
> req.add(docs);
>
> We see too many searcher open errors in log and wondering if frequent updates 
> from java process are causing it. What commit policy gets used when 
> UpdateRequest Solr API is used? Is it opensearcher=true? How do we disable 
> it? 
>
> This is from log file: 
>
> [commitScheduler-16-thread-1] INFO  org.apache.solr.update.UpdateHandler  ? 
> start 
> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,soft
> Commit=true,prepareCommit=false}

There is no commit policy specifically for UpdateRequests.  To Solr,
they are simply HTTP calls to the /update handler.

If you send a commit to Solr with SolrJ, the default will be
openSearcher=true, but you can override it.  If you are relying on
autoCommit, the value of openSearcher will be what you define in
autoCommit.  An autoSoftCommit always opens a new searcher -- there
would be no point to a soft commit without it.

Thanks,
Shawn



Re: q and logical operators.

2014-09-12 Thread Erick Erickson
John:

Glad it worked. Bit a little careful with large slops. As the slop
increases, you approach the same result set as

vis AND dis AND dur

so choosing the appropriate slop is something of a balancing act

Best,
Erick

On Fri, Sep 12, 2014 at 2:10 AM, John Nielsen  wrote:
> I didn't know about sloppy queries. This is great stuff!
>
> I solved it with a &qs=100.
>
> Thank you for the help.
>
>
>
> On Thu, Sep 11, 2014 at 11:36 PM, Erick Erickson 
> wrote:
>
>> just skimmed, but:
>>
>> bq:  I would get a hit for "vis dis dur", but "vis dur dis" no longer
>> returns anything. This is not an option for me
>>
>> Would slop help here? i.e. "vis dur dis"~3 or some such?
>>
>> Best
>> Erick
>>
>> On Thu, Sep 11, 2014 at 4:34 AM, John Nielsen  wrote:
>> > q and logical operators.
>> >
>> > Hi all,
>> >
>> > I have a strange problem which seems to stomp my google-fu skills.
>> >
>> > We have a webshop which has a solr based search mechanism which allows
>> > customers to search for products based on a range of different fields,
>> > including item numbers. I recently added a feature which allows users who
>> > are logged in to search for custom item numbers which are associated with
>> > that user. What this means in practical terms is that when a user logs
>> in,
>> > the solr search query has to look in one extra field compared to when the
>> > user is not logged in.
>> >
>> > The standard non-logged in search query looks like this (I only included
>> > the relevant first part of the query.):
>> > http://
>> >
>> /solr/11731_Danish/search?defType=edismax&q=Visitkort+display+Durable+4+rum+til+240+kort
>> >
>> > When doing the same search while logged in, the query looks like this:
>> > http://
>> >
>> /solr/11731_Danish/search?defType=edismax&q=Visitkort+display+Durable+4+rum+til+240+kort+OR+customer_5266762_product_number_string:Visitkort+display+Durable+4+rum+til+240+kort
>> >
>> > Here I add an extra field, customer_5266762_product_number_string
>> (5266762
>> > being the logged in users internal ID), basically including the same
>> search
>> > tearm two times.
>> >
>> > The above examples work beautifully when searching for a specific item
>> > number stored in the customer_5266762_product_number_string. The problem
>> is
>> > that when a user is logged in and want to do regular searches, the system
>> > begins to break down. In the specific example above, I expect to get a
>> > single hit for a product with the title "Visitkort display Durable 4 rum
>> > til 240 kort". It works as expected with the first non-logged-in example.
>> > The second logged-in example returns over 7000 hits. I would expect it to
>> > return just one hit since there is nothing relevant in the
>> > customer_5266762_product_number_string for this query.
>> >
>> > Now, the following is where my brain begins to melt down.
>> >
>> > I discovered that if you put the search text in quotation marks, it will
>> > work as expected, but doing so breaks another loved feature we have:
>> >
>> > If i want a hit on the product named "Visitkort display Durable 4 rum til
>> > 240 kort", I could do a search for "vis dis dur", and it would show up. I
>> > could also get a hit if i write "vis dur dis", changing the orden of the
>> > words. If i put the search query in quotation marks, I break that
>> > capability. I would get a hit for "vis dis dur", but "vis dur dis" no
>> > longer returns anything. This is not an option for me.
>> >
>> > It is entirely posible that there is a better way of implementing this
>> and
>> > fortunately, a rewrite is possible at this time. If my basic approach is
>> > correct and I just don't understand how to construct my query correctly,
>> an
>> > RTFM pointer will be most welcome!
>> >
>> > --
>> > Med venlig hilsen / Best regards
>> >
>> > *John Nielsen*
>> > Programmer
>> >
>> >
>> >
>> > *MCB A/S*
>> > Enghaven 15
>> > DK-7500 Holstebro
>> >
>> > Kundeservice: +45 9610 2824
>> > p...@mcb.dk
>> > www.mcb.dk
>>
>
>
>
> --
> Med venlig hilsen / Best regards
>
> *John Nielsen*
> Programmer
>
>
>
> *MCB A/S*
> Enghaven 15
> DK-7500 Holstebro
>
> Kundeservice: +45 9610 2824
> p...@mcb.dk
> www.mcb.dk


Re: How to make solr fault tolerant for query?

2014-09-12 Thread Erick Erickson
Hmmm, if all the nodes for a shard are down, shards.tolerant=true
shouldn't be slow unless there's some kind of bug. Solr should be
smart enough not to wait for a timeout. So I'm a bit surprised by that
statement, how sure of it are you? Do you have a test case?

bq: but this is slow and dont gives all results.

Well, you can _never_ have all results if all the replicase for a
shard are down, so the second part of that statement is just the way
the system _has_ to work.

SolrCloud is a wonderful system. The HA/DR handling is predicated upon
at least one member of each shard being available. When you violate
that expectation you have to pay the price of at least incomplete
responses.

Best,
Erick

On Fri, Sep 12, 2014 at 2:33 AM, Amey Jadiye
 wrote:
> Just a dumb question but how can i make solr cloud fault tolerant for queries 
> ? why i am asking this question because, i have 12 different physical server 
> and i am running  12 solr shards on that, whenever any one of them is going 
> down because of any reason it gives me below error, i have 3 zookeeper for 12 
> servers all are leader and no replica for this solr cloud.
> I have option of using shards.tolerant=true  but this is slow and dont gives 
> all results.
> Best,Amey
> {
>   "responseHeader": {
> "status": 503,
> "QTime": 7,
> "params": {
>   "sort": "last_modified asc",
>   "indent": "true",
>   "q": "+links:[* TO *]",
>   "_": "1410512274068",
>   "wt": "json"
> }
>   },
>   "error": {
> "msg": "no servers hosting shard: ",
> "code": 503
>   }
> }


Re: SolrJ : fieldcontent from (multiple) file(s)

2014-09-12 Thread Erick Erickson
bq: I could of course push in the filename(s) in a field, but this
would require Solr (due to field-type e.g. "filecontent") to extract
the content from the given file.

Why? If you're already dealing with SolrJ, you do all the work you
need to there by adding fields to a SolrInputDocument, including any
metadata and content your client extracts. Here's an example that uses
Tika (shipped with Solr) to do just that, as well as extract DB
contents etc.

http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick

On Fri, Sep 12, 2014 at 5:55 AM, Clemens Wyss DEV  wrote:
> Thanks Alex,
>> Do you just care about document content?
> content only.
>
> The documents (not necessarily coming from a Db) are being pushed (through 
> Solrj). This is at least the initial idea, mainly due to the dynamic nature 
> of our index/search architecture.
> I could of course push in the filename(s) in a field, but this would require 
> Solr (due to field-type e.g. "filecontent") to extract the content from the 
> given file. Is something alike this possible in Solr-indexing?
>
>> DataImportHandler
> Would I need to write a custom DIH? Or is the DIH as is, i.e. just 
> configurable through the data-config.xml?
>
>> nested entities design
> Could you link me to this concept/idea?
>
> -Ursprüngliche Nachricht-
> Von: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Gesendet: Freitag, 12. September 2014 14:12
> An: solr-user
> Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)
>
> Do you just care about document content? Not metadata, such as file name, 
> date, author, etc?
>
> Does it have to be push into Solr or can be pull? If pull, DataImportHandler 
> should be able to do what you want with nested entities design.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
> newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
> community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 12 September 2014 06:53, Clemens Wyss DEV  wrote:
>> Looks like I haven't finished " I know"
>> I know I could extract the content on our server's side, but I'd really like 
>> to take that burden of it.
>> That said:
>> Can I hand in the path-to-the-file in a "specific field" which would yield 
>> an extraction in Solr?
>>
>> -Ursprüngliche Nachricht-
>> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
>> Gesendet: Freitag, 12. September 2014 11:30
>> An: 'solr-user@lucene.apache.org'
>> Betreff: SolrJ : fieldcontent from (multiple) file(s)
>>
>> First of all  I'd like to say hello to the Solr world/community ;) So far we 
>> have been using Lucene as-is and now intend to go for Solr.
>>
>> Say I have a document which in one field should have the content of  a
>> file (indexed only, not stored), in order to make the document
>> searchable due to the file's content. I know
>>
>> How is this achieved using SolrJ, i.e. how do I hand in this document?
>>
>> Thx
>> Clemens
>>


Re: UpdateRequest commit policy

2014-09-12 Thread Erick Erickson
Usually, I recommend
1> you configure autocommit on the server. Make it reasonable as in
multiple seconds.
2> do NOT commit from the client.

If you must commit from the client, then consider the
server.add(docs, commitwithin) call

Under no circumstances should you commit from the client IMO except,
perhaps, at the very end of the run, but even that's unnecessary if
you've configured your autocommit intervals.

Here's a long blog on all the autocommit options:

http://searchhub.org/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Fri, Sep 12, 2014 at 8:17 AM, Shawn Heisey  wrote:
> On 9/12/2014 9:10 AM, Joshi, Shital wrote:
>> We're updating Solr cloud from a java process using UpdateRequest API.
>>
>> UpdateRequest req = new UpdateRequest();
>> req.setResponseParser(new XMLResponseParser());
>> req.setParam("_shard_", shard);
>> req.add(docs);
>>
>> We see too many searcher open errors in log and wondering if frequent 
>> updates from java process are causing it. What commit policy gets used when 
>> UpdateRequest Solr API is used? Is it opensearcher=true? How do we disable 
>> it?
>>
>> This is from log file:
>>
>> [commitScheduler-16-thread-1] INFO  org.apache.solr.update.UpdateHandler  ? 
>> start 
>> commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,soft
>> Commit=true,prepareCommit=false}
>
> There is no commit policy specifically for UpdateRequests.  To Solr,
> they are simply HTTP calls to the /update handler.
>
> If you send a commit to Solr with SolrJ, the default will be
> openSearcher=true, but you can override it.  If you are relying on
> autoCommit, the value of openSearcher will be what you define in
> autoCommit.  An autoSoftCommit always opens a new searcher -- there
> would be no point to a soft commit without it.
>
> Thanks,
> Shawn
>


Re: Fatal full GC

2014-09-12 Thread YouPeng Yang
Hi
  Thank you very much. we make the change to low down the Heap size ,we are
watching the effect of this change.we will inform you about the result.
  It is really helpful.

Best Regard

2014-09-12 23:00 GMT+08:00 Walter Underwood :

> I agree about the 80Gb heap as a possible problem.
>
> A GC is essentially a linear scan of memory. More memory means a longer
> scan.
>
> We run with an 8Gb heap. I’d try that. Test it by replaying logs from
> production against a test instance. You can use JMeter and the Apache
> access log sampler.
>
>
> https://jmeter.apache.org/usermanual/jmeter_accesslog_sampler_step_by_step.pdf
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Sep 12, 2014, at 7:10 AM, Shawn Heisey  wrote:
>
> > On 9/12/2014 7:36 AM, YouPeng Yang wrote:
> >> We build the SolrCloud using solr4.6.0 and jdk1.7.60 ,our cluster
> contains
> >> 360G*3 data(one core with 2 replica).
> >>  Our cluster becomes unstable which means occasionlly it comes out long
> >> time full gc.This is awful,the full gc take long take that the solrcloud
> >> consider it as down.
> >>  Normally full gc happens when the Old Generaion  get 70%,and it is
> >> OK.However In the awfull condition,the percentage is highly above 70%
> ,and
> >> become  99% so that the long full gc happens,and the node is considered
> as
> >> down.
> >>   We set he JVM parameters referring to the URL
> >> :*https://wiki.apache.org/solr/ShawnHeisey#GC_Tuning
> >> *, the only
> difference
> >> is that we change the *-Xms48009m -Xmx48009m* to *-Xms49152M
> -Xmx81920M* .
> >>  The appendix[1]  is the output of the jstat when the awful full gc
> >> happens.I have marked  the important port with red font hoping to be
> >> helpful.
> >>   By the way,I have notice that Eden part of Young Generation takes 100%
> >> always during the awful condition happens,which I think it is a import
> >> indication.
> >>  The SolrCloud will be used to support our applications as a very
> >> important part.
> >>  Would you please give me any suggestion? Do I need to change the JDK
> >> version?
> >
> > My GC parameter page is getting around. :)
> >
> > Do you really need an 80GB heap?  I realize that your index is 360GB ...
> > but if you really do need a heap that large, you may need to adjust your
> > configuration so you use a lot less heap memory.
> >
> > The red font you mentioned did not make it through, so I cannot tell
> > what lines you highlighted.
> >
> > I pulled your jstat output into a spreadsheet and calculated the length
> > of each GC.  The longest GC in there took 1.903 seconds.  It's the one
> > that had a GCT of 4450.332.  For an 80GB heap, you couldn't hope for
> > anything better.  Based on what I see here, I don't think GC is your
> > problem.  If I read the other numbers on that 1.903 second GC line
> > correctly (not sure that I am), it dropped your Eden size from 100% to
> > 0% ... suggesting that you really don't need an 80GB heap.
> >
> > How much RAM does this machine have?  For ideal performance, you'll need
> > your index size plus your heap size, which for you right now is 440 GB.
> > Normally you don't need the ideal memory size ... but you do need a
> > *significant* portion of it.  I don't think I'd try running this index
> > with less than 256GB of RAM, and that's assuming a much lower heap size
> > than 80GB.
> >
> > Here's some general info about performance problems and possible
> solutions:
> >
> > http://wiki.apache.org/solr/SolrPerformanceProblems
> >
> > Thanks,
> > Shawn
> >
>
>


Re: Solr and ActiveMQ

2014-09-12 Thread vivison
Solr allows customization to return query result through a message broker
such as ActiveMQ. How: define listener event as in my example. Sample use
case: searchable/sortable log display where the front end gets Solr entries
via messages through predefined topic of a message-broker (such as
ActiveMQ).

Hope that above response is clear enough to answer your question.


Flavio Pompermaier wrote
> Sorry for dumb question but how do you integrate ActiveMQ and Solr? What
> is
> the purpos/use case?
> 
> Thanks, in advance,
> Flavio





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-and-ActiveMQ-tp4158266p4158477.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to make solr fault tolerant for query?

2014-09-12 Thread Amey - codeinventory
Hi thanks for feedback erik.


What do you mean all the nodes in shards are down? i have 12 shards and i 
suppose i have 12 nodes right? correct me if i am wrong.


Now whenever any one of them is down , say  1 down 11 active still i am getting 
same error... is there any way to get results ignoring it other than 
shard.tolerent=true?

I have replicationFactor=1 ,seems default is 1 so i kept it as it is, if i 
change it in my clusterstate.json to 2 or 3 ,is it possible i will get all 
results though any node is down?

Regards,
Amey

--- Original Message ---

From: "Erick Erickson" 
Sent: September 12, 2014 9:23 PM
To: solr-user@lucene.apache.org
Subject: Re: How to make solr fault tolerant for query?

Hmmm, if all the nodes for a shard are down, shards.tolerant=true
shouldn't be slow unless there's some kind of bug. Solr should be
smart enough not to wait for a timeout. So I'm a bit surprised by that
statement, how sure of it are you? Do you have a test case?

bq: but this is slow and dont gives all results.

Well, you can _never_ have all results if all the replicase for a
shard are down, so the second part of that statement is just the way
the system _has_ to work.

SolrCloud is a wonderful system. The HA/DR handling is predicated upon
at least one member of each shard being available. When you violate
that expectation you have to pay the price of at least incomplete
responses.

Best,
Erick

On Fri, Sep 12, 2014 at 2:33 AM, Amey Jadiye
 wrote:
> Just a dumb question but how can i make solr cloud fault tolerant for queries 
> ? why i am asking this question because, i have 12 different physical server 
> and i am running  12 solr shards on that, whenever any one of them is going 
> down because of any reason it gives me below error, i have 3 zookeeper for 12 
> servers all are leader and no replica for this solr cloud.
> I have option of using shards.tolerant=true  but this is slow and dont gives 
> all results.
> Best,Amey
> {
>   "responseHeader": {
> "status": 503,
> "QTime": 7,
> "params": {
>   "sort": "last_modified asc",
>   "indent": "true",
>   "q": "+links:[* TO *]",
>   "_": "1410512274068",
>   "wt": "json"
> }
>   },
>   "error": {
> "msg": "no servers hosting shard: ",
> "code": 503
>   }
> }


Re: How to make solr fault tolerant for query?

2014-09-12 Thread Erick Erickson
Right. Consider your situation where you have 12 shards on 12
machines. 1/12 of your index is stored on each, so  by definition if
one of them is down you cannot get 1/12 of your data. shards.tolerant
is the only option here. Although I'm surprised shards.tolerant makes
things slow, perhaps there's a timeout happening. I thought that we
just didn't send requests to shards that were down, so it shouldn't
take any extra time.

The fact that replicationFactor is 1 is just because it's simple. By
having a higher replication factor, and assuming that no two replicas
for the same shard are on the same node, then taking one of the nodes
down doesn't affect search and you get all your docs back.

The SolrCloud tutorial works through this in some detail, perhaps it
would be good if you reviewed it.

Best,
Erick

On Fri, Sep 12, 2014 at 10:50 AM, Amey - codeinventory
 wrote:
> Hi thanks for feedback erik.
>
>
> What do you mean all the nodes in shards are down? i have 12 shards and i 
> suppose i have 12 nodes right? correct me if i am wrong.
>
>
> Now whenever any one of them is down , say  1 down 11 active still i am 
> getting same error... is there any way to get results ignoring it other than 
> shard.tolerent=true?
>
> I have replicationFactor=1 ,seems default is 1 so i kept it as it is, if i 
> change it in my clusterstate.json to 2 or 3 ,is it possible i will get all 
> results though any node is down?
>
> Regards,
> Amey
>
> --- Original Message ---
>
> From: "Erick Erickson" 
> Sent: September 12, 2014 9:23 PM
> To: solr-user@lucene.apache.org
> Subject: Re: How to make solr fault tolerant for query?
>
> Hmmm, if all the nodes for a shard are down, shards.tolerant=true
> shouldn't be slow unless there's some kind of bug. Solr should be
> smart enough not to wait for a timeout. So I'm a bit surprised by that
> statement, how sure of it are you? Do you have a test case?
>
> bq: but this is slow and dont gives all results.
>
> Well, you can _never_ have all results if all the replicase for a
> shard are down, so the second part of that statement is just the way
> the system _has_ to work.
>
> SolrCloud is a wonderful system. The HA/DR handling is predicated upon
> at least one member of each shard being available. When you violate
> that expectation you have to pay the price of at least incomplete
> responses.
>
> Best,
> Erick
>
> On Fri, Sep 12, 2014 at 2:33 AM, Amey Jadiye
>  wrote:
>> Just a dumb question but how can i make solr cloud fault tolerant for 
>> queries ? why i am asking this question because, i have 12 different 
>> physical server and i am running  12 solr shards on that, whenever any one 
>> of them is going down because of any reason it gives me below error, i have 
>> 3 zookeeper for 12 servers all are leader and no replica for this solr cloud.
>> I have option of using shards.tolerant=true  but this is slow and dont gives 
>> all results.
>> Best,Amey
>> {
>>   "responseHeader": {
>> "status": 503,
>> "QTime": 7,
>> "params": {
>>   "sort": "last_modified asc",
>>   "indent": "true",
>>   "q": "+links:[* TO *]",
>>   "_": "1410512274068",
>>   "wt": "json"
>> }
>>   },
>>   "error": {
>> "msg": "no servers hosting shard: ",
>> "code": 503
>>   }
>> }


Running an updateProcessor after copyField has occurred?

2014-09-12 Thread Douglas Stonham
I'm using the StatelessScriptUpdateProcessorFactory to run a script against 
data as it is imported.

Some relevant pieces from solrconfig.xml:



data-config.xml
textCopyByLang





textCopyByLang.js





In schema.xml I have a number of fields all being combined into one "text" 
field using copyField.  Unfortunately this apparently doesn't happen until 
after the processor chain has completed so the script can't make use of the 
combined text.

i.e.
doc = cmd.solrDoc;
text = doc.getFieldValue("text");  // text is null


Is there a way around this without duplicating all the copyFields inside the 
processing script?

Thanks,

Douglas


Advice on highlighting

2014-09-12 Thread Craig Longman
In order to take our Solr usage to the next step, we really need to
improve its highlighting abilities.  What I'm trying to do is to be able
to write a new component that can return the fields that matched the
search (including numeric fields) and the start/end positions for the
alphanumeric matches.



I see three different approaches take, either way will require making
some modifications to the lucene/solr parts, as it just does not appear
to be doable as a completely stand alone component.



1) At initial search time.

This seemed like a good approach.  I can follow IndexSearcher creating
the TermContext that parses through AtomicReaderContexts to see if it
contains a match and then adds it to the contexts available for later.
However, at this point, inside SegmentTermsEnum.seekExact() it seems
like Solr is not really looking for matching terms as such, it's just
scanning what looks like the raw index.  So, I don't think I can easily
extract term positions at this point.



2) Write a odified HighlighterComponent.  We have managed to get phrases
to highlight properly, but it seems like getting the full field matches
would be more difficult in this module, however, because it does its
highlighting oblivious to any other criteria, we can't use it as is.
For example, this search:



  (body:large+AND+user_id:7)+OR+user_id:346



Will highlight "large" in records that have user_id = 346 when
technically (for our purposes at least) it should not be considered a
hit because the "large" was accompanied by the user_id = 7 criteria.
It's not immediately clear to me how difficult it would be to change
this.



3) Make a modified DebugComponent and enhance the existing explain()
methods (in the query types we require it at least) to include more
information such as the start/end positions of the term that was hit.
I'm exploring this now, but I don't easily see how I can figure out what
those positions might be from the explain() information.  Any pointers
on how, at the point that TermQuery.explain() is being called that I can
figure out which indexed token was the actual hit on?





Craig Longman

C++ Developer

iCONECT Development, LLC
519-645-1663





This message and any attachments are intended only for the use of the addressee 
and may contain information that is privileged and confidential. If the reader 
of the message is not the intended recipient or an authorized representative of 
the intended recipient, you are hereby notified that any dissemination of this 
communication is strictly prohibited. If you have received this communication 
in error, notify the sender immediately by return email and delete the message 
and any attachments from your system.



Re: Running an updateProcessor after copyField has occurred?

2014-09-12 Thread Alexandre Rafalovitch
If you do copyField equivalent in the request processor (there is a
URP for that) before the script one, you would then not need to do the
copyField in the schema. So, a move, not a duplicate.

Or are things more complicated than that?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 12 September 2014 15:49, Douglas Stonham
 wrote:
> I'm using the StatelessScriptUpdateProcessorFactory to run a script against 
> data as it is imported.
>
> Some relevant pieces from solrconfig.xml:
>
>  class="org.apache.solr.handler.dataimport.DataImportHandler">
> 
> data-config.xml
> textCopyByLang
> 
> 
>
> 
> 
> textCopyByLang.js
> 
> 
> 
> 
>
> In schema.xml I have a number of fields all being combined into one "text" 
> field using copyField.  Unfortunately this apparently doesn't happen until 
> after the processor chain has completed so the script can't make use of the 
> combined text.
>
> i.e.
> doc = cmd.solrDoc;
> text = doc.getFieldValue("text");  // text is null
>
>
> Is there a way around this without duplicating all the copyFields inside the 
> processing script?
>
> Thanks,
>
> Douglas


RE: Running an updateProcessor after copyField has occurred?

2014-09-12 Thread Douglas Stonham
Hi Alex,

That seems fair.  The only downside I can think of is that I have to include 
the copyField URP in every request handler that imports data.  Not convenient 
but not a big problem either.

Do you happen to know the name of the URP that performs the copyField 
functionality?  I looked through the list at 
(http://www.solr-start.com/info/update-request-processors/) but don't see one 
that obviously fills the role (sorry if I missed something obvious).

I could easily write one using the script URP but that sounds like it would be 
much less performant than something built in.

Thanks,
Douglas


> -Original Message-
> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Friday, September 12, 2014 1:36 PM
> To: solr-user
> Subject: Re: Running an updateProcessor after copyField has occurred?
> 
> If you do copyField equivalent in the request processor (there is a
> URP for that) before the script one, you would then not need to do the
> copyField in the schema. So, a move, not a duplicate.
> 
> Or are things more complicated than that?
> 
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov
> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
> Solr popularizers community:
> https://www.linkedin.com/groups?gid=6713853
> 
> 
> On 12 September 2014 15:49, Douglas Stonham
>  wrote:
> > I'm using the StatelessScriptUpdateProcessorFactory to run a script against
> data as it is imported.
> >
> > Some relevant pieces from solrconfig.xml:
> >
> >  class="org.apache.solr.handler.dataimport.DataImportHandler">
> > 
> > data-config.xml
> > textCopyByLang
> > 
> > 
> >
> > 
> >  > class="solr.StatelessScriptUpdateProcessorFactory">
> > textCopyByLang.js
> > 
> > 
> > 
> > 
> >
> > In schema.xml I have a number of fields all being combined into one "text"
> field using copyField.  Unfortunately this apparently doesn't happen until
> after the processor chain has completed so the script can't make use of the
> combined text.
> >
> > i.e.
> > doc = cmd.solrDoc;
> > text = doc.getFieldValue("text");  // text is null
> >
> >
> > Is there a way around this without duplicating all the copyFields inside the
> processing script?
> >
> > Thanks,
> >
> > Douglas


Re: Advice on highlighting

2014-09-12 Thread P Williams
Hi Craig,

Have you seen SOLR-4722 (https://issues.apache.org/jira/browse/SOLR-4722)?
 This was my attempt at something similar.

Regards,
Tricia

On Fri, Sep 12, 2014 at 2:23 PM, Craig Longman  wrote:

> In order to take our Solr usage to the next step, we really need to
> improve its highlighting abilities.  What I'm trying to do is to be able
> to write a new component that can return the fields that matched the
> search (including numeric fields) and the start/end positions for the
> alphanumeric matches.
>
>
>
> I see three different approaches take, either way will require making
> some modifications to the lucene/solr parts, as it just does not appear
> to be doable as a completely stand alone component.
>
>
>
> 1) At initial search time.
>
> This seemed like a good approach.  I can follow IndexSearcher creating
> the TermContext that parses through AtomicReaderContexts to see if it
> contains a match and then adds it to the contexts available for later.
> However, at this point, inside SegmentTermsEnum.seekExact() it seems
> like Solr is not really looking for matching terms as such, it's just
> scanning what looks like the raw index.  So, I don't think I can easily
> extract term positions at this point.
>
>
>
> 2) Write a odified HighlighterComponent.  We have managed to get phrases
> to highlight properly, but it seems like getting the full field matches
> would be more difficult in this module, however, because it does its
> highlighting oblivious to any other criteria, we can't use it as is.
> For example, this search:
>
>
>
>   (body:large+AND+user_id:7)+OR+user_id:346
>
>
>
> Will highlight "large" in records that have user_id = 346 when
> technically (for our purposes at least) it should not be considered a
> hit because the "large" was accompanied by the user_id = 7 criteria.
> It's not immediately clear to me how difficult it would be to change
> this.
>
>
>
> 3) Make a modified DebugComponent and enhance the existing explain()
> methods (in the query types we require it at least) to include more
> information such as the start/end positions of the term that was hit.
> I'm exploring this now, but I don't easily see how I can figure out what
> those positions might be from the explain() information.  Any pointers
> on how, at the point that TermQuery.explain() is being called that I can
> figure out which indexed token was the actual hit on?
>
>
>
>
>
> Craig Longman
>
> C++ Developer
>
> iCONECT Development, LLC
> 519-645-1663
>
>
>
>
>
> This message and any attachments are intended only for the use of the
> addressee and may contain information that is privileged and confidential.
> If the reader of the message is not the intended recipient or an authorized
> representative of the intended recipient, you are hereby notified that any
> dissemination of this communication is strictly prohibited. If you have
> received this communication in error, notify the sender immediately by
> return email and delete the message and any attachments from your system.
>
>


Re: Running an updateProcessor after copyField has occurred?

2014-09-12 Thread Alexandre Rafalovitch
http://www.solr-start.com/javadoc/solr-lucene/org/apache/solr/update/processor/CloneFieldUpdateProcessorFactory.html
?

Clone, not Copy.

Regards,
   Alex.
P.s. I welcome private email feedback on that resource page as well :-)

Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
Solr popularizers community: https://www.linkedin.com/groups?gid=6713853


On 12 September 2014 17:02, Douglas Stonham
 wrote:
> Hi Alex,
>
> That seems fair.  The only downside I can think of is that I have to include 
> the copyField URP in every request handler that imports data.  Not convenient 
> but not a big problem either.
>
> Do you happen to know the name of the URP that performs the copyField 
> functionality?  I looked through the list at 
> (http://www.solr-start.com/info/update-request-processors/) but don't see one 
> that obviously fills the role (sorry if I missed something obvious).
>
> I could easily write one using the script URP but that sounds like it would 
> be much less performant than something built in.
>
> Thanks,
> Douglas
>
>
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Friday, September 12, 2014 1:36 PM
>> To: solr-user
>> Subject: Re: Running an updateProcessor after copyField has occurred?
>>
>> If you do copyField equivalent in the request processor (there is a
>> URP for that) before the script one, you would then not need to do the
>> copyField in the schema. So, a move, not a duplicate.
>>
>> Or are things more complicated than that?
>>
>> Regards,
>>Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov
>> Solr resources and newsletter: http://www.solr-start.com/ and @solrstart
>> Solr popularizers community:
>> https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On 12 September 2014 15:49, Douglas Stonham
>>  wrote:
>> > I'm using the StatelessScriptUpdateProcessorFactory to run a script against
>> data as it is imported.
>> >
>> > Some relevant pieces from solrconfig.xml:
>> >
>> > > class="org.apache.solr.handler.dataimport.DataImportHandler">
>> > 
>> > data-config.xml
>> > textCopyByLang
>> > 
>> > 
>> >
>> > 
>> > > > class="solr.StatelessScriptUpdateProcessorFactory">
>> > textCopyByLang.js
>> > 
>> > 
>> > 
>> > 
>> >
>> > In schema.xml I have a number of fields all being combined into one "text"
>> field using copyField.  Unfortunately this apparently doesn't happen until
>> after the processor chain has completed so the script can't make use of the
>> combined text.
>> >
>> > i.e.
>> > doc = cmd.solrDoc;
>> > text = doc.getFieldValue("text");  // text is null
>> >
>> >
>> > Is there a way around this without duplicating all the copyFields inside 
>> > the
>> processing script?
>> >
>> > Thanks,
>> >
>> > Douglas


AW: SolrJ : fieldcontent from (multiple) file(s)

2014-09-12 Thread Clemens Wyss DEV
Erick, thanks for you input. You are right that the "miraculous connection" is 
not always that miraculous ;)

In your example the extraction is being done in the client side. But as I said, 
I'd ideally like to put the burden of tika-extraction into the Solr-process. 
All fields, but the file-content-based-fields, should be field on the client 
side and only the file-content-based-fields shall be extracted (before 
indexing) in Solr. So it would "only" be the files that needed tob e "shared"

--Clemens

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Freitag, 12. September 2014 17:57
An: solr-user@lucene.apache.org
Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)

bq: I could of course push in the filename(s) in a field, but this would 
require Solr (due to field-type e.g. "filecontent") to extract the content from 
the given file.

Why? If you're already dealing with SolrJ, you do all the work you need to 
there by adding fields to a SolrInputDocument, including any metadata and 
content your client extracts. Here's an example that uses Tika (shipped with 
Solr) to do just that, as well as extract DB contents etc.

http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick

On Fri, Sep 12, 2014 at 5:55 AM, Clemens Wyss DEV  wrote:
> Thanks Alex,
>> Do you just care about document content?
> content only.
>
> The documents (not necessarily coming from a Db) are being pushed (through 
> Solrj). This is at least the initial idea, mainly due to the dynamic nature 
> of our index/search architecture.
> I could of course push in the filename(s) in a field, but this would require 
> Solr (due to field-type e.g. "filecontent") to extract the content from the 
> given file. Is something alike this possible in Solr-indexing?
>
>> DataImportHandler
> Would I need to write a custom DIH? Or is the DIH as is, i.e. just 
> configurable through the data-config.xml?
>
>> nested entities design
> Could you link me to this concept/idea?
>
> -Ursprüngliche Nachricht-
> Von: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Gesendet: Freitag, 12. September 2014 14:12
> An: solr-user
> Betreff: Re: SolrJ : fieldcontent from (multiple) file(s)
>
> Do you just care about document content? Not metadata, such as file name, 
> date, author, etc?
>
> Does it have to be push into Solr or can be pull? If pull, DataImportHandler 
> should be able to do what you want with nested entities design.
>
> Regards,
>Alex.
> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
> popularizers community: https://www.linkedin.com/groups?gid=6713853
>
>
> On 12 September 2014 06:53, Clemens Wyss DEV  wrote:
>> Looks like I haven't finished " I know"
>> I know I could extract the content on our server's side, but I'd really like 
>> to take that burden of it.
>> That said:
>> Can I hand in the path-to-the-file in a "specific field" which would yield 
>> an extraction in Solr?
>>
>> -Ursprüngliche Nachricht-
>> Von: Clemens Wyss DEV [mailto:clemens...@mysign.ch]
>> Gesendet: Freitag, 12. September 2014 11:30
>> An: 'solr-user@lucene.apache.org'
>> Betreff: SolrJ : fieldcontent from (multiple) file(s)
>>
>> First of all  I'd like to say hello to the Solr world/community ;) So far we 
>> have been using Lucene as-is and now intend to go for Solr.
>>
>> Say I have a document which in one field should have the content of  
>> a file (indexed only, not stored), in order to make the document 
>> searchable due to the file's content. I know
>>
>> How is this achieved using SolrJ, i.e. how do I hand in this document?
>>
>> Thx
>> Clemens
>>