Re: Proximity(tilde) combined with wildcard, AutomatonQuery ?

2012-09-27 Thread Vadim Kisselmann
Hi Ahmet,
thanks for your reply:)
I see that it does not come with the 4.0 release, because the given
patches do not work with this version.
Right?
Best regards
Vadim


2012/9/26 Ahmet Arslan :
>
>> we assume i have a simple query like this with wildcard and
>> tilde:
>>
>> "japa* fukushima"~10
>>
>> instead of "japan fukushima"~10 OR "japanese fukushima"~10,
>> etc.
>>
>> Do we have a solution in Solr 4.0 to work with these kind of
>> queries?
>
> Vadim, two open jira issues:
>
> https://issues.apache.org/jira/browse/SOLR-1604
> https://issues.apache.org/jira/browse/LUCENE-1486
>


Re: Items disappearing from Solr index

2012-09-27 Thread Kissue Kissue
#What is the field type for that field - string or text?

It is a string type.

Thanks.

On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky wrote:

> What is the field type for that field - string or text?
>
>
> -- Jack Krupansky
>
> -Original Message- From: Kissue Kissue
> Sent: Wednesday, September 26, 2012 1:43 PM
>
> To: solr-user@lucene.apache.org
> Subject: Re: Items disappearing from Solr index
>
> # It is looking for documents with "Emory" in the specified field OR "Labs"
> in the default search field.
>
> This does not seem to be the case. For instance issuing a deleteByQuery for
> catalogueId: "PEARL LINGUISTICS LTD" also deletes the contents of a
> catalogueId with the value: "Ncl_**MacNaughtonMcGregorCoaching_**
> vf010811".
>
> Thanks.
>
> On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky *
> *wrote:
>
>  It is looking for documents with "Emory" in the specified field OR "Labs"
>> in the default search field.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Kissue Kissue
>> Sent: Wednesday, September 26, 2012 7:47 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Items disappearing from Solr index
>>
>>
>> I have just solved this problem.
>>
>> We have a field called catalogueId. One possible value for this field
>> could
>> be "Emory Labs". I found out that when the following delete by query is
>> sent to solr:
>>
>> getSolrServer().deleteByQuery(catalogueId + ":" + Emory Labs)
>>  [Notice
>>
>> that
>> there are no quotes surrounding the catalogueId value - Emory Labs]
>>
>> For some reason this delete by query ends up deleting the contents of some
>> other random catalogues too which is the reason why we are loosing items
>> from the index. When the query is changed to:
>>
>> getSolrServer().deleteByQuery(catalogueId + ":" + "Emory Labs"),
>> then it
>>
>> starts to correctly delete only items in the Emory Labs catalogue.
>>
>> So my first question is, what exactly does deleteByQuery do in the first
>> query without the quotes? How is it determining which catalogues to
>> delete?
>>
>> Secondly, shouldn't the correct behaviour be not to delete anything at all
>> in this case since when a search is done for the same catalogueId without
>> the quotes it just simply returns no results?
>>
>> Thanks.
>>
>>
>> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue 
>> wrote:
>>
>>  Hi Erick,
>>
>>>
>>> Thanks for your reply. Yes i am using delete by query. I am currently
>>> logging the number of items to be deleted before handing off to solr. And
>>> from solr logs i can it deleted exactly that number. I will verify
>>> further.
>>>
>>> Thanks.
>>>
>>>
>>> On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson >> >
>>> **wrote:
>>>
>>>
>>>  How do you delete items? By ID or by query?
>>>

 My guess is that one of two things is happening:
 1> your delete process is deleting too much data.
 2> your index process isn't indexing what you think.

 I'd add some logging to the SolrJ program to see what
 it thinks is has deleted or added to the index and go from there.

 Best
 Erick

 On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue 
 wrote:
 > Hi,
 >
 > I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer
 to
 > index and delete items from solr.
 >
 > I basically index items from the db into solr every night. Existing
 items
 > can be marked for deletion in the db and a delete request sent to solr
 to
 > delete such items.
 >
 > My process runs as follows every night:
 >
 > 1. Check if items have been marked for deletion and delete from solr.
 > I
 > commit and optimize after the entire solr deletion runs.
 > 2. Index any new items to solr. I commit and optimize after all the >
 new
 > items have been added.
 >
 > Recently i started noticing that huge chunks of items that have not >
 been
 > marked for deletion are disappearing from the index. I checked the >
 solr
 > logs and the logs indicate that it is deleting exactly the number of
 items
 > requested but still a lot of other items disappear from the index from
 time
 > to time. Any ideas what might be causing this or what i am doing >
 wrong.
 >
 >
 > Thanks.



>>>
>>>
>>
>


Re: How can I create about 100000 independent indexes in Solr?

2012-09-27 Thread Tanguy Moal
Hello Monton,

I wanted to make sure that you understood me well : I really don't how well
does solr scale if the number of fields increases...

What I mean here is that the more distinct fields you index, the more
memory you will need.

So if in your schema, you have something like 15 fields declared, then
storing data for 100 distinct customers would generate 1500 fields in the
index.

I really don't know how well would that scale.

The simplest solution is one core per customer but the same issue (memory
consumption) will rise at some point, I guess.

There must be a clever way to do that...

--
Tanguy

2012/9/26 韦震宇 

> Hi, Tanguy
>  I would do as your suggestion.
> Best Regards!
> Monton
> - Original Message -
> From: "Tanguy Moal" 
> To: ; 
> Sent: Tuesday, September 25, 2012 11:05 PM
> Subject: Re: How can I create about 10 independent indexes in Solr?
>
>
> That is an interesting issue...
> I was wondering if relying on dynamic fields could be an option...
>
> Something like :
>
> * : 
> * customer : string
> * *_field_a1 : type_a
> * *_field_a2 : type_a
> * *_field_b1 : type_b
> * ...
>
> And the prefix each field by the customer name, so for customer1, indexed
> documents are as follow :
> * customer : customer1
> * customer1_field_a1 : value for field_a1
> * customer1_field_a2 : value for field_a2
> * customer1_field_b1 : value for field_b1
> * ...
> And for customer2 :
> * customer : customer2
> * customer2_field_a1 : value for field_a1
> * customer2_field_a2 : value for field_a2
> * customer2_field_b1 : value for field_b1
> * ...
>
> This solution is simple, and helps isolating each customers fields so
> features like suggester, spellcheck, ..., things relying on frequencies
> would work (as if in a single core)
>
> I just don't how well does solr scale if the number of fields increases...
>
> Then scaling could be achieved depending on number of doc / customer and
> number of customer / core (if amount of fields consumes resources)
>
> Could that help ?
>
> --
> Tanguy
>
> 2012/9/25 Toke Eskildsen 
>
> > On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote:
> > > The company I'm working in have a website to server more than 10
> > > customers, and every customer should have it's own search cataegory.
> > > So I should create independent index for every customer.
> >
> > How many of the customers are active at any given time and how large are
> > the indexes? Depending on usage you might be able to have a limited
> > number of indexes open at any given time and opening new indexes on
> > demand.
> >
> >
>


ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Shigeki Kobayashi
Hi guys,


I use Manifold CF to crawl files in Windows file server and index them to
Solr using Extracting Request Handler.
Most of the documents are succesfully indexed but some are failed and Out
of Memory Error occurs in Solr, so I need some advice.

Those failed files are not so big and they are a csv file of 240MB and a
text file of 170MB.

Here is environment and machine spec:
Solr 3.6 (also Solr4.0Beta)
Tomcat 6.0
CentOS 5.6
java version 1.6.0_23
HDD 60GB
MEM 2GB
JVM Heap: -Xmx1024m -Xms1024m

I feel there is enough memory that Solr should be able to extract and index
file content.

Here is a Solr log below:
--
[solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
Java heap space
at java.util.Arrays.copyOf(Arrays.java:2882)
at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
at
java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
at java.lang.StringBuilder.append(StringBuilder.java:189)
at
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
at
org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
at
org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
at
org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
at
org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
at
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)

-

Anyone has any ideas?

Regards,

Shigeki


Re: How to retrive value from float field in custom request handler?

2012-09-27 Thread ravicv
Thanks guys.. I was able to retrieve all values now.. 
But why  Solr Field is not having a method to retrieve values for all data
types?
something similar to 
Object obj =  doc.getField("Field1");

Why only stringvalue is exposed in this Field class?

doc.getField("Field1").stringValue()

Thanks,
ravi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-retrive-value-from-float-field-in-custom-request-handler-tp4010478p4010707.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem with Special Characters in SOLR Query

2012-09-27 Thread aniljayanti
Hi ,

Im using "text_general" fieldType for searching in SOLR. while searching
keywords along with special characters not getting proper results and
getting errors. used special characters like below.
1) - 
2) &
3) +

QUERY :: 

*solr?q=Healing - Live*
*solr?q=Healing & Live*
*solr?q=Healing ? Live*

Error  message:

The request sent by the client was syntactically incorrect
(org.apache.lucene.queryParser.ParseException: Cannot parse '("Healing \':
Lexical error at line 1, column 8. Encountered:  after : "\"Healing
\\").


schema.xml
---


  
  



  
  





  







text



Please suggest me in this, and thanks in advance.

AnilJayanti



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread irshad siddiqui
Hi,
Just escape all solr special char

example:
solr?q=Healing \& Live

Regards,
Irshad

On Thu, Sep 27, 2012 at 3:55 PM, aniljayanti  wrote:

> Hi ,
>
> Im using "text_general" fieldType for searching in SOLR. while searching
> keywords along with special characters not getting proper results and
> getting errors. used special characters like below.
> 1) -
> 2) &
> 3) +
>
> QUERY ::
>
> *solr?q=Healing - Live*
> *solr?q=Healing & Live*
> *solr?q=Healing ? Live*
>
> Error  message:
>
> The request sent by the client was syntactically incorrect
> (org.apache.lucene.queryParser.ParseException: Cannot parse '("Healing \':
> Lexical error at line 1, column 8. Encountered:  after : "\"Healing
> \\").
>
>
> schema.xml
> ---
>
>  positionIncrementGap="100">
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
> 
> 
>   
>   
> 
>  words="stopwords.txt" enablePositionIncrements="true" />
>  ignoreCase="true" expand="true"/>
> 
> 
>   
> 
>
>
>   stored="true" />
>
>  multiValued="true"/>
>
> text
>
> 
>
> Please suggest me in this, and thanks in advance.
>
> AnilJayanti
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: Solr Replication and Autocommit

2012-09-27 Thread Erick Erickson
I'll echo Otis, nothing comes to mind...

Unless you were indexing stuff to the _slaves_, which you should
never do, now or in the past

Erick

On Thu, Sep 27, 2012 at 12:00 AM, Aleksey Vorona  wrote:
> Hi,
>
> I remember having some issues with replication and autocommit previously.
> But now we are using Solr 3.6.1. Are there any known issues or any other
> reasons to avoid autocommit while using replication? I guess not, just want
> confirmation from someone confident and competent.
>
> -- Aleksey


Re: How can I create about 100000 independent indexes in Solr?

2012-09-27 Thread 韦震宇
Hi, Tanguy
   Oh, I understand now. I don't have the issue as you. Though there
are so many customers in our site, but the fields they owned are same.
so few field fields are ok in my scene.
Best Regards!
Monton

- Original Message - 
From: "Tanguy Moal" 
To: 
Sent: Thursday, September 27, 2012 4:34 PM
Subject: Re: How can I create about 10 independent indexes in Solr?


Hello Monton,

I wanted to make sure that you understood me well : I really don't how well
does solr scale if the number of fields increases...

What I mean here is that the more distinct fields you index, the more
memory you will need.

So if in your schema, you have something like 15 fields declared, then
storing data for 100 distinct customers would generate 1500 fields in the
index.

I really don't know how well would that scale.

The simplest solution is one core per customer but the same issue (memory
consumption) will rise at some point, I guess.

There must be a clever way to do that...

--
Tanguy

2012/9/26 韦震宇 

> Hi, Tanguy
>  I would do as your suggestion.
> Best Regards!
> Monton
> - Original Message -
> From: "Tanguy Moal" 
> To: ; 
> Sent: Tuesday, September 25, 2012 11:05 PM
> Subject: Re: How can I create about 10 independent indexes in Solr?
>
>
> That is an interesting issue...
> I was wondering if relying on dynamic fields could be an option...
>
> Something like :
>
> * : 
> * customer : string
> * *_field_a1 : type_a
> * *_field_a2 : type_a
> * *_field_b1 : type_b
> * ...
>
> And the prefix each field by the customer name, so for customer1, indexed
> documents are as follow :
> * customer : customer1
> * customer1_field_a1 : value for field_a1
> * customer1_field_a2 : value for field_a2
> * customer1_field_b1 : value for field_b1
> * ...
> And for customer2 :
> * customer : customer2
> * customer2_field_a1 : value for field_a1
> * customer2_field_a2 : value for field_a2
> * customer2_field_b1 : value for field_b1
> * ...
>
> This solution is simple, and helps isolating each customers fields so
> features like suggester, spellcheck, ..., things relying on frequencies
> would work (as if in a single core)
>
> I just don't how well does solr scale if the number of fields increases...
>
> Then scaling could be achieved depending on number of doc / customer and
> number of customer / core (if amount of fields consumes resources)
>
> Could that help ?
>
> --
> Tanguy
>
> 2012/9/25 Toke Eskildsen 
>
> > On Tue, 2012-09-25 at 04:21 +0200, 韦震宇 wrote:
> > > The company I'm working in have a website to server more than 10
> > > customers, and every customer should have it's own search cataegory.
> > > So I should create independent index for every customer.
> >
> > How many of the customers are active at any given time and how large are
> > the indexes? Depending on usage you might be able to have a limited
> > number of indexes open at any given time and opening new indexes on
> > demand.
> >
> >
>


httpSolrServer and exyternal load balancer

2012-09-27 Thread Lee Carroll
Hi

We have the following solr http server










The issue we face is the f5 balancer is returning a cookie which the client
is hanging onto. resulting in the same slave being hit for all requests.

one obvious solution is to config the load balancer to be non sticky
however politically a "non-standard" load balancer is timescale suicide.
(It is an out sourced corporate thing)

I'm not keen to use the LB http solr server as i don't want this to be a
concern of the software and have a list of servers etc. (although as a stop
gap may well have to)

My question is can I configure the solr server to ignore client state ? We
are on solr 3.4

Thanks in advance lee c


Re: Items disappearing from Solr index

2012-09-27 Thread Erick Erickson
Wild shot in the dark

What happens if you switch from StreamingUpdateSolrServer to HttpSolrServer?

What I'm wondering is if somehow you're getting a queueing problem. If you have
multiple threads defined for SUSS, it might be possible (and I'm guessing) that
the delete bit is getting sent after some of the adds. Frankly I doubt this is
the case, but this issue is so weird that I'm grasping at straws.

BTW, there's no reason to optimize twice. Actually, the new thinking is that
optimizing usually isn't necessary anyway. But if you insist on optimizing
there's no reason to do it _both_ after the deletes and after the adds, just
do it after the adds.

Best
Erick

On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue  wrote:
> #What is the field type for that field - string or text?
>
> It is a string type.
>
> Thanks.
>
> On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky 
> wrote:
>
>> What is the field type for that field - string or text?
>>
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Kissue Kissue
>> Sent: Wednesday, September 26, 2012 1:43 PM
>>
>> To: solr-user@lucene.apache.org
>> Subject: Re: Items disappearing from Solr index
>>
>> # It is looking for documents with "Emory" in the specified field OR "Labs"
>> in the default search field.
>>
>> This does not seem to be the case. For instance issuing a deleteByQuery for
>> catalogueId: "PEARL LINGUISTICS LTD" also deletes the contents of a
>> catalogueId with the value: "Ncl_**MacNaughtonMcGregorCoaching_**
>> vf010811".
>>
>> Thanks.
>>
>> On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky *
>> *wrote:
>>
>>  It is looking for documents with "Emory" in the specified field OR "Labs"
>>> in the default search field.
>>>
>>> -- Jack Krupansky
>>>
>>> -Original Message- From: Kissue Kissue
>>> Sent: Wednesday, September 26, 2012 7:47 AM
>>> To: solr-user@lucene.apache.org
>>> Subject: Re: Items disappearing from Solr index
>>>
>>>
>>> I have just solved this problem.
>>>
>>> We have a field called catalogueId. One possible value for this field
>>> could
>>> be "Emory Labs". I found out that when the following delete by query is
>>> sent to solr:
>>>
>>> getSolrServer().deleteByQuery(catalogueId + ":" + Emory Labs)
>>>  [Notice
>>>
>>> that
>>> there are no quotes surrounding the catalogueId value - Emory Labs]
>>>
>>> For some reason this delete by query ends up deleting the contents of some
>>> other random catalogues too which is the reason why we are loosing items
>>> from the index. When the query is changed to:
>>>
>>> getSolrServer().deleteByQuery(catalogueId + ":" + "Emory Labs"),
>>> then it
>>>
>>> starts to correctly delete only items in the Emory Labs catalogue.
>>>
>>> So my first question is, what exactly does deleteByQuery do in the first
>>> query without the quotes? How is it determining which catalogues to
>>> delete?
>>>
>>> Secondly, shouldn't the correct behaviour be not to delete anything at all
>>> in this case since when a search is done for the same catalogueId without
>>> the quotes it just simply returns no results?
>>>
>>> Thanks.
>>>
>>>
>>> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue 
>>> wrote:
>>>
>>>  Hi Erick,
>>>

 Thanks for your reply. Yes i am using delete by query. I am currently
 logging the number of items to be deleted before handing off to solr. And
 from solr logs i can it deleted exactly that number. I will verify
 further.

 Thanks.


 On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson >>> >
 **wrote:


  How do you delete items? By ID or by query?

>
> My guess is that one of two things is happening:
> 1> your delete process is deleting too much data.
> 2> your index process isn't indexing what you think.
>
> I'd add some logging to the SolrJ program to see what
> it thinks is has deleted or added to the index and go from there.
>
> Best
> Erick
>
> On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue 
> wrote:
> > Hi,
> >
> > I am running Solr 3.5, using SolrJ and using StreamingUpdateSolrServer
> to
> > index and delete items from solr.
> >
> > I basically index items from the db into solr every night. Existing
> items
> > can be marked for deletion in the db and a delete request sent to solr
> to
> > delete such items.
> >
> > My process runs as follows every night:
> >
> > 1. Check if items have been marked for deletion and delete from solr.
> > I
> > commit and optimize after the entire solr deletion runs.
> > 2. Index any new items to solr. I commit and optimize after all the >
> new
> > items have been added.
> >
> > Recently i started noticing that huge chunks of items that have not >
> been
> > marked for deletion are disappearing from the index. I checked the >
> solr
> > logs and the logs indicate that it is deleting exactly the number of
> items
> > request

Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread aniljayanti
Hi,

thanks,

I tried with below query getting result.

q=Cheat \- Album Version

But getting error with below.

q=Oot \& Aboot

Error message :
--
message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \':
Lexical error at line 1, column 6. Encountered:  after : ""

description The request sent by the client was syntactically incorrect
(org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical
error at line 1, column 6. Encountered:  after : "").

anilJayanti





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712p4010720.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Erick Erickson
What client state? Solr servers are stateless, they don't
keep any information specific to particular clients so this
doesn't seem to be a problem.

What Solr _does_ do is cache things like fq clauses, but
these are not user-specific. Which actually argues for going
to the same slave on the theory that requests from a
user are more likely to have the same fq clauses. Consider
faceting on shoes. The user clicks "mens" and you add an
fq like &fq=gender:mens. Then the user wants dress shoes
so you submit another query &fq=gender:mens&fq=style:dress.
The first fq clause has already been calculated and cached so
doesn't have to be re-calculated for the second query...

But the stickiness is usually the way Solr is used, so this seems
like a red herring.

FWIW,
Erick

On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
 wrote:
> Hi
>
> We have the following solr http server
>
>  id="solrserver" >
> 
> 
> 
> 
> 
> 
> 
>
> The issue we face is the f5 balancer is returning a cookie which the client
> is hanging onto. resulting in the same slave being hit for all requests.
>
> one obvious solution is to config the load balancer to be non sticky
> however politically a "non-standard" load balancer is timescale suicide.
> (It is an out sourced corporate thing)
>
> I'm not keen to use the LB http solr server as i don't want this to be a
> concern of the software and have a list of servers etc. (although as a stop
> gap may well have to)
>
> My question is can I configure the solr server to ignore client state ? We
> are on solr 3.4
>
> Thanks in advance lee c


Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread Toke Eskildsen
On Thu, 2012-09-27 at 13:49 +0200, aniljayanti wrote:
> But getting error with below.
> 
> q=Oot \& Aboot
> 
> Error message :
> --
> message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \':
> Lexical error at line 1, column 6. Encountered:  after : ""

It seems like you are sending the query by performing a REST-call.
You need to URL-escape those, because & in an url is a delimiter for
arguments.

Instead of
http://localhost:8983/solr/collection1/select/?q=Oot \& Aboot
you need to send
http://localhost:8983/solr/collection1/select/?q=Oot%20%5C%26%20Aboot



Re: Problem with Special Characters in SOLR Query

2012-09-27 Thread Erick Erickson
Right, you're conflating two separate issues
1> URL escaping. the & is a special character in the URL, entirely
separate from Solr. Try using %26 rather than \&
2> Query parsing. Once the string gets through the URL and servlet
   container, it's in query parsing land, where the escaping of
   _query_ special characters like '-' counts.
3> And just to confuse matters a LOT, when you're looking at
 URLs, space is translated to '+'. So when you look in your log
 file, you'll see the query q=me myself reported as
 q=me+myself which has nothing to do with the Lucene MUST
(+) operator

Best
Erick

On Thu, Sep 27, 2012 at 7:49 AM, aniljayanti  wrote:
> Hi,
>
> thanks,
>
> I tried with below query getting result.
>
> q=Cheat \- Album Version
>
> But getting error with below.
>
> q=Oot \& Aboot
>
> Error message :
> --
> message org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \':
> Lexical error at line 1, column 6. Encountered:  after : ""
>
> description The request sent by the client was syntactically incorrect
> (org.apache.lucene.queryParser.ParseException: Cannot parse 'Oot \': Lexical
> error at line 1, column 6. Encountered:  after : "").
>
> anilJayanti
>
>
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Problem-with-Special-Characters-in-SOLR-Query-tp4010712p4010720.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Lee Carroll
Hi Erick,

the load balancer in front of the solr servers is dropping the cookie not
the solr server themselves.

are you saying the clients http connection manager builds will ignore this
state ? it looks like they do not. It looks like the
client is passing the cookie back to the load balancer

I want to configure the clients not to pass cookies basically.

Does that make sense ?



On 27 September 2012 12:54, Erick Erickson  wrote:

> What client state? Solr servers are stateless, they don't
> keep any information specific to particular clients so this
> doesn't seem to be a problem.
>
> What Solr _does_ do is cache things like fq clauses, but
> these are not user-specific. Which actually argues for going
> to the same slave on the theory that requests from a
> user are more likely to have the same fq clauses. Consider
> faceting on shoes. The user clicks "mens" and you add an
> fq like &fq=gender:mens. Then the user wants dress shoes
> so you submit another query &fq=gender:mens&fq=style:dress.
> The first fq clause has already been calculated and cached so
> doesn't have to be re-calculated for the second query...
>
> But the stickiness is usually the way Solr is used, so this seems
> like a red herring.
>
> FWIW,
> Erick
>
> On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
>  wrote:
> > Hi
> >
> > We have the following solr http server
> >
> >  > id="solrserver" >
> > 
> > 
> > 
> > 
> > 
> > 
> > 
> >
> > The issue we face is the f5 balancer is returning a cookie which the
> client
> > is hanging onto. resulting in the same slave being hit for all requests.
> >
> > one obvious solution is to config the load balancer to be non sticky
> > however politically a "non-standard" load balancer is timescale suicide.
> > (It is an out sourced corporate thing)
> >
> > I'm not keen to use the LB http solr server as i don't want this to be a
> > concern of the software and have a list of servers etc. (although as a
> stop
> > gap may well have to)
> >
> > My question is can I configure the solr server to ignore client state ?
> We
> > are on solr 3.4
> >
> > Thanks in advance lee c
>


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Erick Erickson
But again, why do you want to do this? I really think you don't.

I'm assuming that when you say this:
"...resulting in the same slave being hit for all requests."

you mean "all requests _from the same client_". If that's
not what's happening, then disregard my maundering
because when it comes to setting up LBs, I'm clueless. But
I can say that many installations have LBs set up with
sticky sessions on a per-client basis..

Consider another scenario; replication. If you have 2 slaves,
each with a polling interval of 5 minutes note that they are
not coordinated. So slave 1 can poll at 14:00:00. Slave 2
at 14:01:00. Say there's been a commit at 14:00:30. Requests
to slave 2 will have a different view of the index than slave 1,
so if your user resends the exact same request, they may
see different results. I could submit the request 5 times in a
row and the results would not only be different each time, they
would flip-flop back and forth.

I wouldn't do this unless and until you have a demonstrated need.

Best
Erick

On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll
 wrote:
> Hi Erick,
>
> the load balancer in front of the solr servers is dropping the cookie not
> the solr server themselves.
>
> are you saying the clients http connection manager builds will ignore this
> state ? it looks like they do not. It looks like the
> client is passing the cookie back to the load balancer
>
> I want to configure the clients not to pass cookies basically.
>
> Does that make sense ?
>
>
>
> On 27 September 2012 12:54, Erick Erickson  wrote:
>
>> What client state? Solr servers are stateless, they don't
>> keep any information specific to particular clients so this
>> doesn't seem to be a problem.
>>
>> What Solr _does_ do is cache things like fq clauses, but
>> these are not user-specific. Which actually argues for going
>> to the same slave on the theory that requests from a
>> user are more likely to have the same fq clauses. Consider
>> faceting on shoes. The user clicks "mens" and you add an
>> fq like &fq=gender:mens. Then the user wants dress shoes
>> so you submit another query &fq=gender:mens&fq=style:dress.
>> The first fq clause has already been calculated and cached so
>> doesn't have to be re-calculated for the second query...
>>
>> But the stickiness is usually the way Solr is used, so this seems
>> like a red herring.
>>
>> FWIW,
>> Erick
>>
>> On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
>>  wrote:
>> > Hi
>> >
>> > We have the following solr http server
>> >
>> > > > id="solrserver" >
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> > 
>> >
>> > The issue we face is the f5 balancer is returning a cookie which the
>> client
>> > is hanging onto. resulting in the same slave being hit for all requests.
>> >
>> > one obvious solution is to config the load balancer to be non sticky
>> > however politically a "non-standard" load balancer is timescale suicide.
>> > (It is an out sourced corporate thing)
>> >
>> > I'm not keen to use the LB http solr server as i don't want this to be a
>> > concern of the software and have a list of servers etc. (although as a
>> stop
>> > gap may well have to)
>> >
>> > My question is can I configure the solr server to ignore client state ?
>> We
>> > are on solr 3.4
>> >
>> > Thanks in advance lee c
>>


Re: Items disappearing from Solr index

2012-09-27 Thread Kissue Kissue
Actually this problem occurs even when i am doing just deletes. I tested by
sending only one delete query for a single catalogue and had the same
problem. I always optimize once.

I changed to the syntax you suggested ( {!term f=catalogueId}Emory Labs)
and works like a charm. Thanks for the pointer, saved me from another issue
that could have occurred at some point.

Thanks.



On Thu, Sep 27, 2012 at 12:30 PM, Erick Erickson wrote:

> Wild shot in the dark
>
> What happens if you switch from StreamingUpdateSolrServer to
> HttpSolrServer?
>
> What I'm wondering is if somehow you're getting a queueing problem. If you
> have
> multiple threads defined for SUSS, it might be possible (and I'm guessing)
> that
> the delete bit is getting sent after some of the adds. Frankly I doubt
> this is
> the case, but this issue is so weird that I'm grasping at straws.
>
> BTW, there's no reason to optimize twice. Actually, the new thinking is
> that
> optimizing usually isn't necessary anyway. But if you insist on optimizing
> there's no reason to do it _both_ after the deletes and after the adds,
> just
> do it after the adds.
>
> Best
> Erick
>
> On Thu, Sep 27, 2012 at 4:31 AM, Kissue Kissue 
> wrote:
> > #What is the field type for that field - string or text?
> >
> > It is a string type.
> >
> > Thanks.
> >
> > On Wed, Sep 26, 2012 at 8:14 PM, Jack Krupansky  >wrote:
> >
> >> What is the field type for that field - string or text?
> >>
> >>
> >> -- Jack Krupansky
> >>
> >> -Original Message- From: Kissue Kissue
> >> Sent: Wednesday, September 26, 2012 1:43 PM
> >>
> >> To: solr-user@lucene.apache.org
> >> Subject: Re: Items disappearing from Solr index
> >>
> >> # It is looking for documents with "Emory" in the specified field OR
> "Labs"
> >> in the default search field.
> >>
> >> This does not seem to be the case. For instance issuing a deleteByQuery
> for
> >> catalogueId: "PEARL LINGUISTICS LTD" also deletes the contents of a
> >> catalogueId with the value: "Ncl_**MacNaughtonMcGregorCoaching_**
> >> vf010811".
> >>
> >> Thanks.
> >>
> >> On Wed, Sep 26, 2012 at 2:37 PM, Jack Krupansky <
> j...@basetechnology.com>*
> >> *wrote:
> >>
> >>  It is looking for documents with "Emory" in the specified field OR
> "Labs"
> >>> in the default search field.
> >>>
> >>> -- Jack Krupansky
> >>>
> >>> -Original Message- From: Kissue Kissue
> >>> Sent: Wednesday, September 26, 2012 7:47 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Items disappearing from Solr index
> >>>
> >>>
> >>> I have just solved this problem.
> >>>
> >>> We have a field called catalogueId. One possible value for this field
> >>> could
> >>> be "Emory Labs". I found out that when the following delete by query is
> >>> sent to solr:
> >>>
> >>> getSolrServer().deleteByQuery(catalogueId + ":" + Emory Labs)
> >>>  [Notice
> >>>
> >>> that
> >>> there are no quotes surrounding the catalogueId value - Emory Labs]
> >>>
> >>> For some reason this delete by query ends up deleting the contents of
> some
> >>> other random catalogues too which is the reason why we are loosing
> items
> >>> from the index. When the query is changed to:
> >>>
> >>> getSolrServer().deleteByQuery(catalogueId + ":" + "Emory Labs"),
> >>> then it
> >>>
> >>> starts to correctly delete only items in the Emory Labs catalogue.
> >>>
> >>> So my first question is, what exactly does deleteByQuery do in the
> first
> >>> query without the quotes? How is it determining which catalogues to
> >>> delete?
> >>>
> >>> Secondly, shouldn't the correct behaviour be not to delete anything at
> all
> >>> in this case since when a search is done for the same catalogueId
> without
> >>> the quotes it just simply returns no results?
> >>>
> >>> Thanks.
> >>>
> >>>
> >>> On Mon, Sep 24, 2012 at 3:12 PM, Kissue Kissue 
> >>> wrote:
> >>>
> >>>  Hi Erick,
> >>>
> 
>  Thanks for your reply. Yes i am using delete by query. I am currently
>  logging the number of items to be deleted before handing off to solr.
> And
>  from solr logs i can it deleted exactly that number. I will verify
>  further.
> 
>  Thanks.
> 
> 
>  On Mon, Sep 24, 2012 at 1:21 PM, Erick Erickson <
> erickerick...@gmail.com
>  >
>  **wrote:
> 
> 
>   How do you delete items? By ID or by query?
> 
> >
> > My guess is that one of two things is happening:
> > 1> your delete process is deleting too much data.
> > 2> your index process isn't indexing what you think.
> >
> > I'd add some logging to the SolrJ program to see what
> > it thinks is has deleted or added to the index and go from there.
> >
> > Best
> > Erick
> >
> > On Mon, Sep 24, 2012 at 6:55 AM, Kissue Kissue 
> > wrote:
> > > Hi,
> > >
> > > I am running Solr 3.5, using SolrJ and using
> StreamingUpdateSolrServer
> > to
> > > index and delete items from solr.
> > >
> > > I basically index items fro

Query filtering

2012-09-27 Thread Finotti Simone
Hello,
I'm doing this query to return top 10 facets within a given "context", 
specified via the fq parameter.

http://solr/core/select?fq=(...)&q=*:*&rows=0&facet.field=interesting_facet&facet.limit=10

Now, I should search for a term inside the context AND the previously 
identified top 10 facet values.

Is there a way to do this with a single query?

thank you in advance,
S


Regarding delta-import and full-import

2012-09-27 Thread darshan
Hi All,

Can anyone refer me few number blogs that explains both
imports in little bit more detail and with examples.

 

Thanks,

Darshan



Re: Regarding delta-import and full-import

2012-09-27 Thread Koji Sekiguchi

(12/09/27 22:45), darshan wrote:

Hi All,

 Can anyone refer me few number blogs that explains both
imports in little bit more detail and with examples.



Thanks,

Darshan




Asking Google, I got:

http://www.arunchinnachamy.com/apache-solr-mysql-data-import/
http://www.andornot.com/blog/post/Sample-Solr-DataImportHandler-for-XML-Files.aspx
http://pooteeweet.org/blog/1827

:

koji
--
http://soleami.com/blog/starting-lab-work.html


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Lee Carroll
Hi Erick

Our application has one  CommonsHttpSolrServer for each solr core used by
our web app. Whilst we have many web app clients
solr only has 1 client, our application. Does that make sense. This is why
sticky load balancing is an issue for us.

I cannot see any where the state is being handled in the
CommonsHttpSolrServer  impl ? It looks like the state is not being passed
by the client or am i missing something?

Cheers Lee c

On 27 September 2012 14:00, Erick Erickson  wrote:

> But again, why do you want to do this? I really think you don't.
>
> I'm assuming that when you say this:
> "...resulting in the same slave being hit for all requests."
>
> you mean "all requests _from the same client_". If that's
> not what's happening, then disregard my maundering
> because when it comes to setting up LBs, I'm clueless. But
> I can say that many installations have LBs set up with
> sticky sessions on a per-client basis..
>
> Consider another scenario; replication. If you have 2 slaves,
> each with a polling interval of 5 minutes note that they are
> not coordinated. So slave 1 can poll at 14:00:00. Slave 2
> at 14:01:00. Say there's been a commit at 14:00:30. Requests
> to slave 2 will have a different view of the index than slave 1,
> so if your user resends the exact same request, they may
> see different results. I could submit the request 5 times in a
> row and the results would not only be different each time, they
> would flip-flop back and forth.
>
> I wouldn't do this unless and until you have a demonstrated need.
>
> Best
> Erick
>
> On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll
>  wrote:
> > Hi Erick,
> >
> > the load balancer in front of the solr servers is dropping the cookie not
> > the solr server themselves.
> >
> > are you saying the clients http connection manager builds will ignore
> this
> > state ? it looks like they do not. It looks like the
> > client is passing the cookie back to the load balancer
> >
> > I want to configure the clients not to pass cookies basically.
> >
> > Does that make sense ?
> >
> >
> >
> > On 27 September 2012 12:54, Erick Erickson 
> wrote:
> >
> >> What client state? Solr servers are stateless, they don't
> >> keep any information specific to particular clients so this
> >> doesn't seem to be a problem.
> >>
> >> What Solr _does_ do is cache things like fq clauses, but
> >> these are not user-specific. Which actually argues for going
> >> to the same slave on the theory that requests from a
> >> user are more likely to have the same fq clauses. Consider
> >> faceting on shoes. The user clicks "mens" and you add an
> >> fq like &fq=gender:mens. Then the user wants dress shoes
> >> so you submit another query &fq=gender:mens&fq=style:dress.
> >> The first fq clause has already been calculated and cached so
> >> doesn't have to be re-calculated for the second query...
> >>
> >> But the stickiness is usually the way Solr is used, so this seems
> >> like a red herring.
> >>
> >> FWIW,
> >> Erick
> >>
> >> On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
> >>  wrote:
> >> > Hi
> >> >
> >> > We have the following solr http server
> >> >
> >> >  >> > id="solrserver" >
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> > 
> >> >
> >> > The issue we face is the f5 balancer is returning a cookie which the
> >> client
> >> > is hanging onto. resulting in the same slave being hit for all
> requests.
> >> >
> >> > one obvious solution is to config the load balancer to be non sticky
> >> > however politically a "non-standard" load balancer is timescale
> suicide.
> >> > (It is an out sourced corporate thing)
> >> >
> >> > I'm not keen to use the LB http solr server as i don't want this to
> be a
> >> > concern of the software and have a list of servers etc. (although as a
> >> stop
> >> > gap may well have to)
> >> >
> >> > My question is can I configure the solr server to ignore client state
> ?
> >> We
> >> > are on solr 3.4
> >> >
> >> > Thanks in advance lee c
> >>
>


Re: Query filtering

2012-09-27 Thread Amit Nithian
I think one way to do this is issue another query and set a bunch of
filter queries to restrict "interesting_facet" to just those ten
values returned in the first query.

fq=interesting_facet:1 OR interesting_facet:2 etc&q=context:

Does that help?
Amit

On Thu, Sep 27, 2012 at 6:33 AM, Finotti Simone  wrote:
> Hello,
> I'm doing this query to return top 10 facets within a given "context", 
> specified via the fq parameter.
>
> http://solr/core/select?fq=(...)&q=*:*&rows=0&facet.field=interesting_facet&facet.limit=10
>
> Now, I should search for a term inside the context AND the previously 
> identified top 10 facet values.
>
> Is there a way to do this with a single query?
>
> thank you in advance,
> S


Re: Solr Replication and Autocommit

2012-09-27 Thread Aleksey Vorona

Thank both of you for the responses!

-- Aleksey

On 12-09-27 03:51 AM, Erick Erickson wrote:

I'll echo Otis, nothing comes to mind...

Unless you were indexing stuff to the _slaves_, which you should
never do, now or in the past

Erick

On Thu, Sep 27, 2012 at 12:00 AM, Aleksey Vorona  wrote:

Hi,

I remember having some issues with replication and autocommit previously.
But now we are using Solr 3.6.1. Are there any known issues or any other
reasons to avoid autocommit while using replication? I guess not, just want
confirmation from someone confident and competent.

-- Aleksey




RE: SolrJ - IOException

2012-09-27 Thread balaji.gandhi
Thanks for your reply. SOLR Server is not stalled. Just the add fails with this 
exception.

Balaji Gandhi, Senior Software Developer, Horizontal Platform Services
Product Engineering  │  Apollo Group, Inc.
1225 W. Washington St.  |  AZ23  |  Tempe, AZ  85281
Phone: 602.713.2417  |  Email: 
balaji.gan...@apollogrp.edu

P Go Green. Don't Print. Moreover soft copies can be indexed by algorithms.

From: roz dev [via Lucene] [mailto:ml-node+s472066n4010037...@n3.nabble.com]
Sent: Monday, September 24, 2012 5:46 PM
To: Balaji Gandhi
Subject: Re: SolrJ - IOException

I have seen this happening

We retry and that works. Is your solr server stalled?

On Mon, Sep 24, 2012 at 4:50 PM, balaji.gandhi
<[hidden email]>wrote:

> Hi,
>
> I am encountering this error randomly (under load) when posting to Solr
> using SolrJ.
>
> Has anyone encountered a similar error?
>
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://localhost:8080/solr/profile at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
> at
>
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
> at
>
> org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
> at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at
> org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at
>
> Thanks,
> Balaji
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010037.html
To unsubscribe from SolrJ - IOException, click 
here.
NAML


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010795.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: SolrJ - IOException

2012-09-27 Thread balaji.gandhi
Here is the stack trace:-

org.apache.solr.client.solrj.SolrServerException: IOException occured when 
talking to server:
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:182)
 at 
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:122) at 
org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:107) at 
org.apache.solr.handler.dataimport.thread.task.SolrUploadTask.upload(SolrUploadTask.java:31)
 at 
org.apache.solr.handler.dataimport.thread.SolrUploader.run(SolrUploader.java:31)
 at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at 
java.lang.Thread.run(Unknown Source) Caused by: 
org.apache.http.NoHttpResponseException: The target server failed to respond at 
org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:101)
 at 
org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:252)
 at 
org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:282)
 at 
org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:247)
 at 
org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:216)
 at 
org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:298)
 at 
org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:125)
 at 
org.apache.http.impl.client.DefaultRequestDirector.tryExecute(DefaultRequestDirector.java:647)
 at 
org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:464)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:820)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:754)
 at 
org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:732)
 at 
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:353)
 ... 9 more

Balaji Gandhi, Senior Software Developer, Horizontal Platform Services
Product Engineering  │  Apollo Group, Inc.
1225 W. Washington St.  |  AZ23  |  Tempe, AZ  85281
Phone: 602.713.2417  |  Email: 
balaji.gan...@apollogrp.edu

P Go Green. Don't Print. Moreover soft copies can be indexed by algorithms.

From: Toke Eskildsen [via Lucene] 
[mailto:ml-node+s472066n4010082...@n3.nabble.com]
Sent: Tuesday, September 25, 2012 12:19 AM
To: Balaji Gandhi
Subject: Re: SolrJ - IOException

On Tue, 2012-09-25 at 01:50 +0200, balaji.gandhi wrote:
> I am encountering this error randomly (under load) when posting to Solr
> using SolrJ.
>
> Has anyone encountered a similar error?
>
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://localhost:8080/solr/profile at
> org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:414)
[...]

This looks suspiciously like a potential bug in the HTTP keep-alive flow
that we encountered some weeks ago. I am guessing that you are issuing
more than 100 separate updates/second. Could you please provide the full
stack trace?



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010082.html
To unsubscribe from SolrJ - IOException, click 
here.
NAML


This message is private and confidential. If you have received it in error, 
please notify the sender and remove it from your system.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-IOException-tp4010026p4010796.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to run Solr Cloud using Tomcat?

2012-09-27 Thread Benjamin, Roy
I've gone through the guide on running Solr Cloud using Jetty but it's not
practical to use JAVA_OPTS etc on real cloud deployments. I don't see how
to extend these instructions to running on Tomcat.

Has anyone run Solr Cloud under Tomcat successfully?  Did they document how?

Thanks

Roy


RE: How to run Solr Cloud using Tomcat?

2012-09-27 Thread Markus Jelsma
Hi - on Debian systems there's a /etc/default/tomcat properties file you can 
use to set your flags.
 
-Original message-
> From:Benjamin, Roy 
> Sent: Thu 27-Sep-2012 19:57
> To: solr-user@lucene.apache.org
> Subject: How to run Solr Cloud using Tomcat?
> 
> I've gone through the guide on running Solr Cloud using Jetty but it's not
> practical to use JAVA_OPTS etc on real cloud deployments. I don't see how
> to extend these instructions to running on Tomcat.
> 
> Has anyone run Solr Cloud under Tomcat successfully?  Did they document how?
> 
> Thanks
> 
> Roy
> 


Re: How to run Solr Cloud using Tomcat?

2012-09-27 Thread Vadim Kisselmann
Hi Roy,
jepp, it works with Tomcat 6 and an external Zookeeper.
I will publish a blogpost about it tomorrow on sentric.ch
My blogpost is ready, but i had no time to publish it in the last
couple of days:)
Best regards
Vadim



2012/9/27 Markus Jelsma :
> Hi - on Debian systems there's a /etc/default/tomcat properties file you can 
> use to set your flags.
>
> -Original message-
>> From:Benjamin, Roy 
>> Sent: Thu 27-Sep-2012 19:57
>> To: solr-user@lucene.apache.org
>> Subject: How to run Solr Cloud using Tomcat?
>>
>> I've gone through the guide on running Solr Cloud using Jetty but it's not
>> practical to use JAVA_OPTS etc on real cloud deployments. I don't see how
>> to extend these instructions to running on Tomcat.
>>
>> Has anyone run Solr Cloud under Tomcat successfully?  Did they document how?
>>
>> Thanks
>>
>> Roy
>>


Re: httpSolrServer and exyternal load balancer

2012-09-27 Thread Erick Erickson
Ahh, I finally think I get it. I was missing the connection being
the CommonsHttpSolrServer. That's the thing that's locking
on to a particular slave

I'm afraid I'm not up enough on the internals here to be much
help, so I'll have to defer

Erick.

On Thu, Sep 27, 2012 at 10:20 AM, Lee Carroll
 wrote:
> Hi Erick
>
> Our application has one  CommonsHttpSolrServer for each solr core used by
> our web app. Whilst we have many web app clients
> solr only has 1 client, our application. Does that make sense. This is why
> sticky load balancing is an issue for us.
>
> I cannot see any where the state is being handled in the
> CommonsHttpSolrServer  impl ? It looks like the state is not being passed
> by the client or am i missing something?
>
> Cheers Lee c
>
> On 27 September 2012 14:00, Erick Erickson  wrote:
>
>> But again, why do you want to do this? I really think you don't.
>>
>> I'm assuming that when you say this:
>> "...resulting in the same slave being hit for all requests."
>>
>> you mean "all requests _from the same client_". If that's
>> not what's happening, then disregard my maundering
>> because when it comes to setting up LBs, I'm clueless. But
>> I can say that many installations have LBs set up with
>> sticky sessions on a per-client basis..
>>
>> Consider another scenario; replication. If you have 2 slaves,
>> each with a polling interval of 5 minutes note that they are
>> not coordinated. So slave 1 can poll at 14:00:00. Slave 2
>> at 14:01:00. Say there's been a commit at 14:00:30. Requests
>> to slave 2 will have a different view of the index than slave 1,
>> so if your user resends the exact same request, they may
>> see different results. I could submit the request 5 times in a
>> row and the results would not only be different each time, they
>> would flip-flop back and forth.
>>
>> I wouldn't do this unless and until you have a demonstrated need.
>>
>> Best
>> Erick
>>
>> On Thu, Sep 27, 2012 at 8:07 AM, Lee Carroll
>>  wrote:
>> > Hi Erick,
>> >
>> > the load balancer in front of the solr servers is dropping the cookie not
>> > the solr server themselves.
>> >
>> > are you saying the clients http connection manager builds will ignore
>> this
>> > state ? it looks like they do not. It looks like the
>> > client is passing the cookie back to the load balancer
>> >
>> > I want to configure the clients not to pass cookies basically.
>> >
>> > Does that make sense ?
>> >
>> >
>> >
>> > On 27 September 2012 12:54, Erick Erickson 
>> wrote:
>> >
>> >> What client state? Solr servers are stateless, they don't
>> >> keep any information specific to particular clients so this
>> >> doesn't seem to be a problem.
>> >>
>> >> What Solr _does_ do is cache things like fq clauses, but
>> >> these are not user-specific. Which actually argues for going
>> >> to the same slave on the theory that requests from a
>> >> user are more likely to have the same fq clauses. Consider
>> >> faceting on shoes. The user clicks "mens" and you add an
>> >> fq like &fq=gender:mens. Then the user wants dress shoes
>> >> so you submit another query &fq=gender:mens&fq=style:dress.
>> >> The first fq clause has already been calculated and cached so
>> >> doesn't have to be re-calculated for the second query...
>> >>
>> >> But the stickiness is usually the way Solr is used, so this seems
>> >> like a red herring.
>> >>
>> >> FWIW,
>> >> Erick
>> >>
>> >> On Thu, Sep 27, 2012 at 7:06 AM, Lee Carroll
>> >>  wrote:
>> >> > Hi
>> >> >
>> >> > We have the following solr http server
>> >> >
>> >> > > >> > id="solrserver" >
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> >
>> >> > The issue we face is the f5 balancer is returning a cookie which the
>> >> client
>> >> > is hanging onto. resulting in the same slave being hit for all
>> requests.
>> >> >
>> >> > one obvious solution is to config the load balancer to be non sticky
>> >> > however politically a "non-standard" load balancer is timescale
>> suicide.
>> >> > (It is an out sourced corporate thing)
>> >> >
>> >> > I'm not keen to use the LB http solr server as i don't want this to
>> be a
>> >> > concern of the software and have a list of servers etc. (although as a
>> >> stop
>> >> > gap may well have to)
>> >> >
>> >> > My question is can I configure the solr server to ignore client state
>> ?
>> >> We
>> >> > are on solr 3.4
>> >> >
>> >> > Thanks in advance lee c
>> >>
>>


Re: Change config to use port 8080 instead of port 8983

2012-09-27 Thread Sami Siren
i just tried this with tomcat and the props work for me. Did you wipe
out your zoo_data before starting with the additional system
properties?

here's how i ran it:

JAVA_OPTS="-DzkRun -DnumShards=1 -Djetty.port=8080
-Dbootstrap_conf=true -Dhost=127.0.0.1" bin/catalina.sh run

--
 Sami Siren



On Thu, Sep 27, 2012 at 9:47 PM, JesseBuesking  wrote:
> I've set the JAVA_OPTS you mentioned (Djetty.port and Dhost), but zookeeper
> still says that the node runs on port 8983 (clusterstate.json is the same).
>
> Would you happen to have any other suggestions that I could try?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Change-config-to-use-port-8080-instead-of-port-8983-tp4010663p4010805.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: 4.0.snapshot to 4.0.beta index migration

2012-09-27 Thread vybe3142
Thanks, that's what we decided to do too.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/4-0-snapshot-to-4-0-beta-index-migration-tp4009247p4010828.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can SOLR Index UTF-16 Text

2012-09-27 Thread vybe3142
Our SOLR setup  (4.0.BETA on Tomcat 6) works as expected when indexing UTF-8
files. Recently, however, we noticed that it has issues with indexing
certain text files eg. UTF-16 files.  See attachment for an example
(tarred+zipped)

tesla-utf16.txt
  

Looking at the "text" terms, I see 35 terms ie, (1,2,3,...9,0,a,b,c,.z)
!! . A UTF-8 version of this file indexes fine.

Here's what the index analyzer looks like


Are UTF-16 text files supported? Any thoughts ?

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-SOLR-Index-UTF-16-Text-tp4010834.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Lance Norskog
These are very large files and this is not enough memory. Do you upload these 
as files? 

If the CSV file is one document per line, you can split it up. Unix has a 
'split' command which does this very nicely. 

- Original Message -
| From: "Shigeki Kobayashi" 
| To: solr-user@lucene.apache.org
| Sent: Thursday, September 27, 2012 2:22:06 AM
| Subject: ExtractingRequestHandler causes Out of Memory Error
| 
| Hi guys,
| 
| 
| I use Manifold CF to crawl files in Windows file server and index
| them to
| Solr using Extracting Request Handler.
| Most of the documents are succesfully indexed but some are failed and
| Out
| of Memory Error occurs in Solr, so I need some advice.
| 
| Those failed files are not so big and they are a csv file of 240MB
| and a
| text file of 170MB.
| 
| Here is environment and machine spec:
| Solr 3.6 (also Solr4.0Beta)
| Tomcat 6.0
| CentOS 5.6
| java version 1.6.0_23
| HDD 60GB
| MEM 2GB
| JVM Heap: -Xmx1024m -Xms1024m
| 
| I feel there is enough memory that Solr should be able to extract and
| index
| file content.
| 
| Here is a Solr log below:
| --
| [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
| Java heap space
| at java.util.Arrays.copyOf(Arrays.java:2882)
| at
| java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
| at
| java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
| at java.lang.StringBuilder.append(StringBuilder.java:189)
| at
| 
org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| 
org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| 
org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
| at
| org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
| at
| org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
| at
| org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
| at
| org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
| at
| 
org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
| at
| org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
| at
| org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
| at
| org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
| at
| org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
| at
| 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
| at
| 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
| at
| 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
| at
| 
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
| at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
| at
| 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
| at
| 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
| at
| 
filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
| at
| 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
| at
| 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
| 
| -
| 
| Anyone has any ideas?
| 
| Regards,
| 
| Shigeki
| 


Re: ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Jan Høydahl
Please try to increase -Xmx and see how much RAM you need for it to succeed.

I believe it is simply a case where this particular file needs double memory 
(480Mb) to parse and you have only allocated 1Gb (which is not particularly 
much). Perhaps the code could be optimized to avoid the Arrays.copyOf() call..

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi 
:

> Hi guys,
> 
> 
> I use Manifold CF to crawl files in Windows file server and index them to
> Solr using Extracting Request Handler.
> Most of the documents are succesfully indexed but some are failed and Out
> of Memory Error occurs in Solr, so I need some advice.
> 
> Those failed files are not so big and they are a csv file of 240MB and a
> text file of 170MB.
> 
> Here is environment and machine spec:
> Solr 3.6 (also Solr4.0Beta)
> Tomcat 6.0
> CentOS 5.6
> java version 1.6.0_23
> HDD 60GB
> MEM 2GB
> JVM Heap: -Xmx1024m -Xms1024m
> 
> I feel there is enough memory that Solr should be able to extract and index
> file content.
> 
> Here is a Solr log below:
> --
> [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
> Java heap space
>at java.util.Arrays.copyOf(Arrays.java:2882)
>at
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
>at
> java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
>at java.lang.StringBuilder.append(StringBuilder.java:189)
>at
> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
>at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>at
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
>at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>at
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
>at
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
>at
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
>at
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
>at
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
>at
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
>at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>at
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
>at
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
>at
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
>at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
>at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
>at
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
>at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>at
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
>at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at
> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
>at
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
>at
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
>at
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> 
> -
> 
> Anyone has any ideas?
> 
> Regards,
> 
> Shigeki



Filter query not null or in list

2012-09-27 Thread Kiran J
Hi everyone,

I have a group field which restricts the permission for each user. A user
can belong to multiple groups. A document can belong to only Group (ie) non
multi valued. There are some documents which are unrestricted, hence group
id is null. How can I use the filter for a given user so that it includes
results from both Group=NULL and Group=(X or Y or Z) ? I try something like
this, but doesnt work:

-Group:[* TO *] OR Group:(X OR Y OR Z)

Note that the Group is a UUID field. Is it possible to assign a default
UUID value ?

Any help is much appreciated.

Thanks
Kiran


Re: Filter query not null or in list

2012-09-27 Thread Jack Krupansky

Add a "*:*" before the negative query.

(*:* -Group:[* TO *]) OR Group:(X OR Y OR Z)

-- Jack Krupansky

-Original Message- 
From: Kiran J 
Sent: Thursday, September 27, 2012 8:07 PM 
To: solr-user@lucene.apache.org 
Subject: Filter query not null or in list 


Hi everyone,

I have a group field which restricts the permission for each user. A user
can belong to multiple groups. A document can belong to only Group (ie) non
multi valued. There are some documents which are unrestricted, hence group
id is null. How can I use the filter for a given user so that it includes
results from both Group=NULL and Group=(X or Y or Z) ? I try something like
this, but doesnt work:

-Group:[* TO *] OR Group:(X OR Y OR Z)

Note that the Group is a UUID field. Is it possible to assign a default
UUID value ?

Any help is much appreciated.

Thanks
Kiran


RE: File content indexing

2012-09-27 Thread Zhang, Lisheng
Hi Erik,

I really meant to send this message earlier, I read code and tested,
your suggestion solved my problem, really appreciate!

Thanks very much for helps, Lisheng

-Original Message-
From: Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Tuesday, September 18, 2012 5:04 PM
To: solr-user@lucene.apache.org
Subject: Re: File content indexing


Solr Cell can already do this.  See the stream.file parameter and content steam 
info on the wiki. 

Erik

On Sep 18, 2012, at 19:56, "Zhang, Lisheng"  
wrote:

> Hi, 
> 
> Sorry I just sent out an unfinished message!
> 
> Reading Solr cell, we indexing a file by first upload it through HTTP to 
> solr, in my
> experience it is rather expensive to pass a big file through HTTP.
> 
> If the file is local, maybe the better way is to pass file path to solr so 
> that solr can
> use java.io API to get file content, maybe this can be much faster?
> 
> I am thinking to change solr a little to do, do you think this is a sensible 
> thing to 
> do (I know how to do, but not sure it can improve performance significantly)?
> 
> Thanks very much for helps, Lisheng


Re: ExtractingRequestHandler causes Out of Memory Error

2012-09-27 Thread Shigeki Kobayashi
Hi Jan.

Thank you very much for your advice.

So I understand Solr needs more memory to parse the files.
To parse a file of size x,  it needs double memory (2x). Then how much
memory allocation should be taken to heap size? 8x? 16x?

Regards,


Shigeki

2012/9/28 Jan Høydahl 

> Please try to increase -Xmx and see how much RAM you need for it to
> succeed.
>
> I believe it is simply a case where this particular file needs double
> memory (480Mb) to parse and you have only allocated 1Gb (which is not
> particularly much). Perhaps the code could be optimized to avoid the
> Arrays.copyOf() call..
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Solr Training - www.solrtraining.com
>
> 27. sep. 2012 kl. 11:22 skrev Shigeki Kobayashi <
> shigeki.kobayas...@g.softbank.co.jp>:
>
> > Hi guys,
> >
> >
> > I use Manifold CF to crawl files in Windows file server and index them to
> > Solr using Extracting Request Handler.
> > Most of the documents are succesfully indexed but some are failed and Out
> > of Memory Error occurs in Solr, so I need some advice.
> >
> > Those failed files are not so big and they are a csv file of 240MB and a
> > text file of 170MB.
> >
> > Here is environment and machine spec:
> > Solr 3.6 (also Solr4.0Beta)
> > Tomcat 6.0
> > CentOS 5.6
> > java version 1.6.0_23
> > HDD 60GB
> > MEM 2GB
> > JVM Heap: -Xmx1024m -Xms1024m
> >
> > I feel there is enough memory that Solr should be able to extract and
> index
> > file content.
> >
> > Here is a Solr log below:
> > --
> >
> [solr.servlet.SolrDispatchFilter]-[http-8080-8]-:java.lang.OutOfMemoryError:
> > Java heap space
> >at java.util.Arrays.copyOf(Arrays.java:2882)
> >at
> >
> java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
> >at
> > java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:515)
> >at java.lang.StringBuilder.append(StringBuilder.java:189)
> >at
> >
> org.apache.solr.handler.extraction.SolrContentHandler.characters(SolrContentHandler.java:293)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.SecureContentHandler.characters(SecureContentHandler.java:270)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.ContentHandlerDecorator.characters(ContentHandlerDecorator.java:146)
> >at
> >
> org.apache.tika.sax.SafeContentHandler.access$001(SafeContentHandler.java:46)
> >at
> >
> org.apache.tika.sax.SafeContentHandler$1.write(SafeContentHandler.java:82)
> >at
> >
> org.apache.tika.sax.SafeContentHandler.filter(SafeContentHandler.java:140)
> >at
> >
> org.apache.tika.sax.SafeContentHandler.characters(SafeContentHandler.java:287)
> >at
> >
> org.apache.tika.sax.XHTMLContentHandler.characters(XHTMLContentHandler.java:268)
> >at org.apache.tika.parser.txt.TXTParser.parse(TXTParser.java:134)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242)
> >at
> > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
> >at
> >
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:227)
> >at
> >
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:58)
> >at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> >at
> >
> org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:244)
> >at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
> >at
> >
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> filters.SetCharacterEncodingFilter.doFilter(SetCharacterEncodingFilter.java:122)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
> >at
> >
> org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
> >at
> >
> org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
> >
> > -
> >
> > Anyone has any ideas?
> >
> > Regards,
> >
> > Shigeki
>
>


Re: Getting the distribution information of scores from query

2012-09-27 Thread Amit Nithian
Thanks! That did the trick! Although it required some more work in the
component level of generating the same query key as the index searcher
else when you go to try and fetch scores for a cached query result, I
got a lot of NPE since the stats are computed in the collector level
which for me isn't set since the cache hit bypasses the lucene level.
I'll write up what I did and probably try and open source the work for
others to see. The stuff with PostFiltering is nice but needs some
examples and documentation.. hopefully mine will help the cause.

Thanks again
Amit

On Wed, Sep 26, 2012 at 5:13 AM, Mikhail Khludnev
 wrote:
> I suggest to create a component, put it after QueryComponent. in prepare it
> should add own PostFilter into list of request filters, your post filter
> will be able to inject own DelegatingCollector, then you can just add
> collected histogram into result named list
>  http://searchhub.org/dev/2012/02/10/advanced-filter-caching-in-solr/
>
> On Tue, Sep 25, 2012 at 10:03 PM, Amit Nithian  wrote:
>
>> We have a federated search product that issues multiple parallel
>> queries to solr cores and fetches the results and blends them. The
>> approach we were investigating was taking the scores, normalizing them
>> based on some distribution (normal distribution seems reasonable) and
>> use that "z score" as the way to blend the results (else you'll be
>> blending scores on different scales). To accomplish this, I was
>> looking to get the distribution of the scores for the query as an
>> analog to the "stats component" but seem to see the only way to
>> accomplish this would be to create a custom collector that would
>> accumulate and store this information (mean, std-dev etc) since the
>> stats component only operates on indexed fields.
>>
>> Is there an easy way to tell Solr to use a custom collector without
>> having to modify the SolrIndexSearcher class? Maybe is there an
>> alternative way to get this information?
>>
>> Thanks
>> Amit
>>
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Tech Lead
> Grid Dynamics
>
> 
>  


Regarding delta-import and full-import

2012-09-27 Thread darshan
Hi All,

Can anyone refer me few number blogs that explains both
imports in little bit more detail and with examples.

 

Thanks,

Darshan



Merge Policy Recommendation for 3.6.1

2012-09-27 Thread Sujatha Arun
Hello,

In the case where there are over 200+ cores on a single node , is it
recommended to go with Tiered MP with segment size of 4 ? Our Index size
vary from a few MB to 4 GB .

Will there be any issue with "Too many open files " and the number of
indexes with respect to MP ?  At the moment we are thinking of going with
Tiered MP ..

Os file limit has been set to maximum.

Regards
Sujatha