Configuring Tomcat 6 with Solr431 with multiple cores

2013-07-18 Thread PeterKerk

Thanks to Sandeep in this post:
http://lucene.472066.n3.nabble.com/HTTP-Status-503-Server-is-shutting-down-td4065958.html#a4078567
I was able to setup Tomcat 6 with Solr 431.

However, I need a multicore implementation and am now stuck on how to do so.
Here is what I did based on Sandeeps recommended steps so far and what I
need:

1. Extract solr431 package. In my case I did in "E:\solr-4.3.1\example\solr" 
Peter's path: C:\Dropbox\Databases\solr-4.3.1\example\solr
2. Now copied solr dir from extracted package (E:\solr-4.3.1\example\solr)
into TOMCAT_HOME dir. In my case TOMCAT_HOME dir is pointed to
E:\Apache\Tomcat 6.0. 
3. I can refer now SOLR_HOME as " E:\Apache\Tomcat 6.0\solr"  (please
remember this) 
Peter's path: C:\Program Files\Apache Software Foundation\Tomcat 
6.0\solr
4. Copy the solr.war file from extracted package to SOLR HOME dir i.e
E:\Apache\Tomcat 6.0\solr. This is required to create the context. As I
donot want to pass this as JAVA OPTS 
5. Create solr1.xml file into TOMCAT_HOME\conf\Catalina\localhost (I gave
file name as solr1.xml ) 


  

 

6.  Also copy solr.war file into TOMCAT_HOME\webapps for deployment purpose 
7.  If you start tomcat you will get errors as mentioned by Shawn.  S0 you
need to copy all the 5 jar files from solr extracted package (
E:\solr-4.3.1\example\lib\ext ) to TOMCAT_HOME\lib dir.(jul-to-slf4j-1.6.6,
jcl-over-slf4j-1.6.6, slf4j-log4j12-1.6.6, slf4j-api-1.6.6,log4j-1.2.16) 
8. Also copy the log4js.properties file  from
E:\solr-4.3.1\example\resources dir to TOMCAT_HOME\lib dir. 
9. Now if you start the tomcat you wont having any problem. 

So far Sandeeps steps.

I can now approach http://localhost:8080/solr-4.3.1/#/ 

Now, what I will be requiring after I've completed the basic setup of
Tomcat6 and Solr431 I want to migrate my Solr350 (now running on Cygwin)
cores to that environment. 

C:\Dropbox\Databases\apache-solr-3.5.0\example\example-DIH\solr\tt 
C:\Dropbox\Databases\apache-solr-3.5.0\example\example-DIH\solr\shop 
C:\Dropbox\Databases\apache-solr-3.5.0\example\example-DIH\solr\homes 

Where do I need to copy the above cores for this all to work? To C:\Program
Files\Apache Software Foundation\Tomcat 6.0\solr? 
And how can I then approach the data-import handler? I now do this like so: 
http://localhost:8983/solr/tt/dataimport?command=full-import


Thanks!




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Configuring-Tomcat-6-with-Solr431-with-multiple-cores-tp4078778.html
Sent from the Solr - User mailing list archive at Nabble.com.


Inconsistent solrcloud search

2013-07-18 Thread Vladimir Poroshin

Hi,

I have a strange behavior while searching my solrcloud cluster:
for a query like this
http://localhost/solr/my_collection/select?q="my+query"; 


solr responses sometimes with one document and sometimes with no documents.
This found document is located at shard8, so if I query with 
&shards=shard8 then I always get this document,
but if I query with &shards=shard8,shard1 then about 50% of my requests 
return no documents at all.

I tried it with solr 4.3.0 and also with 4.3.1.
My cluster has 8 shards with 8 replicas with about 100M docs and default 
(compositeId) document routing.




boost docs if token matches happen in the first 5 words

2013-07-18 Thread Anatoli Matuskova
I've a set of documents with a WhiteSpaceTokenize field. I want to give more
boost when the match of the query happens in the first 3 token positions of
the field. Is there any way to do that (don't want to use payloads as they
mean on more seek to disk so lower performance)
 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Markus Jelsma
You must implement a SpanFirst query yourself. These are not implemented in any 
Solr query parser. You can easily expand the (e)dismax parsers and add support 
for it.
 
-Original message-
> From:Anatoli Matuskova 
> Sent: Thursday 18th July 2013 11:54
> To: solr-user@lucene.apache.org
> Subject: boost docs if token matches happen in the first 5 words
> 
> I've a set of documents with a WhiteSpaceTokenize field. I want to give more
> boost when the match of the query happens in the first 3 token positions of
> the field. Is there any way to do that (don't want to use payloads as they
> mean on more seek to disk so lower performance)
>  
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
Hi all

I am using a Custom RequestHandlerBase where I am querying from multiple
different Solr instance and aggregating their output as a XML Document
using DOM,
now in the RequestHandler's function handleRequestBody(SolrQueryRequest
req, SolrQueryResponse resp) I want to output this XML Document to the user
as a response, but if I write it as a Document or Node by

For Document
response.add("grouped", domResult);
or

response.add("grouped", domNode);

its writing to the user

For Document
com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null]
or
For Node
com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null]


Even when the Document is present, because when I convert the Document to
String its coming perfectly, but I don't want it as a String rather I want
it in a XML format.

Please this is very urgent, has anybody worked on this!

Regards
Vineet


RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Anatoli Matuskova
Thanks for the quick answer Markus.
Could you give me a a guideline or point me where to check in the solr
source code to see how to get it done?




--
View this message in context: 
http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786p4078792.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
This isn't a Solr issue. Maybe ask on the xerces list?


On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra wrote:

> Hi all
>
> I am using a Custom RequestHandlerBase where I am querying from multiple
> different Solr instance and aggregating their output as a XML Document
> using DOM,
> now in the RequestHandler's function handleRequestBody(SolrQueryRequest
> req, SolrQueryResponse resp) I want to output this XML Document to the user
> as a response, but if I write it as a Document or Node by
>
> For Document
> response.add("grouped", domResult);
> or
>
> response.add("grouped", domNode);
>
> its writing to the user
>
> For Document
> com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null]
> or
> For Node
> com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null]
>
>
> Even when the Document is present, because when I convert the Document to
> String its coming perfectly, but I don't want it as a String rather I want
> it in a XML format.
>
> Please this is very urgent, has anybody worked on this!
>
> Regards
> Vineet
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
Thanks for your response Shalin,
so does that mean that we can't return a XML object in SolrQueryResponse
through Custom RequestHandler?


On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> This isn't a Solr issue. Maybe ask on the xerces list?
>
>
> On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra  >wrote:
>
> > Hi all
> >
> > I am using a Custom RequestHandlerBase where I am querying from multiple
> > different Solr instance and aggregating their output as a XML Document
> > using DOM,
> > now in the RequestHandler's function handleRequestBody(SolrQueryRequest
> > req, SolrQueryResponse resp) I want to output this XML Document to the
> user
> > as a response, but if I write it as a Document or Node by
> >
> > For Document
> > response.add("grouped", domResult);
> > or
> >
> > response.add("grouped", domNode);
> >
> > its writing to the user
> >
> > For Document
> > com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null]
> > or
> > For Node
> > com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null]
> >
> >
> > Even when the Document is present, because when I convert the Document to
> > String its coming perfectly, but I don't want it as a String rather I
> want
> > it in a XML format.
> >
> > Please this is very urgent, has anybody worked on this!
> >
> > Regards
> > Vineet
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


RE: boost docs if token matches happen in the first 5 words

2013-07-18 Thread Markus Jelsma

You'll need the import org.apache.lucene.search.spans package in Solr's 
ExtendedDismaxQParserPlugin and add SpanFirstQuery's to the main query. 
Something like:
query.add(new SpanFirstQuery(new SpanTermQuery(field, clause), distance), 
BooleanClause.Occur.SHOULD);

 
-Original message-
> From:Anatoli Matuskova 
> Sent: Thursday 18th July 2013 12:33
> To: solr-user@lucene.apache.org
> Subject: RE: boost docs if token matches happen in the first 5 words
> 
> Thanks for the quick answer Markus.
> Could you give me a a guideline or point me where to check in the solr
> source code to see how to get it done?
> 
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/boost-docs-if-token-matches-happen-in-the-first-5-words-tp4078786p4078792.html
> Sent from the Solr - User mailing list archive at Nabble.com.
> 


Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
Solr's response writers support only a few known types. Look at the
writeVal method in TextResponseWriter:

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java


On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra wrote:

> Thanks for your response Shalin,
> so does that mean that we can't return a XML object in SolrQueryResponse
> through Custom RequestHandler?
>
>
> On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > This isn't a Solr issue. Maybe ask on the xerces list?
> >
> >
> > On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra  > >wrote:
> >
> > > Hi all
> > >
> > > I am using a Custom RequestHandlerBase where I am querying from
> multiple
> > > different Solr instance and aggregating their output as a XML Document
> > > using DOM,
> > > now in the RequestHandler's function handleRequestBody(SolrQueryRequest
> > > req, SolrQueryResponse resp) I want to output this XML Document to the
> > user
> > > as a response, but if I write it as a Document or Node by
> > >
> > > For Document
> > > response.add("grouped", domResult);
> > > or
> > >
> > > response.add("grouped", domNode);
> > >
> > > its writing to the user
> > >
> > > For Document
> > > com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null]
> > > or
> > > For Node
> > > com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null]
> > >
> > >
> > > Even when the Document is present, because when I convert the Document
> to
> > > String its coming perfectly, but I don't want it as a String rather I
> > want
> > > it in a XML format.
> > >
> > > Please this is very urgent, has anybody worked on this!
> > >
> > > Regards
> > > Vineet
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: and performance

2013-07-18 Thread Aditya
Hi

It totally depends upon your affordability. If you could afford go for
bigger RAM, SSD drive and 64 Bit OS.

Benchmark your application, with certain set of docs, how much RAM it
takes, Indexing time, Search time etc. Increase the document count and
perform benchmarking tasks again. This will provide more information.
Everything is directly proportional to number of docs.

In my case, I have basic hosting plan and i am happy with the performance.
My point is you don't always need fancy hardware. Start with basic and
based on the need you could change the plan.

Regards
Aditya
www.findbestopensource.com





On Wed, Jul 17, 2013 at 4:55 PM, Ayman Plaha  wrote:

> Thanks Aditya, can I also please get some advice on hosting.
>
>- What *hosting specs* should I get ? How much RAM ? Considering my
>- client application is very simple that just register users to database
>and queries SOLR and displays SOLR results.
>- simple batch program adds the 1000 OR 2000 documents to SOLR every
>second.
>
> I'm hoping to deploy the code next week, if you guys can give me any other
> advice I'd really appreciate that.
>
>
> On Wed, Jul 17, 2013 at 7:07 PM, Aditya  >wrote:
>
> > Hi
> >
> > It will not affect the performance. We are doing this  regularly. If you
> do
> > optimize and search then there may be some impact.
> >
> > Regards
> > Aditya
> > www.findbestopensource.com
> >
> >
> >
> > On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha 
> > wrote:
> >
> > > Hey Guys,
> > >
> > > I've finally finished my Spring Java application that uses SOLR for
> > > searches and just had performance related question about SOLR. I'm
> > indexing
> > > exactly 1000 *OR* 2000 records every second. Every record having 13
> > fields
> > > including 'id'. Majority of the fields are solr.StrField (no filters)
> > with
> > > characters ranging from 5 - 50 in length and one field which is text_t
> > > (solr.TextField) which can be of length 100 characters to 2000
> characters
> > > and has the following tokenizer and filters
> > >
> > >- PatternTokenizerFactory
> > >- LowerCaseFilterFactory
> > >- SynonymFilterFactory
> > >- SnowballPorterFilterFactory.
> > >
> > >
> > > I'm not using shards. I was hoping when searches get slow I will
> consider
> > > this or should I consider this now ?
> > >
> > > *Questions:*
> > >
> > >- I'm using SOLR autoCommit (every 15 minutes) with openSearcher set
> > as
> > >true. I'm not using autoSoftCommit because instant availability of
> the
> > >documents for search is not necessary and I don't want to chew up
> too
> > > much
> > >memory because I'm consider Cloud hosting.
> > >*
> > >**90
> > >**true
> > >**
> > >*will this effect the query performance of the client website if the
> > >index grew to 10 million records ? I mean while the commit is
> > happening
> > >does that *effect the performance of queries* and how will this
> effect
> > >the queries if the index grew to 10 million records ?
> > >- What *hosting specs* should I get ? How much RAM ? Considering my
> > >- client application is very simple that just register users to
> > database
> > >and queries SOLR and displays SOLR results.
> > >- simple batch program adds the 1000 OR 2000 documents to SOLR every
> > >second.
> > >
> > >
> > > I'm hoping to deploy the code next week, if you guys can give me any
> > other
> > > advice I'd really appreciate that.
> > >
> > > Thanks
> > > Ayman
> > >
> >
>


Re: and performance

2013-07-18 Thread Ayman Plaha
Thanks Shawn and Aditya. Really appreciate your help. Based on your advice
and reading the SolrPerformance article Shawn linked me to, I ended up
getting Intel Dual Core (2 Core) i3 3220 3.3Ghz with 36GB RAM with 2 x
125GB SSD drives for 227$ per month. It's still expensive for me but I got
it anyway because a very basic dedicated host in Australia is for 150$ per
month. VPS in Australia don't offer more then 2GB. I hope I made the right
decision. What do you guys think ?

Thanks
Ayman



On Thu, Jul 18, 2013 at 9:07 PM, Aditya wrote:

> Hi
>
> It totally depends upon your affordability. If you could afford go for
> bigger RAM, SSD drive and 64 Bit OS.
>
> Benchmark your application, with certain set of docs, how much RAM it
> takes, Indexing time, Search time etc. Increase the document count and
> perform benchmarking tasks again. This will provide more information.
> Everything is directly proportional to number of docs.
>
> In my case, I have basic hosting plan and i am happy with the performance.
> My point is you don't always need fancy hardware. Start with basic and
> based on the need you could change the plan.
>
> Regards
> Aditya
> www.findbestopensource.com
>
>
>
>
>
> On Wed, Jul 17, 2013 at 4:55 PM, Ayman Plaha  wrote:
>
> > Thanks Aditya, can I also please get some advice on hosting.
> >
> >- What *hosting specs* should I get ? How much RAM ? Considering my
> >- client application is very simple that just register users to
> database
> >and queries SOLR and displays SOLR results.
> >- simple batch program adds the 1000 OR 2000 documents to SOLR every
> >second.
> >
> > I'm hoping to deploy the code next week, if you guys can give me any
> other
> > advice I'd really appreciate that.
> >
> >
> > On Wed, Jul 17, 2013 at 7:07 PM, Aditya  > >wrote:
> >
> > > Hi
> > >
> > > It will not affect the performance. We are doing this  regularly. If
> you
> > do
> > > optimize and search then there may be some impact.
> > >
> > > Regards
> > > Aditya
> > > www.findbestopensource.com
> > >
> > >
> > >
> > > On Wed, Jul 17, 2013 at 12:52 PM, Ayman Plaha 
> > > wrote:
> > >
> > > > Hey Guys,
> > > >
> > > > I've finally finished my Spring Java application that uses SOLR for
> > > > searches and just had performance related question about SOLR. I'm
> > > indexing
> > > > exactly 1000 *OR* 2000 records every second. Every record having 13
> > > fields
> > > > including 'id'. Majority of the fields are solr.StrField (no filters)
> > > with
> > > > characters ranging from 5 - 50 in length and one field which is
> text_t
> > > > (solr.TextField) which can be of length 100 characters to 2000
> > characters
> > > > and has the following tokenizer and filters
> > > >
> > > >- PatternTokenizerFactory
> > > >- LowerCaseFilterFactory
> > > >- SynonymFilterFactory
> > > >- SnowballPorterFilterFactory.
> > > >
> > > >
> > > > I'm not using shards. I was hoping when searches get slow I will
> > consider
> > > > this or should I consider this now ?
> > > >
> > > > *Questions:*
> > > >
> > > >- I'm using SOLR autoCommit (every 15 minutes) with openSearcher
> set
> > > as
> > > >true. I'm not using autoSoftCommit because instant availability of
> > the
> > > >documents for search is not necessary and I don't want to chew up
> > too
> > > > much
> > > >memory because I'm consider Cloud hosting.
> > > >*
> > > >**90
> > > >**true
> > > >**
> > > >*will this effect the query performance of the client website if
> the
> > > >index grew to 10 million records ? I mean while the commit is
> > > happening
> > > >does that *effect the performance of queries* and how will this
> > effect
> > > >the queries if the index grew to 10 million records ?
> > > >- What *hosting specs* should I get ? How much RAM ? Considering
> my
> > > >- client application is very simple that just register users to
> > > database
> > > >and queries SOLR and displays SOLR results.
> > > >- simple batch program adds the 1000 OR 2000 documents to SOLR
> every
> > > >second.
> > > >
> > > >
> > > > I'm hoping to deploy the code next week, if you guys can give me any
> > > other
> > > > advice I'd really appreciate that.
> > > >
> > > > Thanks
> > > > Ayman
> > > >
> > >
> >
>


Re: Doc's FunctionQuery result field in my custom SearchComponent class ?

2013-07-18 Thread Jack Krupansky
As detailed in previous email, "termfreq" is not a "field" - it is a 
"transformer" or function. Technically, it is actually a "ValueSource".


If you look at the TextResponseWriter.writeVal method you can see you it 
kicks off the execution of transformers for writing documents.


-- Jack Krupansky

-Original Message- 
From: Tony Mullins

Sent: Thursday, July 18, 2013 2:49 AM
To: solr-user@lucene.apache.org
Subject: Re: Doc's FunctionQuery result field in my custom SearchComponent 
class ?


Eric ,
In freq:termfreq(product,'spider') , freq is alias for 'termfreq' function
query so I could have that field with name 'freq' in document response.
this is my code which I am using to get document object and there is no
termfreq field in its fields collection.

DocList docs = rb.getResults().docList;
   DocIterator iterator = docs.iterator();
   int sumFreq = 0;
   String id = null;

   for (int i = 0; i < docs.size(); i++) {
   try {
   int docId = iterator.nextDoc();

  // Document doc = searcher.doc(docId, fieldSet);
   Document doc = searcher.doc(docId);

Thanks,
Tony


On Wed, Jul 17, 2013 at 5:30 PM, Erick Erickson 
wrote:



Where are you getting the syntax
freq:termfreq(product,'spider')
? Try just

termfreq(product,'spider')
you'll get an element in the doc labeled 'termfreq', at least
I do.

Best
Erick

On Tue, Jul 16, 2013 at 1:03 PM, Tony Mullins 
wrote:
> OK, So thats why I cannot see the FunctionQuery fields in my
> SearchComponent class.
> So then question would be how can I apply my custom processing/logic to
> these FunctionQuery ? Whats the ExtensionPoint in Solr for such
scenarios ?
>
> Basically I want to call termfreq() for each document and then apply the
> sum to all doc's termfreq() results and show in one aggregated TermFreq
> field in my query response.
>
> Thanks.
> Tony
>
>
>
> On Tue, Jul 16, 2013 at 6:01 PM, Jack Krupansky wrote:
>
>> Basically, the evaluation of function queries in the "fl" parameter
occurs
>> when the response writer is composing the document results. That's 
>> AFTER

>> all of the search components are done.
>>
>> SolrReturnFields.**getTransformer() gets the DocTransformer, which is
>> really a DocTransformers, and then a call to
DocTransformers.transform() in
>> each response writer will evaluate the embedded function queries and
insert
>> their values in the results as they are being written.
>>
>> -- Jack Krupansky
>>
>> -Original Message- From: Tony Mullins
>> Sent: Tuesday, July 16, 2013 1:37 AM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Doc's FunctionQuery result field in my custom
SearchComponent
>> class ?
>>
>>
>> No sorry, I am still not getting the termfreq() field in my 'doc'
object.
>> I do get the _version_ field in my 'doc' object which I think is
>> realValue=StoredField.
>>
>> At which point termfreq() or any other FunctionQuery field becomes the
part
>> of doc object in Solr ? And at that point can I perform some custom
logic
>> and append the response ?
>>
>> Thanks.
>> Tony
>>
>>
>>
>>
>>
>> On Tue, Jul 16, 2013 at 1:34 AM, Patanachai Tangchaisin <
>> patanachai.tangchaisin@**wizecommerce.com<
patanachai.tangchai...@wizecommerce.com>>
>> wrote:
>>
>>  Hi,
>>>
>>> I think the process of retrieving a stored field (through fl) is
happens
>>> after SearchComponent.
>>>
>>> One solution: If you wrap a q params with function your score will be 
>>> a

>>> result of the function.
>>> For example,
>>>
>>> http://localhost:8080/solr/collection2/demoendpoint?q=**<
http://localhost:8080/solr/**collection2/demoendpoint?q=**>
>>>
termfreq%28product,%27spider%27%29&wt=xml&indent=true&fl=***,**score<
>>> http://localhost:**8080/solr/collection2/**demoendpoint?q=termfreq%**
>>> 28product,%27spider%27%29&wt=**xml&indent=true&fl=*,score<
http://localhost:8080/solr/collection2/demoendpoint?q=termfreq%28product,%27spider%27%29&wt=xml&indent=true&fl=*,score
>
>>> >
>>>
>>>
>>>
>>> Now your score is going to be a result of termfreq(product,'spider')
>>>
>>>
>>> --
>>> Patanachai Tangchaisin
>>>
>>>
>>>
>>> On 07/15/2013 12:01 PM, Tony Mullins wrote:
>>>
>>>  any help plz !!!


 On Mon, Jul 15, 2013 at 4:13 PM, Tony Mullins <
tonymullins...@gmail.com
 >*
 *wrote:


  Please any help on how to get the value of 'freq' field in my custom

> SearchComponent ?
>
>
> http://localhost:8080/solr/collection2/demoendpoint?q=**<
http://localhost:8080/solr/**collection2/demoendpoint?q=**>
> spider&wt=xml&indent=true&fl=*,freq:termfreq%28product,%**
> 27spider%27%29 collection2/demoendpoint?q=**spider&wt=xml&indent=true&fl=***
> ,freq:termfreq%28product,%**27spider%27%29<
http://localhost:8080/solr/collection2/demoendpoint?q=spider&wt=xml&indent=true&fl=*,freq:termfreq%28product,%27spider%27%29
>
> >
>
>
> 11Video Games name="format">xbox 360The Amazing
> 

Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
But it seems it even have something called  XML ResponseWriter

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/XMLResponseWriter.java

Wont it be appropriate in my case?
Although I have not implemented it yet but how come there couldn't be any
way to make a SolrQueryResponse in XML format!


On Thu, Jul 18, 2013 at 4:36 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Solr's response writers support only a few known types. Look at the
> writeVal method in TextResponseWriter:
>
>
> https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java
>
>
> On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra  >wrote:
>
> > Thanks for your response Shalin,
> > so does that mean that we can't return a XML object in SolrQueryResponse
> > through Custom RequestHandler?
> >
> >
> > On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > This isn't a Solr issue. Maybe ask on the xerces list?
> > >
> > >
> > > On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra  > > >wrote:
> > >
> > > > Hi all
> > > >
> > > > I am using a Custom RequestHandlerBase where I am querying from
> > multiple
> > > > different Solr instance and aggregating their output as a XML
> Document
> > > > using DOM,
> > > > now in the RequestHandler's function
> handleRequestBody(SolrQueryRequest
> > > > req, SolrQueryResponse resp) I want to output this XML Document to
> the
> > > user
> > > > as a response, but if I write it as a Document or Node by
> > > >
> > > > For Document
> > > > response.add("grouped", domResult);
> > > > or
> > > >
> > > > response.add("grouped", domNode);
> > > >
> > > > its writing to the user
> > > >
> > > > For Document
> > > > com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: null]
> > > > or
> > > > For Node
> > > > com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null]
> > > >
> > > >
> > > > Even when the Document is present, because when I convert the
> Document
> > to
> > > > String its coming perfectly, but I don't want it as a String rather I
> > > want
> > > > it in a XML format.
> > > >
> > > > Please this is very urgent, has anybody worked on this!
> > > >
> > > > Regards
> > > > Vineet
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Jack Krupansky

It would probably be better to integrate the responses (document lists.)

Solr response writers do a lot of special processing of the response data, 
so you can't just throw random objects into the response.


You may need to explain your use case a little more clearly.

-- Jack Krupansky

-Original Message- 
From: Vineet Mishra

Sent: Thursday, July 18, 2013 8:41 AM
To: solr-user@lucene.apache.org
Subject: Re: Custom RequestHandlerBase XML Response Issue

But it seems it even have something called  XML ResponseWriter

https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/XMLResponseWriter.java

Wont it be appropriate in my case?
Although I have not implemented it yet but how come there couldn't be any
way to make a SolrQueryResponse in XML format!


On Thu, Jul 18, 2013 at 4:36 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:


Solr's response writers support only a few known types. Look at the
writeVal method in TextResponseWriter:


https://svn.apache.org/repos/asf/lucene/dev/trunk/solr/core/src/java/org/apache/solr/response/TextResponseWriter.java


On Thu, Jul 18, 2013 at 4:08 PM, Vineet Mishra wrote:

> Thanks for your response Shalin,
> so does that mean that we can't return a XML object in SolrQueryResponse
> through Custom RequestHandler?
>
>
> On Thu, Jul 18, 2013 at 4:04 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > This isn't a Solr issue. Maybe ask on the xerces list?
> >
> >
> > On Thu, Jul 18, 2013 at 3:31 PM, Vineet Mishra  > >wrote:
> >
> > > Hi all
> > >
> > > I am using a Custom RequestHandlerBase where I am querying from
> multiple
> > > different Solr instance and aggregating their output as a XML
Document
> > > using DOM,
> > > now in the RequestHandler's function
handleRequestBody(SolrQueryRequest
> > > req, SolrQueryResponse resp) I want to output this XML Document to
the
> > user
> > > as a response, but if I write it as a Document or Node by
> > >
> > > For Document
> > > response.add("grouped", domResult);
> > > or
> > >
> > > response.add("grouped", domNode);
> > >
> > > its writing to the user
> > >
> > > For Document
> > > com.sun.org.apache.xerces.internal.dom.DocumentImpl:[#document: 
> > > null]

> > > or
> > > For Node
> > > com.sun.org.apache.xerces.internal.dom.ElementImpl:[arr: null]
> > >
> > >
> > > Even when the Document is present, because when I convert the
Document
> to
> > > String its coming perfectly, but I don't want it as a String rather 
> > > I

> > want
> > > it in a XML format.
> > >
> > > Please this is very urgent, has anybody worked on this!
> > >
> > > Regards
> > > Vineet
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



--
Regards,
Shalin Shekhar Mangar.





Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
So does that mean there is no way that we can write a XML or JSON object to
the SolrQueryResponse and expect it to be formatted?


Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
Okay, let me explain. If you construct your combined response (why are you
doing that again?) in the form a Solr NamedList or SolrDocumentList then
the XMLResponseWriter (which btw uses TextResponseWriter) has no problem
writing it down as XML. The problem here is that you are giving it an
object (a DOM Document?) which it doesn't know how to serialize so it just
calls .toString on it and writes it out.

As long as you stick a known type into the SolrQueryResponse, you should be
fine.


On Thu, Jul 18, 2013 at 6:24 PM, Vineet Mishra wrote:

> So does that mean there is no way that we can write a XML or JSON object to
> the SolrQueryResponse and expect it to be formatted?
>



-- 
Regards,
Shalin Shekhar Mangar.


Sort by document similarity counts

2013-07-18 Thread zygis
Hi,

Is it possible to sort search results based on the count of similar documents a 
document has? Say we have a document A which has 4 other similar documents in 
the index and document B which has 10. Then the order solr returns them should 
be B, A. Sorting on moreLikeThis counts for each document would be an example 
of this (in my case I use ngram similarity detection from Tika).

I have tried doing this via custom SearchComponent, where I can find all 
similar documents for each document in current search result, then add a new 
field into document hoping to use sort parameter (q=*&sort=similarityCount). 
But this will not work because sort is done before handling my custom search 
component, if added via last-components. Can't add it via first-components, 
because then I will have no access to query results. And I do not want to 
override QueryComponent because I need to have all the functionality it covers: 
grouping, facets, etc.

Thanks


Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Vineet Mishra
My case is like, I have got a few Solr Instances and querying them and
getting their xml response, out of that xml I have to extract a group of
specific xml nodes, later I am combining other solr's response into a
single xml and making a DOM document out of it.

So as you mentioned in your last mail, how can I prepare a combined
response for this xml doc and even if I do I don't think it would work
because the same I am doing in the RequstHandler.





On Thu, Jul 18, 2013 at 6:30 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Okay, let me explain. If you construct your combined response (why are you
> doing that again?) in the form a Solr NamedList or SolrDocumentList then
> the XMLResponseWriter (which btw uses TextResponseWriter) has no problem
> writing it down as XML. The problem here is that you are giving it an
> object (a DOM Document?) which it doesn't know how to serialize so it just
> calls .toString on it and writes it out.
>
> As long as you stick a known type into the SolrQueryResponse, you should be
> fine.
>
>
> On Thu, Jul 18, 2013 at 6:24 PM, Vineet Mishra  >wrote:
>
> > So does that mean there is no way that we can write a XML or JSON object
> to
> > the SolrQueryResponse and expect it to be formatted?
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>


Re: Sort by document similarity counts

2013-07-18 Thread Koji Sekiguchi

I have tried doing this via custom SearchComponent, where I can find all similar 
documents for each document in current search result, then add a new field into 
document hoping to use sort parameter (q=*&sort=similarityCount).


I don't understand this part very well, but:


But this will not work because sort is done before handling my custom search 
component, if added via last-components. Can't add it via first-components, 
because then I will have no access to query results. And I do not want to 
override QueryComponent because I need to have all the functionality it covers: 
grouping, facets, etc.


You may want to put your custom SearchComponent to last-component and inject 
SortSpec
in your prepare() so that QueryComponent can sort the result complying with 
your SortSpec?

koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html


Re: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Furkan KAMACI
Hi Shawn;

This is what I see when I look at mbeans:
org.apache.solr.update.DirectUpdateHandler21.0Update handler that
efficiently directly updates the on-disk main lucene index$URL$

41
15000ms
37
0
2
0
0
0
0
0
0
0
211453
0
0
0


I think that there is no information about what I look for?

2013/7/18 Shawn Heisey 

> On 7/17/2013 8:06 AM, Furkan KAMACI wrote:
> > I have crawled some web pages and indexed them at my SolrCloud(Solr
> 4.2.1).
> > However before I index them there was already some indexes. I can
> calculate
> > the difference between current and previous document count. However it
> > doesn't mean that I have indexed that count of documents. Because urls of
> > websites are unique ids at my system. So it means that some of documents
> > updated and they did not increased document count.
> >
> > My question is that: How can I learn the total count of how many
> documents
> > indexed and how many documents updated?
>
> Look at the update handler statistics.  Your application should record
> the numbers there, then you can check the handler statistics again and
> note the differences.  Here's a URL that can give you those statistics.
>
> http://server:port/solr/mycollectionname/admin/mbeans?stats=true
>
> They are also available in the UI on the UPDATEHANDLER section of
> Plugins / Stats, but you can't really use that in a program.
>
> By setting the request handler path on a query object to /admin/mbeans
> and setting the stats parameter, you can get this information with SolrJ.
>
> Thanks,
> Shawn
>
>


RE: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Markus Jelsma
Not your updateHandler, that only shows number about what it's doing and it can 
be restarted. Check your cores:
host:port/solr/admin/cores
 
 
-Original message-
> From:Furkan KAMACI 
> Sent: Thursday 18th July 2013 15:46
> To: solr-user@lucene.apache.org
> Subject: Re: How can I learn the total count of how many documents indexed 
> and how many documents updated?
> 
> Hi Shawn;
> 
> This is what I see when I look at mbeans:
>  name="class">org.apache.solr.update.DirectUpdateHandler2 name="version">1.0Update handler that
> efficiently directly updates the on-disk main lucene index name="src">$URL$
> 
> 41
> 15000ms
> 37
> 0
> 2
> 0
> 0
> 0
> 0
> 0
> 0
> 0
> 211453
> 0
> 0
> 0
> 
> 
> I think that there is no information about what I look for?
> 
> 2013/7/18 Shawn Heisey 
> 
> > On 7/17/2013 8:06 AM, Furkan KAMACI wrote:
> > > I have crawled some web pages and indexed them at my SolrCloud(Solr
> > 4.2.1).
> > > However before I index them there was already some indexes. I can
> > calculate
> > > the difference between current and previous document count. However it
> > > doesn't mean that I have indexed that count of documents. Because urls of
> > > websites are unique ids at my system. So it means that some of documents
> > > updated and they did not increased document count.
> > >
> > > My question is that: How can I learn the total count of how many
> > documents
> > > indexed and how many documents updated?
> >
> > Look at the update handler statistics.  Your application should record
> > the numbers there, then you can check the handler statistics again and
> > note the differences.  Here's a URL that can give you those statistics.
> >
> > http://server:port/solr/mycollectionname/admin/mbeans?stats=true
> >
> > They are also available in the UI on the UPDATEHANDLER section of
> > Plugins / Stats, but you can't really use that in a program.
> >
> > By setting the request handler path on a query object to /admin/mbeans
> > and setting the stats parameter, you can get this information with SolrJ.
> >
> > Thanks,
> > Shawn
> >
> >
> 


Re: Custom RequestHandlerBase XML Response Issue

2013-07-18 Thread Shalin Shekhar Mangar
This sounds like a bad idea. You could have done this much simply inside
your own application using libraries that you know well.

That being said, instead of creating a DOM document, create a solr
NamedList object which can be serialized by XMLResponseWriter.


On Thu, Jul 18, 2013 at 6:48 PM, Vineet Mishra wrote:

> My case is like, I have got a few Solr Instances and querying them and
> getting their xml response, out of that xml I have to extract a group of
> specific xml nodes, later I am combining other solr's response into a
> single xml and making a DOM document out of it.
>
> So as you mentioned in your last mail, how can I prepare a combined
> response for this xml doc and even if I do I don't think it would work
> because the same I am doing in the RequstHandler.
>
>
>
>
>
> On Thu, Jul 18, 2013 at 6:30 PM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Okay, let me explain. If you construct your combined response (why are
> you
> > doing that again?) in the form a Solr NamedList or SolrDocumentList then
> > the XMLResponseWriter (which btw uses TextResponseWriter) has no problem
> > writing it down as XML. The problem here is that you are giving it an
> > object (a DOM Document?) which it doesn't know how to serialize so it
> just
> > calls .toString on it and writes it out.
> >
> > As long as you stick a known type into the SolrQueryResponse, you should
> be
> > fine.
> >
> >
> > On Thu, Jul 18, 2013 at 6:24 PM, Vineet Mishra  > >wrote:
> >
> > > So does that mean there is no way that we can write a XML or JSON
> object
> > to
> > > the SolrQueryResponse and expect it to be formatted?
> > >
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>



-- 
Regards,
Shalin Shekhar Mangar.


Getting a large number of documents by id

2013-07-18 Thread Brian Hurt
I have a situation which is common in our current use case, where I need to
get a large number (many hundreds) of documents by id.  What I'm doing
currently is creating a large query of the form "id:12345 OR id:23456 OR
..." and sending it off.  Unfortunately, this query is taking a long time,
especially the first time it's executed.  I'm seeing times of like 4+
seconds for this query to return, to get 847 documents.

So, my question is: what should I be looking at to improve the performance
here?

Brian


Re: Clearing old nodes from zookeper without restarting solrcloud cluster

2013-07-18 Thread Luis Carlos Guerrero Covo
Hey andre, that isn't a possibility for us right now since we are
terminating nodes using aws autoscaling policies. We'll have to either
change our policies so that we can have some kind of graceful shutdown
where we get the possibility to unload cores or update zookeeper's cluster
state every once in a while to clear old offline nodes. Thanks for the help!


On Wed, Jul 17, 2013 at 2:23 AM, Andre Bois-Crettez
wrote:

> Indeed we are using UNLOAD of cores before shutting down extra replica
> nodes, works well but already said, it needs such nodes to be up.
> Once UNLOADed it is possible to stop them, works well for our use case.
>
> But if nodes are already down, maybe it is possible to manually create
> and upload a cleaned /clusterstate.json to Zookeeper ?
>
>
> André
>
>
> On 07/16/2013 11:18 PM, Marcin Rzewucki wrote:
>
>> Unloading a core is the known way to unregister a solr node in zookeeper
>> (and not use for further querying). It works for me. If you didn't do that
>> like this, unused nodes may remain in the cluster state and Solr may try
>> to
>> use them without a success. I'd suggest to start some machine with the old
>> name, run solr, join the cluster for a while, unload a core to unregister
>> it from the cluster and shutdown host at the end. This way you could have
>> clear cluster state.
>>
>>
>>
>> On 16 July 2013 14:41, Luis Carlos Guerrero Covo
>> **wrote:
>>
>>  Thanks, I was actually asking about deleting nodes from the cluster state
>>> not cores, unless you can unload cores specific to an already offline
>>> node
>>> from zookeeper.
>>>
>>> --
>>> André Bois-Crettez
>>>
>>> Search technology, Kelkoo
>>> http://www.kelkoo.com/
>>>
>>
> Kelkoo SAS
> Société par Actions Simplifiée
> Au capital de € 4.168.964,30
> Siège social : 8, rue du Sentier 75002 Paris
> 425 093 069 RCS Paris
>
> Ce message et les pièces jointes sont confidentiels et établis à
> l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
> destinataire de ce message, merci de le détruire et d'en avertir
> l'expéditeur.
>



-- 
Luis Carlos Guerrero Covo
M.S. Computer Engineering
(57) 3183542047


Two-steps queries with different sorting criteria

2013-07-18 Thread Fabio Amato
Hi all,
I need to execute a Solr query in two steps, executing in the first step a
generic limited-results query ordered by relevance, and in the second step
the ordering of the results of the first step according to a given sorting
criterion (different from relevance).

This two-steps query is meaningful when the query terms are so generic in
such a way that the number of matched results exceeds the wanted number of
results.

In such circumstance, using single-step queries with different sorting
criteria has a very confusing effect on the user experience, because at
each change of sorting criterion the user gets different results even if
the search query and the filtering conditions have not changed.

On the contrary, using a two-steps query where the sorting order of the
first step is always the relevance is more acceptable in case of large
number of matched results because the result set would not change with the
sorting criterion of the second step.

I am wondering if such a two-steps query is achievable with a single Solr
query, or if I am obliged to execute the sorting step of my two-steps query
out of Solr (i.e.:in my application). Another possibility could be the
development of a Solr plugin, but I am afraid of the possible effects on
the performances.

I am using Solr 3.4.0

Thanks in advance for your kind help.
Fabio


Re: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Furkan KAMACI
Hi Markus;

It doesn't give me how many documents updated from last commit.

2013/7/18 Markus Jelsma 

> Not your updateHandler, that only shows number about what it's doing and
> it can be restarted. Check your cores:
> host:port/solr/admin/cores
>
>
> -Original message-
> > From:Furkan KAMACI 
> > Sent: Thursday 18th July 2013 15:46
> > To: solr-user@lucene.apache.org
> > Subject: Re: How can I learn the total count of how many documents
> indexed and how many documents updated?
> >
> > Hi Shawn;
> >
> > This is what I see when I look at mbeans:
> >  > name="class">org.apache.solr.update.DirectUpdateHandler2 > name="version">1.0Update handler that
> > efficiently directly updates the on-disk main lucene index > name="src">$URL$
> > 
> > 41
> > 15000ms
> > 37
> > 0
> > 2
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 0
> > 211453
> > 0
> > 0
> > 0
> > 
> >
> > I think that there is no information about what I look for?
> >
> > 2013/7/18 Shawn Heisey 
> >
> > > On 7/17/2013 8:06 AM, Furkan KAMACI wrote:
> > > > I have crawled some web pages and indexed them at my SolrCloud(Solr
> > > 4.2.1).
> > > > However before I index them there was already some indexes. I can
> > > calculate
> > > > the difference between current and previous document count. However
> it
> > > > doesn't mean that I have indexed that count of documents. Because
> urls of
> > > > websites are unique ids at my system. So it means that some of
> documents
> > > > updated and they did not increased document count.
> > > >
> > > > My question is that: How can I learn the total count of how many
> > > documents
> > > > indexed and how many documents updated?
> > >
> > > Look at the update handler statistics.  Your application should record
> > > the numbers there, then you can check the handler statistics again and
> > > note the differences.  Here's a URL that can give you those statistics.
> > >
> > > http://server:port/solr/mycollectionname/admin/mbeans?stats=true
> > >
> > > They are also available in the UI on the UPDATEHANDLER section of
> > > Plugins / Stats, but you can't really use that in a program.
> > >
> > > By setting the request handler path on a query object to /admin/mbeans
> > > and setting the stats parameter, you can get this information with
> SolrJ.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
>


RE: How can I learn the total count of how many documents indexed and how many documents updated?

2013-07-18 Thread Markus Jelsma
No nothing will. If you must know, you'll have to do it on the client side and 
make sure autocommit is disabled. 
 
-Original message-
> From:Furkan KAMACI 
> Sent: Thursday 18th July 2013 17:01
> To: solr-user@lucene.apache.org
> Subject: Re: How can I learn the total count of how many documents indexed 
> and how many documents updated?
> 
> Hi Markus;
> 
> It doesn't give me how many documents updated from last commit.
> 
> 2013/7/18 Markus Jelsma 
> 
> > Not your updateHandler, that only shows number about what it's doing and
> > it can be restarted. Check your cores:
> > host:port/solr/admin/cores
> >
> >
> > -Original message-
> > > From:Furkan KAMACI 
> > > Sent: Thursday 18th July 2013 15:46
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: How can I learn the total count of how many documents
> > indexed and how many documents updated?
> > >
> > > Hi Shawn;
> > >
> > > This is what I see when I look at mbeans:
> > >  > > name="class">org.apache.solr.update.DirectUpdateHandler2 > > name="version">1.0Update handler that
> > > efficiently directly updates the on-disk main lucene index > > name="src">$URL$
> > > 
> > > 41
> > > 15000ms
> > > 37
> > > 0
> > > 2
> > > 0
> > > 0
> > > 0
> > > 0
> > > 0
> > > 0
> > > 0
> > > 211453
> > > 0
> > > 0
> > > 0
> > > 
> > >
> > > I think that there is no information about what I look for?
> > >
> > > 2013/7/18 Shawn Heisey 
> > >
> > > > On 7/17/2013 8:06 AM, Furkan KAMACI wrote:
> > > > > I have crawled some web pages and indexed them at my SolrCloud(Solr
> > > > 4.2.1).
> > > > > However before I index them there was already some indexes. I can
> > > > calculate
> > > > > the difference between current and previous document count. However
> > it
> > > > > doesn't mean that I have indexed that count of documents. Because
> > urls of
> > > > > websites are unique ids at my system. So it means that some of
> > documents
> > > > > updated and they did not increased document count.
> > > > >
> > > > > My question is that: How can I learn the total count of how many
> > > > documents
> > > > > indexed and how many documents updated?
> > > >
> > > > Look at the update handler statistics.  Your application should record
> > > > the numbers there, then you can check the handler statistics again and
> > > > note the differences.  Here's a URL that can give you those statistics.
> > > >
> > > > http://server:port/solr/mycollectionname/admin/mbeans?stats=true
> > > >
> > > > They are also available in the UI on the UPDATEHANDLER section of
> > > > Plugins / Stats, but you can't really use that in a program.
> > > >
> > > > By setting the request handler path on a query object to /admin/mbeans
> > > > and setting the stats parameter, you can get this information with
> > SolrJ.
> > > >
> > > > Thanks,
> > > > Shawn
> > > >
> > > >
> > >
> >
> 


Re: Getting a large number of documents by id

2013-07-18 Thread Alexandre Rafalovitch
You could start from doing id:(12345 23456) to reduce the query length and
possibly speed up parsing.
You could also move the query from 'q' parameter to 'fq' parameter, since
you probably don't care about ranking ('fq' does not rank).
If these are unique every time, you could probably look at not caching
(can't remember exact syntax).

That's all I can think of at the moment without digging deep into why you
need to do this at all.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt  wrote:

> I have a situation which is common in our current use case, where I need to
> get a large number (many hundreds) of documents by id.  What I'm doing
> currently is creating a large query of the form "id:12345 OR id:23456 OR
> ..." and sending it off.  Unfortunately, this query is taking a long time,
> especially the first time it's executed.  I'm seeing times of like 4+
> seconds for this query to return, to get 847 documents.
>
> So, my question is: what should I be looking at to improve the performance
> here?
>
> Brian
>


Re: Getting a large number of documents by id

2013-07-18 Thread Jack Krupansky
Solr really isn't designed for that kind of use case. If it happens to work 
well for your particular situation, great, but don't complain when you are 
well outside the normal usage for a "search engine" (10, 20, 50, 100 results 
paged at a time, with modest sized query strings.)


If you must get these 837 documents, do them in reasonable size batches, 
like 20, 50, or 100 at a time.


That said, there may be something else going on here, since a query for 837 
results should not take 4 seconds anyway.


Check QTime - is it 4 seconds?

Add debugQuery=true to your query and check the individual module times - 
which ones are the biggest hogs? Or, maybe it is none of them and the 
problem is elsewhere, like formatting the response, network problems, etc.


Hmmm... I wonder if the new real-time "Get" API would be better for your 
case. It takes a comma-separated list of document IDs (keys). Check it out:


http://wiki.apache.org/solr/RealTimeGet

-- Jack Krupansky

-Original Message- 
From: Brian Hurt

Sent: Thursday, July 18, 2013 10:46 AM
To: solr-user@lucene.apache.org
Subject: Getting a large number of documents by id

I have a situation which is common in our current use case, where I need to
get a large number (many hundreds) of documents by id.  What I'm doing
currently is creating a large query of the form "id:12345 OR id:23456 OR
..." and sending it off.  Unfortunately, this query is taking a long time,
especially the first time it's executed.  I'm seeing times of like 4+
seconds for this query to return, to get 847 documents.

So, my question is: what should I be looking at to improve the performance
here?

Brian 



Re: Getting a large number of documents by id

2013-07-18 Thread Michael Della Bitta
Brian,

Have you tried the realtime get handler? It supports multiple documents.

http://wiki.apache.org/solr/RealTimeGet

Michael Della Bitta

Applications Developer

o: +1 646 532 3062  | c: +1 917 477 7906

appinions inc.

“The Science of Influence Marketing”

18 East 41st Street

New York, NY 10017

t: @appinions  | g+:
plus.google.com/appinions
w: appinions.com 


On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt  wrote:

> I have a situation which is common in our current use case, where I need to
> get a large number (many hundreds) of documents by id.  What I'm doing
> currently is creating a large query of the form "id:12345 OR id:23456 OR
> ..." and sending it off.  Unfortunately, this query is taking a long time,
> especially the first time it's executed.  I'm seeing times of like 4+
> seconds for this query to return, to get 847 documents.
>
> So, my question is: what should I be looking at to improve the performance
> here?
>
> Brian
>


Re: Solr with Hadoop

2013-07-18 Thread Matt Lieber
Rajesh,

If you require to have an integration between Solr and Hadoop or NoSQL, I
would recommend using a commercial distribution. I think most are free to
use as long as you don't require support.
I inquired about the Cloudera Search capability, but it seems like that
far it is just preliminary: there is no tight integration yet between
Hbase and Solr, for example, other than full text search on the HDFS data
(I believe enabled in Hue). I am not too familiar with what MapR's M7 has
to offer.
However Datastax does a good job of tightly integrating Solr with
Cassandra, and lets you query over the data ingested from Solr in Hive for
example, which is pretty nice. Solr would not trigger Hadoop jobs, though.

Cheers,
Matt


On 7/17/13 7:37 PM, "Rajesh Jain"  wrote:

>I
> have a newbie question on integrating Solr with Hadoop.
>
>There are some vendors like Cloudera/MapR who have announced Solr Search
>for Hadoop.
>
>If I use the Apache distro, how can I use Solr Search on docs in
>HDFS/Hadoop
>
>Is there a tutorial on how to use it or getting started.
>
>I am using Flume to sink CSV docs into Hadoop/HDFS and I would like to use
>Solr to provide Search.
>
>Does Solr Search trigger MapReduce Jobs (like Splunk-Hunk) does?
>
>Thanks,
>Rajesh
>









NOTE: This message may contain information that is confidential, proprietary, 
privileged or otherwise protected by law. The message is intended solely for 
the named addressee. If received in error, please destroy and notify the 
sender. Any use of this email is prohibited when received in error. Impetus 
does not represent, warrant and/or guarantee, that the integrity of this 
communication has been maintained nor that the communication is free of errors, 
virus, interception or interference.


Re: Getting a large number of documents by id

2013-07-18 Thread Roman Chyla
Look at speed of reading the data - likely, it takes long time to assemble
a big response, especially if there are many long fields - you may want to
try SSD disks, if you have that option.

Also, to gain better understanding: Start your solr, start jvisualvm and
attach to your running solr. Start sending queries and observe where the
most time is spent - it is very easy, you don't have to be a programmer to
do it.

The crucial parts are (but they will show up under different names) are:

1. query parsing
2. search execution
3. response assembly

quite likely, your query is a huge boolean OR clause, that may not be as
efficient as some filter query.

Your use case is actually not at all exotic. There will soon be a JIRA
ticket that makes the scenario of sending/querying with large number of IDs
less painful.

http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-td4070747.html#a4070964
http://lucene.472066.n3.nabble.com/ACL-implementation-Pseudo-join-performance-amp-Atomic-Updates-td4077894.html

But I would really recommend you to do the jvisualvm measurement - that's
like bringing the light into darkness.

roman


On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt  wrote:

> I have a situation which is common in our current use case, where I need to
> get a large number (many hundreds) of documents by id.  What I'm doing
> currently is creating a large query of the form "id:12345 OR id:23456 OR
> ..." and sending it off.  Unfortunately, this query is taking a long time,
> especially the first time it's executed.  I'm seeing times of like 4+
> seconds for this query to return, to get 847 documents.
>
> So, my question is: what should I be looking at to improve the performance
> here?
>
> Brian
>


RE: Solr with Hadoop

2013-07-18 Thread Saikat Kanjilal
I'm familiar with and have used both the DSE cluster as well as am in the 
process of evaluating cloudera search, in general cloudera search has tight 
integration with hdfs and takes care of replication and sharding transparently 
by using the pre-existing hdfs replication and sharding, however cloudera 
search actually uses solrcloud underneath and you would need to install 
zookeeper to enable coordination between each of the solr nodes.   DataStax 
allows you to talk to Solr, however their model scales around the data model 
and architecture of cassandra, release 3.1 allows for some additional solr 
admin functionality and removes the need to write cassandra specific code.

If you go the open source route you have a few options:

1) You can build a custom plugin inside solr that would internally query hdfs 
and return data, you would need to figure out how to scale this potentially 
using a solution very similar to cloudera search (i.e. leverage solrcloud), and 
if using solrcloud you would need ot install zookeeper for node coordination

2) You could write create a flume channel that accumulates specific events from 
hdfs and create a sink to write data directly to solr

3) I would look at cloudera search if you need tight integration into hadoop, 
it might save you some time and efforts

I dont think you want to have solr trigger map-reduce jobs if you're looking at 
having very fast throughput through your search service.


Hope this helps, ping me offline if you have more questions.
Regards

> From: mlie...@impetus.com
> To: solr-user@lucene.apache.org
> Subject: Re: Solr with Hadoop
> Date: Thu, 18 Jul 2013 15:41:36 +
> 
> Rajesh,
> 
> If you require to have an integration between Solr and Hadoop or NoSQL, I
> would recommend using a commercial distribution. I think most are free to
> use as long as you don't require support.
> I inquired about the Cloudera Search capability, but it seems like that
> far it is just preliminary: there is no tight integration yet between
> Hbase and Solr, for example, other than full text search on the HDFS data
> (I believe enabled in Hue). I am not too familiar with what MapR's M7 has
> to offer.
> However Datastax does a good job of tightly integrating Solr with
> Cassandra, and lets you query over the data ingested from Solr in Hive for
> example, which is pretty nice. Solr would not trigger Hadoop jobs, though.
> 
> Cheers,
> Matt
> 
> 
> On 7/17/13 7:37 PM, "Rajesh Jain"  wrote:
> 
> >I
> > have a newbie question on integrating Solr with Hadoop.
> >
> >There are some vendors like Cloudera/MapR who have announced Solr Search
> >for Hadoop.
> >
> >If I use the Apache distro, how can I use Solr Search on docs in
> >HDFS/Hadoop
> >
> >Is there a tutorial on how to use it or getting started.
> >
> >I am using Flume to sink CSV docs into Hadoop/HDFS and I would like to use
> >Solr to provide Search.
> >
> >Does Solr Search trigger MapReduce Jobs (like Splunk-Hunk) does?
> >
> >Thanks,
> >Rajesh
> >
> 
> 
> 
> 
> 
> 
> 
> 
> 
> NOTE: This message may contain information that is confidential, proprietary, 
> privileged or otherwise protected by law. The message is intended solely for 
> the named addressee. If received in error, please destroy and notify the 
> sender. Any use of this email is prohibited when received in error. Impetus 
> does not represent, warrant and/or guarantee, that the integrity of this 
> communication has been maintained nor that the communication is free of 
> errors, virus, interception or interference.
  

Re: Getting a large number of documents by id

2013-07-18 Thread Alexandre Rafalovitch
And I guess, if only a subset of fields is being requested but there are
other large fields present, there could be the cost of loading those extra
fields into memory before discarding them. In which case,
using enableLazyFieldLoading may help.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jul 18, 2013 at 11:47 AM, Roman Chyla  wrote:

> Look at speed of reading the data - likely, it takes long time to assemble
> a big response, especially if there are many long fields - you may want to
> try SSD disks, if you have that option.
>
> Also, to gain better understanding: Start your solr, start jvisualvm and
> attach to your running solr. Start sending queries and observe where the
> most time is spent - it is very easy, you don't have to be a programmer to
> do it.
>
> The crucial parts are (but they will show up under different names) are:
>
> 1. query parsing
> 2. search execution
> 3. response assembly
>
> quite likely, your query is a huge boolean OR clause, that may not be as
> efficient as some filter query.
>
> Your use case is actually not at all exotic. There will soon be a JIRA
> ticket that makes the scenario of sending/querying with large number of IDs
> less painful.
>
>
> http://lucene.472066.n3.nabble.com/Solr-large-boolean-filter-td4070747.html#a4070964
>
> http://lucene.472066.n3.nabble.com/ACL-implementation-Pseudo-join-performance-amp-Atomic-Updates-td4077894.html
>
> But I would really recommend you to do the jvisualvm measurement - that's
> like bringing the light into darkness.
>
> roman
>
>
> On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt  wrote:
>
> > I have a situation which is common in our current use case, where I need
> to
> > get a large number (many hundreds) of documents by id.  What I'm doing
> > currently is creating a large query of the form "id:12345 OR id:23456 OR
> > ..." and sending it off.  Unfortunately, this query is taking a long
> time,
> > especially the first time it's executed.  I'm seeing times of like 4+
> > seconds for this query to return, to get 847 documents.
> >
> > So, my question is: what should I be looking at to improve the
> performance
> > here?
> >
> > Brian
> >
>


XInclude and Document Entity not working on schema.xml

2013-07-18 Thread Elodie Sannier

Hello,

I am using the solr nightly version 4.5-2013-07-18_06-04-44 and I want
to use "Document Entity" in schema.xml, I get this exception :
java.lang.RuntimeException: schema fieldtype
string(org.apache.solr.schema.StrField) invalid
arguments:{xml:base=solrres:/commonschema_types.xml}
at org.apache.solr.schema.FieldType.setArgs(FieldType.java:187)
at
org.apache.solr.schema.FieldTypePluginLoader.init(FieldTypePluginLoader.java:141)
at
org.apache.solr.schema.FieldTypePluginLoader.init(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:190)
... 16 more

schema.xml:


]>

  

&commonschema_types;
  
   


commonschema_types.xml:





The same error appears in this bug (fixed ?):
https://issues.apache.org/jira/browse/SOLR-3087

It works with solr-4.2.1.

//-

I also try to use use XML XInclude mechanism
(http://en.wikipedia.org/wiki/XInclude) to include parts of schema.xml.

When I try to include a fieldType, I get this exception :
org.apache.solr.common.SolrException: Unknown fieldType 'long' specified
on field _version_
at org.apache.solr.schema.IndexSchema.loadFields(IndexSchema.java:644)
at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:470)
at org.apache.solr.schema.IndexSchema.(IndexSchema.java:164)
at
org.apache.solr.schema.IndexSchemaFactory.create(IndexSchemaFactory.java:55)
at
org.apache.solr.schema.IndexSchemaFactory.buildIndexSchema(IndexSchemaFactory.java:69)
at org.apache.solr.core.ZkContainer.createFromZk(ZkContainer.java:267)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:622)
... 10 more

The type is not found.

I include 'schema_integration.xml' like this in 'schema.xml' :




http://www.w3.org/2001/XInclude"/>








Is it a bug of the nightly version ?

Elodie Sannier

Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: solr autodetectparser tikaconfig dataimporter error

2013-07-18 Thread Andreas Owen
i have now changed some things and the import runs without error. in schema.xml 
i haven't got the field "text" but "contentsExact". unfortunatly the text (from 
file) isn't indexed even though i mapped it to the proper field. what am i 
doing wrong?

data-config.xml:




http://127.0.0.1/tkb/internet/"; name="main"/>

 

















i noticed, that when I move the field author into the tika- it isn't 
indexed. can this have something to do why the text from the file isn't 
indexed? Do I have to do something special about the -levels in 


ps: how do i import tsstamp, it's a static value?




On 14. Jul 2013, at 10:30 PM, Jack Krupansky wrote:

> "Caused by: java.lang.NoSuchMethodError:"
> 
> That means you have some out of date jars or some newer jars mixed in with 
> the old ones.
> 
> -- Jack Krupansky
> 
> -Original Message- From: Andreas Owen
> Sent: Sunday, July 14, 2013 3:07 PM
> To: solr-user@lucene.apache.org
> Subject: Re: solr autodetectparser tikaconfig dataimporter error
> 
> hi
> 
> is there nowone with a idea what this error is or even give me a pointer 
> where to look? If not is there a alternitave way to import documents from a 
> xml-file with meta-data and the filename to parse?
> 
> thanks for any help.
> 
> 
> On 12. Jul 2013, at 10:38 PM, Andreas Owen wrote:
> 
>> i am using solr 3.5, tika-app-1.4 and tagcloud 1.2.1. when i try to =
>> import a
>> file via xml i get this error, it doesn't matter what file format i try =
>> to index txt, cfm, pdf all the same error:
>> 
>> SEVERE: Exception while processing: rec document :
>> SolrInputDocument[{id=3Did(1.0)=3D{myTest.txt},
>> title=3Dtitle(1.0)=3D{Beratungsseminar kundenbrief}, =
>> contents=3Dcontents(1.0)=3D{wie
>> kommuniziert man}, author=3Dauthor(1.0)=3D{Peter Z.},
>> =
>> path=3Dpath(1.0)=3D{download/online}}]:org.apache.solr.handler.dataimport.=
>> DataImportHandlerException:
>> java.lang.NoSuchMethodError:
>> =
>> org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
>> TikaConfig;)V
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
>> a:669)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
>> a:622)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2=
>> 68)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)=
>> 
>> at
>> =
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.=
>> java:359)
>> at
>> =
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4=
>> 27)
>> at
>> =
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40=
>> 8)
>> Caused by: java.lang.NoSuchMethodError:
>> =
>> org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
>> TikaConfig;)V
>> at
>> =
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityP=
>> rocessor.java:122)
>> at
>> =
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr=
>> ocessorWrapper.java:238)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
>> a:596)
>> ... 6 more
>> 
>> Jul 11, 2013 5:23:36 PM org.apache.solr.common.SolrException log
>> SEVERE: Full Import
>> failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
>> java.lang.NoSuchMethodError:
>> =
>> org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
>> TikaConfig;)V
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
>> a:669)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
>> a:622)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:2=
>> 68)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:187)=
>> 
>> at
>> =
>> org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.=
>> java:359)
>> at
>> =
>> org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:4=
>> 27)
>> at
>> =
>> org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:40=
>> 8)
>> Caused by: java.lang.NoSuchMethodError:
>> =
>> org.apache.tika.parser.AutoDetectParser.setConfig(Lorg/apache/tika/config/=
>> TikaConfig;)V
>> at
>> =
>> org.apache.solr.handler.dataimport.TikaEntityProcessor.nextRow(TikaEntityP=
>> rocessor.java:122)
>> at
>> =
>> org.apache.solr.handler.dataimport.EntityProcessorWrapper.nextRow(EntityPr=
>> ocessorWrapper.java:238)
>> at
>> =
>> org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.jav=
>> a:596)
>> ... 6 more
>> 
>> Jul 11, 2013 5:23:36 PM org.apache.solr.updat

Luke's analysis of Trie Dates

2013-07-18 Thread JohnRodey
I have a TrieDateField dynamic field setup in my schema, pretty standard...

  

  

In my code I only set one field, "creation_tdt" and I round it to the
nearest second before storing it.  However when I analyze it with Luke I
get:



tdate
IT--OF--
*_tdt
(unstored field)
22404
-1

  22404
  22404
  22404
  22404
  22404
  22404
  22404
  16014
  6390
  1535
  1459
  1268
  1193
  1187
  1152
  1129
  1089
  ...


So my questions is, where are all these entries coming from?  They are not
the dates I specified because they have millis, and my field isn't
multivalued, so the term counts dont add up (how could I have more than
22404 terms if I only have 22404 documents).  Why multiple
"1970-01-01T00:00:00Z" entries?

Is this somehow related to Trie fields and how they are indexed?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Luke-s-analysis-of-Trie-Dates-tp4078885.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Luke's analysis of Trie Dates

2013-07-18 Thread Yonik Seeley
On Thu, Jul 18, 2013 at 12:53 PM, JohnRodey  wrote:
> I have a TrieDateField dynamic field setup in my schema, pretty standard...
>
>   
>
>precisionStep="6" positionIncrementGap="0"/>
>
> In my code I only set one field, "creation_tdt" and I round it to the
> nearest second before storing it.  However when I analyze it with Luke I
> get:
>
> 
> 
> tdate
> IT--OF--
> *_tdt
> (unstored field)
> 22404
> -1
> 
>   22404
>   22404
>   22404
>   22404
>   22404
>   22404
>   22404
>   16014
>   6390
>   1535
>   1459
>   1268
>   1193
>   1187
>   1152
>   1129
>   1089
>   ...
>
>
> So my questions is, where are all these entries coming from?  They are not
> the dates I specified because they have millis, and my field isn't
> multivalued, so the term counts dont add up (how could I have more than
> 22404 terms if I only have 22404 documents).  Why multiple
> "1970-01-01T00:00:00Z" entries?
>
> Is this somehow related to Trie fields and how they are indexed?

Yes, it's due to how trie fields are indexed (can have multiple
indexed tokens per logical value to speed up range queries).
If you want counts of values (as opposed to tokens), use faceting.

-Yonik
http://lucidworks.com


Re: JVM Crashed - SOLR deployed in Tomcat

2013-07-18 Thread neoman
Thanks for your reply. Yes, it worked. No more crashes after switching to
1.6.0_30



--
View this message in context: 
http://lucene.472066.n3.nabble.com/JVM-Crashed-SOLR-deployed-in-Tomcat-tp4078439p4078906.html
Sent from the Solr - User mailing list archive at Nabble.com.


Indexing into SolrCloud

2013-07-18 Thread Beale, Jim (US-KOP)
Hey folks,

I've been migrating an application which indexes about 15M documents from 
straight-up Lucene into SolrCloud.  We've set up 5 Solr instances with a 3 
zookeeper ensemble using HAProxy for load balancing. The documents are 
processed on a quad core machine with 6 threads and indexed into SolrCloud 
through HAProxy using ConcurrentUpdateSolrServer in order to batch the updates. 
 The indexing box is heavily-loaded during indexing but I don't think it is so 
bad that it would cause issues.

I'm using Solr 4.3.1 on client and server side, zookeeper 3.4.5 and HAProxy 
1.4.22.

I've been accepting the default HttpClient with 50K buffered docs and 2 
threads, i.e.,

int solrMaxBufferedDocs = 5;
int solrThreadCount = 2;
solrServer = new ConcurrentUpdateSolrServer(solrHttpIPAddress, 
solrMaxBufferedDocs, solrThreadCount);

autoCommit is configured in the solrconfig as follows:

 
   60
   50
   false
 

I'm getting the following errors on the client and server sides respectively:

Client side:

2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when 
processing request: Software caused connection abort: socket write error
2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-4] INFO  
SystemDefaultHttpClient - Retrying request
2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
SystemDefaultHttpClient - I/O exception (java.net.SocketException) caught when 
processing request: Software caused connection abort: socket write error
2013-07-16 19:02:47,002 [concurrentUpdateScheduler-1-thread-5] INFO  
SystemDefaultHttpClient - Retrying request

Server side:

7988753 [qtp1956653918-23] ERROR org.apache.solr.core.SolrCore  â 
java.lang.RuntimeException: [was class org.eclipse.jetty.io.EofException] early 
EOF
at 
com.ctc.wstx.util.ExceptionUtil.throwRuntimeException(ExceptionUtil.java:18)
at com.ctc.wstx.sr.StreamScanner.throwLazyError(StreamScanner.java:731)
at 
com.ctc.wstx.sr.BasicStreamReader.safeFinishToken(BasicStreamReader.java:3657)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:809)
at org.apache.solr.handler.loader.XMLLoader.readDoc(XMLLoader.java:393)

When I disabled autoCommit on the server side, I didn't see any errors there 
but I still get the issue client-side after about 2 million documents - which 
is about 45 minutes.

Has anyone seen this issue before?  I couldn't find anything useful on the 
usual places.

I suppose I could setup wireshark to see what is happening but I'm hoping that 
someone has a better suggestion.

Thanks in advance for any help!


Best regards,
Jim Beale

hibu.com
2201 Renaissance Boulevard, King of Prussia, PA, 19406
Office: 610-879-3864
Mobile: 610-220-3067

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

The information contained in this email message, including any attachments, is 
intended solely for use by the individual or entity named above and may be 
confidential. If the reader of this message is not the intended recipient, you 
are hereby notified that you must not read, use, disclose, distribute or copy 
any part of this communication. If you have received this communication in 
error, please immediately notify me by email and destroy the original message, 
including any attachments. Thank you.


Re: Getting a large number of documents by id

2013-07-18 Thread Brian Hurt
Thanks everyone for the response.

On Thu, Jul 18, 2013 at 11:22 AM, Alexandre Rafalovitch
wrote:

> You could start from doing id:(12345 23456) to reduce the query length and
> possibly speed up parsing.
>

I didn't know about this syntax- it looks useful.


> You could also move the query from 'q' parameter to 'fq' parameter, since
> you probably don't care about ranking ('fq' does not rank).
>

Yes, I don't care about rank, so this helps.


> If these are unique every time, you could probably look at not caching
> (can't remember exact syntax).
>

That's all I can think of at the moment without digging deep into why you
> need to do this at all.
>
>
Short version of a long story: I'm implementing a graph database on top of
solr.  Which is not what solr is designed for, I know.  This is a case
where I'm following a set of edges from a given node to it's 847 children,
and I need to get the children.  And yes, I've looked at neo4j- it doesn't
help.



> Regards,
>Alex.
>
> Personal website: http://www.outerthoughts.com/
> LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
> - Time is the quality of nature that keeps events from happening all at
> once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)
>
>
> On Thu, Jul 18, 2013 at 10:46 AM, Brian Hurt  wrote:
>
> > I have a situation which is common in our current use case, where I need
> to
> > get a large number (many hundreds) of documents by id.  What I'm doing
> > currently is creating a large query of the form "id:12345 OR id:23456 OR
> > ..." and sending it off.  Unfortunately, this query is taking a long
> time,
> > especially the first time it's executed.  I'm seeing times of like 4+
> > seconds for this query to return, to get 847 documents.
> >
> > So, my question is: what should I be looking at to improve the
> performance
> > here?
> >
> > Brian
> >
>


Auto-sharding and numShard parameter

2013-07-18 Thread Flavio Pompermaier
Hi to all,
Probably this question has a simple answer but I just want to be sure of
the potential drawbacks..when I run SolrCloud I run the main solr instance
with the -numShard option (e.g. 2).
Then as data grows, shards could potentially become a huge number. If I
hadstio to restart all nodes and I re-run the master with the numShard=2,
what will happen? It will be just ignored or Solr will try to reduce
shards...?

Another question...in SolrCloud, how do I restart all the cloud at once? Is
it possible?

Best,
Flavio


Need ideas to perform historical search

2013-07-18 Thread SolrLover

I am trying to implement Historical search using SOLR.

Ex:

If I search on address 800 5th Ave and provide a time range, it should list
the name of the person who was living at the address during the time period.
I am trying to figure out a way to store the data without redundancy.

I can do a join in the database to return all the names who were living in a
particular address during a particular time but I know it's difficult to do
that in SOLR and SOLR is not a database (it works best when the data is
denormalized).,..

Is there any other way / idea by which I can reduce the redundancy of
creating multiple records for a particular person again and again?







--
View this message in context: 
http://lucene.472066.n3.nabble.com/Need-ideas-to-perform-historical-search-tp4078980.html
Sent from the Solr - User mailing list archive at Nabble.com.


Spellcheck questions

2013-07-18 Thread smanad
Exploring various SpellCheckers in solr and have a few questions, 
1. Which algorithm is used for generating suggestions when using
IndexBasedSpellChecker. I know its Levenshtein (with edit distance=2 -
default) in DirectSolrSpellChecker.
2. If i have 2 indices, can I setup multiple IndexBasedSpellCheckers to
point to different spellcheck dictionaries to generate suggestions from
both.
3. Can I use IndexBasedSpellChecker and FileBasedSpellChecker together? I
tried doing it and ran into an exception "All checkers need to use the same
StringDistance."

Any help will be much apprecited.
Thanks, 
-Manasi



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-questions-tp4078985.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Spellcheck questions

2013-07-18 Thread SolrLover
check the below link to get more info on IndexBasedSpellCheckers

http://searchhub.org/2010/08/31/getting-started-spell-checking-with-apache-lucene-and-solr/



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Spellcheck-questions-tp4078985p4079000.html
Sent from the Solr - User mailing list archive at Nabble.com.


additional requests sent to solr

2013-07-18 Thread alxsss
Hello,

I send to solr( to server1 in the cluster of two servers) the folowing request

http://server1:8983/solr/mycollection/select?q=alex&wt=xml&defType=edismax&facet.field=school&facet.field=company&facet=true&facet.limit=10&facet.mincount=1&qf=school_txt+company_txt+name&shards=server1:8983/solr/mycollection,server2.com:8983/solr/mycollection

I see in the logs 2 additional requests

INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&f.company.facet.limit=25&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&f.school_facet.facet.limit=25&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&fl=id,score&start=0&q=alex&facet.field=school&facet.field=company&isShard=true&fsv=true}
 hits=9118 status=0 QTime=72

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&facet.mincount=1&company__terms=Google&ids=957642543183429632,957841245982425088,67612781366,56659036467,50875569066,957707339232706560,465078975511&facet.limit=10&qf=school_txt+company_txt+name&distrib=false&wt=javabin&version=2&rows=10&defType=edismax&NOW=1374191542130&shard.url=server1:8983/solr/mycollection&school__terms=Michigan+State+University,Brigham+Young+University,Northeastern+University&q=alex&facet.field={!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}company&isShard=true}
 status=0 QTime=6

Jul 18, 2013 4:52:22 PM org.apache.solr.core.SolrCore execute
INFO: [mycollection] webapp=/solr path=/select 
params={facet=true&shards=server1.prod.mylife.com:8983/solr/mycollection,server2:8983/solr/mycollection&facet.mincount=1&q=alex&facet.limit=10&qf=school_txt+company_txt+name&facet.field=school&facet.field=company&wt=xml&defType=edismax}
 hits=97262 status=0 QTime=168


I can understand that the first and the third log records are related to the 
above request, but cannot inderstand where the second log comes from. 
I see in it, company__terms and 
{!terms%3D$school__terms}school&facet.field={!terms%3D$company__terms}, whish 
seems does not have anything to do with the initial request. This is solr-4.2.0


Any ideas about it are welcome.

Thanks in advance.
Alex.


Solr 4.3 open a lot more files than solr 3.6

2013-07-18 Thread Zhang, Lisheng
Hi,
 
After upgrading solr from 3.6 to 4.3, we found that solr opened a lot more 
files compared to
solr 3.6 (when core is open). Since we have many cores (more than 2K and still 
grow), we 
would like to reduce the number of open files.
 
We already used shareSchema and sharedLib, we also shared SolrConfig across all 
cores,
we also commented out autoSoftCommit in solrconfig.xml.
 
In solr 3.6, it seems that indexWriter is opened only if indexing request comes 
and immediately 
closed after request is done, but in solr 4.3, IndexWriter kept open, is there 
an easy way to
go back to 3.6 behavior (we donot need to use Near RealTime Search), can we 
change code
to disable keeping IndexWriter open (if no better way)?
 
Any guidance to reduce open files would be very helpful?
 
Thanks very much for helps, Lisheng


Re: add to ContributorsGroup - Instructions for setting up SolrCloud on jboss

2013-07-18 Thread Erick Erickson
Thank you for adding to the wiki! It's always appreciated...

On Wed, Jul 17, 2013 at 5:18 PM, Ali, Saqib  wrote:
> Thanks Erick!
>
> I have added the instructions for running SolrCloud on Jboss:
> http://wiki.apache.org/solr/SolrCloud%20using%20Jboss
>
> I will refine the instructions further, and also post some screenshots.
>
> Thanks.
>
>
> On Sun, Jul 14, 2013 at 5:05 AM, Erick Erickson 
> wrote:
>
>> Done, sorry it took so long, hadn't looked at the list in a couple of days.
>>
>>
>> Erick
>>
>> On Fri, Jul 12, 2013 at 5:46 PM, Ali, Saqib  wrote:
>> > username: saqib
>> >
>> >
>> > On Fri, Jul 12, 2013 at 2:35 PM, Ali, Saqib 
>> wrote:
>> >
>> >> Hello,
>> >>
>> >> Can you please add me to the ContributorsGroup? I would like to add
>> >> instructions for setting up SolrCloud using Jboss.
>> >>
>> >> thanks.
>> >>
>> >>
>>


Re: Need ideas to perform historical search

2013-07-18 Thread Alexandre Rafalovitch
Why do you care about redundancy? That's the search engine's architectural
tradeoff (as far as I understand). And, the tokens are all normalized under
the covers, so it does not take as much space as you expect.

Specifically regarding your issue, maybe you should store 'occupancy' as
the record. That's similar to what they do at Gilt:
http://www.slideshare.net/trenaman/personalized-search-on-the-largest-flash-sale-site-in-america(slide
36+)

The other option is to use location as spans with some clever queries:
http://wiki.apache.org/solr/SpatialForTimeDurations (follow the links).

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all at
once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD book)


On Thu, Jul 18, 2013 at 5:58 PM, SolrLover  wrote:

>
> I am trying to implement Historical search using SOLR.
>
> Ex:
>
> If I search on address 800 5th Ave and provide a time range, it should list
> the name of the person who was living at the address during the time
> period.
> I am trying to figure out a way to store the data without redundancy.
>
> I can do a join in the database to return all the names who were living in
> a
> particular address during a particular time but I know it's difficult to do
> that in SOLR and SOLR is not a database (it works best when the data is
> denormalized).,..
>
> Is there any other way / idea by which I can reduce the redundancy of
> creating multiple records for a particular person again and again?
>
>
>
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Need-ideas-to-perform-historical-search-tp4078980.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Early Access Release #3 for Solr 4.x Deep Dive book is now available for download on Lulu.com

2013-07-18 Thread Jack Krupansky
Okay, it’s hot off the e-presses: Solr 4.x Deep Dive, Early Access Release #3 
is now available for purchase and download as an e-book for $9.99 on Lulu.com 
at:

http://www.lulu.com/shop/jack-krupansky/solr-4x-deep-dive-early-access-release-1/ebook/product-21079719.html


(That link says “1”, but it apparently correctly redirects to EAR #3.)

My recent blog posts over the past two weeks detailed the changes from EAR#2. 
Besides more cleanup, the focus was on features of Solr 4.4, including update 
processors and token filters. I still haven’t finished 4.4 coverage, but this 
is progress.

See:
http://basetechnology.blogspot.com/

The next EAR will be in approximately two weeks, contents TBD.

If you have purchased EAR#1 or #2, there is no need to rush out and pick up 
EAR#3. I mean, the technical content changes were relatively modest (68 new 
pages), and EAR#4 will be out in another two weeks anyway. That said, EAR#3 is 
a significant improvement over EAR#1 and EAR#2.

-- Jack Krupansky

Re: Sort by document similarity counts

2013-07-18 Thread zygis


Not sure if it will work. Say we have SearchComponent which does this in 
process method:

1. DocList docs = rb.getResults().docList;

2. Go over docs and for each doc do:

3. 
BooleanQuery q = new BooleanQuery(); //construct a query which gets all docs 
which are not equal to current one and are from a different host (we deal there 
with web pages)
q.add(new TermQuery(new Term("host", host)), BooleanClause.Occur.MUST_NOT);
q.add(new TermQuery(new Term("id", name)), BooleanClause.Occur.MUST_NOT);
DocListAndSet sim = searcher.getDocListAndSet( q, (TermQuery) null, null, 0, 
1000); //TODO how to set proper limit not hard-coded 1000

4. for all docs in sim calculate similarity to current doc (from #2)

5. Count all similar documents and add a new field
            FieldType ft = new FieldType();
            ft.setStored(true);
            ft.setIndexed(true);
            Field f = new IntField("similarCount", ds.size(), ft);
            d.add(f);


Now the problem is with #1 this comes in already sorted. That is if I call solr 
with q=*&sort=similarityCount, sort is applied before calling last component, 
which does all the above defined steps. If I add this to first-components then 
#1 call will return null.


Completely different approach would be to calculate aggregate values on update 
via UpdateRequestProcessor. But then I need to be able to do searches in update 
processor (step #3). But in that case docs for searcher are available only 
after commit. I'd expect that this would work but search always returns 0

public void processCommit(CommitUpdateCommand cmd) throws IOException {
               TopDocs docs = searcher.search(new MatchAllDocsQuery(), 100);
               DocListAndSet sim = searcher.getDocListAndSet( 
                    new MatchAllDocsQuery(), (TermQuery) null, null, 0, 10);
                DocList docs = sim.docList; < Is always empty

(Tried placing it after solr.RunUpdateProcessorFactory in update chain, no 
change)

Even if searcher would work, it looks bad. Because in this case I would need to 
update not only incoming document but also all those documents which are 
similar to a current one (That is if A is similar to B and C, then B and C are 
similar to A, and similarCount field has to be increased in B and C as well).




 From: Koji Sekiguchi 
To: solr-user@lucene.apache.org 
Sent: Thursday, July 18, 2013 4:29 PM
Subject: Re: Sort by document similarity counts
 

> I have tried doing this via custom SearchComponent, where I can find all 
> similar documents for each document in current search result, then add a new 
> field into document hoping to use sort parameter (q=*&sort=similarityCount).

I don't understand this part very well, but:

> But this will not work because sort is done before handling my custom search 
> component, if added via last-components. Can't add it via first-components, 
> because then I will have no access to query results. And I do not want to 
> override QueryComponent because I need to have all the functionality it 
> covers: grouping, facets, etc.

You may want to put your custom SearchComponent to last-component and inject 
SortSpec
in your prepare() so that QueryComponent can sort the result complying with 
your SortSpec?

koji
-- 
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html