Re: set keepword file to be used based on a field value

2014-12-22 Thread leostro
Hi Tomoko,

I understand you first reply and the first hint (one field for each
categoryid).
I thought this was a relatively "common" scenario.

I'm interested in understanding the option you are talking about in the
second reply.

> you can tell "which keepwords set (file) shoud be used" to custom filter
> by
> adding special prefix (or something like) to the target field value.
> but of course it makes indexing/querying process slightly complicated. 

Are you talking about adding a postfix (like _CAT1) at value of the field
I'm going to analyze with keepwords? If the value ends with "_CAT1" ==> use
as keepword file "keepwords1.txt" and so on?

I can't understand how to reach this goal, have you seen some configuration
examples?
I didn't find anything :(

Thanks
Leo




--
View this message in context: 
http://lucene.472066.n3.nabble.com/different-keepword-files-for-differents-field-values-tp4175474p4175528.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrCloud & Paging on large indexes

2014-12-22 Thread Bram Van Dam

Hi folks,

If I understand things correctly, you can use paging & sorting in a 
SolrCloud environment. However, if I request the first 10 documents, a 
distributed query will be launched to all shards requesting the top 10, 
and then (Shards * 10) documents will then be sorted so that only the 
top 10 is returned.


This is fine.

But I'm a little worried when going beyond the first page ... This 
becomes (Page * shards * 10). I'm worried that in a 50 billion document 
setup paging will just explode.


Does anyone have any experience with paging on large cloud setups? 
Positive or negative? Or can anyone offer some reassurances or words of 
caution with this approach?


Or should I tell my users that they can never go beyond Page X (which is 
fine if the alternative is hell fire and brimstone).


Thanks,

 - Bram


Re: set keepword file to be used based on a field value

2014-12-22 Thread Tomoko Uchida
Hi Leo,

Yes, my image is similar to yours.
> If the value ends with "_CAT1" ==> use
> as keepword file "keepwords1.txt" and so on?

But my second option is not about configurations, but "customizing" Solr.

Utilizing customizability of Lucene/Solr, you can write your own
TokenFilter class.
Maybe your requirement is satisfied by subclassing
org.apache.lucene.analysis.util.FilteringTokenFilter.

The custom filter class will take multiple keepword files, and build
multiple word sets (KeepwordFilter have only single word set),
and switch the word sets by field value's prefix (or other information.)
That is just my draft idea, there should be more sophisticated way...

If you are interested in (and familiar with Java programming of course,)
you would want to check out Solr source code from SVN and browse KeepwordFilter
/ KeepwordFilterFactory class for getting implementation image.

Thanks,
Tomoko



2014-12-22 17:10 GMT+09:00 leostro :

> Hi Tomoko,
>
> I understand you first reply and the first hint (one field for each
> categoryid).
> I thought this was a relatively "common" scenario.
>
> I'm interested in understanding the option you are talking about in the
> second reply.
>
> > you can tell "which keepwords set (file) shoud be used" to custom filter
> > by
> > adding special prefix (or something like) to the target field value.
> > but of course it makes indexing/querying process slightly complicated.
>
> Are you talking about adding a postfix (like _CAT1) at value of the field
> I'm going to analyze with keepwords? If the value ends with "_CAT1" ==> use
> as keepword file "keepwords1.txt" and so on?
>
> I can't understand how to reach this goal, have you seen some configuration
> examples?
> I didn't find anything :(
>
> Thanks
> Leo
>
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/different-keepword-files-for-differents-field-values-tp4175474p4175528.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: SolrCloud & Paging on large indexes

2014-12-22 Thread Mikhail Khludnev
Hello Bram,

make sure you checked the doc
https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results

On Mon, Dec 22, 2014 at 12:59 PM, Bram Van Dam  wrote:
>
> Hi folks,
>
> If I understand things correctly, you can use paging & sorting in a
> SolrCloud environment. However, if I request the first 10 documents, a
> distributed query will be launched to all shards requesting the top 10, and
> then (Shards * 10) documents will then be sorted so that only the top 10 is
> returned.
>
> This is fine.
>
> But I'm a little worried when going beyond the first page ... This becomes
> (Page * shards * 10). I'm worried that in a 50 billion document setup
> paging will just explode.
>
> Does anyone have any experience with paging on large cloud setups?
> Positive or negative? Or can anyone offer some reassurances or words of
> caution with this approach?
>
> Or should I tell my users that they can never go beyond Page X (which is
> fine if the alternative is hell fire and brimstone).
>
> Thanks,
>
>  - Bram
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Querying parent with multiple child documents

2014-12-22 Thread Rajesh
Hi,

I've a document which has multiple child documents associated with it and
child documents are from different table(Both contain different fields)
mentioned below. I can query the parent and child document with a OR
condition between these two child records. Is there a way how I can specify
AND condition between the two child records to retrieve the parent?

My sample doc. structure:

 Parent
 order1
   
child
product1
childproduct  
  
  
  child
  product2
  childproduct2
  


OR query between child docs.
fq = {!parent which=\"type:parent\" v=\"productname:childproduct OR
orderDetail:childproduct2\"}
fl = *,[child parentFilter=\"type:parent\"
childFilter=\"productname:childproduct OR orderDetail:childproduct2\"]

How can I get a parent which has both childproduct and childproduct2 but in
different childrens.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Querying-parent-with-multiple-child-documents-tp4175546.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Querying parent with multiple child documents

2014-12-22 Thread Mikhail Khludnev
On Mon, Dec 22, 2014 at 2:16 PM, Rajesh 
wrote:
>
> OR query between child docs.
> fq = {!parent which=\"type:parent\" v=\"productname:childproduct OR
> orderDetail:childproduct2\"}
> fl = *,[child parentFilter=\"type:parent\"
> childFilter=\"productname:childproduct OR orderDetail:childproduct2\"]
>
> How can I get a parent which has both childproduct and childproduct2 but in
> different childrens.
>

I think you search for a parent which has foo and the same parent has bar.
It gives:
fq={!parent which=\"type:parent\"
v=\"productname:childproduct\"}&fq={!parent which=\"type:parent\"
v=\"orderDetail:childproduct2\"}
that's the same as
fq=+{!parent which=\"type:parent\" v=\"productname:childproduct\"}
+{!parent which=\"type:parent\" v=\"orderDetail:childproduct2\"}
caveat: it yields the cross-match, as you asked: "in different child", and
a exact match as well, including those which has "in the same child".

-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: SolrCloud & Paging on large indexes

2014-12-22 Thread heaven
I have a very bad experience with pagination on collections larger than a few
millions of documents. Pagination becomes very and very slow. Just tried to
switch to page 76662 and it took almost 30 seconds.

Solr now supports cursors which work fast and are useful for exports and
some data processing, but I don't see how I can use those to draw page
numbers and allow users to paginate through large data sets.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrCloud-Paging-on-large-indexes-tp4175535p4175550.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Endless 100% CPU usage on searcherExecutor thread

2014-12-22 Thread heaven
It is getting better now with smaller caches like this:
filterCache
class:org.apache.solr.search.FastLRUCache
version:1.0
description:Concurrent LRU Cache(maxSize=4096, initialSize=512,
minSize=3686, acceptableSize=3891, cleanupThread=false, autowarmCount=256,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@4668b788)
src:null
stats:
lookups:34
hits:33
hitratio:0.97
inserts:1
evictions:0
size:282
warmupTime:1879
cumulative_lookups:51190
cumulative_hits:35938
cumulative_hitratio:0.7
cumulative_inserts:15252
cumulative_evictions:0

Is warmupTime in milliseconds or seconds?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Endless-100-CPU-usage-on-searcherExecutor-thread-tp4175088p4175553.html
Sent from the Solr - User mailing list archive at Nabble.com.


IOException occured when talking to solr server

2014-12-22 Thread Aditya
Hello all

I am getting following error. Could anyone throw me some light on it. I am
accessing Solr via Solrj, when there is more load on the server i am
getting this error. Is there any way to overcome this situitation.

org.apache.solr.client.solrj.SolrServerException: IOException occured when
talking to server at: http://localhost/solr
org.apache.solr.client.solrj.SolrServerException: Server refused connection
at: http://localhost/solr

Once this error is encountered, Tomcat is not responding and i need to
restart the server.

Regards
Aditya
www.findbestopensource.com


Re: IOException occured when talking to solr server

2014-12-22 Thread Tomoko Uchida
Hi,

> org.apache.solr.client.solrj.SolrServerException: Server refused
connection at: http://localhost/solr

Clearly it is server side problem, so client SolrJ logs are not helpful.
You should check Tomcat and Solr error logs and look for cause of the load.

Best,
Tomoko

2014-12-22 21:27 GMT+09:00 Aditya :

> Hello all
>
> I am getting following error. Could anyone throw me some light on it. I am
> accessing Solr via Solrj, when there is more load on the server i am
> getting this error. Is there any way to overcome this situitation.
>
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://localhost/solr
> org.apache.solr.client.solrj.SolrServerException: Server refused connection
> at: http://localhost/solr
>
> Once this error is encountered, Tomcat is not responding and i need to
> restart the server.
>
> Regards
> Aditya
> www.findbestopensource.com
>


Re: SolrCloud & Paging on large indexes

2014-12-22 Thread Bram Van Dam

On 12/22/2014 12:47 PM, heaven wrote:

I have a very bad experience with pagination on collections larger than a few
millions of documents. Pagination becomes very and very slow. Just tried to
switch to page 76662 and it took almost 30 seconds.


Yeah that's pretty much my experience, and I think SolrCloud would only 
exacerbate the problem (due to increased complexity of sorting). If 
there's no silver bullet to be found, I guess I'll just have to disable 
paging on large data sets -- which is fine, really, who the hell browses 
through 50 billion documents anyway? That's what search is for, right?


Thx,

 - Bram



Re: IOException occured when talking to solr server

2014-12-22 Thread Alexandre Rafalovitch
Could be the size of pool of listeners if this happens only under
load. That's a pure Tomcat setting, look for it there.

But look for the exception in the logs first, it may give better clues.

Regards,
   Alex.

Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 22 December 2014 at 08:00, Tomoko Uchida
 wrote:
> Hi,
>
>> org.apache.solr.client.solrj.SolrServerException: Server refused
> connection at: http://localhost/solr
>
> Clearly it is server side problem, so client SolrJ logs are not helpful.
> You should check Tomcat and Solr error logs and look for cause of the load.
>
> Best,
> Tomoko
>
> 2014-12-22 21:27 GMT+09:00 Aditya :
>
>> Hello all
>>
>> I am getting following error. Could anyone throw me some light on it. I am
>> accessing Solr via Solrj, when there is more load on the server i am
>> getting this error. Is there any way to overcome this situitation.
>>
>> org.apache.solr.client.solrj.SolrServerException: IOException occured when
>> talking to server at: http://localhost/solr
>> org.apache.solr.client.solrj.SolrServerException: Server refused connection
>> at: http://localhost/solr
>>
>> Once this error is encountered, Tomcat is not responding and i need to
>> restart the server.
>>
>> Regards
>> Aditya
>> www.findbestopensource.com
>>


Solr Search Inconsistent result

2014-12-22 Thread Ankit Jain
Hi All,

We are getting inconsistent search result on searching on *multivalued*
field:

*Input Query:*
( t : [ 0 TO 1419245069253 ] )AND(_all:"impetus-i0111.impetus.co.in")

The "_all" field is multivalued field.

The above query is returning sometimes 11 records and sometimes 12471
records.

Please help.

Thanks,
Ankit


Parallel Indexing

2014-12-22 Thread Peri Subrahmanya
Hi,

We have millions of records in our db that we do a complete re-index of every 
fortnight or so. It takes around 11 hours or so and I was wondering if there 
was a way to fetch the records in batches parallel and issue the solr http 
command with the solr docs in parallel. Please let me know.

Thanks
-Peri.S
http://www.kuali.org/ole 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


Re: Querying parent with multiple child documents

2014-12-22 Thread Rajesh
Thanks for your reply Mikhail. It's working as expected.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Querying-parent-with-multiple-child-documents-tp4175546p4175579.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Parallel Indexing

2014-12-22 Thread Ahmet Arslan
Hi Peri,

You can always send concurrent update requests to solr. 
Usually data acquisition takes more time than indexing time. You can dump your 
db record into several csv files and you can feed them to solr in parallel.

Ahmet 



On Monday, December 22, 2014 4:55 PM, Peri Subrahmanya 
 wrote:
Hi,

We have millions of records in our db that we do a complete re-index of every 
fortnight or so. It takes around 11 hours or so and I was wondering if there 
was a way to fetch the records in batches parallel and issue the solr http 
command with the solr docs in parallel. Please let me know.

Thanks
-Peri.S
http://www.kuali.org/ole 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose. 


Re: Solr Search Inconsistent result

2014-12-22 Thread Ahmet Arslan
Hi,

Is this sharded  query? 

Ahmet


On Monday, December 22, 2014 4:47 PM, Ankit Jain  
wrote:
Hi All,

We are getting inconsistent search result on searching on *multivalued*
field:

*Input Query:*
( t : [ 0 TO 1419245069253 ] )AND(_all:"impetus-i0111.impetus.co.in")

The "_all" field is multivalued field.

The above query is returning sometimes 11 records and sometimes 12471
records.

Please help.

Thanks,
Ankit


Re: Endless 100% CPU usage on searcherExecutor thread

2014-12-22 Thread Erick Erickson
Milliseconds. The thing to track here is your
cumulative_hitratio.

0.7 isn't bad, but it's not great either. I'd be really
curious what kinds of fq clauses you're entering,
anything that mentions NOW is potentially a
waste unless you round with "date math"

FWIW,
Erick

On Mon, Dec 22, 2014 at 3:52 AM, heaven  wrote:
> It is getting better now with smaller caches like this:
> filterCache
> class:org.apache.solr.search.FastLRUCache
> version:1.0
> description:Concurrent LRU Cache(maxSize=4096, initialSize=512,
> minSize=3686, acceptableSize=3891, cleanupThread=false, autowarmCount=256,
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@4668b788)
> src:null
> stats:
> lookups:34
> hits:33
> hitratio:0.97
> inserts:1
> evictions:0
> size:282
> warmupTime:1879
> cumulative_lookups:51190
> cumulative_hits:35938
> cumulative_hitratio:0.7
> cumulative_inserts:15252
> cumulative_evictions:0
>
> Is warmupTime in milliseconds or seconds?
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Endless-100-CPU-usage-on-searcherExecutor-thread-tp4175088p4175553.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Parallel Indexing

2014-12-22 Thread Mikhail Khludnev
What your indexer is build on? Do you use SolrJ, just REST, or
DataImportHandler? What's you DB schema is briefly?
Frankly speaking, there are few approaches to handle indexing concurrently,
details depends on the details mentioned above.

On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya <
peri.subrahma...@htcinc.com> wrote:
>
> Hi,
>
> We have millions of records in our db that we do a complete re-index of
> every fortnight or so. It takes around 11 hours or so and I was wondering
> if there was a way to fetch the records in batches parallel and issue the
> solr http command with the solr docs in parallel. Please let me know.
>
> Thanks
> -Peri.S
> http://www.kuali.org/ole 
>
>
>
> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
> recipient, please delete without copying and kindly advise us by e-mail of
> the mistake in delivery.
> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
> Global Services to any order or other contract unless pursuant to explicit
> written agreement or government initiative expressly permitting the use of
> e-mail for such purpose.
>


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: SolrCloud & Paging on large indexes

2014-12-22 Thread Erick Erickson
Have you read Hossman's blog here?
https://lucidworks.com/blog/coming-soon-to-solr-efficient-cursor-based-iteration-of-large-result-sets/#referrer=solr.pl

And how to use it here?
http://wiki.apache.org/solr/CommonQueryParameters#Deep_paging_with_cursorMark

Because if you're trying this and _still_ getting bad performance we
need to know.

Bram:
One minor pedantic clarification.. The first round-trip only returns
the id and sort criteria (score by default), not the whole document,
although the effect is the same, as you page N into the corpus, the
default implementation returns N * (pageNum + 1) entries. Even worse,
each node itself has to _sort_ that many entries Then a second
call is made to get the page-worth of docs...

About telling your users not to page past N... up to you, especially
if the deep paging stuff works as advertised (and I have no reason to
believe it doesn't).

That said, though, its pretty easy to argue that the 500th page is
pretty useless, nobody will ever hit the "next page" button 499 times.

The different use-case, though, is when people want to return the
entire corpus for whatever reason and _must_ page through to the
end

Best,
Erick

On Mon, Dec 22, 2014 at 5:03 AM, Bram Van Dam  wrote:
> On 12/22/2014 12:47 PM, heaven wrote:
>>
>> I have a very bad experience with pagination on collections larger than a
>> few
>> millions of documents. Pagination becomes very and very slow. Just tried
>> to
>> switch to page 76662 and it took almost 30 seconds.
>
>
> Yeah that's pretty much my experience, and I think SolrCloud would only
> exacerbate the problem (due to increased complexity of sorting). If there's
> no silver bullet to be found, I guess I'll just have to disable paging on
> large data sets -- which is fine, really, who the hell browses through 50
> billion documents anyway? That's what search is for, right?
>
> Thx,
>
>  - Bram
>


Re: Parallel Indexing

2014-12-22 Thread Erick Erickson
Just to pile on

_very_ frequently in my experience the problem
is not Solr at all, but acquiring the data in the
first place, i.e. often executing the DB query.

A very simple test is (in the SolrJ world) just comment
out the server.add(doclist).

Assuming you're using SolrJ, you _are_ indexing in
batches, right? And you are _not_ committing from
the  program, right? And As Hossman often says,
details matter.

Also, take a look at your Solr server CPU utilization. You
can get a crude idea of how much work it's doing,
unless you have it running at 100% your bottleneck is
on the acquisition side.

For a benchmark (admittedly not directly comparable),
I can index 11M Wikipedia docs on my laptop in < 1
hour without tuning anything. They're in XML format
so data acquisition is very fast...

Best,
Erick

On Mon, Dec 22, 2014 at 7:21 AM, Mikhail Khludnev
 wrote:
> What your indexer is build on? Do you use SolrJ, just REST, or
> DataImportHandler? What's you DB schema is briefly?
> Frankly speaking, there are few approaches to handle indexing concurrently,
> details depends on the details mentioned above.
>
> On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya <
> peri.subrahma...@htcinc.com> wrote:
>>
>> Hi,
>>
>> We have millions of records in our db that we do a complete re-index of
>> every fortnight or so. It takes around 11 hours or so and I was wondering
>> if there was a way to fetch the records in batches parallel and issue the
>> solr http command with the solr docs in parallel. Please let me know.
>>
>> Thanks
>> -Peri.S
>> http://www.kuali.org/ole 
>>
>>
>>
>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
>> recipient, please delete without copying and kindly advise us by e-mail of
>> the mistake in delivery.
>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
>> Global Services to any order or other contract unless pursuant to explicit
>> written agreement or government initiative expressly permitting the use of
>> e-mail for such purpose.
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 


Re: Parallel Indexing

2014-12-22 Thread Peri Subrahmanya
Thanks guys for the quick responses. I need to take the suggestions, 
incorporate them, figure out how is that we are doing the fetching etc and 
reply back on this post. The suggestions have been very helpful in taking this 
forward for us here. 

Thanks
-Peri.S

> On Dec 22, 2014, at 10:32 AM, Erick Erickson  wrote:
> 
> Just to pile on
> 
> _very_ frequently in my experience the problem
> is not Solr at all, but acquiring the data in the
> first place, i.e. often executing the DB query.
> 
> A very simple test is (in the SolrJ world) just comment
> out the server.add(doclist).
> 
> Assuming you're using SolrJ, you _are_ indexing in
> batches, right? And you are _not_ committing from
> the  program, right? And As Hossman often says,
> details matter.
> 
> Also, take a look at your Solr server CPU utilization. You
> can get a crude idea of how much work it's doing,
> unless you have it running at 100% your bottleneck is
> on the acquisition side.
> 
> For a benchmark (admittedly not directly comparable),
> I can index 11M Wikipedia docs on my laptop in < 1
> hour without tuning anything. They're in XML format
> so data acquisition is very fast...
> 
> Best,
> Erick
> 
> On Mon, Dec 22, 2014 at 7:21 AM, Mikhail Khludnev
> mailto:mkhlud...@griddynamics.com>> wrote:
>> What your indexer is build on? Do you use SolrJ, just REST, or
>> DataImportHandler? What's you DB schema is briefly?
>> Frankly speaking, there are few approaches to handle indexing concurrently,
>> details depends on the details mentioned above.
>> 
>> On Mon, Dec 22, 2014 at 5:54 PM, Peri Subrahmanya <
>> peri.subrahma...@htcinc.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We have millions of records in our db that we do a complete re-index of
>>> every fortnight or so. It takes around 11 hours or so and I was wondering
>>> if there was a way to fetch the records in batches parallel and issue the
>>> solr http command with the solr docs in parallel. Please let me know.
>>> 
>>> Thanks
>>> -Peri.S
>>> http://www.kuali.org/ole 
>>> 
>>> 
>>> 
>>> *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended
>>> recipient, please delete without copying and kindly advise us by e-mail of
>>> the mistake in delivery.
>>> NOTE: Regardless of content, this e-mail shall not operate to bind HTC
>>> Global Services to any order or other contract unless pursuant to explicit
>>> written agreement or government initiative expressly permitting the use of
>>> e-mail for such purpose.
>>> 
>> 
>> 
>> --
>> Sincerely yours
>> Mikhail Khludnev
>> Principal Engineer,
>> Grid Dynamics
>> 
>> >
>> mailto:mkhlud...@griddynamics.com>>
> 
> --- 
> This message has been scanned for viruses and dangerous content by HTC E-Mail 
> Virus Protection Service. 



*** DISCLAIMER *** This is a PRIVATE message. If you are not the intended 
recipient, please delete without copying and kindly advise us by e-mail of the 
mistake in delivery.
NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global 
Services to any order or other contract unless pursuant to explicit written 
agreement or government initiative expressly permitting the use of e-mail for 
such purpose.


Re: Old facet value doesn't go away after index update

2014-12-22 Thread Tang, Rebecca
Thank you for the explanation!

Rebecca Tang
Applications Developer, UCSF CKM
Industry Documents Digital Libraries
E: rebecca.t...@ucsf.edu





On 12/19/14 12:37 PM, "Shawn Heisey"  wrote:

>On 12/19/2014 11:22 AM, Tang, Rebecca wrote:
>> I have an index that has a field called collection_facet.
>>
>> There was a value 'Ness Motley Law Firm Documents' that we wanted to
>>update to 'Ness Motley Law Firm'.  There were 36,132 records with this
>>value.  So I re-indexed just the 36,132 records.  After the update, I
>>ran a facet query (q=*:*&facet=true&facet.field=collection_facet) to see
>>if the value got updated and I saw
>> Ness Motley Law Firm 36,132  -- as expected
>> Ness Motley Law Firm Documents 0 ‹ Why is this value still here even
>>though clearly there are no records with this value anymore?  I thought
>>maybe it was cached, so I restarted solr, but I still got the same
>>results.
>>
>> "facet_fields": { "collection_facet": [
>> Š "Ness Motley Law Firm", 36132,
>> Š "Ness Motley Law Firm Documents", 0 ]
>
>Updating a document in Solr is actually a delete of the old document
>followed by indexing a new version.
>
>When a document is deleted from an index, Lucene (the search API that
>Solr uses) does not actually remove that document from the index
>segment, it just writes an ID value to a file that tracks deletes.  That
>document is still in the index, and its terms are still present, but the
>software can remove it from any results when it sees that ID value in
>the delete tracking file(s).  Only a segment merge can eliminate the
>document and remove its terms from the inverted index.
>
>When you do a facet on that field, Lucene still sees "Ness Motley Law
>Firm Documents" in the inverted index, because nothing has actually
>removed it. The upper layers of Solr faceting code are aware that all
>the documents containing that term have been deleted, so it gets a
>correct document count of zero.
>
>To eliminate it from the results, you have two choices.  One is to set
>facet.mincount=1 as a parameter on your query, the other is to run an
>optimize (also known as a forceMerge down to one segment) on the index.
>
>Thanks,
>Shawn
>



Re: Solr Search Inconsistent result

2014-12-22 Thread Ankit Jain
Hi Ahmet,

Thanks for the response.
I am running this query from Solr Search UI. The number of shards for a
collection is two.

Thanks,
Ankit

On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan 
wrote:

> Hi,
>
> Is this sharded  query?
>
> Ahmet
>
>
> On Monday, December 22, 2014 4:47 PM, Ankit Jain 
> wrote:
> Hi All,
>
> We are getting inconsistent search result on searching on *multivalued*
> field:
>
> *Input Query:*
> ( t : [ 0 TO 1419245069253 ] )AND(_all:"impetus-i0111.impetus.co.in")
>
> The "_all" field is multivalued field.
>
> The above query is returning sometimes 11 records and sometimes 12471
> records.
>
> Please help.
>
> Thanks,
> Ankit
>



-- 
Thanks,
Ankit Jain


Re: Solr Search Inconsistent result

2014-12-22 Thread Ahmet Arslan
Hi,

Do you happen to have documents with with unique id in different shards?
When unique ids are not unique across shards, people see inconsistent results.
Please see : http://find.searchhub.org/document/2814183511b5a52

Ahmet



On Monday, December 22, 2014 8:06 PM, Ankit Jain  
wrote:
Hi Ahmet,

Thanks for the response.
I am running this query from Solr Search UI. The number of shards for a
collection is two.

Thanks,
Ankit

On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan 
wrote:

> Hi,
>
> Is this sharded  query?
>
> Ahmet
>
>
> On Monday, December 22, 2014 4:47 PM, Ankit Jain 
> wrote:
> Hi All,
>
> We are getting inconsistent search result on searching on *multivalued*
> field:
>
> *Input Query:*
> ( t : [ 0 TO 1419245069253 ] )AND(_all:"impetus-i0111.impetus.co.in")
>
> The "_all" field is multivalued field.
>
> The above query is returning sometimes 11 records and sometimes 12471
> records.
>
> Please help.
>
> Thanks,
> Ankit
>



-- 
Thanks,

Ankit Jain


Re: IOException occured when talking to solr server

2014-12-22 Thread Shawn Heisey
On 12/22/2014 5:27 AM, Aditya wrote:
> I am getting following error. Could anyone throw me some light on it. I am
> accessing Solr via Solrj, when there is more load on the server i am
> getting this error. Is there any way to overcome this situitation.
>
> org.apache.solr.client.solrj.SolrServerException: IOException occured when
> talking to server at: http://localhost/solr
> org.apache.solr.client.solrj.SolrServerException: Server refused connection
> at: http://localhost/solr
>
> Once this error is encountered, Tomcat is not responding and i need to
> restart the server.

One setting in the servlet container that might be responsible here is
maxThreads.  It typically defaults to 200.  It is very easy to exceed
200 threads with Solr, which is why the jetty that's included with Solr
is configured with a maxThreads value of 1.  Tomcat has this same
setting.

Thanks,
Shawn



Pointing solr cloud to multiple index directories.

2014-12-22 Thread Nishanth S
Hey folks,

I have 5 drives in my machine which are mounted to  5 different
locations(/d/1 ,/d/2,/d/3).How can I point solr to write to all these
directories?.


Thanks,
Nishanth


Re: Pointing solr cloud to multiple index directories.

2014-12-22 Thread Erick Erickson
Not at all sure what you're asking

If you're creating cores/replicas, you can specify a dataDir.

But you haven't really told us anything at all about what you're
trying to do here. or _why_ you want to write to them all.

Best
Erick

On Mon, Dec 22, 2014 at 1:10 PM, Nishanth S  wrote:
> Hey folks,
>
> I have 5 drives in my machine which are mounted to  5 different
> locations(/d/1 ,/d/2,/d/3).How can I point solr to write to all these
> directories?.
>
>
> Thanks,
> Nishanth


Solr unit tests intermittently fail with error: java.lang.NoClassDefFoundError: org/eclipse/jetty/util/security/CertificateUtils

2014-12-22 Thread brian4
I'm trying to run a unit test for a custom request handler component with
Solr 4.10.0.

I followed the pattern of existing "unit tests", extending "SolrTestCaseJ4". 
I first ran "ant eclipse" on the 4.10 source, then included all lib files
generated (as well as all the solr and lucene core lib files and the testing
framework lib files) on the build path in Eclipse.

Usually running the tests works fine.  However, seemingly at random, it will
sometimes fail with the following error (even if I make no changes from a
previous successful version):

java.lang.NoClassDefFoundError:
org/eclipse/jetty/util/security/CertificateUtils
at __randomizedtesting.SeedInfo.seed([632CB8A91CFAD9B4]:0)
at 
org.apache.solr.util.SSLTestConfig.buildKeyStore(SSLTestConfig.java:85)
at
org.apache.solr.util.SSLTestConfig.buildSSLContext(SSLTestConfig.java:77)
at
org.apache.solr.util.SSLTestConfig$SSLHttpClientConfigurer.configure(SSLTestConfig.java:99)
at
org.apache.solr.client.solrj.impl.HttpClientUtil.configureClient(HttpClientUtil.java:142)
at
org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:118)
at
org.apache.solr.handler.component.HttpShardHandlerFactory.init(HttpShardHandlerFactory.java:155)
at
org.apache.solr.handler.component.ShardHandlerFactory.newInstance(ShardHandlerFactory.java:49)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:211)
at org.apache.solr.util.TestHarness.(TestHarness.java:137)
at org.apache.solr.util.TestHarness.(TestHarness.java:147)
at org.apache.solr.util.TestHarness.(TestHarness.java:98)
at org.apache.solr.SolrTestCaseJ4.createCore(SolrTestCaseJ4.java:559)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:551)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:371)
at org.apache.solr.SolrTestCaseJ4.initCore(SolrTestCaseJ4.java:378)
at
com.mathworks.solr.related.content.params.parse.ExtraQueryAdderTest.beforeClass(ExtraQueryAdderTest.java:43)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at
com.carrotsearch.randomizedtesting.RandomizedRunner.invoke(RandomizedRunner.java:1618)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$4.evaluate(RandomizedRunner.java:767)
at
com.carrotsearch.randomizedtesting.RandomizedRunner$5.evaluate(RandomizedRunner.java:783)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesRestoreRule$1.evaluate(SystemPropertiesRestoreRule.java:53)
at
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:46)
at
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:42)
at
com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:55)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43)
at
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48)
at
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:65)
at
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55)
at
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:365)
at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException:
org.eclipse.jetty.util.security.CertificateUtils
at java.net.URLClassLoader$1.run(Unknown Source)
at java.net.URLClassLoader$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.la

Re: Pointing solr cloud to multiple index directories.

2014-12-22 Thread Shawn Heisey
On 12/22/2014 2:10 PM, Nishanth S wrote:
> I have 5 drives in my machine which are mounted to  5 different
> locations(/d/1 ,/d/2,/d/3).How can I point solr to write to all these
> directories?.

Erick has asked a relevant question.  I assume that you're trying to
take advantage of the extra I/O bandwidth offered by additional
spindles, but it would be good for you to clarify why you're trying to
do this.

The way that I would do it if it were me is by letting SolrCloud create
the instance directories with a "data" directory in them, then shut down
Solr, move the data directories to alternate locations, and create a
symlink to link each "data" directory to the other location.  You have
indicated paths with forward slashes, so I'm assuming you're on a *NIX
platform that has symlinks.

Something to remember about Solr: If Solr is actually reading off the
disk, performance is going to be far less than optimal, even if
different indexes live on different spindles.  The way to make Solr fast
is to install a LOT of memory, so that the operating system can load the
index into RAM and run without hitting the disk very often at all.  If
you've got an ideal setup, the number and speed of your disks will have
very little impact on performance, except at reboot time.

http://wiki.apache.org/solr/SolrPerformanceProblems#OS_Disk_Cache

Thanks,
Shawn



How to define Json list in schema in xml

2014-12-22 Thread Xin Cai
hi guys
I am looking to parse a json file that contains fields that has a list of
schools

So for example I would have


{"Schools":[
name: "Seirra High School",
name: "Walnut elementary School"]}

So if I want to be able to index all the different schools so i can fast
look up with people that went to a certain school, what is the best way for
me to define the schema file? I have looked around and I don't think Solr
has a native support for list but I can be wrong because list is used so
oftenAny advice would be appreciated. Thanks

Xin Cai


Re: Solr Search Inconsistent result

2014-12-22 Thread Ankit Jain
Hi Ahmet,

Thanks for the response.
Document ID is unique because we are using *UUID* to generate the document
ID.

Thanks,
Ankit

On Tue, Dec 23, 2014 at 12:16 AM, Ahmet Arslan 
wrote:

> Hi,
>
> Do you happen to have documents with with unique id in different shards?
> When unique ids are not unique across shards, people see inconsistent
> results.
> Please see : http://find.searchhub.org/document/2814183511b5a52
>
> Ahmet
>
>
>
> On Monday, December 22, 2014 8:06 PM, Ankit Jain 
> wrote:
> Hi Ahmet,
>
> Thanks for the response.
> I am running this query from Solr Search UI. The number of shards for a
> collection is two.
>
> Thanks,
> Ankit
>
> On Mon, Dec 22, 2014 at 8:34 PM, Ahmet Arslan 
> wrote:
>
> > Hi,
> >
> > Is this sharded  query?
> >
> > Ahmet
> >
> >
> > On Monday, December 22, 2014 4:47 PM, Ankit Jain <
> ankitjainc...@gmail.com>
> > wrote:
> > Hi All,
> >
> > We are getting inconsistent search result on searching on *multivalued*
> > field:
> >
> > *Input Query:*
> > ( t : [ 0 TO 1419245069253 ] )AND(_all:"impetus-i0111.impetus.co.in")
> >
> > The "_all" field is multivalued field.
> >
> > The above query is returning sometimes 11 records and sometimes 12471
> > records.
> >
> > Please help.
> >
> > Thanks,
> > Ankit
> >
>
>
>
> --
> Thanks,
>
> Ankit Jain
>



-- 
Thanks,
Ankit Jain