Re: How to get all the docs whose field contain a specialized string?

2016-05-06 Thread Ahmet Arslan
Hi,


It looks like brand_s is defined as string, which is not tokenized.
Please do one of the following to retrieve "brand_s":"ibm hp"
 
a) use a tokenized field type
or
b) issue a wildcard query of q=ibm*

Ahmet


On Friday, May 6, 2016 8:35 AM, 梦在远方  wrote:



Hi, all


I do a query by solr admin UI ,but the response is not what i desired!
My operations follows!


first step: get all data.
http://127.0.0.1:8080/solr/example/select?q=*%3A*&wt=json&indent=true


response follows:


"response": { "numFound": 5, "start": 0, "docs": [   { 
"id": "1", "goods_name_s": "cpu1", "brand_s": "amd", 
"_version_": 1533546720443498500   },   { "id": "2", 
"goods_name_s": "cpu2", "brand_s": "ibm",// there is a 'ibm' 
"_version_": 1533546730775117800   },   { "id": "3", 
"goods_name_s": "cpu3", "brand_s": "intel", "_version_": 
1533546741316452400   },   { "id": "4", "goods_name_s": 
"cpu4", "brand_s": "other", "_version_": 1533546750936088600
   },   { "id": "5", "goods_name_s": "cpu5", 
"brand_s": "ibm hp",//there is a 'ibm' "_version_": 1533548604687384600 
  } ]

second step: query the record which 'brand_s' contain 'ibm'.
http://127.0.0.1:8080/solr/example/select?q=brand_s%3Aibm&wt=json&indent=true


"response": { "numFound": 1, "start": 0, "docs": [   { 
"id": "2", "goods_name_s": "cpu2", "brand_s": "ibm", 
"_version_": 1533546730775117800   } ]   }


my question is why there is only one doc found? There are two Docs which 
contains 'ibm' in all docs.


Re: Facet ignoring repeated word

2016-05-06 Thread Ahmet Arslan
Hi Rajesh,

Can you please explain what do you mean by "tag cloud"?
How it is related to a query?
Please explain your requirements.

Ahmet



On Friday, May 6, 2016 8:44 AM, "G,"  wrote:
Hi,

Can you please help? If there is a solution then It will be easy, else I have 
to create a script in python that can process the results from 
TermVectorComponent and group the result by words in different documents to 
find the word count. The Python script will accept the exported Solr result as 
input

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.


-Original Message-
From: G, Rajesh [mailto:r...@cebglobal.com]
Sent: Thursday, May 5, 2016 4:29 PM
To: Ahmet Arslan ; solr-user@lucene.apache.org; 
erickerick...@gmail.com
Subject: RE: Facet ignoring repeated word

Hi,

TermVectorComponent works. I am able to find the repeating words within the 
same document...that facet was not able to. The problem I see is 
TermVectorComponent produces result by a document e.g. and I have to combine 
the counts i.e count of word my is=6 in the list of documents. Can you please 
suggest a solution to group count by word across documents?. Basically we want 
to build word cloud from Solr result


1675


4





1675


2




http://localhost:8182/solr/dev/tvrh?q=*:*&tv=true&tv.fl=comments&tv.tf=true&fl=comments&rows=1000


Hi Erick,
I need the count of repeated words to build word cloud

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Tuesday, May 3, 2016 6:19 AM
To: solr-user@lucene.apache.org; G, Rajesh 
Subject: Re: Facet ignoring repeated word

Hi,

StatsComponent does not respect the query parameter. However you can feed a 
function query (e.g., termfreq) to it.

Instead consider using TermVectors or MLT's interesting terms.


https://cwiki.apache.org/confluence/display/solr/The+Term+Vector+Component
https://cwiki.apache.org/confluence/display/solr/MoreLikeThis

Ahmet


On Monday, May 2, 2016 9:31 AM, "G, Rajesh"  wrote:
Hi Erick/ Ahmet,

Thanks for your suggestion. Can we have a query in TermsComponent like. I need 
the word count of comments for a question id not all. When I include the query 
q=questionid=123 I still see count of all

http://localhost:8182/solr/dev/terms?terms.fl=comments&terms=true&terms.limit=1000&q=questionid=123

StatsComponent is not supporting text fields

Field type 
textcloud_en{class=org.apache.solr.schema.TextField,analyzer=org.apache.solr.analysis.TokenizerChain,args={positionIncrementGap=100,
 class=solr.TextField}} is not currently supported

  

  
  
  


  
  

  

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer 
SHL Talent Measurement products and services. If you have received this e-mail 
in error, please notify the sender and immediately, destroy all copies of this 
email and its attachments. The publication, copying, in whole or in part, or 
use or dissemination in any other way of this e-mail and attachments by anyone 
other than the intended person(s) is prohibited.


-Original Message-
From: Erick Erickson [mailto:e

RE: Facet ignoring repeated word

2016-05-06 Thread G, Rajesh
Hi Ahmet,

Sorry it is Word Cloud  
https://www.google.co.uk/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#newwindow=1&q=word+cloud

We have comments from survey. We want to build word cloud using the filed 
comments

e.g For question 1 the comments are

Comment 1.Projects, technology, features, performance
Comment 2.Too many projects and technology, not enough people to run 
projects

I want to run a query for question 1 that will produce the below result

projects: 3
technology:2
features:1
performance:1
Too:1
Many:1
Enough:1
People:1
Run:1


Facet produces the result but ignores repeated words in a document[projects 
count will be 2 instead of 3].

projects: 2
technology:2
features:1
performance:1
Too:1
Many:1
Enough:1
People:1
Run:1

TeamVectorComponent produces the result as expected but they are not grouped by 
words, instead they are grouped by id.


1


1





2


2




I wanted to know if it is possible to produce a result that is grouped by word 
and also does not ignore repeated words in a document. If it is not possible 
then I have to write some script that will take the above result from solr 
group words and sum the count

Thanks
Rajesh




CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com]
Sent: Friday, May 6, 2016 12:39 PM
To: G, Rajesh ; solr-user@lucene.apache.org
Subject: Re: Facet ignoring repeated word

Hi Rajesh,

Can you please explain what do you mean by "tag cloud"?
How it is related to a query?
Please explain your requirements.

Ahmet



On Friday, May 6, 2016 8:44 AM, "G,"  wrote:
Hi,

Can you please help? If there is a solution then It will be easy, else I have 
to create a script in python that can process the results from 
TermVectorComponent and group the result by words in different documents to 
find the word count. The Python script will accept the exported Solr result as 
input

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally privileged information 
belonging to CEB and/or its subsidiaries, including SHL. If you have received 
this e-mail in error, please notify the sender and immediately, destroy all 
copies of this email and its attachments. The publication, copying, in whole or 
in part, or use or dissemination in any other way of this e-mail and 
attachments by anyone other than the intended person(s) is prohibited.


-Original Message-
From: G, Rajesh [mailto:r...@cebglobal.com]
Sent: Thursday, May 5, 2016 4:29 PM
To: Ahmet Arslan ; solr-user@lucene.apache.org; 
erickerick...@gmail.com
Subject: RE: Facet ignoring repeated word

Hi,

TermVectorComponent works. I am able to find the repeating words within the 
same document...that facet was not able to. The problem I see is 
TermVectorComponent produces result by a document e.g. and I have to combine 
the counts i.e count of word my is=6 in the list of documents. Can you please 
suggest a solution to group count by word across documents?. Basically we want 
to build word cloud from Solr result


1675


4





1675


2




https://urldefense.proofpoint.com/v2/url?u=http-3A__localhost-3A8182_solr_dev_tvrh-3Fq-3D-2A-3A-2A-26tv-3Dtrue-26tv.fl-3Dcomments-26tv.tf-3Dtrue-26fl-3Dcomments-26rows-3D1000&d=CwICaQ&c=zzHkMf6HMoOvCB4yTPe0Gg&r=05YCVYE-IrDXcnbr1V8J9Q&m=lBNd_H5rkg46NYGJF0Kua46oVMy7Dr41Qbbregs1xjQ&s=W1Ti2_egOYFBVpBB11wxKQZqf8RGf5FkM22HrMI6eiY&e=


Hi Erick,
I need the count of repeated words to build word cloud

Thanks
Rajesh



CEB India Private Limited. Registration No: U741040HR2004PTC035324. Registered 
office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, 
Haryana-122002, India.

This e-mail and/or its attachments are intended only for the use of the 
addressee(s) and may contain confidential and legally p

the highlight does not work when query without specified field

2016-05-06 Thread ????????
hi all,


i want query without specified field(like q=java), so i use the 'copyfield' tag 
to copy my custom field to the 'text' field, this works fine, but issue one 
problem: the field of returned doc which contain the query keyword does not 
highlight. I guess this is because keyword was found in 'text' field, but this 
is not my desired, does someone have a good solution? please give me some help!


thanks

max

Re-ranking query: issue with sort criteria and how to disable it

2016-05-06 Thread Andrea Gazzarini

Hi guys,
I have a Solr 4.10.4 instance with a RequestHandler that has a 
re-ranking query configured like this:



dismax
...
{!boost b=someFunction() v=$q}
{!rerank reRankQuery=$rqq reRankDocs=60 
reRankWeight=1.2}

score desc


Everything is working until the client sends a sort params that doesn't 
include the score field. So if for example the request contains 
"sort=price asc" then a NullPointerException is thrown:

/
//09:46:08,548 ERROR [org.apache.solr.core.SolrCore] 
java.lang.NullPointerException//
//[INFO] [talledLocalContainer] at 
org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)//
//[INFO] [talledLocalContainer] at 
org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)//
//[INFO] [talledLocalContainer] at 
org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)//
//[INFO] [talledLocalContainer] at 
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)//
//[INFO] [talledLocalContainer] at 
org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)//
//[INFO] [talledLocalContainer] at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)//
//[INFO] [talledLocalContainer] at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)//
//[INFO] [talledLocalContainer] at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)


/The only way to avoid this exception is to _explicitly_ add th/e "score 
desc" /value to the incoming field (i.e. sort=price asc, score desc). In 
this way I get no exception. I said "explicitly" because adding an 
"appends" section in my handler



score desc


Even I don't know if that could solve my problem, in practice it is 
completely ignoring (i.e. I'm still getting the NPE above).
However, when I explicitly add "sort=price asc, score desc", as 
consequence of the re-ranking, the top 60 results, although I said to 
Solr "order by price", are still shuffled and that's not what I want.


On top of that I have two questions:

 * Any idea about the exception above?
 * How can I disable the re-ranking query in case the order is not by
   score?

About the second question, I'm thinking to the following solutions, but 
I'm not sure if there's a better way to do that.


1. Create another request handler, which is basically a clone of the 
handler above but without the re-ranking stuff

2. Use local params for the reRankDocs...


dismax
...
{!boost b=someFunction() v=$q}
{!rerank reRankQuery=$rqq reRankDocs=*$rrd* 
reRankWeight=1.2}

*60*
score desc


...and have (in case of sorting by something different from the score) 
the client sending an additional params "rdd=0". This is working but I 
still need to explicitly declare "sort=price asc, score desc"


Any thoughts?

Best,
Andrea



Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple

2016-05-06 Thread deniz
I went on digging and debug the code and here is what I got on the point it
breaks:


 

so basically the tuple doesnt have anything for "isMetadata" hence getting
null on that point... is this a bug or there is missing configs on
clientside or classpath? 



-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451p4275053.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Multi-word Synonyms Solr 4.3.1 does not work

2016-05-06 Thread SRINI SOLR
Hi Reth & All -
Thanks for your quick reply.
As I can see the work-around link given by you is dealing with the
different syntax of the synonyms ... like as below ...


big apple
new york city
city of new york
new york new york
new york ny
ny city
ny ny
new york

But I need the synonyms like as below format  - left-hand => right-hand
test1 test, test2 test, test3 test =>movie1 cinema,movie2 cinema,movie3

Can you please help me out and suggest the approach.


On Fri, May 6, 2016 at 12:13 PM, Reth RM  wrote:

> Right, this is a known issue. There is currently an active jira that you
> may like to watch https://issues.apache.org/jira/browse/SOLR-5379
>
> And other possible workaround is explained here :
>
> https://lucidworks.com/blog/2014/07/12/solution-for-multi-term-synonyms-in-lucenesolr-using-the-auto-phrasing-tokenfilter/
>
> On Fri, May 6, 2016 at 11:51 AM, SRINI SOLR  wrote:
>
> > Hi All -
> > Can you please help me out on the multi-word synonyms with Solr 4.3.1.
> >
> > I am using the synonyms as below 
> >
> > test1,test2 => movie1 cinema,movie2 cinema,movie3 cinema
> >
> > I am able to success with the above syntax  like - if I search for
> > words like test1 or test2  then right hand side multi-word values are
> > shown.
> >
> > But -
> >
> > I have a synonyms like below - multi-word on both the side left-hand and
> > right-hand...
> >
> > test1 test, test2 test, test3 test =>movie1 cinema,movie2 cinema,movie3
> > cinema
> >
> > With the above left-hand multi-word format - not working as expected 
> > means 
> >
> > Here below is the configuration I am using on query analyzer ...
> >
> >  > ignoreCase="true"  expand="true"
> > tokenizerFactory="solr.KeywordTokenizerFactory"/>
> >
> >
> > Please Help me 
> >
>


Re: Can Highlighting and MoreLikeThis works together in same requestHandler?

2016-05-06 Thread Zheng Lin Edwin Yeo
Does anyone knows if this configuration for the MoreLikeThisHandler will
actually work for the highlighting? I tried but it cannot work, although
from this conversion in the link below, it says that it can work.
http://grokbase.com/t/lucene/solr-user/094v8xm8h6/term-highlighting-with-morelikethishandler

   

explicit
10
json
true
edismax
id, score


subject^10.0 content^20.0

content
subject content
2
5
3
25
10
false
details
true
id, subject, content, author, tag
true
true
200
  10
  true



Regards,
Edwin

On 5 May 2016 at 14:54, Zheng Lin Edwin Yeo  wrote:

> Hi,
>
> I'm finding out if we could possibly implement highlighting and
> MoreLikeThis (MLT) function in the same requestHandler.
>
> I understand that highlighting uses the normal SearchHandler, while MLT
> uses MoreLikeThisHandler, and reach requestHandler can only have one
> handler. For this case, we have to implement two different requestHandler,
> and when user does a search, we have to send two different queries to Solr.
>
> Is there anyway which we can combine the two together, so that when user
> does a search, the same query can give both the highlight and MLT results?
>
> I'm using Solr 5.4.0.
>
> Regards,
> Edwin
>


RE: Query String Limit

2016-05-06 Thread Prasanna S. Dhakephalkar
Hi,

This got resolved. Needed to do 2 things

1. maxBooleanClauses needed to be set to large value from 1024 in 
solrconfig.xml for all cores.
2. In jetty.xml file solr.jetty.request.header.size needed to be set to higher 
value from 8192

Thanks all for giving pointers to come to a solution.

Regards,

Prasanna.

-Original Message-
From: Susmit Shukla [mailto:shukla.sus...@gmail.com] 
Sent: Thursday, May 5, 2016 11:31 AM
To: solr-user@lucene.apache.org
Subject: Re: Query String Limit

Hi Prasanna,

What is the exact number you set it to?
What error did you get on solr console and in the solr logs?
Did you reload the core/restarted solr after bumping up the solrconfig?

Thanks,
Susmit

On Wed, May 4, 2016 at 9:45 PM, Prasanna S. Dhakephalkar < 
prasann...@merajob.in> wrote:

> Hi
>
> We had increased theto a large number, but it did 
> not work
>
> Here is the query
>
>
> http://localhost:8983/solr/collection1/select?fq=record_id%3A(604929+5
> 04197+
>
> 500759+510957+624719+524081+544530+375687+494822+468221+553049+441998+495212
>
> +462613+623866+344379+462078+501936+189274+609976+587180+620273+479690+60601
>
> 8+487078+496314+497899+374231+486707+516582+74518+479684+1696152+1090711+396
>
> 784+377205+600603+539686+550483+436672+512228+1102968+600604+487699+612271+4
>
> 87978+433952+479846+492699+380838+412290+487086+515836+487957+525335+495426+
>
> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+1
> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+1
> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+0
> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+8
> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+2
> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+9
> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+9
>
> +498818+528683+530270+595087+468595+585998+487888+600612+515884+455568+60643
>
> 8+526281+497992+460147+587530+576456+526021+790508+486148+469160+365923+4846
>
> 54+510829+488792+610933+254610+632700+522376+594418+514817+439283+1676569+52
>
> 4031+431557+521628+609255+627205+1255921+57+477017+519675+548373+350309+
>
> 491176+524276+570935+549458+495765+512814+494722+382249+619036+477309+487718
>
> +470604+514622+1240902+570607+613830+519130+479708+630293+496994+623870+5706
>
> 72+390434+483496+609115+490875+443859+292168+522383+501802+606498+596773+479
>
> 881+486020+488654+490422+512636+495512+489480+626269+614618+498967+476988+47
>
> 7608+486568+270095+295480+478367+607120+583892+593474+494373+368030+484522+5
>
> 01183+432822+448109+553418+584084+614868+486206+481014+495027+501880+479113+
>
> 615208+488161+512278+597663+569409+139097+489490+584000+493619+607479+281080
>
> +518617+518803+487896+719003+584153+484341+505689+278177+539722+548001+62529
>
> 6+1676456+507566+619039+501882+530385+474125+293642+612857+568418+640839+519
>
> 893+524335+612859+618762+479460+479719+593700+573677+525991+610965+462087+52
>
> 1251+501197+443642+1684784+533972+510695+475499+490644+613829+613893+479467+
>
> 542478+1102898+499230+436921+458632+602303+488468+1684407+584373+494603+4992
>
> 45+548019+600436+606997+59+503156+440428+518759+535013+548023+494273+649
>
> 062+528704+469282+582249+511250+496466+497675+505937+489504+600444+614240+19
>
> 35577+464232+522398+613809+1206232+607149+607644+498059+506810+487115+550976
>
> +638174+600849+525655+625011+500082+606336+507156+487887+333601+457209+60111
>
> 0+494927+1712081+601280+486061+501558+600451+263864+527378+571918+472415+608
>
> 130+212386+380460+590400+478850+631886+486782+608013+613824+581767+527023+62
>
> 3207+607013+505819+485418+486786+537626+507047+92+527473+495520+553141+5
>
> 17837+497295+563266+495506+532725+267057+497321+453249+524341+429654+720001+
>
> 539946+490813+479491+479628+479630+1125985+351147+524296+565077+439949+61241
>
> 3+495854+479493+1647796+600259+229346+492571+485638+596394+512112+477237+600
>
> 459+263780+704068+485934+450060+475944+582280+488031+1094010+1687904+539515+
>
> 525820+539516+505985+600461+488991+387733+520928+362967+351847+531586+616101
>
> +479925+494156+511292+515729+601903+282655+491244+610859+486081+325500+43639
>
> 7+600708+523445+480737+486083+614767+486278+1267655+484845+495145+562624+493
>
> 381+8060+638731+501347+565979+325132+501363+268866+614113+479646+1964487+631
>
> 934+25717+461612+376451+513712+527557+459209+610194+1938903+488861+426305+47
>
> 7676+1222682+1246647+567986+501908+791653+325802+498354+435156+484862+533068
>
> +339875+395827+475148+331094+528741+540715+623480+416601+516419+600473+62563
>
> 2+480570+447412+449778+503316+492365+563298+486361+500907+514521+138405+6123
>
> 27+495344+596879+524918+474563+47273+514739+553189+548418+448943+450612+6006
>
> 78+484753+485302+271844+474199+487922+473784+431524+535371+513583+514746+612
>
> 534+327470+485855+517878+384102+485856+612768+494791+504840+601330+493551+55
>
> 8620+540131+479809+394179

Oddity with importing documents...

2016-05-06 Thread Betsey Benagh
Since it appears that using a recent version of Tika with Solr is not really 
feasible, I'm trying to run Grobid on my files, and then import the
corresponding XML into Solr.

I don't see any errors on the post:

bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml
/Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/bin/java
-classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar
-Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool
/Users/bba0124/software/grobid/out/021002_1.tei.xml
SimplePostTool version 5.0.0
Posting files to [base] url http://localhost:8983/solr/lrdtest/update...
Entering auto mode. File endings considered are
xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,r
tf,htm,html,txt,log
POSTing file 021002_1.tei.xml (application/xml) to [base]
1 files indexed.
COMMITting Solr index changes to
http://localhost:8983/solr/lrdtest/update...
Time spent: 0:00:00.027

But the documents don't seem to show up in the index, either.


Additionally, if I try uploading the documents using the web UI, they
appear to upload successfully,

Response:{
  "responseHeader": {
"status": 0,
"QTime": 7
  }
}


But aren't in the index.

What am I missing?



Re: Filter queries & caching

2016-05-06 Thread Shawn Heisey
On 5/5/2016 2:44 PM, Jay Potharaju wrote:
> Are you suggesting rewriting it like this ?
> fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] )
> fq=filter(type:abc)
>
> Is this a better use of the cache as supposed to fq=fromfield:[* TO
> NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] && type:"abc"

I keep writing emails and forgetting to send them.  Supplementing the
excellent information you've already gotten:

Because all three clauses are ANDed together, what I would suggest doing
is three filter queries:

fq=fromfield:[* TO NOW/DAY+1DAY]
fq=tofield:[NOW/DAY-7DAY TO *]
fq=type:abc

Whether or not to split your fq like this will depend on how you use
filters, and how much memory you can let them use.  With three separate
fq parameters, you'll get three cache entries in filterCache from the
one query.  If the next query changes only one of those filters to
something that's not in the cache yet, but leaves the other two alone,
then Solr can get the results from the cache for two of them, and then
will only need to run the query for one of them, saving time and system
resources.

I removed the quotes from "abc" because for that specific example,
quotes are not necessary.  For more complex information than abc, quotes
might be important.  Experiment, and use what gets you the results you want.

Thanks,
Shawn



Re: Filter queries & caching

2016-05-06 Thread Jay Potharaju
Thanks Shawn,Erick & Ahmet , this was very helpful. 

> On May 6, 2016, at 6:19 AM, Shawn Heisey  wrote:
> 
>> On 5/5/2016 2:44 PM, Jay Potharaju wrote:
>> Are you suggesting rewriting it like this ?
>> fq=filter(fromfield:[* TO NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] )
>> fq=filter(type:abc)
>> 
>> Is this a better use of the cache as supposed to fq=fromfield:[* TO
>> NOW/DAY+1DAY]&& tofield:[NOW/DAY-7DAY TO *] && type:"abc"
> 
> I keep writing emails and forgetting to send them.  Supplementing the
> excellent information you've already gotten:
> 
> Because all three clauses are ANDed together, what I would suggest doing
> is three filter queries:
> 
> fq=fromfield:[* TO NOW/DAY+1DAY]
> fq=tofield:[NOW/DAY-7DAY TO *]
> fq=type:abc
> 
> Whether or not to split your fq like this will depend on how you use
> filters, and how much memory you can let them use.  With three separate
> fq parameters, you'll get three cache entries in filterCache from the
> one query.  If the next query changes only one of those filters to
> something that's not in the cache yet, but leaves the other two alone,
> then Solr can get the results from the cache for two of them, and then
> will only need to run the query for one of them, saving time and system
> resources.
> 
> I removed the quotes from "abc" because for that specific example,
> quotes are not necessary.  For more complex information than abc, quotes
> might be important.  Experiment, and use what gets you the results you want.
> 
> Thanks,
> Shawn
> 


Re: Re-ranking query: issue with sort criteria and how to disable it

2016-05-06 Thread Joel Bernstein
I would consider the NPE when sort by score is not included a bug. There is
the work around, that you mentioned, which is to have a compound sort which
includes score.

The second issue though of disabling the ReRanker when someone doesn't
include a sort by score, would be a new feature of the ReRanker. I think
it's a good idea but it's not implemented yet.

I'm not sure if anyone has any ideas about conditionally adding the
ReRanker using configurations?

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 6, 2016 at 4:10 AM, Andrea Gazzarini  wrote:

> Hi guys,
> I have a Solr 4.10.4 instance with a RequestHandler that has a re-ranking
> query configured like this:
>
> 
> dismax
> ...
> {!boost b=someFunction() v=$q}
> {!rerank reRankQuery=$rqq reRankDocs=60
> reRankWeight=1.2}
> score desc
> 
>
> Everything is working until the client sends a sort params that doesn't
> include the score field. So if for example the request contains "sort=price
> asc" then a NullPointerException is thrown:
> /
> //09:46:08,548 ERROR [org.apache.solr.core.SolrCore]
> java.lang.NullPointerException//
> //[INFO] [talledLocalContainer] at
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)//
> //[INFO] [talledLocalContainer] at
> org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)//
> //[INFO] [talledLocalContainer] at
> org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)//
> //[INFO] [talledLocalContainer] at
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)//
> //[INFO] [talledLocalContainer] at
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)//
> //[INFO] [talledLocalContainer] at
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)//
> //[INFO] [talledLocalContainer] at
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)//
> //[INFO] [talledLocalContainer] at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
>
> /The only way to avoid this exception is to _explicitly_ add th/e "score
> desc" /value to the incoming field (i.e. sort=price asc, score desc). In
> this way I get no exception. I said "explicitly" because adding an
> "appends" section in my handler
>
> 
> score desc
> 
>
> Even I don't know if that could solve my problem, in practice it is
> completely ignoring (i.e. I'm still getting the NPE above).
> However, when I explicitly add "sort=price asc, score desc", as
> consequence of the re-ranking, the top 60 results, although I said to Solr
> "order by price", are still shuffled and that's not what I want.
>
> On top of that I have two questions:
>
>  * Any idea about the exception above?
>  * How can I disable the re-ranking query in case the order is not by
>score?
>
> About the second question, I'm thinking to the following solutions, but
> I'm not sure if there's a better way to do that.
>
> 1. Create another request handler, which is basically a clone of the
> handler above but without the re-ranking stuff
> 2. Use local params for the reRankDocs...
>
> 
> dismax
> ...
> {!boost b=someFunction() v=$q}
> {!rerank reRankQuery=$rqq reRankDocs=*$rrd*
> reRankWeight=1.2}
> *60*
> score desc
> 
>
> ...and have (in case of sorting by something different from the score) the
> client sending an additional params "rdd=0". This is working but I still
> need to explicitly declare "sort=price asc, score desc"
>
> Any thoughts?
>
> Best,
> Andrea
>
>


Re: id field always stored?

2016-05-06 Thread Siddhartha Singh Sandhu
Solr 6. Thank you that was what I was looking for.

On Fri, May 6, 2016 at 1:04 AM, Alexandre Rafalovitch 
wrote:

> Solr 6 or Solr 5.5, right?
>
> docValues now return the values, even if stored=false. That's probably
> what you are hitting. Check release notes (under 5.5 I believe) for
> more details.
>
> Regards,
>Alex.
> 
> Newsletter and resources for Solr beginners and intermediates:
> http://www.solr-start.com/
>
>
> On 6 May 2016 at 06:30, Siddhartha Singh Sandhu 
> wrote:
> > Hi,
> >
> > 1. I was doing some exploration and wanted to know if
> > the id field is always stored even when I set
> stored
> > = false.
> >
> >
> > * > multiValued="false" stored="false"/>*
> >
> > 2. Also, even though I removed dynamic fields, anything tagged *_id is
> > getting stored despite marking that field stored = false.
> >
> > * indexed="true"
> > required="true" stored="false"/>*
> >
> > Where string is defined as:
> >
> > * sortMissingLast="true"
> > docValues="true" />*
> >
> > Regards,
> >
> > Sid.
>


Re: Oddity with importing documents...

2016-05-06 Thread Shawn Heisey
On 5/6/2016 6:38 AM, Betsey Benagh wrote:
> Since it appears that using a recent version of Tika with Solr is not really 
> feasible, I'm trying to run Grobid on my files, and then import the
> corresponding XML into Solr.
>
> I don't see any errors on the post:
>
> bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml
> /Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/bin/java
> -classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar
> -Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool
> /Users/bba0124/software/grobid/out/021002_1.tei.xml
> SimplePostTool version 5.0.0
> Posting files to [base] url http://localhost:8983/solr/lrdtest/update...
> Entering auto mode. File endings considered are
> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,r
> tf,htm,html,txt,log
> POSTing file 021002_1.tei.xml (application/xml) to [base]
> 1 files indexed.
> COMMITting Solr index changes to
> http://localhost:8983/solr/lrdtest/update...
> Time spent: 0:00:00.027
>
> But the documents don't seem to show up in the index, either.
>
>
> Additionally, if I try uploading the documents using the web UI, they
> appear to upload successfully,
>
> Response:{
>   "responseHeader": {
> "status": 0,
> "QTime": 7
>   }
> }
>
> But aren't in the index.
>
> What am I missing?

The way that you have used bin/post assumes that the XML is in the Solr
xml update format.  Is your XML file in that format, or is it something
else generated by Tika?  A 'bad' XML file will not necessarily throw an
error, it might simply be ignored because it does not contain any
actions for Solr to process.

https://wiki.apache.org/solr/UpdateXmlMessages

If it's some other kind of XML data generated by Tika, then I am not
sure what you need to do in order to get the information into Solr. 
Perhaps it needs to be sent through the /update/extract handler (instead
of /update), or maybe you will need to use DIH to run it through the
XPathEntityProcessor.

Thanks,
Shawn



Re: BigDecimal Solr Field in schema

2016-05-06 Thread Shawn Heisey
On 5/5/2016 11:22 PM, Roshan Kamble wrote:
> I am using Solr 6.0.0 in cloud mode and have requirement to support all 
> number in BigDecimal
>
> Does anyone know which solr field type should be used for BigDecimal?
>
> I tried using DoubleTrieField but it does not meet the requirement and round 
> up very big number approx. after 16 digit.

Solr has built-in support for the basic Java numeric types -- the ones
that don't need an "import" statement to use, like Integer, Float,
Double, etc.  BigDecimal is in the java.math package and requires an
import to use.

For something that is not covered by the built-in field classes, you
will need a custom solr schema class that knows how to handle your data,
and that class will need to use Lucene classes (perhaps those will need
to be custom too) to read and write the information in the Lucene index.

Thereare plenty of posts/issues about this that went nowhere:

Here's an old stackoverflow post about doing it in Lucene -- but I am
not sure that this work actually went anywhere:

http://stackoverflow.com/questions/2730200/how-to-index-bigdecimal-values-in-lucene-3-0-1

The lucidimagination.com links in that SO post do not work any more.

Another issue:

https://jira.kuali.org/browse/KITSKMS-1058

Yet another email (onthis mailing list):

http://osdir.com/ml/solr-user.lucene.apache.org/2011-09/msg00636.html

Thanks,
Shawn



Re: OOM script executed

2016-05-06 Thread Shawn Heisey
On 5/5/2016 11:42 PM, Bastien Latard - MDPI AG wrote:
> So if I run the two following requests, it will only store once 7.5Mo,
> right?
> - select?q=*:*&fq=bPublic:true&rows=10
> - select?q=field:my_search&fq=bPublic:true&rows=10

That is correct.

Thanks,
Shawn



Re: Filter queries & caching

2016-05-06 Thread Shawn Heisey
On 5/6/2016 7:19 AM, Shawn Heisey wrote:
> With three separate
> fq parameters, you'll get three cache entries in filterCache from the
> one query.

One more tidbit of information related to this:

When you have multiple filters and they aren't cached, I am reasonably
certain that they run in parallel.  Instead of one complex filter, you
would have three simple filters running simultaneously.  For low to
medium query loads on a server with a whole bunch of CPUs, where there
is plenty of spare CPU power, this can be a real gain in performance ...
but if the query load is really high, it might be a bad thing.

Thanks,
Shawn



Re: fq behavior...

2016-05-06 Thread Shawn Heisey
On 5/6/2016 12:07 AM, Bastien Latard - MDPI AG wrote:
> Thank you Susmit, so the answer is:
> fq queries are by default run before the main query.

Queries in fq parameters are normally executed in parallel with the main
query, unless they are a postfilter.  I am not sure that the standard
parser supports being run as a postfilter.  Some parsers (like geofilt)
do support that.

Susmit already gave you this link where some of that is explained:

http://yonik.com/advanced-filter-caching-in-solr/

Thanks,
Shawn



Re: Solr 6 / Solrj RuntimeException: First tuple is not a metadata tuple

2016-05-06 Thread Joel Bernstein
It appears that the /sql handler is not sending the metadata Tuple.
According to the log the parameter includeMetadata=true is being sent. This
should trigger the sending of the metadata Tuple.

Is it possible that you are using a pre 6.0 release version of Solr from
the master branch? The JDBC client appears to be from the 6.0 release but
the server could be an older version.

The reason I ask this, is that older versions of the /sql handler don't
have the metadata Tuple logic. So the query would be processed correctly
but the metadata Tuple wouldn't be there.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 6, 2016 at 4:50 AM, deniz  wrote:

> I went on digging and debug the code and here is what I got on the point it
> breaks:
>
>
> 
>
> so basically the tuple doesnt have anything for "isMetadata" hence getting
> null on that point... is this a bug or there is missing configs on
> clientside or classpath?
>
>
>
> -
> Zeki ama calismiyor... Calissa yapar...
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr-6-Solrj-RuntimeException-First-tuple-is-not-a-metadata-tuple-tp4274451p4275053.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


SolrCloud slower than standalone Solr

2016-05-06 Thread Bismaya Vikash
Hi everyone,We are trying to migrate from a standalone Solr 4.7.0 with 1 collection(no sharing no replication) and 3GB of heap memory 	to aSolrCloud configuration with a 3-Zookeeper ensemble, 2 Solr instances with 1 collection(2 shards per collection and a replication factor of 2).We have the two Solr instances on two different servers A and B, with 1 zookeeper on server A and other 2 on server B.We have assigned 4GB heap memory to each Solr instance.Below are the config files I am using.

solrconfig.xml
Description: XML document
Zookeeper Config:tickTime=2000initLimit=10syncLimit=5We have less than 400,000 documents indexed and we are usinga singleton CloudSolrClient instance from Solrj 5.4.1. to index/search over the collection.However, our search/indexing performance is slower than the standalone version. 	Standalone Solr 	SolrCloudTime taken:		   avg. 1000msvaries from 4000 to 15000 ms.Following are our queries:1. Is SolrCloud slower compared to standalone Solr ?2. How can we improve the search time to less than 1000 ms ?3. Should we use a singleton CloudSolrClient or create a new instance for every new request ?Thanks and Regards,Bismaya

Re: Advice to add additional non-related fields to a collection or create a subset of it?

2016-05-06 Thread Erick Erickson
Denormalizing the data is usually the first thing to try. That's
certainly the preferred option if it doesn't bloat the index
unacceptably.

But my real question is what have you done to try to figure out _why_
it's slow? Do you have some loop
like
for (each found document)
   extract all the supplier IDs and query Solr for them)

? That's a fundamental design decision that will be expensive.

Have you examined the time each query takes to see if Solr is really
the bottleneck or whether it's "something else"? Mind you, I have no
clue what "something else" is here

Do you ever return lots of rows (i.e. thousands)?

Solr serves queries very quickly, so I'd concentrate on identifying what
is slow before jumping to a solution

Best,
Erick

On Wed, May 4, 2016 at 10:28 PM, Derek Poh  wrote:
> Hi
>
> We have a "product" collection and a "supplier" collection.
> The "product" collection contains products information and "supplier"
> collection contains the product's suppliers information.
> We have a subsidiary page that query on "product" collection for the search.
> The display result include product and supplier information.
> This page will query the "product" collection to get the matching product
> records.
> From this query a list of the matching product's supplier id is extracted
> and used in a filter query against the "supplier" collection to get the
> necessary supplier's information.
>
> The loading of this page is very slow, it leads to timeout at times as well.
> Beside looking at tweaking the codes of the page we are also looking at what
> tweaking can be done on solr side. Reducing the number of queries generated
> bythis page was one of the optionto try.
>
> The main "product" collection is also use by our site main search page and
> other subsidiary pages as well. So the query load on it is substantial.
> It has about 6.5 million documents and index size of 38-39 GB.
> It is setup as 1 shard with 5 replicas. Each replica is on it's own server.
> Total of 5 servers.
> There are other smaller collections with similar 1 shard 5 replicas setup
> residing on these servers as well.
>
> I am thinking of either
> 1. Index supplier information into the "product" collection.
> 2. Create another similar "product" collection for this page to use. This
> collection will have lesser product fields and will include the required
> supplier fields. But the number of documents in it will be the same as the
> main "product" collection. The index size will be smallerthough.
>
> With either 2 options we do not need to query "supplier" collection. So
> there is one less query and hopefully it will improve the performance of
> this page.
>
> What is the advise between the 2 options?
> Any other advice or options?
>
> Derek
>
> --
> CONFIDENTIALITY NOTICE
> This e-mail (including any attachments) may contain confidential and/or
> privileged information. If you are not the intended recipient or have
> received this e-mail in error, please inform the sender immediately and
> delete this e-mail (including any attachments) from your computer, and you
> must not use, disclose to anyone else or copy this e-mail (including any
> attachments), whether in whole or in part.
> This e-mail and any reply to it may be monitored for security, legal,
> regulatory compliance and/or other appropriate reasons.


Re: fq behavior...

2016-05-06 Thread Erick Erickson
>From Yonik's blog:
"By default, Solr resolves all of the filters before the main query"

By definition, the non-cached fq clause _must_ be
executed over the entire data set in order to be
cached. Otherwise, how could the next query
that uses an identical fq clause make use of the
cached value?

If cache=false, it's  a different story as per Yonik's
blog.

On Fri, May 6, 2016 at 7:25 AM, Shawn Heisey  wrote:
> On 5/6/2016 12:07 AM, Bastien Latard - MDPI AG wrote:
>> Thank you Susmit, so the answer is:
>> fq queries are by default run before the main query.
>
> Queries in fq parameters are normally executed in parallel with the main
> query, unless they are a postfilter.  I am not sure that the standard
> parser supports being run as a postfilter.  Some parsers (like geofilt)
> do support that.
>
> Susmit already gave you this link where some of that is explained:
>
> http://yonik.com/advanced-filter-caching-in-solr/
>
> Thanks,
> Shawn
>


Re: Oddity with importing documents...

2016-05-06 Thread Erick Erickson
Shawn'e spot on in identifying your problem I think.

Actually, I'm not sure what happens if you just replace the Tika jars
in Solr. I actually doubt it'd work, but it _might_.

Personally I'm not a great fan of using SolrCell in production,
you're putting all the work on the Solr sever that's also indexing
and serving queries. With what's actually not very much effort
you can use Java/SolrJ to parse the docs on as many
clients as you want and send them to Solr, see:
http://searchhub.org/2012/02/14/indexing-with-solrj/

Best,
Erick


On Fri, May 6, 2016 at 7:20 AM, Shawn Heisey  wrote:
> On 5/6/2016 6:38 AM, Betsey Benagh wrote:
>> Since it appears that using a recent version of Tika with Solr is not really 
>> feasible, I'm trying to run Grobid on my files, and then import the
>> corresponding XML into Solr.
>>
>> I don't see any errors on the post:
>>
>> bba0124$ bin/post -c lrdtest ~/software/grobid/out/021002_1.tei.xml
>> /Library/Java/JavaVirtualMachines/jdk1.8.0_71.jdk/Contents/Home/bin/java
>> -classpath /Users/bba0124/software/solr-5.5.0/dist/solr-core-5.5.0.jar
>> -Dauto=yes -Dc=lrdtest -Ddata=files org.apache.solr.util.SimplePostTool
>> /Users/bba0124/software/grobid/out/021002_1.tei.xml
>> SimplePostTool version 5.0.0
>> Posting files to [base] url http://localhost:8983/solr/lrdtest/update...
>> Entering auto mode. File endings considered are
>> xml,json,jsonl,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,r
>> tf,htm,html,txt,log
>> POSTing file 021002_1.tei.xml (application/xml) to [base]
>> 1 files indexed.
>> COMMITting Solr index changes to
>> http://localhost:8983/solr/lrdtest/update...
>> Time spent: 0:00:00.027
>>
>> But the documents don't seem to show up in the index, either.
>>
>>
>> Additionally, if I try uploading the documents using the web UI, they
>> appear to upload successfully,
>>
>> Response:{
>>   "responseHeader": {
>> "status": 0,
>> "QTime": 7
>>   }
>> }
>>
>> But aren't in the index.
>>
>> What am I missing?
>
> The way that you have used bin/post assumes that the XML is in the Solr
> xml update format.  Is your XML file in that format, or is it something
> else generated by Tika?  A 'bad' XML file will not necessarily throw an
> error, it might simply be ignored because it does not contain any
> actions for Solr to process.
>
> https://wiki.apache.org/solr/UpdateXmlMessages
>
> If it's some other kind of XML data generated by Tika, then I am not
> sure what you need to do in order to get the information into Solr.
> Perhaps it needs to be sent through the /update/extract handler (instead
> of /update), or maybe you will need to use DIH to run it through the
> XPathEntityProcessor.
>
> Thanks,
> Shawn
>


Re: query action with wrong result size zero

2016-05-06 Thread Erick Erickson
bq: does this means that different kinds of docs can not be put into
the same solr core

You can certainly put different kinds of docs in the same core,
you just have to search them appropriately, something like
q=field1:value OR field2:value

Say doc1 had "value" in field1 (but did  not have field2)
and doc2 had "value" in field2 (but did not have field1)

Then the above query would return both docs.

However, this may have surprising results since presumably
the different "types" of docs represent very different things.
Let's say you have "people" and "places" docs. Ogden is a
surname, but there is also a city in Utah called "Ogden".
A search like above might return both and if the user expected
to be searching places they'd be surprised to see a person.

So, to sum up there's no restriction on having different types
of docs with different fields in Solr, you just have to search
them appropriately (and so the users get what they expect).

Very often, people will put a "type" field in the doc and restrict
what kinds of docs are returned with an fq clause (fq=type:people
in the above example for instance) when appropriate.

Best,
Erick

On Thu, May 5, 2016 at 10:58 PM, 梦在远方  wrote:
> thank you ,Jay Potharaju
>
>
> I got a discover, in the same one solr core , i put two kinds of docs, which 
> means that they does not have the same fields, does this means that different 
> kinds of docs can not be put into the same solr core?
>
>
> thanks!
> 
> max mi
>
>
>
>
> -- 原始邮件 --
> 发件人: "Erick Erickson";;
> 发送时间: 2016年5月6日(星期五) 中午12:14
> 收件人: "solr-user";
>
> 主题: Re: query action with wrong result size zero
>
>
>
> Please show us:
> 1> a sample doc that you expect to be returned
> 2> the results of adding '&debug=query' to the URL
> 3> the schema definition for the field you're querying against.
>
> It is likely that your query isn't quite what you think it is, is going
> against a different field than you think or your schema isn't
> quite doing what you think...
>
> On Thu, May 5, 2016 at 9:40 AM, Jay Potharaju  wrote:
>> Can you check if the field you are searching on is case sensitive? You can
>> quickly test it by copying the exact contents of the brand field into your
>> query and comparing it against the query you have posted above.
>>
>> On Thu, May 5, 2016 at 8:57 AM, mixiangliu <852262...@qq.com> wrote:
>>
>>>
>>> i found a strange thing  with solr query,when i set the value of query
>>> field like "brand:amd",the  size of query result is zero,but the real data
>>> is not zero,can  some body tell me why,thank you very much!!
>>> my english is not very good,wish some body understand my words!
>>>
>>
>>
>>
>> --
>> Thanks
>> Jay Potharaju


Re: the highlight does not work when query without specified field

2016-05-06 Thread Erick Erickson
You need to give more details. How is your highlighter
defined? Does it reference the "text" field? And is the
"text" field stored (it must be to be highlighted)?

Details really matter for these questions.

Best,
Erick

On Fri, May 6, 2016 at 12:57 AM, 梦在远方  wrote:
> hi all,
>
>
> i want query without specified field(like q=java), so i use the 'copyfield' 
> tag to copy my custom field to the 'text' field, this works fine, but issue 
> one problem: the field of returned doc which contain the query keyword does 
> not highlight. I guess this is because keyword was found in 'text' field, but 
> this is not my desired, does someone have a good solution? please give me 
> some help!
>
>
> thanks
> 
> max


Re: Query String Limit

2016-05-06 Thread Erick Erickson
By the way, this is the use-case for the TermsQueryParser
rather than a standard clause, see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

I _think_ that this doesn't trip the maxBooleanClauses bits...

Best,
Erick

On Fri, May 6, 2016 at 5:01 AM, Prasanna S. Dhakephalkar
 wrote:
> Hi,
>
> This got resolved. Needed to do 2 things
>
> 1. maxBooleanClauses needed to be set to large value from 1024 in 
> solrconfig.xml for all cores.
> 2. In jetty.xml file solr.jetty.request.header.size needed to be set to 
> higher value from 8192
>
> Thanks all for giving pointers to come to a solution.
>
> Regards,
>
> Prasanna.
>
> -Original Message-
> From: Susmit Shukla [mailto:shukla.sus...@gmail.com]
> Sent: Thursday, May 5, 2016 11:31 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Query String Limit
>
> Hi Prasanna,
>
> What is the exact number you set it to?
> What error did you get on solr console and in the solr logs?
> Did you reload the core/restarted solr after bumping up the solrconfig?
>
> Thanks,
> Susmit
>
> On Wed, May 4, 2016 at 9:45 PM, Prasanna S. Dhakephalkar < 
> prasann...@merajob.in> wrote:
>
>> Hi
>>
>> We had increased theto a large number, but it did
>> not work
>>
>> Here is the query
>>
>>
>> http://localhost:8983/solr/collection1/select?fq=record_id%3A(604929+5
>> 04197+
>>
>> 500759+510957+624719+524081+544530+375687+494822+468221+553049+441998+495212
>>
>> +462613+623866+344379+462078+501936+189274+609976+587180+620273+479690+60601
>>
>> 8+487078+496314+497899+374231+486707+516582+74518+479684+1696152+1090711+396
>>
>> 784+377205+600603+539686+550483+436672+512228+1102968+600604+487699+612271+4
>>
>> 87978+433952+479846+492699+380838+412290+487086+515836+487957+525335+495426+
>>
>> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+1
>> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+1
>> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+0
>> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+8
>> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+2
>> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+9
>> 619724+49726+444558+67422+368749+630542+473638+613887+1679503+509367+9
>>
>> +498818+528683+530270+595087+468595+585998+487888+600612+515884+455568+60643
>>
>> 8+526281+497992+460147+587530+576456+526021+790508+486148+469160+365923+4846
>>
>> 54+510829+488792+610933+254610+632700+522376+594418+514817+439283+1676569+52
>>
>> 4031+431557+521628+609255+627205+1255921+57+477017+519675+548373+350309+
>>
>> 491176+524276+570935+549458+495765+512814+494722+382249+619036+477309+487718
>>
>> +470604+514622+1240902+570607+613830+519130+479708+630293+496994+623870+5706
>>
>> 72+390434+483496+609115+490875+443859+292168+522383+501802+606498+596773+479
>>
>> 881+486020+488654+490422+512636+495512+489480+626269+614618+498967+476988+47
>>
>> 7608+486568+270095+295480+478367+607120+583892+593474+494373+368030+484522+5
>>
>> 01183+432822+448109+553418+584084+614868+486206+481014+495027+501880+479113+
>>
>> 615208+488161+512278+597663+569409+139097+489490+584000+493619+607479+281080
>>
>> +518617+518803+487896+719003+584153+484341+505689+278177+539722+548001+62529
>>
>> 6+1676456+507566+619039+501882+530385+474125+293642+612857+568418+640839+519
>>
>> 893+524335+612859+618762+479460+479719+593700+573677+525991+610965+462087+52
>>
>> 1251+501197+443642+1684784+533972+510695+475499+490644+613829+613893+479467+
>>
>> 542478+1102898+499230+436921+458632+602303+488468+1684407+584373+494603+4992
>>
>> 45+548019+600436+606997+59+503156+440428+518759+535013+548023+494273+649
>>
>> 062+528704+469282+582249+511250+496466+497675+505937+489504+600444+614240+19
>>
>> 35577+464232+522398+613809+1206232+607149+607644+498059+506810+487115+550976
>>
>> +638174+600849+525655+625011+500082+606336+507156+487887+333601+457209+60111
>>
>> 0+494927+1712081+601280+486061+501558+600451+263864+527378+571918+472415+608
>>
>> 130+212386+380460+590400+478850+631886+486782+608013+613824+581767+527023+62
>>
>> 3207+607013+505819+485418+486786+537626+507047+92+527473+495520+553141+5
>>
>> 17837+497295+563266+495506+532725+267057+497321+453249+524341+429654+720001+
>>
>> 539946+490813+479491+479628+479630+1125985+351147+524296+565077+439949+61241
>>
>> 3+495854+479493+1647796+600259+229346+492571+485638+596394+512112+477237+600
>>
>> 459+263780+704068+485934+450060+475944+582280+488031+1094010+1687904+539515+
>>
>> 525820+539516+505985+600461+488991+387733+520928+362967+351847+531586+616101
>>
>> +479925+494156+511292+515729+601903+282655+491244+610859+486081+325500+43639
>>
>> 7+600708+523445+480737+486083+614767+486278+1267655+484845+495145+562624+493
>>
>> 381+8060+638731+501347+565979+325132+501363+268866+614113+479646+1964487+631
>>
>> 934+25717+461612+376451+513712+527557+459209+610194+1938903+488861+426305+47
>>
>> 7676+1222682+1246647+567986+501908

Solr 5.4.1 Mergeindexes duplicate rows

2016-05-06 Thread Kalpana
Hello

I am trying to create a new core by merging two indexes. All of them have
the same schema and data on the cores do not have duplicates. As soon as I
do a merge I see lots of duplicates. I used this for merging :
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=Sitecore_SharePoint&srcCore=sitecore_web_index&srcCore=SharePoint_All

One core is from Sitecore web index and the other one is a SQL database
table. Both have unique ids set.

My result set


/sitecore/content/spanish/spanish page 1

sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index




/sitecore/templates/user defined/niddk/page type/health detail
page/__standard values


sitecore://web/{46b6bcd8-8b29-4e61-8207-058e26bf622c}?lang=en&ver=1&ndx=sitecore_web_index



Diabetes
9105

/health-information/informacion-de-la-salud/diabetes/Pages/default.aspx



/sitecore/content/spanish/spanish page 1

sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index




/sitecore/templates/user defined/niddk/page type/health detail
page/__standard values


sitecore://web/{46b6bcd8-8b29-4e61-8207-058e26bf622c}?lang=en&ver=1&ndx=sitecore_web_index



Diabetes
9105

/health-information/informacion-de-la-salud/diabetes/Pages/default.aspx






Any help is greatly appreciated.
Thanks




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-06 Thread Shawn Heisey
On 5/6/2016 9:47 AM, Kalpana wrote:
> I am trying to create a new core by merging two indexes. All of them have
> the same schema and data on the cores do not have duplicates. As soon as I
> do a merge I see lots of duplicates. I used this for merging :
> http://localhost:8983/solr/admin/cores?action=mergeindexes&core=Sitecore_SharePoint&srcCore=sitecore_web_index&srcCore=SharePoint_All

Merging indexes happens 100% at the Lucene level.  Lucene does not have
the concept of a uniqueKey -- this is a concept added and enforced by
Solr.  Merging has zero ability to eliminate duplicates.

If the same uniqueKey value is in both indexes, you will have duplicate
records after merging.

The documentation doesn't go into very much detail on this topic, but it
DOES say that the indexes which you are merging must not include
duplicate documents:

https://cwiki.apache.org/confluence/display/solr/Merging+Indexes

Thanks,
Shawn



Re: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-06 Thread Kalpana
Thank you for your reply, I did see the website (reason to use the merge
indexes). However, individual cores do not have duplicates and the two cores
dont have common records. So I am not sure why there are duplicates. 

One of them is a sitecore core and the other one is a SQL db. They both have
different _uniquekey. So looks like just from the individual cores I see
duplicate rows when I do the merge.

Core 1
<_uniquekey>9105
<_uniquekey>9106
<_uniquekey>9107

Core 2
<_uniquekey>sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index
<_uniquekey>sitecore://web/{46b6bcd8-8b29-4e61-8207-058e26bf622c}?lang=en&ver=1&ndx=sitecore_web_index

Merged core - Core 3
<_uniquekey>9105
<_uniquekey>9105
<_uniquekey>sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index
<_uniquekey>sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index
<_uniquekey>sitecore://web/{46b6bcd8-8b29-4e61-8207-058e26bf622c}?lang=en&ver=1&ndx=sitecore_web_index
<_uniquekey>9106
<_uniquekey>9107
<_uniquekey>9107

So I am not sure what is going on...

Thanks again!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275160.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Suggester no results

2016-05-06 Thread Erick Erickson
First off, kudos for providing the details, that really helps!

The root of your problem is that your suggest field has stored="false".
DocumentDictionaryFactory reads through all the
docs in your corpus, extracts the stored data and puts it in the FST. Since
you don't have any stored data your FST is...er...minimal.

I'd also add
suggester_fuzzy_dir
to the searchComponent. You'll find the FST on disk in that directory where it
can be read next time Solr starts up. It is also helpful for figuring out
whether there are suggestions to be had.

And a minor nit, you probably don't want to specify suggest.dictionary
in your query,
that's already specified in your config.

And it looks like you're alive to the fact that with that setup
capitalization matters
as does the fact that these suggestions be matched from the beginning of the
field...

Best,
Erick

On Thu, May 5, 2016 at 1:05 AM, Grigoris Iliopoulos
 wrote:
> Hi there,
>
> I want to use the Solr suggester component for city names. I have the
> following settings:
> schema.xml
>
> Field definition
>
>  positionIncrementGap="100">
>   
> 
> 
> 
>   
> 
>
> The field i want to apply the suggester on
>
> 
>
> The copy field
>
> 
>
> The field
>
> 
>
> solr-config.xml
>
> 
>   
> true
> 10
> mySuggester
>   
>   
> suggest
>   
> 
>
>
>
> 
>   
> mySuggester
> FuzzyLookupFactory
> DocumentDictionaryFactory
> citySuggest
> string
>   
> 
>
> Then i run
>
> http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath&suggest.build=true
>
> to build the suggest component
>
> Finally i run
>
>
> http://localhost:8983/solr/company/suggest?suggest=true&suggest.dictionary=mySuggester&wt=json&suggest.q=Ath
>
> but i get an empty result set
>
> {"responseHeader":{"status":0,"QTime":0},"suggest":{"mySuggester":{"Ath":{"numFound":0,"suggestions":[]
>
> Are there any obvious mistakes? Any thoughts?


Re: Filter queries & caching

2016-05-06 Thread Jay Potharaju
We have high query load and considering that I think the suggestions made
above will help with performance.
Thanks
Jay

On Fri, May 6, 2016 at 7:26 AM, Shawn Heisey  wrote:

> On 5/6/2016 7:19 AM, Shawn Heisey wrote:
> > With three separate
> > fq parameters, you'll get three cache entries in filterCache from the
> > one query.
>
> One more tidbit of information related to this:
>
> When you have multiple filters and they aren't cached, I am reasonably
> certain that they run in parallel.  Instead of one complex filter, you
> would have three simple filters running simultaneously.  For low to
> medium query loads on a server with a whole bunch of CPUs, where there
> is plenty of spare CPU power, this can be a real gain in performance ...
> but if the query load is really high, it might be a bad thing.
>
> Thanks,
> Shawn
>
>


-- 
Thanks
Jay Potharaju


Re: Passing IDs in query takes more time

2016-05-06 Thread Erick Erickson
Well, you're parsing 80K IDs and forming them into a query. Consider
what has to happen. Even in the very best case of the 
being evaluated first, for every doc that satisfies that clause the inverted
index must be examined 80,000 times to see if that doc matches
one of the IDs in your huge clause for scoring purposes.

You might be better off by moving the 80K list to an fq clause like
fq={!cache=false}docid:(111 222 333).

Additionally, you probably want to use the TermsQueryParser, something like:
fq={!terms f=id cache=false}111,222,333
see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

In any case, though, an 80K clause will slow things down considerably.

Best,
Erick

On Thu, May 5, 2016 at 2:42 AM, Bhaumik Joshi  wrote:
> Hi,
>
>
> I am retrieving ids from collection1 based on some query and passing those 
> ids as a query to collection2 so the query to collection2 which contains ids 
> in it takes much more time compare to normal query.
>
>
> Que. 1 - While passing ids to query why it takes more time compare to normal 
> query however we are narrowing the criteria by passing ids?
>
> e.g.  query-1: doc_id:(111 222 333 444 ...) AND  slower 
> (takes 7-9 sec) than
>
> only  (700-800 ms). Please note that in this case i am 
> passing 80k ids in  and retrieving 250 rows.
>
>
> Que. 2 - Any idea on how i can achieve above (get ids from one collection and 
> pass those ids to other one) in efficient manner or any other way to get data 
> from one collection based on response of other collection?
>
>
> Thanks & Regards,
>
> Bhaumik Joshi


Re: SolrCloud slower than standalone Solr

2016-05-06 Thread Erick Erickson
Without seeing the queries, it's hard to say. There was a faceting
issue with some
Solr versions that you might be hitting. The first thing I'd try is just
straight-up 5.4, non cloud, non-sharede to compare against 4.7 to see whether
it's just a difference between 4.7 and 5.4.anything to do with it,
then tune if so.
_Then_ go to SolrCloud.

Sharding should _only_ be used when there are too many documents to
fit into a single shard and give you good results. Sharding inevitably
adds overhead
to the query time (although nothing like what you're seeing so that's
a different
problem).

You can easily use SolrCloud with a singly-sharded collection with
replication and get
all the goodness of failover, HA/DR etc. after you determine whether
your perf issues
are related to differences (perhaps you've tuned the 4.7 setup differently?).

Best,
Erick

On Fri, May 6, 2016 at 12:44 AM, Bismaya Vikash  wrote:
> Hi everyone,
>
> We are trying to migrate from a
>
> standalone Solr 4.7.0 with 1 collection(no sharing no replication) and 3GB
> of heap memory
> to a
> SolrCloud configuration with a 3-Zookeeper ensemble, 2 Solr instances with 1
> collection(2 shards per collection and a replication factor of 2).
> We have the two Solr instances on two different servers A and B, with 1
> zookeeper on server A and other 2 on server B.
> We have assigned 4GB heap memory to each Solr instance.
>
>
> Below are the config files I am using.
>
>
>
> Zookeeper Config:
> tickTime=2000
> initLimit=10
> syncLimit=5
>
> We have less than 400,000 documents indexed and we are using
> a singleton CloudSolrClient instance from Solrj 5.4.1. to index/search over
> the collection.
>
> However, our search/indexing performance is slower than the standalone
> version.
>
> Standalone Solr  SolrCloud
> Time taken:avg. 1000ms varies from 4000 to 15000 ms.
>
> Following are our queries:
>
> 1. Is SolrCloud slower compared to standalone Solr ?
> 2. How can we improve the search time to less than 1000 ms ?
> 3. Should we use a singleton CloudSolrClient or create a new instance for
> every new request ?
>
> Thanks and Regards,
> Bismaya
>


Re: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-06 Thread Erick Erickson
My _guess_ is that you somehow hit the merge multiple times and,
perhaps, interrupted it thus don't have complete duplicates.

If we're all talking about the same thing, what you're seeing doesn't
make sense. I'm assuming you're totally sure that a query on

_uniqueKey:9105

will return only one doc from Core1 and 0 fro Core2 before the merge?

Best,
Erick

On Fri, May 6, 2016 at 9:33 AM, Kalpana  wrote:
> Thank you for your reply, I did see the website (reason to use the merge
> indexes). However, individual cores do not have duplicates and the two cores
> dont have common records. So I am not sure why there are duplicates.
>
> One of them is a sitecore core and the other one is a SQL db. They both have
> different _uniquekey. So looks like just from the individual cores I see
> duplicate rows when I do the merge.
>
> Core 1
> <_uniquekey>9105
> <_uniquekey>9106
> <_uniquekey>9107
>
> Core 2
> <_uniquekey>sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index
> <_uniquekey>sitecore://web/{46b6bcd8-8b29-4e61-8207-058e26bf622c}?lang=en&ver=1&ndx=sitecore_web_index
>
> Merged core - Core 3
> <_uniquekey>9105
> <_uniquekey>9105
> <_uniquekey>sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index
> <_uniquekey>sitecore://web/{9ee61bd5-6a08-490a-bd9d-50c48a23b518}?lang=en&ver=1&ndx=sitecore_web_index
> <_uniquekey>sitecore://web/{46b6bcd8-8b29-4e61-8207-058e26bf622c}?lang=en&ver=1&ndx=sitecore_web_index
> <_uniquekey>9106
> <_uniquekey>9107
> <_uniquekey>9107
>
> So I am not sure what is going on...
>
> Thanks again!
>
>
>
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275160.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Filtering on nGroups

2016-05-06 Thread Erick Erickson
What version of Solr? Regardless, if you can pre-process
at index time it'll be faster than anything else (probably).

pre-processing isn't very dynamic though so there are lots
of situations where that's just not viable.

Best,
Erick

On Thu, May 5, 2016 at 6:05 PM, Nick Vasilyev  wrote:
> I am grouping documents on a field and would like to retrieve documents
> where the number of items in a group matches a specific value or a range.
>
> I haven't been able to experiment with all new functionality, but I wanted
> to see if this is possible without having to calculate the count and add it
> at index time as a field.
>
> Does anyone have any ideas?
>
> Thanks in advance


Re: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-06 Thread Kalpana
Yes, when I query them separately I do not see duplicates. I am using Solr
5.4.1 I created the core and then browsed to 
http://localhost:8983/solr/admin/cores?action=mergeindexes&core=Sitecore_SharePoint&srcCore=sitecore_web_index&srcCore=SharePoint_All

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275173.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 5.4.1 Mergeindexes duplicate rows

2016-05-06 Thread Kalpana
Querying on _uniqueKey:9105 returns only one doc from Core1 and 0 from Core2
before the merge




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-5-4-1-Mergeindexes-duplicate-rows-tp4275153p4275174.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Re-ranking query: issue with sort criteria and how to disable it

2016-05-06 Thread Andrea Gazzarini
Hi Joel,
many thanks for the response and sorry for this late reply.

About the first question, I can open a JIRA for that. Instead, for
disabling the component I think it would be useful to add

- an automatic behaviour: if the sort criteria excludes the score the
re-ranking could be automatically excluded
- a parameter / flag (something like *rr=true*) which enables / disables
the reranking. In this way such behaviour could be also driven on the
client side

What do you think? I guess this should be another JIRA

Best,
Andrea


On Fri, May 6, 2016 at 3:32 PM, Joel Bernstein  wrote:

> I would consider the NPE when sort by score is not included a bug. There is
> the work around, that you mentioned, which is to have a compound sort which
> includes score.
>
> The second issue though of disabling the ReRanker when someone doesn't
> include a sort by score, would be a new feature of the ReRanker. I think
> it's a good idea but it's not implemented yet.
>
> I'm not sure if anyone has any ideas about conditionally adding the
> ReRanker using configurations?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Fri, May 6, 2016 at 4:10 AM, Andrea Gazzarini  wrote:
>
> > Hi guys,
> > I have a Solr 4.10.4 instance with a RequestHandler that has a re-ranking
> > query configured like this:
> >
> > 
> > dismax
> > ...
> > {!boost b=someFunction() v=$q}
> > {!rerank reRankQuery=$rqq reRankDocs=60
> > reRankWeight=1.2}
> > score desc
> > 
> >
> > Everything is working until the client sends a sort params that doesn't
> > include the score field. So if for example the request contains
> "sort=price
> > asc" then a NullPointerException is thrown:
> > /
> > //09:46:08,548 ERROR [org.apache.solr.core.SolrCore]
> > java.lang.NullPointerException//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)//
> > //[INFO] [talledLocalContainer] at
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> >
> > /The only way to avoid this exception is to _explicitly_ add th/e "score
> > desc" /value to the incoming field (i.e. sort=price asc, score desc). In
> > this way I get no exception. I said "explicitly" because adding an
> > "appends" section in my handler
> >
> > 
> > score desc
> > 
> >
> > Even I don't know if that could solve my problem, in practice it is
> > completely ignoring (i.e. I'm still getting the NPE above).
> > However, when I explicitly add "sort=price asc, score desc", as
> > consequence of the re-ranking, the top 60 results, although I said to
> Solr
> > "order by price", are still shuffled and that's not what I want.
> >
> > On top of that I have two questions:
> >
> >  * Any idea about the exception above?
> >  * How can I disable the re-ranking query in case the order is not by
> >score?
> >
> > About the second question, I'm thinking to the following solutions, but
> > I'm not sure if there's a better way to do that.
> >
> > 1. Create another request handler, which is basically a clone of the
> > handler above but without the re-ranking stuff
> > 2. Use local params for the reRankDocs...
> >
> > 
> > dismax
> > ...
> > {!boost b=someFunction() v=$q}
> > {!rerank reRankQuery=$rqq reRankDocs=*$rrd*
> > reRankWeight=1.2}
> > *60*
> > score desc
> > 
> >
> > ...and have (in case of sorting by something different from the score)
> the
> > client sending an additional params "rdd=0". This is working but I still
> > need to explicitly declare "sort=price asc, score desc"
> >
> > Any thoughts?
> >
> > Best,
> > Andrea
> >
> >
>


Re: Filtering on nGroups

2016-05-06 Thread Nick Vasilyev
I am on 6.1 preview, I just need this to gather some one time metrics so
performance isn't an issue.
On May 6, 2016 1:13 PM, "Erick Erickson"  wrote:

What version of Solr? Regardless, if you can pre-process
at index time it'll be faster than anything else (probably).

pre-processing isn't very dynamic though so there are lots
of situations where that's just not viable.

Best,
Erick

On Thu, May 5, 2016 at 6:05 PM, Nick Vasilyev 
wrote:
> I am grouping documents on a field and would like to retrieve documents
> where the number of items in a group matches a specific value or a range.
>
> I haven't been able to experiment with all new functionality, but I wanted
> to see if this is possible without having to calculate the count and add
it
> at index time as a field.
>
> Does anyone have any ideas?
>
> Thanks in advance


Re: Re-ranking query: issue with sort criteria and how to disable it

2016-05-06 Thread Joel Bernstein
Maybe one ticket would work. Something like: "ReRanker should gracefully
handle sorts without score". Then you can describe the two scenarios. It
might be that these problems are tackled outside of the
ReRankQParserPlugin. Possibly the QueryComponent could add some logic that
would tack on the secondary score sort or remove the reRanker.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 6, 2016 at 1:39 PM, Andrea Gazzarini  wrote:

> Hi Joel,
> many thanks for the response and sorry for this late reply.
>
> About the first question, I can open a JIRA for that. Instead, for
> disabling the component I think it would be useful to add
>
> - an automatic behaviour: if the sort criteria excludes the score the
> re-ranking could be automatically excluded
> - a parameter / flag (something like *rr=true*) which enables / disables
> the reranking. In this way such behaviour could be also driven on the
> client side
>
> What do you think? I guess this should be another JIRA
>
> Best,
> Andrea
>
>
> On Fri, May 6, 2016 at 3:32 PM, Joel Bernstein  wrote:
>
> > I would consider the NPE when sort by score is not included a bug. There
> is
> > the work around, that you mentioned, which is to have a compound sort
> which
> > includes score.
> >
> > The second issue though of disabling the ReRanker when someone doesn't
> > include a sort by score, would be a new feature of the ReRanker. I think
> > it's a good idea but it's not implemented yet.
> >
> > I'm not sure if anyone has any ideas about conditionally adding the
> > ReRanker using configurations?
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Fri, May 6, 2016 at 4:10 AM, Andrea Gazzarini 
> wrote:
> >
> > > Hi guys,
> > > I have a Solr 4.10.4 instance with a RequestHandler that has a
> re-ranking
> > > query configured like this:
> > >
> > > 
> > > dismax
> > > ...
> > > {!boost b=someFunction() v=$q}
> > > {!rerank reRankQuery=$rqq reRankDocs=60
> > > reRankWeight=1.2}
> > > score desc
> > > 
> > >
> > > Everything is working until the client sends a sort params that doesn't
> > > include the score field. So if for example the request contains
> > "sort=price
> > > asc" then a NullPointerException is thrown:
> > > /
> > > //09:46:08,548 ERROR [org.apache.solr.core.SolrCore]
> > > java.lang.NullPointerException//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291)//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.solr.search.ReRankQParserPlugin$ReRankCollector.collect(ReRankQParserPlugin.java:263)//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.sortDocSet(SolrIndexSearcher.java:1999)//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1423)//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514)//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:484)//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218)//
> > > //[INFO] [talledLocalContainer] at
> > >
> >
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
> > >
> > > /The only way to avoid this exception is to _explicitly_ add th/e
> "score
> > > desc" /value to the incoming field (i.e. sort=price asc, score desc).
> In
> > > this way I get no exception. I said "explicitly" because adding an
> > > "appends" section in my handler
> > >
> > > 
> > > score desc
> > > 
> > >
> > > Even I don't know if that could solve my problem, in practice it is
> > > completely ignoring (i.e. I'm still getting the NPE above).
> > > However, when I explicitly add "sort=price asc, score desc", as
> > > consequence of the re-ranking, the top 60 results, although I said to
> > Solr
> > > "order by price", are still shuffled and that's not what I want.
> > >
> > > On top of that I have two questions:
> > >
> > >  * Any idea about the exception above?
> > >  * How can I disable the re-ranking query in case the order is not by
> > >score?
> > >
> > > About the second question, I'm thinking to the following solutions, but
> > > I'm not sure if there's a better way to do that.
> > >
> > > 1. Create another request handler, which is basically a clone of the
> > > handler above but without the re-ranking stuff
> > > 2. Use local params for the reRankDocs...
> > >
> > > 
> > > dismax
> > > ...
> > > {!boost b=someFunction() v=$q}
> > > {!rerank reRankQuery=$rqq reRankDocs=*$rrd*
> > > reRankWeight=1.2}
> > > *60*
> > > score

Re: Filtering on nGroups

2016-05-06 Thread Nick Vasilyev
I guess it would also work if I could facet on the group counts. I just
need to know how many groups of different sizes there are.

On Fri, May 6, 2016 at 2:10 PM, Nick Vasilyev 
wrote:

> I am on 6.1 preview, I just need this to gather some one time metrics so
> performance isn't an issue.
> On May 6, 2016 1:13 PM, "Erick Erickson"  wrote:
>
> What version of Solr? Regardless, if you can pre-process
> at index time it'll be faster than anything else (probably).
>
> pre-processing isn't very dynamic though so there are lots
> of situations where that's just not viable.
>
> Best,
> Erick
>
> On Thu, May 5, 2016 at 6:05 PM, Nick Vasilyev 
> wrote:
> > I am grouping documents on a field and would like to retrieve documents
> > where the number of items in a group matches a specific value or a range.
> >
> > I haven't been able to experiment with all new functionality, but I
> wanted
> > to see if this is possible without having to calculate the count and add
> it
> > at index time as a field.
> >
> > Does anyone have any ideas?
> >
> > Thanks in advance
>
>


Re: Filtering on nGroups

2016-05-06 Thread Joel Bernstein
You may want to check this out:
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=62693238

It does aggregations that might work for you.

Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, May 6, 2016 at 2:31 PM, Nick Vasilyev 
wrote:

> I guess it would also work if I could facet on the group counts. I just
> need to know how many groups of different sizes there are.
>
> On Fri, May 6, 2016 at 2:10 PM, Nick Vasilyev 
> wrote:
>
> > I am on 6.1 preview, I just need this to gather some one time metrics so
> > performance isn't an issue.
> > On May 6, 2016 1:13 PM, "Erick Erickson" 
> wrote:
> >
> > What version of Solr? Regardless, if you can pre-process
> > at index time it'll be faster than anything else (probably).
> >
> > pre-processing isn't very dynamic though so there are lots
> > of situations where that's just not viable.
> >
> > Best,
> > Erick
> >
> > On Thu, May 5, 2016 at 6:05 PM, Nick Vasilyev 
> > wrote:
> > > I am grouping documents on a field and would like to retrieve documents
> > > where the number of items in a group matches a specific value or a
> range.
> > >
> > > I haven't been able to experiment with all new functionality, but I
> > wanted
> > > to see if this is possible without having to calculate the count and
> add
> > it
> > > at index time as a field.
> > >
> > > Does anyone have any ideas?
> > >
> > > Thanks in advance
> >
> >
>


Re: relaxed vs. improved validation in solr.TrieDateField

2016-05-06 Thread David Smiley
Sorry to hear that Uwe Reh.

If this is just in your input/index data, then this could be handled with
an URP, maybe evan an existing URP.
See ParseDateFieldUpdateProcessorFactory which uses the Joda-time API.  I
am not sure if that will work, I'm a little doubtful in fact since Solr now
uses the Java 8 time API which was taken, more or less, from Joda-time.
But it's worth a shot, any way.  If it doesn't work, let me know and I'll
give you a snippet of JavaScript you can use in your URP chain.

~ David

On Fri, Apr 29, 2016 at 4:07 AM Uwe Reh  wrote:

> Hi,
>
> doing some migration tests (4.10 to 6.0) I recognized a improved
> validation of TrieDateField.
> Syntactical correct but impossible days are rejected now. (stack trace
> at the end of the mail)
>
> Examples:
> - '1997-02-29T00:00:00Z'
> - '2006-06-31T00:00:00Z'
> - '2000-00-00T00:00:00Z'
> The first two dates are formal ok, but the Date does not exist. The
> third date is more suspicions, but was also accepted by Solr 4.10.
>
> I appreciate this improvement in principle, but I have to respect the
> original data. The dates might be intentionally wrong.
>
> Is there an easy way to get the weaker validation back?
>
> Regards
> Uwe
>
>
> > Invalid Date in Date Math String:'1997-02-29T00:00:00Z'
> > at
> org.apache.solr.util.DateMathParser.parseMath(DateMathParser.java:254)
> > at
> org.apache.solr.schema.TrieField.createField(TrieField.java:726)
> > at
> org.apache.solr.schema.TrieField.createFields(TrieField.java:763)
> > at
> org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:47)
>
> --
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com


Re: Mockito issues with private SolrTestCaseJ4.beforeClass

2016-05-06 Thread Georg Sorst
Anyway, this is now SOLR-9081.

Best,
Georg

Georg Sorst  schrieb am So., 24. Apr. 2016 um
17:34 Uhr:

> Hi list,
>
> I just ran into some issues with Mockito and SolrTestCaseJ4. It looks like
> this:
>
> * Mockito requires all @BeforeClass methods in the class hierarchy to be
> "public static void"
> * SolrTestCaseJ4.beforeClass (which is @BeforeClass) is "private static
> void"
> * So I cannot use Mockito as a test runner when my tests are derived from
> SolrTestCaseJ4
>
> Is there a specific reason why it is private? Am I missing something? I'll
> gladly open a JIRA issue if someone can confirm that there is no good
> reason for it.
>
> Best,
> Georg
> --
> *Georg M. Sorst I CTO*
> FINDOLOGIC GmbH
>
>
>
> Jakob-Haringer-Str. 5a | 5020 Salzburg I T.: +43 662 456708
> E.: g.so...@findologic.com
> www.findologic.com Folgen Sie uns auf: XING
> facebook
>  Twitter
> 
>
> Wir sehen uns auf dem *Shopware Community Day in Ahaus am 20.05.2016!*
> Hier  Termin
> vereinbaren!
> Wir sehen uns auf der* dmexco in Köln am 14.09. und 15.09.2016!* Hier
>  Termin
> vereinbaren!
>
-- 
*Georg M. Sorst I CTO*
FINDOLOGIC GmbH



Jakob-Haringer-Str. 5a | 5020 Salzburg I T.: +43 662 456708
E.: g.so...@findologic.com
www.findologic.com Folgen Sie uns auf: XING
facebook
 Twitter


Wir sehen uns auf dem *Shopware Community Day in Ahaus am 20.05.2016!* Hier
 Termin
vereinbaren!
Wir sehen uns auf der* dmexco in Köln am 14.09. und 15.09.2016!* Hier
 Termin
vereinbaren!