Re: SOLR ranking

2016-02-15 Thread Emir Arnautovic
Hi Nitin, You can use pf parameter to boost results with exact phrase. You can also use pf2 and pf3 to boost results with bigrams (phrase matches with 2 or 3 words in case input is with more than 3 words) Regards, Emir On 16.02.2016 06:18, Nitin.K wrote: I am using edismax parser with the fo

Re: SOLR ranking

2016-02-15 Thread Binoy Dalal
Firstly to do phrase searching, you need to set omitTermFreqAndPositions=false. You've set this to true. This will require a reindex. Secondly it will be helpful to check the debug Query output and see how the query is parsed and searched. On Tue, 16 Feb 2016, 12:28 Modassar Ather wrote: > First

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Anil
you mean default fl ? On 16 February 2016 at 12:57, Binoy Dalal wrote: > Oh wait. We don't append the fl parameter to the query. > We've configured it in the request handler in solrconfig.xml > Maybe that is something that you can do. > > On Tue, 16 Feb 2016, 12:39 Anil wrote: > > > Thanks for

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Binoy Dalal
Oh wait. We don't append the fl parameter to the query. We've configured it in the request handler in solrconfig.xml Maybe that is something that you can do. On Tue, 16 Feb 2016, 12:39 Anil wrote: > Thanks for your response Binoy. > > Yes.I am looking for any alternative to this. With long numbe

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Anil
Thanks for your response Binoy. Yes.I am looking for any alternative to this. With long number of fileds, url will become long and might lead to "url too long exception" when using http request. On 16 February 2016 at 11:01, Binoy Dalal wrote: > Filling in the fl parameter with all the required

Re: SOLR ranking

2016-02-15 Thread Modassar Ather
First it will search for "Eating Disorders" together and then the individual words "Eating" and "Disorders" I don't think the phrase will be searched as individual ANDed terms until the query has it like below. "Eating Disorders" OR (Eating AND Disorders). Best, Modassar On Tue, Feb 16, 2016 at

Data Import Handler Usage

2016-02-15 Thread vidya
Hi I have gone through documents to define data import handler in solr. But i couldnot implement it. I have created data-config.xml file that specifies moving data from collection1 core to another collection, i donno where i need to specify that second collection. http://localhost:8983/so

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Binoy Dalal
Filling in the fl parameter with all the required fields is what we do at my project as well, and I don't think there is any alternative to this. Maybe somebody else can advise on this? On Tue, 16 Feb 2016, 10:30 Anil wrote: > Any help on this ? Thanks. > > On 15 February 2016 at 19:06, Anil w

Re: Solr Query Explain Plan

2016-02-15 Thread Binoy Dalal
Wild card and fuzzy queries are in general expensive to compute for the simple reason that the number of query combinations that solr has to check against increases. So the lesser amount of combinations solr has to try, the faster it'll be. I believe that this is what you're seeing. Additionally,

Re: SOLR ranking

2016-02-15 Thread Nitin.K
I am using edismax parser with the following query: localhost:8983/solr/tgl/select?q=eating%20disorders&wt=xml&tie=1.0&rows=200&q.op=AND&indent=true&defType=edismax&stopwords=true&lowercaseOperators=true&debugQuery=true&qf=topic_title%5E100+subtopic_title%5E40+index_term%5E20+drug%5E15+content%5E3

Re: Need to move on SOlr cloud (help required)

2016-02-15 Thread Midas A
Susheel, Is there any client available in php for solr cloud which maintain the same ?? On Tue, Feb 16, 2016 at 7:31 AM, Susheel Kumar wrote: > In SolrJ, you would use CloudSolrClient which interacts with Zookeeper > (which maintains Cluster State). See CloudSolrClient API. So that's how > Sol

Re: Solr Query Explain Plan

2016-02-15 Thread Shahzad Masud
Thanks Binoy, these links helps. Explain or debug log really helped me, and after few experimentation and debugging, I conclude that if we move wild card queries (marked with *) to right; it improves performance. I haven't been able to find a reference in documentation, but does this statement hold

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Anil
Any help on this ? Thanks. On 15 February 2016 at 19:06, Anil wrote: > Yes. But i have long list of fields. > > i feel adding all the fileds in fl is not good practice unless one > interested in few fields. In my case, i am interested in all fields except > the one . > > is there any alternative

Re: Why is my index size going up (or: why it was smaller)?

2016-02-15 Thread Shawn Heisey
On 2/15/2016 1:12 PM, Steven White wrote: > I'm fixing code that I noticed to have a defect. My expectation was that > once I make the fix, the index size will be smaller but instead I see it > growing. I'm going to assume that SolrField_ID_LIST and SolrField_ALL_FIELDS_DATA are String instances

Re: SOLR ranking

2016-02-15 Thread Binoy Dalal
You'll have to provide more information. How exactly do you want phrase search to work and how is it not working properly? On Tue, 16 Feb 2016, 00:08 Nitin.K wrote: > Thanks Binoy.. > > I have used the boost parameters and its working as expected. > I also need to give the priority to the phrase

Re: Index writer addIndexes method not working

2016-02-15 Thread Binoy Dalal
I don't know how fitting this solution might be but I'll have a go. You could have a separate core that is an exact replica of your current core. Merge your created index with that core and then after loading it, swap it with your original one. This can be done at runtime. Then merge your original

Re: Adding nodes

2016-02-15 Thread Susheel Kumar
Hi Paul, Thanks for the detail but I am still not able to understand how the CoreAPI would make it easier for you to create replica's. I understand that using Core API, you can add more cores but would that also populate the data so that it can serve queries / act like a replica. Second, As Shaw

Re: Need to move on SOlr cloud (help required)

2016-02-15 Thread Susheel Kumar
In SolrJ, you would use CloudSolrClient which interacts with Zookeeper (which maintains Cluster State). See CloudSolrClient API. So that's how SolrJ would know which node is down or not. Thanks, Susheel On Mon, Feb 15, 2016 at 12:07 AM, Midas A wrote: > Erick, > > We are using php for our app

Re: Near Duplicate Documents, "authorization"? tf/idf implications, spamming the index?

2016-02-15 Thread Jack Krupansky
Sounds a lot like multi-tenancy, where you don't want the document frequencies of one tenant to influence the query relevancy scores for other tenants. No ready solution. Although, I have thought of a simplified document scoring using just tf and leaving out df/idf. Not as good a tf*idf or BM25 s

Near Duplicate Documents, "authorization"? tf/idf implications, spamming the index?

2016-02-15 Thread Chris Morley
Hey Solr people: Suppose that we did not want to break up our document set into separate indexes, but had certain cases where many versions of a document were not relevant for certain searches. I guess this could be thought of as a "authorization" class of problem, however it is not that

Index writer addIndexes method not working

2016-02-15 Thread jeba earnest
My requirement is to add the index folder to the solr data directory. I am generating a lucene index by mapreduce program. And later I would like to merge the index with the solr index without bringing the solr down. I actually tried index merger tool but this tool works when the solr is down. Is

Errors on master after upgrading to 4.10.3

2016-02-15 Thread Joseph Hagerty
After migrating from 3.5 to 4.10.3, I'm seeing the following error with alarming regularity in the master's error log: 2/15/2016, 4:32:22 PM ERROR PDSimpleFont Can't determine the width of the space character using 250 as default I can't seem to glean much information about this one from the web.

Re: Why is my index size going up (or: why it was smaller)?

2016-02-15 Thread Steven White
That's not the case (please read the entire email). I'm starting with a fresh index each time when I run my tests. In fact, I even tested (multiple times) by deleting the entire "data" folder (stop / start Solr). In each case, I get the same exact results. At one point, I started to wander if m

Re: Why is my index size going up (or: why it was smaller)?

2016-02-15 Thread Upayavira
Not got time to read your mail in depth, but I bet it is because you are overwriting docs. When docs are overwritten, they are effectively marked as deleted then re-inserted, thus leaving you with both versions of your doc physically in your index. When you query though, the deleted one is filtered

Re: join and NOT together

2016-02-15 Thread Mikhail Khludnev
Hello Sergio, What debougQuery=true output does look like? On Mon, Feb 15, 2016 at 7:10 PM, marotosg wrote: > Hi, > > I am trying to solve an issue when doing a search joining two collections > and negating the cross core query. > > Let's say I have one collection person and another collection

Why is my index size going up (or: why it was smaller)?

2016-02-15 Thread Steven White
Hi folks, I'm fixing code that I noticed to have a defect. My expectation was that once I make the fix, the index size will be smaller but instead I see it growing. Here is the stripped down version of the code to show the issue: Buggy code #1: for (String field : fieldsList) { doc.add

Re: solr-4.3.1 docValues usage

2016-02-15 Thread Mikhail Khludnev
Hello Neeraj, Check slide 23 and overall http://www.slideshare.net/lucenerevolution/what-is-inaluceneagrandfinal On Mon, Feb 15, 2016 at 4:09 PM, Neeraj Lajpal wrote: > Hi, > I recently asked this question on stackoverflow: > I am trying to access a field in custom request handler. I am access

Re: SOLR ranking

2016-02-15 Thread Nitin.K
Thanks Binoy.. I have used the boost parameters and its working as expected. I also need to give the priority to the phrase search. Kindly suggest on this. I am using edismax parser right now. Using pf, pf2 and pf3 parameters but that too are not working properly. -- View this message in cont

Prevent the SSL Keystore and Truststore password from showing up in the Solr Admin and Linux processes (Solr 5.2.1)

2016-02-15 Thread Katherine Mora
Hello All, I've configured Solr 5.2.1 to enable SSL by following the instructions listed in the Wiki in Enabling SSL. This is working fine. However, if I go to the Solr Admin (Dashboard -> JVM -> Args) or if I list the processes ru

Re: Negating multiple array fileds

2016-02-15 Thread Jack Krupansky
I should also have noted that your full query: (-persons:*)AND(-places:*)AND(-orgs:*) can be written as: -persons:* -places:* -orgs:* Which may work as is, or can also be written as: *:* -persons:* -places:* -orgs:* -- Jack Krupansky On Mon, Feb 15, 2016 at 1:57 AM, Salman Ansari wrote:

join and NOT together

2016-02-15 Thread marotosg
Hi, I am trying to solve an issue when doing a search joining two collections and negating the cross core query. Let's say I have one collection person and another collection documents and I can join them using local param !join because I have PersonIDS in document collection. if my query is li

Re: SOLR ranking

2016-02-15 Thread Binoy Dalal
I'm sorry, missed that part. It's true, you cannot sort on multivalued fields. The workaround will be pretty complex; you'll either have to find the max or min value of the fields at index time and store those in separate fields and use those to sort, or somehow come up with some function that can

Re: SOLR ranking

2016-02-15 Thread Nitin.K
Thanks Binoy.. Actually it is throwing following error: can not sort on multivalued field: index_term -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-ranking-tp4257367p4257378.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Default max number of connections

2016-02-15 Thread Shawn Heisey
On 2/14/2016 9:45 PM, Anil wrote: > I am using solr cloud with zookeeper. is that 20 is the default number of > max connections per host ? This is very vague. What you *might* be talking about is the defaults for the ShardHandler, which is effectively the configuration for the internal HttpClient

Re: SOLR ranking

2016-02-15 Thread Emir Arnautovic
Hi, Not sure how ordering will help (maybe missing question) but what seems to me that would help your case is simple boosting. See https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field Regards, Emir On 15.02.2

Re: solr-4.3.1 docValues usage

2016-02-15 Thread Emir Arnautovic
Sorry - replied to wrong thread :( On 15.02.2016 15:17, Emir Arnautovic wrote: Hi, Not sure how ordering will help (maybe missing question) but what seems to me that would help your case is simple boosting. See https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the

Re: solr-4.3.1 docValues usage

2016-02-15 Thread Emir Arnautovic
Hi, Not sure how ordering will help (maybe missing question) but what seems to me that would help your case is simple boosting. See https://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_make_.22superman.22_in_the_title_field_score_higher_than_in_the_subject_field Regards, Emir On 15.02.201

Re: Default max number of connections

2016-02-15 Thread Anil
Guys, Any help on this ? Thanks. On 15 February 2016 at 10:15, Anil wrote: > HI , > > I am using solr cloud with zookeeper. is that 20 is the default number of > max connections per host ? > > Is there any way to use connection pooling like solr http connection ? > Please clarify. > > Regards,

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Anil
Yes. But i have long list of fields. i feel adding all the fileds in fl is not good practice unless one interested in few fields. In my case, i am interested in all fields except the one . is there any alternative approach ? Thanks in advance. On 15 February 2016 at 17:27, Binoy Dalal wrote:

Re: solr-4.3.1 docValues usage

2016-02-15 Thread Binoy Dalal
DocValues has nothing to do with your handler. It is a field property. To use it simply put docValues=true in your field definitions and reindex. On Mon, 15 Feb 2016, 18:40 Neeraj Lajpal wrote: > Hi, > I recently asked this question on stackoverflow: > I am trying to access a field in custom req

Re: SOLR ranking

2016-02-15 Thread Binoy Dalal
Use the sort parameter with your query and pass the fields in the order in which you want to sort them. So if you want topic > subtopic > index > drug > content all ascending, your sort parameter will look like &sort=topic asc,subtopic asc,index asc,drug asc,content asc On Mon, 15 Feb 2016, 18:17

solr-4.3.1 docValues usage

2016-02-15 Thread Neeraj Lajpal
Hi, I recently asked this question on stackoverflow: I am trying to access a field in custom request handler. I am accessing it like this for each document: Document doc;doc = reader.document(id);DocFields = doc.getValues("state");There are around 600,000 documents in the solr. For a query runnin

SOLR ranking

2016-02-15 Thread Nitin.K
I have five fields in SOLR topic_title subtopic_title index_terms - Multivalued drug - Multivalued content - Now, I want to rank the documents with all these fields; I want all those documents that are haivng the search term in topic_title will come first in the order then documents having search

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Binoy Dalal
If I understand correctly, you have already highlighted the field and only want to return the highlights and not the field itself. Well in that case, simply remove the field name from your fl list. On Mon, 15 Feb 2016, 17:04 Anil wrote: > HOw can highlighted field excluded in the main result ? a

doubt about timeAllowed

2016-02-15 Thread Anatoli Matuskova
Hey there, I have a doubt about using time allowed. Long ago it used to affect just de queryComponent. Now it seems to be affecting all components.So, in the past, if you used queryComponent, facetComponent and highlightComponent, you might got a subset of the results in the queryComponent due to

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Anil
HOw can highlighted field excluded in the main result ? as it is available in the highlight section. In my scenario, One filed (lets say commands) of the each solr document would be around 10 mg. I dont want to fetch that filed in response when its highlight snippets available in the response. Pl

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Evert R.
Hello Mark, Thanks for you reply. All text is indexed (1 pdf file). It works now. Best regard, *--Evert* 2016-02-14 23:47 GMT-02:00 Mark Ehle : > is all the text being indexed? Check to make sure that there's actually the > data you are looking for in the index. Is there a setting in tika th

Re: Highlight brings the content from the first pages of pdf

2016-02-15 Thread Evert R.
Binoy, Thank you very much for you reply and explanation. Best regards, *--Evert* 2016-02-14 23:28 GMT-02:00 Binoy Dalal : > What you've done so far will highlight every instance of "nietava" found in > the field, and return it, i.e., your entire field will return with all the > "nietava"s in

Why QueryWeight with Custom Similarity

2016-02-15 Thread Markus, Sascha
Hi, I created a custom similarity and factory which extends DefaultSimilarity/-Factory to have to achive this I my similarity overwrites idfExplain like this and also the method for an array of terms. public Explanation idfExplain(CollectionStatistics collectionStats, TermStatistics termStats)