Re: Highlight brings the content from the first pages of pdf

2016-02-16 Thread Binoy Dalal
Yeah. Under an entry like so: fields On Tue, 16 Feb 2016, 13:00 Anil wrote: > you mean default fl ? > > On 16 February 2016 at 12:57, Binoy Dalal wrote: > > > Oh wait. We don't append the fl parameter to the query. > > We've configured it in the request handler in solrconfig.xml > > Maybe that

Re: Using Solr's spatial functionality for astronomical catalog

2016-02-16 Thread Colin Freas
David, thanks for getting back to me. SpatialRecursivePrefixTreeFieldType seems to be what I need, and the default search seems appropriate. This is for entries in an astronomical catalog, so great circle distances on a perfect sphere is what I¹m after. I am having a bit of difficulty though.

Re: Using Solr's spatial functionality for astronomical catalog

2016-02-16 Thread Colin Freas
Looks like the only issue was that I did not have an alias for SourceRpt field in the SQL. With that in place, everything seems to work more or less as expected. SourceRpt shows up where it should. Queries like http://localhost:8983/solr/spatial/select?q=*:*&fq={!geofilt%20sfield=Sour c

Re: Data Import Handler Usage

2016-02-16 Thread Erik Hatcher
The "other" collection (destination of the import) is the collection where that data import handler definition resides. Erik > On Feb 16, 2016, at 01:54, vidya wrote: > > Hi > > I have gone through documents to define data import handler in solr. But i > couldnot implement it. > I have cr

Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
Binoy, the omitTermFreqAndPositions is set only for text_ws which is used only on the "indexed_terms" field. The text_general fields seem fine to me. Are you omitting norms on purpose ? To be fair it could be relevant in title or short topic searches to boost up short field values, containing a lo

Re: SOLR ranking

2016-02-16 Thread Binoy Dalal
@Nitin Why are you phrase boosting on string fields? More often than not, it won't do anything because the phrases simply won't match the entire string. On Tue, 16 Feb 2016, 15:36 Alessandro Benedetti wrote: > Binoy, the omitTermFreqAndPositions is set only for text_ws which is used > only on th

Re: join and NOT together

2016-02-16 Thread Sergio García Maroto
My debugQuery=true returns related to the NOT: 0.06755901 = (MATCH) sum of: 0.06755901 = (MATCH) MatchAllDocsQuery, product of: 0.06755901 = queryNorm I tried changing v='(*:* -DocType:pdf)' to v='(-DocType:pdf)' and it worked. Anyone could explain the difference? Thanks Sergo On 15 February

Re: SOLR ranking

2016-02-16 Thread Nitin.K
You are absolutely right Binoy..!! But my problem is; We don't want the term frequency to take into account for index term as well as drug. (i.e. Don't want to consider the no. of occurrences of search term for both of these fields.) Is it possible that i can omit the term frequency for these two

Re: SOLR ranking

2016-02-16 Thread Nitin.K
Hi Emir, I tried using the boost parameters for phrase search by removing the omitTermFreqAndPositions from the multivalued field type but somehow while searching phrases; the documents that have exact match are not coming up in the order. Instead; in the content field, it is considering the mutua

Re: SOLR ranking

2016-02-16 Thread Binoy Dalal
Based on a quick look at the documentation, I think that you should use termPositions=true to achieve what you want. On Tue, 16 Feb 2016, 16:08 Nitin.K wrote: > Hi Emir, > > I tried using the boost parameters for phrase search by removing the > omitTermFreqAndPositions from the multivalued field

Re: Need to move on SOlr cloud (help required)

2016-02-16 Thread Paul Borgermans
On 16 February 2016 at 06:09, Midas A wrote: > Susheel, > > Is there any client available in php for solr cloud which maintain the same > ?? > > No there is none. I recommend HAProxy for Non SolrJ clients and loadbalancing SolrCloud. HAProxy makes it also easy to do rolling updates of your SolrCl

Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
Nithin, have you read my reply ? kindly let me know, how can i first search the phrase and then go to the > individual words (i.e word-1 AND word-2) > On 16 February 2016 at 10:45, Binoy Dalal wrote: > Based on a quick look at the documentation, I think that you should use > termPositions=true

Re: SOLR ranking

2016-02-16 Thread Emir Arnautovic
Hi Nitin, Not sure if you changed what fields you use for phrase boost, but in example you sent, all fields except content are "string" fields and content is boosted with 6 while topic_title in qf is boosted with 100. Try setting same field you use in qf in pf2 and you should see the differenc

Re: SOLR ranking

2016-02-16 Thread Modassar Ather
Actually you can get it with the edismax. Just set mm to 100% and then configure a pf field ( or more) . You are going to search all the search terms mandatory and boost phrases match . @Alessandro Thanks for your insight. I thought that the document will be boosted if all of the terms appear in c

Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
If I remember well , it is going to be as a phrase query ( when you use the "quotes") . So the close proximity means a match of the phrase with 0 tolerance ( so the terms must respect the position distance in the query). If I remember well I debugged that recently. Cheers On 16 February 2016 at 1

Re: Data Import Handler Usage

2016-02-16 Thread vidya
Hi Dataimport section in web ui page still shows me that no data import handler is defined. And no data is being added to my new collection. -- View this message in context: http://lucene.472066.n3.nabble.com/Data-Import-Handler-Usage-tp4257518p4257576.html Sent from the Solr - User mailing li

Re: SOLR ranking

2016-02-16 Thread Modassar Ather
In that case will a phrase with a given slop match a document having the terms of the given phrase with more than the given slop in between them when pf field and mm=100%? Per my understanding as a phrase it will not match for sure. Best, Modassar On Tue, Feb 16, 2016 at 5:26 PM, Alessandro Bene

Re: SOLR ranking

2016-02-16 Thread Binoy Dalal
By my understanding, it will depend on whether you're explicitly running the phrase query or whether you're also searching for the terms individually. In the first case, it will not match. In the second case, it will match just as long as your field contains all the terms. On Tue, 16 Feb 2016, 17:

Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
You can describe the pf field as an exact phrase query : ""~0 . But You can specify the slop with : The ps Parameter Default amount of slop on phrase queries built with pf, pf2 and/or pf3 fields (affects boosting). Just take a look to the edismax page in the wiki, it seems well described : http

Re: SOLR ranking

2016-02-16 Thread Alessandro Benedetti
Sorry for the misleading mail, actually if you play with the slop factor, that is going to be easy. A proximity search can be done with a sloppy phrase query. The closer > together the two terms appear in the document, the higher the score will > be. A sloppy phrase query specifies a maximum "slop

Re: doubt about timeAllowed

2016-02-16 Thread Anatoli Matuskova
Is there any way to tell timeAllow to just affect query component and not the others? -- View this message in context: http://lucene.472066.n3.nabble.com/doubt-about-timeAllowed-tp4257363p4257622.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: join and NOT together

2016-02-16 Thread marotosg
Actually I was wrong this doesn't work. (-DocType:pdf) -- View this message in context: http://lucene.472066.n3.nabble.com/join-and-NOT-together-tp4257411p4257620.html Sent from the Solr - User mailing list archive at Nabble.com.

Delay in replication between cloud servers

2016-02-16 Thread Cool Techi
We are using solr cloud with 1 shard and replication factor as 3. We are noticing that the time for data to become available across all replicas from the leader is very high. The data rate is not very high, is there anyway to control this. In master-slave setup with give a replication time. Rega

Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Victor D'agostino
Hi I am building a Solr 5 architecture with 3 Solr nodes and 1 zookeeper. The database backend is postgresql 9 on RHEL 6. I am looking for a free open-source crawler which use SolrJ. What do you guys recommend ? Best regards Victor d'Agostino  Ce message et les éventuels do

Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Emir Arnautovic
Hi, It is most common to use Nutch as crawler, but it seems that it still does not have support for SolrCloud (if I am reading this ticket correctly https://issues.apache.org/jira/browse/NUTCH-1662). Anyway, I would recommend Nutch with standard http client. Regards, Emir On 16.02.2016 16:02

RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Markus Jelsma
Nutch has Solr 5 cloud support in trunk, i committed it earlier this month. https://issues.apache.org/jira/browse/NUTCH-2197 Markus -Original message- > From:Emir Arnautovic > Sent: Tuesday 16th February 2016 16:26 > To: solr-user@lucene.apache.org > Subject: Re: Which open-source craw

Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Victor D'agostino
Hi, Thanks for your help. Nutch is exactly what i'm looking for and i'm feeling lucky the solr cloud support has just been comited ! I'll try the trunk version and wait until the 1.12 version is released. Regards Victor Nutch has Solr 5 cloud support in trunk, i committed it earlier this mo

Re: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Emir Arnautovic
Markus, Ticket I run into is for Nutch2 and NUTCH-2197 is for Nutch1. Haven't been using Nutch for a while so cannot recommend version. Thanks, Emir On 16.02.2016 16:37, Markus Jelsma wrote: Nutch has Solr 5 cloud support in trunk, i committed it earlier this month. https://issues.apache.org/j

RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Davis, Daniel (NIH/NLM) [C]
I'm far, far from an expert on this sort of thing, but my personal experience 1-year ago was that Nutch-1 was easier to use, and the blog post I link below suggests that the abstraction layer in Nutch-2 really costs some time.I expect that Nutch-2 has matured some since then, but going with

Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Steven White
I found the issue: as soon as I restart Solr, the index size goes down. My index and data size must have been at a border line where some segments are not released on my last document commit. Steve On Mon, Feb 15, 2016 at 11:09 PM, Shawn Heisey wrote: > On 2/15/2016 1:12 PM, Steven White wrote

RE: Delay in replication between cloud servers

2016-02-16 Thread Cool Techi
Further we have noticed that the delay increase a couple of hours after restart. Details related to sorlconfig.xml are given below, 15000 25000 false 1000 Regards,Rohit > From: cooltec...@outlook.com > To: solr-user@lucene.apache.org > Subject: Delay in replic

Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Shawn Heisey
On 2/16/2016 9:37 AM, Steven White wrote: > I found the issue: as soon as I restart Solr, the index size goes down. > > My index and data size must have been at a border line where some segments > are not released on my last document commit. I think the only likely thing that could cause this beha

Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Steven White
Here is how I was testing: stop Solr, delete the "data" folder, start Solr, start indexing, and finally check index size. I used the same pattern for the before and after my (see my original email) and each time I run this test, the index size ended up being larger; restarting Solr did the trick.

Re: Why is my index size going up (or: why it was smaller)?

2016-02-16 Thread Chris Hostetter
: I'm testing this on Windows, so that maybe a factor too (the OS is not : releasing file handles?!) specifically: Windows won't let Solr delete files on disk that have open file handles... https://wiki.apache.org/solr/FAQ#Why_doesn.27t_my_index_directory_get_smaller_.28immediately.29_when_i_de

Re: Errors on master after upgrading to 4.10.3

2016-02-16 Thread Joseph Hagerty
Does literally nobody else see this error in their logs? I see this error hundreds of times per day, in occasional bursts. Should I file this as a bug? On Mon, Feb 15, 2016 at 4:56 PM, Joseph Hagerty wrote: > After migrating from 3.5 to 4.10.3, I'm seeing the following error with > alarming regu

RE: Which open-source crawler to use with SolrJ and Postgresql ?

2016-02-16 Thread Markus Jelsma
Hello - Nutch 1.x is much more feature rich than 2.x, both can do tremendous large crawls with ease. I haven't tried all others mentioned except ManifoldCF, which is very good in retrieving data from shared file systems and stuff like filenet. We use Nutch 1.x for most of our crawls, small and

Re: SOLR ranking

2016-02-16 Thread david.w.smi...@gmail.com
I just want to interject to say one thing: You *can* sort on multi-valued fields as-of recent Solr 5 releases. it's done using the "field" function query with either a "min" or "max" 2nd argument: https://cwiki.apache.org/confluence/display/solr/Function+Queries Of course it'd be nicer to simply s

Solr and Nutch integration

2016-02-16 Thread Tom Running
I am having problem configuring Solr to read Nutch data or Integrate with Nutch. Does anyone able to get SOLR 5.4.x to work with Nutch? I went through lot of google's article any still not able to get SOLR 5.4.1 to searching Nutch contents. Any howto or working configuration sample that you can

Re: Using Solr's spatial functionality for astronomical catalog

2016-02-16 Thread david.w.smi...@gmail.com
Ah; I saw that. I'm glad you figured it out. Yes, you needed the SQL alias. I'm kinda surprised you didn't get an error about a field by the name of your expression not existing... but maybe you have a catch-all dynamic field or maybe you're in data-driven mode. In either case, I'd expect a qui

RE: Solr and Nutch integration

2016-02-16 Thread Markus Jelsma
Hello Tom - Nutch 2.x has iirc old SolrServer client implemented. It should just send an HTTP request to a specified node. The Solr node will then forward it to a destination shard. In Nutch, you should set up indexer-solr as an indexing plugin in the plugin.includes configuration directive and

Re: Negating multiple array fileds

2016-02-16 Thread Shawn Heisey
On 2/15/2016 9:22 AM, Jack Krupansky wrote: > I should also have noted that your full query: > > (-persons:*)AND(-places:*)AND(-orgs:*) > > can be written as: > > -persons:* -places:* -orgs:* > > Which may work as is, or can also be written as: > > *:* -persons:* -places:* -orgs:* Salman, One fac

Re: Negating multiple array fileds

2016-02-16 Thread Binoy Dalal
Hi Shawn, Please correct me If I'm wrong here, but don't the all inclusive range query [* TO *] and an only wildcard query like the one above essentially do the same thing from a black box perspective? In such a case wouldn't it be better to default an only wildcard query to an all inclusive range