Re: DIH XPathEntityProcessor XPath subset?

2018-01-05 Thread Rick Leir
Stefan There is at least one free Solr WP plugin. There are several Solr PHP toolkits on github. Start with these unless your WP is wildly custo..  .. cheers -- Rick On 01/03/2018 11:50 AM, Erik Hatcher wrote: Stefan - If you pre-transform the XML, I’d personally recommend either transform

Re: Small Tokenization issue

2018-01-05 Thread Rick Leir
Nawab Look at classicTokenizer. It is a good choice if you have part numbers with hyphens. The second tokenizer on this page: https://lucene.apache.org/solr/guide/6_6/tokenizers.html Cheers -- Rick On 01/03/2018 04:52 PM, Shawn Heisey wrote: On 1/3/2018 1:56 PM, Nawab Zada Asad Iqbal wrote

Re: SOLR SSL Java command line properties

2018-01-05 Thread Rick Leir
Bob Thanks for mentioning the jetty-ssl.xml file. I have a follow-on question: since it is strongly recommended that you host Solr behind a web app (perhaps solr-security-proxy is adequate), the Solr REST interface will not be on the open Internet, so perhaps HTTP is the appropriate protocol?

Re: Deliver static html content via solr

2018-01-05 Thread Rick Leir
Using Velocity, you can have some results-driven HTML served by Solr and all your JS, CSS etc 'assets' served by Apache from /var/www/html. Warning: the Velocity learning curve is steep and you still need a separate front-end web app for security because Velocity is a templating output filter.

Re: Solr - custom ordering

2018-01-05 Thread Emir Arnautović
Hi, Solr can return documents by score or some field value. In case of all docs having the same score, it’ll use its internal id to sort docs. That being said, you have two choices: 1. sort jobids in your fq and sort by jobid (in this case you should use terms query parser) 2. use q instead of f

Re: Replication Factor Bug in Collections Restore API?

2018-01-05 Thread Ansgar Wiechers
On 2018-01-04 Shalin Shekhar Mangar wrote: > Sounds like a bug. Can you please open a Jira issue? https://issues.apache.org/jira/browse/SOLR-11823 Regards Ansgar Wiechers

Personalized search parameters

2018-01-05 Thread marco
Hi, first of all I want to say that i'm a beginner with the whole Lucene/Solr environment. I'm trying to create a simple personalized search engine, and to do so i was thinking about adding a parameter user= to the uri of the query requests, that i would need during the scoring phase to rerank

Very high number of deleted docs, part 2

2018-01-05 Thread Markus Jelsma
Hello, We discussed [1] this problem before, and we could not fix it until it became clear my collection was rather small, thanks again. Another collection, now on 7.1, also shows this problem and has default TMP settings. This time size is different, each shard of this collection is over 40 G

Replacing the entire schema using schema API

2018-01-05 Thread André Widhani
Hi, I know I can retrieve the entire schema using Schema API and I can also use it to manipulate the schema by adding fields etc. I don't see any way to post an entire schema file back to the Schema API though ... this is what most REST APIs offer: You retrieve an object, modify it and send back

Re: Very high number of deleted docs, part 2

2018-01-05 Thread Shawn Heisey
On 1/5/2018 5:33 AM, Markus Jelsma wrote: Another collection, now on 7.1, also shows this problem and has default TMP settings. This time size is different, each shard of this collection is over 40 GB, and each shard has about 50 % deleted documents. Each shard's largest segment is just under

Re: Replacing the entire schema using schema API

2018-01-05 Thread Shawn Heisey
On 1/5/2018 6:51 AM, André Widhani wrote: I know I can retrieve the entire schema using Schema API and I can also use it to manipulate the schema by adding fields etc. I don't see any way to post an entire schema file back to the Schema API though ... this is what most REST APIs offer: You retri

Re: slow solr facet processing

2018-01-05 Thread Ere Maijala
Hi Everyone, This is a followup on the discussion from September 2017. Since then I've spent a lot of time gathering a better understanding on docValues compared to UIF and other stuff related to Solr performance. Here's a summary of the results based on my real-world experience: 1. Making s

Re: Limit search queries only to pull replicas

2018-01-05 Thread Ere Maijala
Hi, It would be really nice to have a server-side option, though. Not everyone uses Solrj, and a typical fairly dummy client just queries the server without any understanding about shards etc. Solr could be clever enough to not forward the query to NRT shards when configured to prefer PULL sh

Re: Replacing the entire schema using schema API

2018-01-05 Thread André Widhani
Hi Shawn, thanks for confirming. I am not using Solr Cloud (I forgot to mention that), or at least not in all instances where that particular piece of code would be used. I'll think about opening a Jira issue, or just doing it iteratively through the API. Regards, André 2018-01-05 15:05 GMT+0

Re: Personalized search parameters

2018-01-05 Thread Erik Hatcher
IMO you’re making this more complicated than it needs to be. Forget for a moment where the user profile is stored. Say user A likes turtles. User B likes puppies. User A queries, and this gets sent to Solr: q=something&bq=turtles User B queries: q=something&bq=puppies I’d fetch the user pref

Re:Personalized search parameters

2018-01-05 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
Why you want the personalization to happen into Similarity? Similarity will score all the docs matching your query, so it has too be really fast. Unless your personalization is very easy (e.g., tf/idf computed in a different way based on the user) I would not put it there.. Did you consider wri

RE: Very high number of deleted docs, part 2

2018-01-05 Thread Markus Jelsma
It could be that when this index was first reconstructed, it was optimized to one segment before packed and shipped. How about optimizing it again, with maxSegments set to ten, it should recover right? -Original message- > From:Shawn Heisey > Sent: Friday 5th January 2018 14:34 > To: s

Re: Deliver static html content via solr

2018-01-05 Thread Erik Hatcher
Rick - fair enough, indeed. However, for a “static” resource, no Velocity syntax or learning curve needed. In fact, correcting myself, VelocityResponseWriter isn’t even part of the picture for serving a static resource. Have a look at example/files - https://github.com/apache/lucene-solr/tr

Re: Solr - custom ordering

2018-01-05 Thread Erik Hatcher
Vineet - Solr’s QueryElevationComponent can do this. Or you could use a query like: q=id:C^=300 id:B^=200 id:A^=100 The ^= is a constant score syntax, so you can assign a “score” to a clause (in this case a single document with a unique id). Erik > On Jan 4, 2018, at 11:47 PM

RE: 7.1.0 weird messages bad core before recovery

2018-01-05 Thread Markus Jelsma
Any on this? Thanks, Markus -Original message- > From:Markus Jelsma > Sent: Wednesday 27th December 2017 11:11 > To: Solr-user > Subject: 7.1.0 weird messages bad core before recovery > > Hello, > > I just had a bad core that needed recovery after restart, first it told me > this

Solr - how does faceting returned unstored values?

2018-01-05 Thread ruby
The Solr document states that the purpose of the stored attribute is to tell Solr to store the original text in the index somewhere. If that is true, then how is Solr able to return original texts when we facet on fields which are not stored? -- Sent from: http://lucene.472066.n3.nabble.com/S

Re: Solr - how does faceting returned unstored values?

2018-01-05 Thread Erik Hatcher
Facets return the *indexed* value. This is an important, ahem, facet to facets. Field analysis matters, so tokenized fields will have tokenized facets. Erik > On Jan 5, 2018, at 10:17 AM, ruby wrote: > > The Solr document states that the purpose of the stored attribute is to tell

Re: Solr - how does faceting returned unstored values?

2018-01-05 Thread Emir Arnautović
Hi Ruby, Faceting does not work on stored or “original” values but on tokens that are result of analysis chain. Of course, depending on configuration, tokenized value can be the same as original. Stored values are stored as they are sent in input document. It is probably best if you try faceting

Re: Personalized search parameters

2018-01-05 Thread marco
First of all thank you for the reply. I understand your idea, and that would make the thing a lot easyer, the problem is that this system is being created as a university project, and we were specifically asked to develop a personalized search system based on result reranking. In particular we have

Re: Limit search queries only to pull replicas

2018-01-05 Thread Emir Arnautović
It is interesting that ES had similar feature to prefer primary/replica but it deprecating that and will remove it - could not find explanation why. Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > On 5

Re: Solrcloud with Master/Slave

2018-01-05 Thread Shawn Heisey
On 1/4/2018 9:01 AM, Sundaram, Dinesh wrote: > Thanks Shawn for your prompt response. Assume I have solrcloud A server with > 1 node runs on 8983 port and solrcloud B server with 1 node runs on 8983, > here I want to synch up the collection between solrcloud A and B using the > below replication

Re: Solrcloud with Master/Slave

2018-01-05 Thread Erick Erickson
One slight correction. Solr will run perfectly fine with a single ZooKeeper. The recommendation for 3 is that running with a single ZooKeeper creates a single point of failure, i.e. if that node goes down for any reason your Solr cluster won't be able to update anything at all. You can still query,

Re: Limit search queries only to pull replicas

2018-01-05 Thread Erick Erickson
Actually, I think a much better option is to route queries to server load. The theory of preferring pull replicas to leaders would be that the leader will be doing the indexing work and the pull replicas would be doing less work therefore serving queries faster. But that's a fragile assumption. Le

RE: Solrcloud with Master/Slave

2018-01-05 Thread Sundaram, Dinesh
Thanks Shawn and Erick. I guess now we are in same track. So two independent solrcloud nodes are allowed to sync up via master/slave method without referring any external/embedded zookeepers. I need to use -cloud in the command while starting solr otherwise I'm not able to see the admin console.

Re: Very high number of deleted docs, part 2

2018-01-05 Thread Erick Erickson
I'm not 100% sure that playing with maxSegments will work. what will work is to re-index everything. You can re-index into the existing collection, no need to start with a new collection. Eventually you'll replace enough docs in the over-sized segments that they'll fall under the 2.5G live documen

Re: Solrcloud with Master/Slave

2018-01-05 Thread Erick Erickson
bq: I need to use -cloud in the command while starting solr otherwise I'm not able to see the admin console That is totally not true. Yes, you need to start SolrCloud to see the _cloud section_ of the admin UI, but the rest of the admin UI isn't dependent on SolrCloud _at all_. In master/slave set

Re: Replacing the entire schema using schema API

2018-01-05 Thread Erick Erickson
We (well, me and one of the UI guys) had the brilliant idea of editing the configs through the admin UI. Turns out it wasn't all that brilliant given security issues, see Uwe Schindler's comments here: https://issues.apache.org/jira/browse/SOLR-5287 on November 30th. So before raising a JIRA you

Re: Personalized search parameters

2018-01-05 Thread Diego Ceccarelli (BLOOMBERG/ LONDON)
From: solr-user@lucene.apache.org At: 01/05/18 15:35:46To: solr-user@lucene.apache.org Subject: Re: Personalized search parameters In particular we have to retrieve the documents with a normal search followed by a result reranking phase where we calculate the cosine similarity between the retrie

Re: slow solr facet processing

2018-01-05 Thread Erick Erickson
Ere: This is an excellent summary, it conforms to what I think I know, it's always nice to see confirmation! I'd add two small enhancements. Your point 5 mentions sorting. The same consideration is true for grouping and faceting as well. What all three have in common is that they answer the quest

Re: SolrJ with Async Http Client

2018-01-05 Thread Gus Heck
IIRC http2 allows for multiple (non blocking) requests over a single connection, so this Jira might be relevant: https://issues.apache.org/jira/browse/SOLR-7442 -Gus On Wed, Jan 3, 2018 at 10:22 AM, Walter Underwood wrote: > HTTPClient is non-blocking. Send the request, then the client gets con

RE: problem with Solr Sorting by score and distance together

2018-01-05 Thread Deepak Udapudi
Thanks Susheel / Shawn for the suggestions. We are working on the proposed changes. Regards, Deepak -Original Message- From: Susheel Kumar [mailto:susheel2...@gmail.com] Sent: Thursday, January 04, 2018 7:19 PM To: solr-user@lucene.apache.org Cc: Venkata MR ; Segar Soundiramourthy Sub

Solr cloud issue : Solr 6.5

2018-01-05 Thread Satyaprashant Bezwada
Not sure where we are going wrong in our implementation. We have a Solr cloud environment (Solr 6.5), with 2 solr nodes and 3 Zookeeper servers. The environment was running without any issues, but lately noticed that one of the Solr node keeps shutting down frequently. We have replication in p

document colocation

2018-01-05 Thread Steve Pruitt
I have two document types that share several fields. We currently plan a single index for both types. One of the shared fields contains a value that correlates two document instances, i.e. two documents of the two types has the same value. The values are random integers. We would like each c

Re: document colocation

2018-01-05 Thread Erick Erickson
Why do you want to do this? This feels like an XY problem, you're asking how to do X (colocate the docs) without explaining why it's valuable (the Y). I'm skeptical that this buys you enough to be worth the hassle, which is why I'm asking about Y. theoretically at least you might be able to use c

Re: Solr cloud issue : Solr 6.5

2018-01-05 Thread Erick Erickson
If this has been working fine for a while and suddenly started this behavior my first suspicion would be excessive GC, i.e. you've been adding docs and your heap is no longer adequate. If Java needs to do a stop-the-world garbage collection you can get these kinds of errors. So I'd enable GC loggi

RE: Solr Issue

2018-01-05 Thread Lewin Joy (TMNA)
Hi Erick, I just didn't know how to handle the logic without using streaming query. What I want is to apply the logic without streaming expressions. I want to eliminate "item_name"s in the result set which has any of it's record's status='N' Example: Id:1, Item Name: A , status: Y Id:2, Item Na

Re: Negative Core Node Numbers

2018-01-05 Thread Chris Ulicny
After more testing, compiling the archived source and the pre-packaged files on the archive.apache.org site for 7.1.0 keep generating the same issue with negative core node numbers. However, if I compile and run the 7.1 branch from github, it does not produce the negative numbers. When generating

Re: Solr Issue

2018-01-05 Thread Erick Erickson
Ok, I missed that you had multiple records. Pagination does not mesh well with streaming, they address quite different use cases. You might have to re-think your indexing scheme to somehow make the selection more amenable to standard Solr queries. Perhaps something with the join query parser or in

Re: Personalized search parameters

2018-01-05 Thread marco
This looks like a very good solution actually. In the mean time i started working in a different way: I created a custom query componentan from there i accessed the list of results of the query, and i was searching a way to reorder that list, but i'd be better look to the RankQuery, it surely looks

Re: Solr cloud issue : Solr 6.5

2018-01-05 Thread Satyaprashant Bezwada
Thanks a lot. It helped, I noticed the error in the solr console log. cat solr-8983-console.log I> No access restrictor found, access to any MBean is allowed Jolokia: Agent started with URL http://xxx.xx.x.xxx:8778/jolokia/ 2018-01-05 03:56:30.824 INFO (main) [ ] o.e.j.s.Server jetty-9.3.14.v20

Re: SolrJ with Async Http Client

2018-01-05 Thread Shawn Heisey
On 1/5/2018 11:35 AM, Gus Heck wrote: > IIRC http2 allows for multiple (non blocking) requests over a single > connection, so this Jira might be relevant: > https://issues.apache.org/jira/browse/SOLR-7442 HTTPClient will have http/2 support in version 5.0 -- but only in the async client, not the c

Re: Negative Core Node Numbers

2018-01-05 Thread Shawn Heisey
On 1/5/2018 1:35 PM, Chris Ulicny wrote: > After more testing, compiling the archived source and the pre-packaged > files on the archive.apache.org site for 7.1.0 keep generating the same > issue with negative core node numbers. > > However, if I compile and run the 7.1 branch from github, it does

Re: Solr cloud issue : Solr 6.5

2018-01-05 Thread Shawn Heisey
On 1/5/2018 2:35 PM, Satyaprashant Bezwada wrote: > Thanks a lot. It helped, I noticed the error in the solr console log. > # java.lang.OutOfMemoryError: Metaspace > # -XX:OnOutOfMemoryError="/usr/local/solr/bin/oom_solr.sh 8983 /var/solr/logs" > # Executing /bin/sh -c "/usr/local/solr/bin/oom_s

Re: Solr cloud issue : Solr 6.5

2018-01-05 Thread Satyaprashant Bezwada
Thanks a lot Shawn. That really helps, I believe the changes were made during our load testing phase to introduce some changes in the startup script to manage the max allocated memory. That’s where someone introduced that line. We removed it and I’ll check how it behaves now. Regards Prashan

Re: Personalized search parameters

2018-01-05 Thread marco
At the moment I have another problem: is there an efficient way to calculate the cosine similarity between documents? I'm following (with the required modifications) THIS code that calculates the cosine similarity between 2 documents, but it doesn't look t

Re: SolrJ with Async Http Client

2018-01-05 Thread Erick Erickson
re: upgrade Jetty: See SOLR-11810 On Fri, Jan 5, 2018 at 2:08 PM, Shawn Heisey wrote: > On 1/5/2018 11:35 AM, Gus Heck wrote: > > IIRC http2 allows for multiple (non blocking) requests over a single > > connection, so this Jira might be relevant: > > https://issues.apache.org/jira/browse/SOLR-7

RE: Pass field value through function for filtering

2018-01-05 Thread Chris Hostetter
https://lucene.apache.org/solr/guide/7_2/other-parsers.html fq={!frange l=0}your(complex(func(fieldA,fieldB),fieldC)) As of 7.2, frange filters will default to being PostFilters as long as you use cache=false ... https://lucidworks.com/2017/11/27/caching-and-filters-and-post-filters/ https://i

Re: Deliver static html content via solr

2018-01-05 Thread Rick Leir
Erik, Sorry I didn't mean to say Velocity has a security problem. I am just thinking that people will see it in action and think it is a full answer to a front end web app, though it has no input filtering or range checking ( as an output template system, natcch). What do you recommend for a ve

Re: trivia question: why q=*:* doesn't return same result as q.alt=*:*

2018-01-05 Thread Nawab Zada Asad Iqbal
HI Erik Hatcher Yes, i am using dismax. But dismax allows *:* for q.alt ,which also seems like inconsistency. On Thu, Jan 4, 2018 at 5:53 PM, Erik Hatcher wrote: > defType=??? Probably dismax. It doesn’t do *:* like edismax or lucene. > > > On Jan 4, 2018, at 20:39, Nawab Zada Asad Iqbal > w

Re: CommonGramsFilter

2018-01-05 Thread Nawab Zada Asad Iqbal
Actually, i have found that it is *not* mandatory to use phrase search with CommonGramsFilter . PS: i had some other code change (which is unnecessary) which was causing the above behavior. On Thu, Jan 4, 2018 at 6:56 PM, Nawab Zada Asad Iqbal wrote: > After some debugging, it seems that the s

Re: 7.1.0 weird messages bad core before recovery

2018-01-05 Thread S G
> > Never seen it before, bug? Already fixed? I have seen it many times before in almost all Solr versions. Do not remember the exact stack trace though. Generally a restart fixes the problem (Like almost all software :) On Fri, Jan 5, 2018 at 6:57 AM, Markus Jelsma wrote: > Any on this? > >