SQL JOIN eta

2017-03-14 Thread Damien Kamerman
Hi all, does anyone know roughly when the SQL JOIN functionally will be released? Is there a Jira for this? I'm guessing this might be on Solr 6.6. Cheers, Damien.

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Ok, thanks for the heads up. I'll review on the solrconfig.xml first. Regards, Edwin On 15 March 2017 at 00:23, Joel Bernstein wrote: > Yeah, there has been a lot of changes to configs in Solr 6. All the > streaming request handlers have now been made implicit so the > solrconfig.xml doesn't

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
bq: If I changed the routing strategy back to composite (which it should be). is it ok? I sincerely doubt it. The docs have already been routed to the wrong place (actually, I'm not sure how it worked at all). You can't get them redistributed simply by changing the definition in ZooKeeper, they're

Re: Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel
Wow, this is interesting! Is it going to be a new addition to solr or is it already available cause I can not find it in documentation? I am using solr version 6.4.1. On Tue, Mar 14, 2017 at 7:41 PM, Joel Bernstein wrote: > I'm going to add a "cartesian" function that create a cartesian product

Re: Using fetch function with streaming expression

2017-03-14 Thread Joel Bernstein
I'm going to add a "cartesian" function that create a cartesian product from a multi-value field. This will turn a single tuple with a multi-value into multiple tuples with a single value field. This will allow the fetch operation to work on ancestors. It also has many other use cases. Sample synta

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
I think I dint explain properly. I have 3 data centers each with its own SOLR cloud. My original strategy was composite routing but when one data center went down and we brought it back, somehow the routing strategy on this changed to implicit (Other 2 DC still have composit and they are working

Re: Facet? Search problem

2017-03-14 Thread David Hastings
glad it worked for you. im planning on some experimentation using that feature, could contribute to an interface nicely if thought through well. On Tue, Mar 14, 2017 at 2:25 PM, Scott Smith wrote: > Grouping appears to be exactly what I'm looking for. I added > "group=true&group.field=category

RE: Facet? Search problem

2017-03-14 Thread Scott Smith
Thanks. I'll look at that as well. -Original Message- From: Stefan Matheis [mailto:matheis.ste...@gmail.com] Sent: Tuesday, March 14, 2017 1:20 PM To: solr-user@lucene.apache.org Subject: RE: Facet? Search problem Scott Depending on what you're looking for https://cwiki.apache.org/conf

RE: Facet? Search problem

2017-03-14 Thread Stefan Matheis
Scott Depending on what you're looking for https://cwiki.apache.org/confluence/display/solr/Collapse+and+Expand+Results might be worth a look as well. -Stefan On Mar 14, 2017 7:25 PM, "Scott Smith" wrote: > Grouping appears to be exactly what I'm looking for. I added > "group=true&group.field

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
That would make the problem even worse. If you created the collection with implicit routing, there are no hash ranges for each shard. CompositeId requires hash ranges to be defined for each shard. Don't even try. Best, Erick On Tue, Mar 14, 2017 at 11:13 AM, vbindal wrote: > Compared it against

RE: Facet? Search problem

2017-03-14 Thread Scott Smith
Grouping appears to be exactly what I'm looking for. I added "group=true&group.field=category" to my search and It appears that I get a list of groups, one document in each group that matches the search along with (bonus) the number of documents in the category that match that search. Perfect.

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
Compared it against the other 2 datacenters and they both have `compositeId `. This started happening after 1 of our zookeeper died due to hardware issue and we had to setup a new zookeeper machine. update the config in all the solr machine and restart the cloud. My guess is something went wrong a

Re: Iterating sorted result docs in a custom search component

2017-03-14 Thread alexpusch
I ended up using ValueSource, and FunctionValues (as used in statsComponent) FieldType fieldType = schemaField.getType(); ValueSource valueSource = fieldType.getValueSource(schemaField, null); FunctionValues values = valueSource.getValues(Collections.emptyMap(), ctx); values.strVal(docId) I hope

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread Erick Erickson
The default router has always been compositeId, but when you created your collection you may have created it with implicit. Looking at the clusterstate.json and/or state.json in the individual collection should show you (admin UI>>cloud>>tree). But we need to be very clear about what a "duplicate"

Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Shawn Heisey
On 3/14/2017 10:23 AM, Elodie Sannier wrote: > The request close() method decrements the reference count on the > searcher. >From what I could tell, that method decrements the reference counter, but does not actually close the searcher object. I cannot tell you what the correct procedure is to ma

Re: Inconsistent numFound in SC when querying core directly

2017-03-14 Thread vbindal
Hi Shawn, We are on 4.10.0 version. Is that the default router in this version? Also, we dont see all the documents duplicated, only some of them. I have a indexer job to index data in SOLR. After I delete all the records and run this job, the count is correct but when I run the job again, we star

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
Yeah, there has been a lot of changes to configs in Solr 6. All the streaming request handlers have now been made implicit so the solrconfig.xml doesn't include them. Something seems to be stepping on the implicit configs. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 12:20

Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Erick Erickson
Yeah, it's a little confusing. But SolrQueryReqeustBase.getSearcher calls, in turn, core.getSearcher which explicitly says in the javadocs: * If returnSearcher==true then a SolrIndexSearcher will be returned with * the reference count incremented. It must be decremented when no longer needed. Se

Re: Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel
Hi, Joel. Thanks for the reply. So, I need to do some graph traversal queries for my use case. In my data set, I have concepts and events. concept : {name, address, bio ..}, > event: {name, date, participantIds:[concept1, concept2...] .} Events connects two or more concepts. So, this is

Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Elodie Sannier
The request close() method decrements the reference count on the searcher. public abstract class SolrQueryRequestBase implements SolrQueryRequest, Closeable { // The index searcher associated with this request protected RefCounted searcherHolder; public void close() { if(this.searcher

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Could it be because the solrconfig.xml was created in Solr 5.x, and was upgraded to Solr 6.x, and there is something which I have missed out during the upgrading? So far for this server, only the schema.xml and solrconfig.xml was carried forward and modified from Solr 5.x. The files for Solr 6.4.1

Re: Need help with date boost

2017-03-14 Thread Erick Erickson
Rick: Hmmm, try this: https://cwiki.apache.org/confluence/display/solr/Function+Queries. It's not quite as explicit, but it's the latest document. Essentially that's what the function on that page does, something like: "recip(rord (creationDate),1,1000,1000)" the "recip" function is actually rec

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
Yeah, something is wrong with the configuration, because /export only should be returning json. Have you changed the configurations? What were the exact steps you used in setting up the server? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 11:50 AM, Zheng Lin Edwin Yeo wr

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi Joel, This is what get from query: true 0 0 Regards, Edwin On 14 March 2017 at 22:33, Joel Bernstein wrote: > try running the following query: > > http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}ed...@mail.com > &distrib=false&fl=from,to&sort=to+asc,from+asc&wt=json&version=

Re: Iterating sorted result docs in a custom search component

2017-03-14 Thread Erick Erickson
Then you're probably going to write your own Collector if you need to see each document and do something different with it. Do be aware that you _really_ need to be sure you get your values from docValues fields. If you use the simple get(docId).getField() method the stored fields will be read from

Re: Using fetch function with streaming expression

2017-03-14 Thread Joel Bernstein
Wow that's an interesting expression! The problem is that you are trying to fetch using the ancestors field, which is multi-valued. fetch doesn't support multi-value join keys. I never thought someone might try to do that. So , your attempting to get the concept names for ancestors? Can you expl

Re: managing active/passive cores in Solr and Haystack

2017-03-14 Thread Erick Erickson
I don't know much about HAYSTACK, but for the Solr URL you probably want the "shards" parameter for searching, see: https://cwiki.apache.org/confluence/display/solr/Distributed+Search+with+Index+Sharding And just use the specific core you care about for update requests. But I would suggest that y

Re: Suggestions from different dictionaries dynamically

2017-03-14 Thread Alexandre Rafalovitch
Are you actually using spell checker functionality? If so, could you provide the solrconfig.xml segment of what that configuration looks like. Or are you just using plain search, then what is your default 'df' field? Regards, Alex. http://www.solr-start.com/ - Resources for Solr users, ne

Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Joel Bernstein
Ok. I updated the other thread with a URL to run based on what I've seeing in the logs. Try running that URL and let's see what comes back. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 10:26 AM, Zheng Lin Edwin Yeo wrote: > Hi Joel, > > This is a standard Solr 6.4.1 inst

Using fetch function with streaming expression

2017-03-14 Thread Pratik Patel
I have two types of documents in my index. eventLink and concepttData. eventLink { ancestors:[,] } conceptData-{ id:id1, conceptid, concept_name . } Both are in same collection. In my query, I am doing a gatherNodes query wrapped in some other function and ultimately I am getting a b

managing active/passive cores in Solr and Haystack

2017-03-14 Thread serwah sabetghadam
Hi all, I am totally new to this group and of course so happy to join:) So my question may be repetitive but I did not find how to search all previous questions. problem in one sentence: to read from multiple cores (archive and active ones), write only to the latest active core using Solr and Ha

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Thanks Toke, After sorting with Self Time(CPU) I got that the FSDirectory$FSIndexOutput$1.write() is taking much of CPU time, so the bottleneck now is the IO of the hard drive? https://drive.google.com/open?id=0BwLcshoSCVcdb2I4U1RBNnI0OVU On Tue, Mar 14, 2017 at 4:19 PM, Toke Eskildsen wrote:

Re: Need help with date boost

2017-03-14 Thread Rick Leir
Hi Erick We have ten year old documents and new ones which often score about the same just based on default similarity. But the newer ones are much more relevant in our case. Suppose we de-boost proportionally to (NOW/YEAR - modifiedDate/YEAR). Thanks for the link you provided, it had not jumpe

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
try running the following query: http://localhost:8983/solr/email/export?{!terms+f%3Dfrom}ed...@mail.com &distrib=false&fl=from,to&sort=to+asc,from+asc&wt=json&version=2.2 Let's see what comes back from this. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 10:20 AM, Zheng L

Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi Joel, This is a standard Solr 6.4.1 install. I got the same error even after I upgrade it to Solr 6.4.2. Regards, Edwin On 14 March 2017 at 21:30, Joel Bernstein wrote: > Looks like there might be something strange with your configuration. Did > you upgrade an existing install or is this a

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi Joel, I have only managed to find these above the stack trace. 2017-03-14 14:08:42.819 INFO (qtp1543727556-2635) [ ] o.a.s.s.HttpSolrCall [admin] webapp=null path=/admin/info/logging params={wt=json&_=1489500479108&since=0} status=0 QTime=0 2017-03-14 14:08:43.085 INFO (qtp1543727556-2397)

SOLR Data Locality

2017-03-14 Thread Muhammad Imad Qureshi
We have a 30 node Hadoop cluster and each data node has a SOLR instance also running. Data is stored in HDFS. We are adding 10 nodes to the cluster. After adding nodes, we'll run HDFS balancer and also create SOLR replicas on new nodes. This will affect data locality. does this impact how solr w

Re: Indexing CPU performance

2017-03-14 Thread Toke Eskildsen
On Tue, 2017-03-14 at 11:51 +0200, Mahmoud Almokadem wrote: > Here is the profiler screenshot from VisualVM after upgrading > > https://drive.google.com/open?id=0BwLcshoSCVcddldVRTExaDR2dzg > > the jetty is taking the most time on CPU. Does this mean, the jetty > is the bottleneck on indexing? Y

Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Shawn Heisey
On 3/14/2017 3:08 AM, Gerald Reinhart wrote: > Hi, >The custom code we have is something like this : > public class MySearchHandlerextends SearchHandler { > @Override public void handleRequestBody(SolrQueryRequest req, > SolrQueryResponse rsp)throws Exception { > SolrIndexSearcher sear

Re: Error with Streaming Expressions - shortestPath

2017-03-14 Thread Joel Bernstein
Looks like there might be something strange with your configuration. Did you upgrade an existing install or is this a standard Solr 6.4.1 install? Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Mar 14, 2017 at 6:22 AM, Zheng Lin Edwin Yeo wrote: > Hi, > > I tired to run the following Stre

Re: Indexing CPU performance

2017-03-14 Thread Shawn Heisey
On 3/14/2017 3:35 AM, Mahmoud Almokadem wrote: > After upgrading to 6.4.2 I got 3500+ docs/sec throughput with two uploading > clients to solr which is good to me for the whole reindexing. > > I'll try Shawn code to posting to solr using HttpSolrClient instead of > SolrCloudClient. If the servers

Re: Error for Graph Traversal using Streaming Expressions

2017-03-14 Thread Joel Bernstein
You're getting json parse errors, that look like your getting an XML response. Do you see any errors in the logs other then the stack trace. I suspect there might be another error above the stack trace which shows the error from the server that causing it to respond with XML. Joel Bernstein http

Suggestions from different dictionaries dynamically

2017-03-14 Thread vuppalasubbarao
Hi, We have two field names "teacher_field" and "school_field" along with other fields like "source". We have created single dictionary from both these fields. When I am searching with misspelling of "teacher_field", I also get the spelling suggestions from "school_field". Instead I have to get s

SortingMergePolicy in solr 6.4.2

2017-03-14 Thread Sahil Agarwal
The SortingMergePolicy does not seem to get implemeted. The csv file gets indexed without errors. But when I search for a term, the results returned are not sorted by Marks. Following is a toy project in Solr 6.4.2 on which I tried to use SortingMergePolicyFactory. Just showing the changes that

Re: Modifying solrconfig.xml in solr cloud

2017-03-14 Thread Binoy Dalal
Thanks Eric. Missed that somehow. On Tue, 14 Mar 2017, 10:44 Erick Erickson, wrote: > First hit from googling "solr config API" > > https://cwiki.apache.org/confluence/display/solr/Config+API > > Best, > Erick > > On Mon, Mar 13, 2017 at 8:27 PM, Binoy Dalal > wrote: > > Is there a simpler way

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
After upgrading to 6.4.2 I got 3500+ docs/sec throughput with two uploading clients to solr which is good to me for the whole reindexing. I'll try Shawn code to posting to solr using HttpSolrClient instead of SolrCloudClient. Thanks to all, Mahmoud On Tue, Mar 14, 2017 at 10:23 AM, Mahmoud Almok

Error with Streaming Expressions - shortestPath

2017-03-14 Thread Zheng Lin Edwin Yeo
Hi, I tired to run the following Streaming query with the shortestPath Stream Source. http://localhost:8983/solr/email/stream?expr=shortestPath(email, from="ed...@mail.com", to="ad...@mail.com", edge="from_address=to_address", threads="6", partitionSize="300", maxDepth="4")&indent=t

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Here is the profiler screenshot from VisualVM after upgrading https://drive.google.com/open?id=0BwLcshoSCVcddldVRTExaDR2dzg the jetty is taking the most time on CPU. Does this mean, the jetty is the bottleneck on indexing? Thanks, Mahmoud On Tue, Mar 14, 2017 at 11:41 AM, Mahmoud Almokadem wr

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Thanks Shalin, I'm posting data to solr with SolrInputDocument using SolrJ. According to the profiler, the com.codahale.metrics.Meter.mark is take much processing than others as mentioned on this issue https://issues.apache.org/jira/browse/SOLR-10130. And I think the profiler of sematext is diff

Re: Indexing CPU performance

2017-03-14 Thread Shalin Shekhar Mangar
According to the profiler output, a significant amount of cpu is being spent in JSON parsing but your previous email said that you use SolrJ. SolrJ uses the javabin binary format to send documents to Solr and it never ever uses JSON so there is definitely some other indexing process that you have n

Re: [Migration Solr5 to Solr6] Unwanted deleted files references

2017-03-14 Thread Gerald Reinhart
Hi, The custom code we have is something like this : public class MySearchHandlerextends SearchHandler { @Override public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)throws Exception { SolrIndexSearcher searcher =req.getSearcher(); try{

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
Thanks Erick, I think there are something missing, the rate I'm talking about is for bulk upload and one time indexing to on-going indexing. My dataset is about 250 million documents and I need to index them to solr. Thanks Shawn for your clarification, I think that I got stuck on this version 6

Re: Indexing CPU performance

2017-03-14 Thread Mahmoud Almokadem
I'm using VisualVM and sematext to monitor my cluster. Below is screenshots for each of them. https://drive.google.com/open?id=0BwLcshoSCVcdWHRJeUNyekxWN28 https://drive.google.com/open?id=0BwLcshoSCVcdZzhTRGVjYVJBUzA https://drive.google.com/open?id=0BwLcshoSCVcdc0dQZGJtMWxDOFk https://drive.

Re: Iterating sorted result docs in a custom search component

2017-03-14 Thread alexpusch
Single field. I'm iterating over the results once, and need each doc in memory only for that single iteration. I need different fields from each doc according to the algorithm state. -- View this message in context: http://lucene.472066.n3.nabble.com/Iterating-sorted-result-docs-in-a-custom-sea