Re: Indexed epoch time in Solr

2015-01-25 Thread Jorge Luis Betancourt González
Perhaps could you use a DocTransformer to convert the unix time field into any representation you want? You'll need to write a custom DocTransformer but this is no complex task. Regards, - Original Message - From: "Ahmed Adel" To: solr-user@lucene.apache.org Sent: Monday, January 26, 2

Re: [MASSMAIL]Weighting of prominent text in HTML

2015-01-25 Thread Jorge Luis Betancourt González
Hi Dan: Agreed, this question is more Nutch related than Solr ;) Nutch doesn't send any data into /update/extract request handler, all the text and metadata extraction happens in Nutch side rather than relying in the ExtractRequestHandler provided by Solr. Underneath Nutch use Tika the same te

Indexed epoch time in Solr

2015-01-25 Thread Ahmed Adel
Hi All, Is there a way to convert unix time field that is already indexed to ISO-8601 format in query response? If this is not possible on the query level, what is the best way to copy this field to a new Solr standard date field. Thanks, -- *Ahmed Adel*

Weighting of prominent text in HTML

2015-01-25 Thread Dan Davis
By examining solr.log, I can see that Nutch is using the /update request handler rather than /update/extract. So, this may be a more appropriate question for the nutch mailing list. OTOH, y'all know the anwser off the top of your head. Will Nutch boost text occurring in h1, h2, etc. more heavi

Re: solr replication vs. rsync

2015-01-25 Thread Erick Erickson
bq: I thought SolrCloud replicas were replication, and you imply parallel indexing Absolutely! You couldn't get near-real-time indexing if you relied on replication a-la 3x. And you also couldn't guarantee consistency. Say you have 1 shard, a leader and a follower (i.e. 2 replicas). Now you thro

Re: replicas goes in recovery mode right after update

2015-01-25 Thread Erick Erickson
Ah, OK. Whew! because I was wondering how you were running at _all_ if all the memory was allocated to the JVM ;).. What is your Zookeeper timeout? The original default was 15 seconds and this has caused problems like this. Here's the scenario: You send a bunch of docs at the server, and eventuall

Re: Unexplained leader initiated recovery after updates - SolrCmdDistributor no longer retries on RemoteSolrException

2015-01-25 Thread sekhrivijay
Hi Lindsey Were you every able to figure out the reason for this behavior? We are experiencing the same issue with solr cloud version 4.10 http://lucene.472066.n3.nabble.com/jira-Commented-SOLR-7030-replicas-goes-in-recovery-mode-right-after-update-td4181881.html https://issues.apache.org/jira/br

Re: solr replication vs. rsync

2015-01-25 Thread Dan Davis
@Erick, Problem space is not constant indexing. I thought SolrCloud replicas were replication, and you imply parallel indexing. Good to know. On Sunday, January 25, 2015, Erick Erickson wrote: > @Shawn: Cool table, thanks! > > @Dan: > Just to throw a different spin on it, if you migrate to S

Re: solr replication vs. rsync

2015-01-25 Thread Dan Davis
Thanks! On Sunday, January 25, 2015, Erick Erickson wrote: > @Shawn: Cool table, thanks! > > @Dan: > Just to throw a different spin on it, if you migrate to SolrCloud, then > this question becomes moot as the raw documents are sent to each of the > replicas so you very rarely have to copy the fu

Re: replicas goes in recovery mode right after update

2015-01-25 Thread Vijay Sekhri
Thank you for the reply Eric. I am sorry I had wrong information posted. I posted our DEV env configuration by mistake. After double checking our stress and Prod Beta env where we have found the original issue, I found all the searchers have around 50 GB of RAM available and two instances of JVM ru

Re: replicas goes in recovery mode right after update

2015-01-25 Thread Erick Erickson
Shawn directed you over here to the user list, but I see this note on SOLR-7030: "All our searchers have 12 GB of RAM available and have quad core Intel(R) Xeon(R) CPU X5570 @ 2.93GHz. There is only one java process running i.e jboss and solr in it . All 12 GB is available as heap for the java proc

Sorting on a computed value

2015-01-25 Thread tedsolr
I'll bet some super user has figured this out. How can I perform a sort on a single computed field? I have a QParserPlugin that is collapsing docs based on data from multiple fields. I am summing the values from one numerical field 'X'. I was going to use a DocTransformer to inject that summed valu

replicas goes in recovery mode right after update

2015-01-25 Thread Vijay Sekhri
We have a cluster of solr cloud server with 10 shards and 4 replicas in each shard in our stress environment. In our prod environment we will have 10 shards and 15 replicas in each shard. Our current commit settings are as follows ** *50* *18* ** ** *200

Re: solr replication vs. rsync

2015-01-25 Thread Erick Erickson
@Shawn: Cool table, thanks! @Dan: Just to throw a different spin on it, if you migrate to SolrCloud, then this question becomes moot as the raw documents are sent to each of the replicas so you very rarely have to copy the full index. Kind of a tradeoff between constant load because you're sending

Re: solr replication vs. rsync

2015-01-25 Thread Shawn Heisey
On 1/24/2015 10:56 PM, Dan Davis wrote: > When I polled the various projects already using Solr at my organization, I > was greatly surprised that none of them were using Solr replication, > because they had talked about "replicating" the data. > > But we are not Pinterest, and do not expect to be

RE: Facet Double Counting

2015-01-25 Thread Toke Eskildsen
harish singh [harish.sing...@gmail.com] wrote: > I tried the Faceting on the UUID field. Nice debug trick. I'll remember that to next time. > So does this mean, when I do a facet query on facet.field= loginUserName, > Solr does not look at the UUID? Yes. For faceting, Solr only uses the internal

Re: Facet Double Counting

2015-01-25 Thread harish singh
Oh yes!! :) I tried the Faceting on the UUID field. All the uuids have count = 2 ==> which probably explains why I am getting Double counting in Facet result. So does this mean, when I do a facet query on facet.field= loginUserName, Solr does not look at the UUID? And the unique field (UUID in thi

RE: Facet Double Counting

2015-01-25 Thread Toke Eskildsen
harish singh [harish.sing...@gmail.com] wrote: > As you see, the result is showing Facet-Count for "loginUserName= harry" is > 36. > > So when I do a Solr Search for logs, I should get 36 logs. > But I am getting 18. > This happening for all the searches now. If you have recently added or changed

Re: Facet Double Counting

2015-01-25 Thread Ahmet Arslan
weird, optimize or expungeDeletes=true should do the trick. Can you try to optimise this time? On Sunday, January 25, 2015 11:08 AM, harish singh wrote: Still the same. Can the reason be that if there are duplicate logs/documents, then the Facet query will count them, but when I do the Search Q

Re: Facet Double Counting

2015-01-25 Thread harish singh
Still the same. Can the reason be that if there are duplicate logs/documents, then the Facet query will count them, but when I do the Search Query, solr eliminates the duplicates? On Sat, Jan 24, 2015 at 11:47 PM, Ahmet Arslan wrote: > > > Hi Harish, > > What happens when you purge deleted ter