Re: Tokenized keywords

2013-01-20 Thread Mikhail Khludnev
Romita, That's what exactly is shown debugQuery output. If you cant find it there, paste output here, let's try to find together. Also pay attention to explainOther debug parameter and analisys page in admin ui. 21.01.2013 10:50 пользователь "Romita Saha" написал: > What I am trying to achieve is

Data import handler start bulging the memory after completing 1 million

2013-01-20 Thread vijeshnair
You may refer this snapshot to get an understanding of the resource consumption. I am trying to index a total number of 13 million documents from MySQL to SOLR. First 1 million document's got completed very smoothly in the first

Re: Tokenized keywords

2013-01-20 Thread Romita Saha
What I am trying to achieve is as follows. I query "Search for all the Laptops" and my tokenized key words are "search laptop" (I apply stopword filter to filter out words like for,all,the and i also user lowercase filter). I want to display these tokenized keywords using debugQuery. Thanks and

Re: Tokenized keywords

2013-01-20 Thread Dikchant Sahi
Can you please elaborate a more on what you are trying to achieve. Tokenizers work on indexed field and doesn't effect how the values will be displayed. The response value comes from stored field. If you want to see how your query is being tokenized, you can do it using analysis interface or enabl

Re: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread giltene
If you believe the logs, using -XX:+PrintGCApplicationStoppedTime is probably the easiest way to avoid having to try to parse pause times from various formats. But remember, GC logs can [often unintentionally] lie (I've seen them under-report by multi-second gaps). If you want to actually measure

Re: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread giltene
> I don't see any info on your website about pricing, so I can't make any > decisions about whether it would be right for me. Can you give me > long-term pricing information? As is the case with much of enterprise software (including getting a supported version of Oracle HotSpot), this is a sal

Re: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread Shawn Heisey
Unfortunately, G1 on Java 6 was a bust. Several times GC pauses made my load balancer think the server was down, just like with CMS/ParNew. Either there's something about my production query patterns that doesn't get along with any of the garbage collection methods, or I need to upgrade to Ja

Re: build CMIS compatible Solr

2013-01-20 Thread Nicholas Li
I think this might be the one you are talking about: https://github.com/sourcesense/solr-cmis But I think Alfresco has already had search functionality, similar to Solr. Then why did you want to use it to index docs out of Alfresco? On Fri, Jan 18, 2013 at 8:00 PM, Upayavira wrote: > A colleagu

Re: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread Shawn Heisey
On 1/20/2013 2:13 PM, Markus Jelsma wrote: Hi Shawn, Although our heap spaces are much less than yours (256M for 2x 2.5GB cores per node) we saw decreased throughput and higher latency with G1 on Java 6. You can also expect higher CPU consumption. You can check it very well with VisualVM atta

RE: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread Markus Jelsma
Hi Shawn, Although our heap spaces are much less than yours (256M for 2x 2.5GB cores per node) we saw decreased throughput and higher latency with G1 on Java 6. You can also expect higher CPU consumption. You can check it very well with VisualVM attached. Looking forward to your results. Mar

Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-20 Thread Brett Hoerner
Sorry, I take it back. It looks like fixing https://issues.apache.org/jira/browse/SOLR-4321 fixed my issue after all. On Sun, Jan 20, 2013 at 2:21 PM, Brett Hoerner wrote: > So the ticket I created wasn't related, there is a working patch for that > now but my original issue remains, I get 404 w

Re: Solr 4.0 - timeAllowed in distributed search

2013-01-20 Thread Walter Underwood
If you are going to request 30,000 rows, you can give up on getting good performance. It is not going to happen. Even without all the disk accesses, think about how much is sent over the network, then parsed by the client. The client cannot even start working with the data until it is all recei

Re: Solr cache considerations

2013-01-20 Thread Walter Underwood
I routinely see hit rates over 75% on the document cache. Perhaps yours is too small. Mine is set at 10240 entries. wunder On Jan 20, 2013, at 8:08 AM, Erick Erickson wrote: > About your question about document cache: Typically the document cache > has a pretty low hit-ratio. I've rarely, if ev

Re: Solr cache considerations

2013-01-20 Thread Isaac Hebsh
Wow Erick, The MMap acrtivle is a very fundamental one. Totaly changed my view. It must be mentioned in SolrPerformanceFactors in SolrWiki... I'm sorry I did not know it before. Thank you a lot. I promise to share my results then my cart will start to fly :) On Sun, Jan 20, 2013 at 6:08 PM, Erick

Re: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread Shawn Heisey
On 1/18/2013 10:07 PM, Shawn Heisey wrote: I may try the G1 collector with Java 6 in production, since I am on the newest Oracle version. I am giving this a try on my secondary server set. An encouraging note: The -XX:+UnlockExperimentalVMOptions option is no longer required to use the G1 co

Re: Have the SolrCloud collection REST endpoints move or changed for 4.1?

2013-01-20 Thread Brett Hoerner
So the ticket I created wasn't related, there is a working patch for that now but my original issue remains, I get 404 when trying to post updates to a URL that worked fine in Solr 4.0. On Sat, Jan 19, 2013 at 5:56 PM, Brett Hoerner wrote: > I'm actually wondering if this other issue I've been h

Re: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread Shawn Heisey
On 1/20/2013 11:33 AM, Shawn Heisey wrote: On 1/18/2013 10:07 PM, Shawn Heisey wrote: On my dev 4.1 server with Java 7u11, I am using the G1 collector with a max pause target of 1500ms. I was thinking that this collector was producing long pauses too, but after reviewing the gc log with a close

Re: Long ParNew GC pauses - even when young generation is small

2013-01-20 Thread Shawn Heisey
On 1/18/2013 10:07 PM, Shawn Heisey wrote: On my dev 4.1 server with Java 7u11, I am using the G1 collector with a max pause target of 1500ms. I was thinking that this collector was producing long pauses too, but after reviewing the gc log with a closer eye, I see that there are lines that speci

RE: Solr 4.0 - timeAllowed in distributed search

2013-01-20 Thread Michael Ryan
(This is based on my knowledge of 3.6 - not sure if this has changed in 4.0) You are using rows=3, which requires retrieving 3 documents from disk. In a non-distributed search, the QTime will not include the time it takes to retrieve these documents, but in a distributed search, it will.

Re: Solr cache considerations

2013-01-20 Thread Erick Erickson
About your question about document cache: Typically the document cache has a pretty low hit-ratio. I've rarely, if ever, seen it get hit very often. And remember that this cache is only hit when assembling the response for a few documents (your page size). Bottom line: I wouldn't worry about this

Re: Solr load balancer

2013-01-20 Thread Erick Erickson
Hmmm, the first thing I'd look at is why you are having long GC pauses. Here's a great place to start: http://www.lucidimagination.com/blog/2011/03/27/garbage-collection-bootcamp-1-0/ and: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html I've wondered about a similar approa

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-20 Thread Erick Erickson
If this was in SolrCloud mode, there was a bug in 4.0 when submitting batches of documents at once. Can't find it right now, but thought I'd mention it just in case. Submitting the docs one-at-a-time doesn't have the same problem. May not be applicable, and entirely orthogonal to the discussion ab

Re: Language Identification in index time

2013-01-20 Thread Jack Krupansky
It sounds like you want an update request processor: http://wiki.apache.org/solr/UpdateRequestProcessor But, it also sounds like you should probably be normalizing the encoding before sending the data to Solr. -- Jack Krupansky -Original Message- From: Yewint Ko Sent: Sunday, Januar

Re: Using Solr Spatial in conjunction with HBASE/Hadoop

2013-01-20 Thread ashok joshi
Have you looked at Oracle NoSQL Database http://www.oracle.com/us/products/database/nosql/overview/index.html, a scalable key-value store? Can Solr be integrated with it? Thanks and warm regards. ashok joshi oracle -- View this message in context: http://lucene.472066.n3.nabble.com/Using-Solr