Re: Can Solr handle large text files?

2012-07-27 Thread Peter Spam
Has the performance of highlighting large text documents been improved in Solr 4? Thanks! Pete On Nov 5, 2011, at 9:03 AM, Erick Erickson wrote: > Sure, if you write a custom update handler. But I'm not at all sure > this is "ideal". > You're requiring all that data to be transmitted across t

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
subQueries.get(i).close() is nothing but pulling the refrence from the vector and closing it. So yes. it wouldnt throw exception. vector subQueries Please let me know if you need any more information On Fri, Jul 27, 2012 at 10:14 PM, Karthick Duraisamy Soundararaj < karthick.soundara...@gmail.co

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
SimpleOrderedMap commonRequestParams; //This holds the common request params. Vector> subQueryRequestParams; // This holds the request params of sub Queries I use the above to create multiple localQueryRequests. To add a little more information, I create new ResponseBuilder for each request I al

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
First no. Because i do the following for(i=0;i wrote: > A finally clause can throw exceptions. Can this throw an exception? > subQueries.get(i).close(); > > If so, each close() call should be in a try-catch block. > > On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj >

Re: querying using filter query and lots of possible values

2012-07-27 Thread Chris Hostetter
: the list of IDs is constant for a longer time. I will take a look at : these join thematic. : Maybe another solution would be to really create a whole new : collection or set of documents containing the aggregated documents (from the : ids) from scratch and to execute queries on this collection.

Re: Deduplication in SolrCloud

2012-07-27 Thread Lance Norskog
Should the old Signature code be removed? Given that the goal is to have everyone use SolrCloud, maybe this kind of landmine should be removed? On Fri, Jul 27, 2012 at 8:43 AM, Markus Jelsma wrote: > This issue doesn't really describe your problem but a more general problem of > distributed dedu

Re: leaks in solr

2012-07-27 Thread Lance Norskog
A finally clause can throw exceptions. Can this throw an exception? subQueries.get(i).close(); If so, each close() call should be in a try-catch block. On Fri, Jul 27, 2012 at 5:28 PM, Karthick Duraisamy Soundararaj wrote: > Hello all, > While running in my eclipse and run a set of

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
Just to clarify, the leak happens everytime a new searcher is opened. On Fri, Jul 27, 2012 at 8:28 PM, Karthick Duraisamy Soundararaj < karthick.soundara...@gmail.com> wrote: > Hello all, > While running in my eclipse and run a set of queries, this > works fine, but when I run it in t

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
Hello all, While running in my eclipse and run a set of queries, this works fine, but when I run it in test production server, the searchers are leaked. Any hint would be appreciated. I have not used CoreContainer. Considering that the SearchHandler is running fine, I am not able to th

RE: Bulk Indexing

2012-07-27 Thread Lan
I assume your're indexing on the same server that is used to execute search queries. Adding 20K documents in bulk could cause the Solr Server to 'stop the world' where the server would stop responding to queries. My suggestion is - Setup master/slave to insulate your clients from 'stop the world'

Re: Solr edismax NOT operator behavior

2012-07-27 Thread Jack Krupansky
"can any one explain" - add the &debugQuery=true option to your request and Solr will give an explanation, including the parsed query and the Lucene scoring of documents. If you think Solr is wrong, show us a sample document that either is supposed to appear that doesn't, or doesn't appear and

Re: Bulk Indexing

2012-07-27 Thread Sohail Aboobaker
We will be using Solr 3.x version. I was wondering if we do need to worry about this as we have only 10k index entries at a time. It sounds like a very low number and we have only document type at this point. Should we worry about directly using SolrJ for indexing and searching for this low volume

Solr not getting OpenText document name and metadata

2012-07-27 Thread eShard
Hi, I'm currently using ManifoldCF (v.5.1) to crawl OpenText (v10.5) and the output is sent to Solr (4.0 alpha). All I see in the index is an id = to the opentext download URL and a version (a big integer value). What I don't see is the document name from OpenText or any of the Opentext metadata. D

Re: Bulk Indexing

2012-07-27 Thread Alexandre Rafalovitch
Haven't tried this but: 1) I think SOLR 4 supports on-the-fly core attach/detach/select. Can somebody confirm this? 2) If 1) is true, run everything as two cores. 3) One core is live in production 4) Second core is detached from SOLR and attached to something like SolrJ, which I believe can index w

RE: Bulk Indexing

2012-07-27 Thread Zhang, Lisheng
Hi, Previously I asked a similar question and I have not fully implemented yet. My plan is: 1) use Solr only for search, not for indexing 2) have a separate java process to index (calling lucene API directly, maybe can call Solr API, I need to check more details). As other people pointed earl

question(s) re lucene spatial toolkit aka LSP aka spatial4j

2012-07-27 Thread solr-user
hopefully someone is using the lucene spatial toolkit aka LSP aka spatial4j, and can answer this question we are using this spatial tool for doing searches. overall, it seems to work very well. however, finding documentation is difficult. I have a couple of questions: 1. I have a geohash field

RE: Deduplication in SolrCloud

2012-07-27 Thread Markus Jelsma
This issue doesn't really describe your problem but a more general problem of distributed deduplication: https://issues.apache.org/jira/browse/SOLR-3473 -Original message- > From:Daniel Brügge > Sent: Fri 27-Jul-2012 17:38 > To: solr-user@lucene.apache.org > Subject: Deduplication in

Deduplication in SolrCloud

2012-07-27 Thread Daniel Brügge
Hi, in my old Solr Setup I have used the deduplication feature in the update chain with couple of fields. true signature false uuid,type,url,content_hash org.apache.solr.update.processor.Lookup3Signature This worked fine. When I now use this in my 2 shards SolrCloud setup when inserti

Problem with Solr 4.0-ALPHA and JSON response

2012-07-27 Thread Federico Valeri
Hi all, I'm new to Solr, I have a problem with JSON format, this is my Java client code: PrintWriter out = res.getWriter(); res.setContentType("text/plain"); String query = req.getParameter("query"); SolrServer solr = new HttpSolrServer(solrServer); ModifiableSolrParams params = new ModifiableSolr

how solr will apply regex fragmenter

2012-07-27 Thread meghana
I was looking on Regex fragment for customizing my highlight fragment, I was wondering how Regex fragment works within solr and googled for it , But didn't found any results. Can anybody tell me how regex fragmenter works with in solr. And when regex fragmenter apply regex on fragments , do i

Re: leaks in solr

2012-07-27 Thread Karthick Duraisamy Soundararaj
I have tons of these open. searcherName : Searcher@24be0446 main caching : true numDocs : 1331167 maxDoc : 1338549 reader : SolrIndexReader{this=5585c0de,r=ReadOnlyDirectoryReader@5585c0de ,refCnt=1,segments=18} readerDir : org.apache.lucene.store.NIOFSDirectory@ /usr/local/solr/highlander/data/...

Re: too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-27 Thread Alexandre Rafalovitch
Try taking a couple of thread dumps and see where in the stack the snowball classes show up. That might give you a clue. Did you customize the parameters to the stemmer? If so, maybe it has problems with the file you gave it. Just some generic thoughts that might help. Regards, Alex. Personal

Re: too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-27 Thread Bernd Fehling
It is something from internally of the snowball analyzer (stemmer). To find out more you should take a heapdump and look into it with Memory Analyzer (MAT) http://www.eclipse.org/mat/ Regards, Bernd Am 27.07.2012 09:53, schrieb roz dev: > Hi All > > I am trying to find out the reason for very

Re: Skip first word

2012-07-27 Thread Chantal Ackermann
Your're welcome :-) C

Solr - customize Fragment using hl.fragmenter and hl.regex.pattern

2012-07-27 Thread meghana
0 down vote favorite I want solr highlight in specific format. Below is string format for which i need to provide highlighting feature --- 130s: LISTEN! LISTEN! 138s: [THUMP] 143s: WHAT IS THAT? 144s: HEAR THAT? 152s: EVERYBODY, SHH. SHH. 156s: STAY UP THERE.

Re: Skip first word

2012-07-27 Thread Finotti Simone
Brilliant! Thank you very much :) Inizio: Chantal Ackermann [c.ackerm...@it-agenten.com] Inviato: venerdì 27 luglio 2012 11.20 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word Hi Simone, no I meant that you populate the two fields with the s

dynamic EdgeNGramFilter

2012-07-27 Thread Alexander Helhorn
of virus signature database 7333 (20120727) __ The message was checked by ESET Mail Security. http://www.eset.com

Re: Skip first word

2012-07-27 Thread Chantal Ackermann
Hi Simone, no I meant that you populate the two fields with the same input - best done via copyField directive. The first field will contain ngrams of size 1 and 2. The other field will contain ngrams of size 3 and longer (you might want to set a decent maxsize there). The query for the autoc

R: Skip first word

2012-07-27 Thread Finotti Simone
Could you elaborate it, please? thanks S Inizio: in.abdul [in.ab...@gmail.com] Inviato: giovedì 26 luglio 2012 20.36 Fine: solr-user@lucene.apache.org Oggetto: Re: Skip first word That's is best option I had also used shingle filter factory . . On Jul 26

Re: Skip first word

2012-07-27 Thread Finotti Simone
Hi Chantal, if I understand correctly, this implies that I have to populate different fields according to their lenght. Since I'm not aware of any logical condition you can apply to copyField directive, it means that this logic has to be implementend by the process that populates the Solr core.

Re: Upgrade solr 1.4.1 to 3.6

2012-07-27 Thread alexander81
Yes, the index. You know any link/documentation about upgrade solr 1.4.1 -> 3.6? -- View this message in context: http://lucene.472066.n3.nabble.com/Upgrade-solr-1-4-1-to-3-6-tp3996952p3997678.html Sent from the Solr - User mailing list archive at Nabble.com.

too many instances of "org.tartarus.snowball.Among" in the heap

2012-07-27 Thread roz dev
Hi All I am trying to find out the reason for very high memory use and ran JMAP -hist It is showing that i have too many instances of org.tartarus.snowball.Among Any ideas what is this for and why am I getting so many of them num #instances#bytes Class description ---

Re: leaks in solr

2012-07-27 Thread roz dev
in my case, I see only 1 searcher, no field cache - still Old Gen is almost full at 22 GB Does it have to do with index or some other configuration -Saroj On Thu, Jul 26, 2012 at 7:41 PM, Lance Norskog wrote: > What does the "Statistics" page in the Solr admin say? There might be > several "se