Re: Can index size increase when no updates/optimizes are happening?

2013-03-18 Thread eanand333
This is what we do, A user logs in - enter s a few documents in a particular domain, say A, B or C - logs out. Say B is the most commonly used domain. The increase in index size is drastic only in this particular domain. So unless a user logs in there s no question of documents being submitted or

Facets with 5000 facet fields

2013-03-18 Thread sivaprasad
Hi, We have configured solr for 5000 facet fields as part of request handler.We have 10811177 docs in the index. The solr server machine is quad core with 12 gb of RAM. When we are querying with facets, we are getting out of memory error. What we observed is , If we have larger number of facets

Re: removing all fields before full import using DIH

2013-03-18 Thread Gora Mohanty
On 18 March 2013 13:09, Rohan Thakur wrote: > hi all > > how can I ensure that I have delete all the fields for solr before doing > full import in DIH only? the aim is that my database is pretty small so > full import takes only 3-4 sec. thus I do not require delta import for now > and I want to e

Re: Facets with 5000 facet fields

2013-03-18 Thread Upayavira
I'd be very surprised if this were to work. I recall one situation in which 24 facets in a request placed too much pressure on the server. In order to support faceting, Solr maintains a cache of the faceted field. You need one cache for each field you are faceting on, meaning your memory requireme

Re: Facets with 5000 facet fields

2013-03-18 Thread Toke Eskildsen
On Mon, 2013-03-18 at 08:34 +0100, sivaprasad wrote: > We have configured solr for 5000 facet fields as part of request handler.We > have 10811177 docs in the index. > > The solr server machine is quad core with 12 gb of RAM. > > When we are querying with facets, we are getting out of memory erro

Re: removing all fields before full import using DIH

2013-03-18 Thread Rohan Thakur
k thanks yes I dint checked it before I was using DIH full import directly and one day I observed that my solr search was giving duplicate results then I deleted all the entries and re index the dataand after that for ensure that this does not happen I always use delete first then do full impor

Re: Fuzzy Suggester and exactMatchFirst

2013-03-18 Thread Robert Muir
On Sun, Mar 17, 2013 at 8:19 PM, Eoghan Ó Carragáin wrote: > > I can see why the Fuzzy Suggester sees "college" as a match for "colla" but > expected the exactMatchFirst parameter to ensure that suggestions beginning > with "colla" to be weighted higher than "fuzzier" matches. I > have spellcheck.

Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Jochen Just
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi list, i have the following field type in my schema.xml defined in order to be able to do in word search. Searching itself works as expected, though high

Re: Solr indexing binary files

2013-03-18 Thread Luis
Hi Gora, Yes, my urlpath points to an url like that. I do not get why uncommenting the catch all dynamic field ("*") does not work for me. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-indexing-binary-files-tp4047470p4048542.html Sent from the Solr - User mailing li

Re: Handling a closed IndexWriter in SOLR 4.0

2013-03-18 Thread Mark Miller
I'll fix it - I put up a patch last night. - Mark On Mar 18, 2013, at 1:12 AM, mark12345 wrote: > This looks similar to the issue I also have: > > * > http://lucene.472066.n3.nabble.com/Solr-4-1-4-2-SolrException-Error-opening-new-searcher-td4046543.html >

Re: Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Koji Sekiguchi
Hi Jochen, There is a restriction in FVH. FVH cannot deal with variable gram size. That is, minGramSize == maxGramSize in your NGramFilterFactory setting. koji -- http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html (13/03/18 22:17), Jochen Just wrote: -BEGIN

RE: SOLR - Define fields in DIH configuration file dynamically

2013-03-18 Thread Dyer, James
There are 3 approaches I can think of: 1. You can generate a new data-config.xml for each import. With Solr 4.0 and later, DIH re-parses your data-config.xml and picks up any changes automatically. 2. You can parameterize nearly anything in data-config.xml, add the parameters to your request

Re: PDF keyword searches not accurate

2013-03-18 Thread JDJ
Does this make a difference? - JDJ "There are two kinds of people in the world; those who understand binary, and those who don't. -- View this message in context: http://lucene.472066.n3.nabble.com/PDF-keyword-searches-not-accurate-tp4046741p4048596.html Sent from the Solr -

Re: Mark document as hidden

2013-03-18 Thread lboutros
Thanks Jack. I finally managed to replicate the external files with my own replication handler. But now, there's an issue with Solr in the Update Log replay process. The default processor chain is not used, this means that my processor which manage the external files is not used... I have creat

Group By and Sum

2013-03-18 Thread Adam Harris
Hello All, Pretty stuck here and I am hoping you might be the person to help me out. I am working with SOLR and JSONiq which are totally new to me and doing even the simplest of things is just escaping me. I know SQL pretty well however this simple requirement seems escape me. I'll jump right i

Re: Group By and Sum

2013-03-18 Thread Walter Underwood
You should use a relational database. Solr is not really designed for this kind of query. wunder On Mar 18, 2013, at 9:48 AM, Adam Harris wrote: > Hello All, > > Pretty stuck here and I am hoping you might be the person to help me out. I > am working with SOLR and JSONiq which are totally new

Re: 4.0 hanging on startup on Windows after Control-C

2013-03-18 Thread Shawn Heisey
On 3/17/2013 11:51 AM, xavier jmlucjav wrote: Hi, I have an index where, if I kill solr via Control-C, it consistently hangs next time I start it. Admin does not show cores, and searches never return. If I delete the index contents and I restart again all is ok. I am on windows 7, jdk1.7 and Sol

RE: Group By and Sum

2013-03-18 Thread Adam Harris
I agree however the powers that be, being upper management, have decided that we need to switch to SOLR, JSONiq and JavaScript MVC for all our reporting needs. I would love to just keep using the SQL DB that we have been using but alas I am not allowed to. Thanks, Adam -Original Message---

Re: Group By and Sum

2013-03-18 Thread Miguel
Hi Adam Have you seen wiki about field collapsing? http://wiki.apache.org/solr/FieldCollapsing I think that this page help you to emule group by. El 18/03/2013 17:48, Adam Harris escribió: Hello All, Pretty stuck here and I am hoping you might be the person to help me out. I am working

Search on final value in multi-valued field

2013-03-18 Thread Annette Newton
Are multi-valued fields ordered and if so is it possible to search on the final value only? -- Annette Newton Database Administrator ServiceTick Ltd T:+44(0)1603 618326 Seebohm House, 2-4 Queen Street, Norwich, England NR2 4SQ www.servicetick.com *www.sessioncam.com* -- *This message

Re: Group By and Sum

2013-03-18 Thread Walter Underwood
Dang. Well, make your estimates clear. I would not be surprised if this took a few weeks to get something that worked but was too slow. It might require new Solr or Lucene features to make it fast. It may be possible to put something together out of the existing features. If it is, the people o

Re: Search on final value in multi-valued field

2013-03-18 Thread Jack Krupansky
Yes, order is maintained, but search is simply whether any of the multiple values matches. -- Jack Krupansky -Original Message- From: Annette Newton Sent: Monday, March 18, 2013 1:40 PM To: solr-user@lucene.apache.org Subject: Search on final value in multi-valued field Are multi-val

how to deploy customization in solr that requires dependency

2013-03-18 Thread Gian Maria Ricci
Hi to everyone, I want to deploy a custom filter developed in java to Solr4, my problem is that it requires to access Sql Server, so it depends from sqljdbc4.jar, but I got a java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver I've copied the sqljdbc.jar file in t

Re: structure of solr index

2013-03-18 Thread alxsss
---So,"search" time is in no way impacting by the existence or non-existence of stored values, What about memory? Would it require to increase memeory in order to have the same Qtime as in the case of indexed only fields? For example in the case of indexed fields only index size is 5GB, a

Re: Replica is unable to recover because leader doesn't think it is the leader (Solr 4.1)

2013-03-18 Thread Mark Miller
Hmm… Sounds like it's a defensive mechanism we have where a leader will check it's own state about whether it thinks it's the leader with the zk info. In this case it's own state is not convinced of it's leadership. That's just a volatile boolean that gets flipped on when elected. What do the

Re: how to deploy customization in solr that requires dependency

2013-03-18 Thread Dmitry Kan
Hi, See here, might help: http://wiki.apache.org/solr/SolrPlugins#How_to_Load_Plugins We don't use multicore functionality of SOLR, so we decided to bundle SOLR dependencies into the war file of the solr web app. Regards, Dmitry On Mon, Mar 18, 2013 at 7:47 PM, Gian Maria Ricci wrote: > Hi t

Surge 2013 CFP open

2013-03-18 Thread Katherine Jeschke
The Surge 2013 CFP is open. For details or to submit a paper, please visit http://surge.omniti.com/2013 -- Katherine Jeschke Director of Marketing and Creative Services OmniTI Computer Consulting, Inc. 11830 West Market Place, Suite F Fulton, MD 20759 O: 240-646-0770, 222 F: 301-497-2001 C: 443/6

Re: Replica is unable to recover because leader doesn't think it is the leader (Solr 4.1)

2013-03-18 Thread Timothy Potter
Hi Mark, Thanks for responding. Looking under /collections/solr_signal/leader_elect/shard5/election/ there are 2 nodes: 161276082334072879-ADDR1:8983_solr_solr_signal-n_53 - *Mon Mar 18 17:36:41 UTC 2013* 161276082334072880-ADDR2:8983_solr_solr_signal-n_56 - *Mon Mar 18 17:48:22

Re: structure of solr index

2013-03-18 Thread Jack Krupansky
Certainly if you are actually going to reference stored values they will add on to the TOTAL time and memory, but still have zero impact on the actual search time or memory for search. Searching for documents and returning results are two separate steps. Highlighting and faceting are other separ

Re: Replica is unable to recover because leader doesn't think it is the leader (Solr 4.1)

2013-03-18 Thread Timothy Potter
Hi Mark, I figured out what got the cluster into this bad state. I did a rolling restart and one of the JVM processes wasn't killed off before I restarted it, i.e. there were two Solr JVM processes running for the same shard. (Perhaps some things happen in Solr before Jetty fails to bind on the po

Re: Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Jochen Just
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 So just to be clear: There is no possibility to highlight results, if I use variable gram size. Neither the original highlighter nor FVH do the job. Or am I missing something? Btw does any documentation exits how the VFH works? Jochen Am 18.03.2013 15

RE: Query.toString printing binary in the output...

2013-03-18 Thread Andrew Lundgren
I am sorry, I don't follow what you mean by debug=query. Can you elaborate on that a bit? Thanks! -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Sunday, March 17, 2013 8:09 AM To: solr-user@lucene.apache.org Subject: Re: Query.toString printing binary in

Re: Group By and Sum

2013-03-18 Thread Alan Woodward
Hi Adam, Have a look at the stats component: http://wiki.apache.org/solr/StatsComponent. In your case, I think you'd need to add an extra field for your month, and then run a query filtered by your date range with stats.field=NetSales, stats.field=TransCount, and stats.facet=month. Make sure

Re: Making tika process mail attachments eludes me

2013-03-18 Thread Marcos Garcia
Hi Leif I've had the same problem. I tried with 4.2.0 as well, in both fedora 17 and centos6, using java-6 and java-7 (openjdk and oracel/sun as well). I could NEVER use example-DIH against a mailbox having mails attachments. Only mails without them, even if they were HTML, but as long as I in

Re: Search on final value in multi-valued field

2013-03-18 Thread Alexandre Rafalovitch
So, if you really want that you need to clone the field and keep only the final value in the clone. In 4.1, there are helpers for that: http://lucene.apache.org/solr/4_1_0/solr-core/org/apache/solr/update/processor/LastFieldValueUpdateProcessorFactory.html You don't have to store the copied value,

strange behaviour of wordbreak spellchecker in solr cloud

2013-03-18 Thread alxsss
Hello, I try to use wordbreak spellchecker in solr-4.2 with cloud feature. We have two server with one shard in each of them. curl 'server1:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10' curl 'server2:8983/solr/test/testhandler?q=paulusoles&indent=true&rows=10' does not return any

Help getting a document by unique ID

2013-03-18 Thread Brian Hurt
So here's the problem I'm trying to solve: in my use case, all my documents have a unique id associated with them (a string), and I very often need to get them by id. Currently I'm doing a search on id, and this takes long enough it's killing my performance. Now, it looks like there is a GET call

Shingles Filter Query time behaviour

2013-03-18 Thread Catala, Francois
Hello, I am trying to have the input "darkknight" match documents containing either "dark knight" and "darkknight". The reverse should also work ("dark knight" matching "dark knight" and "darkknight") but it doesn't. Does anyone know why? When I run the following query I get the expected respo

Re: how to deploy customization in solr that requires dependency

2013-03-18 Thread Shawn Heisey
On 3/18/2013 11:47 AM, Gian Maria Ricci wrote: I want to deploy a custom filter developed in java to Solr4, my problem is that it requires to access Sql Server, so it depends from sqljdbc4.jar, but I got a java.lang.ClassNotFoundException: com.microsoft.sqlserver.jdbc.SQLServerDriver Solr has a

Re: Help getting a document by unique ID

2013-03-18 Thread Jack Krupansky
Hmmm... if query by your unique key field is killing your performance, maybe you have some larger problem to address. How bad is it? Are you using the string field type? How long are your ids? The only thing the real-time GET API gives you is more immediate access to recently added, uncommitte

Re: 4.0 hanging on startup on Windows after Control-C

2013-03-18 Thread xavier jmlucjav
Hi Shawn, I am using DIH with commit at the end...I'll investigate further to see if this is what is happening and will report back, also will check 4.2 (that I had to do anyway...). thanks for your input xavier On Mon, Mar 18, 2013 at 6:12 PM, Shawn Heisey wrote: > On 3/17/2013 11:51 AM, xavi

Re: Is Solr more CPU bound or IO bound?

2013-03-18 Thread Erick Erickson
And just to make it worse, I've seen lots of cases where the correct answer is "neither, performance is constrained by memory" ... Erick On Sun, Mar 17, 2013 at 10:44 PM, David Parks wrote: > Thank you, Manu, for that excellent discussion on the topic, I could have > been more detailed about my

Re: Incorrect snippets using FastVectorHighlighter

2013-03-18 Thread Koji Sekiguchi
So just to be clear: There is no possibility to highlight results, if I use variable gram size. Neither the original highlighter nor FVH do the job. Or am I missing something? I don't know the latest original highlighter has such restriction or not today, but when FVH came in 2.9, at that time,

Re: Query.toString printing binary in the output...

2013-03-18 Thread Erick Erickson
If you simply attach &debug=all to your URL, you should see the query come back in your response, XML, JSON, whatever. If that also shows bizarre characters, then that will give you some idea whether it's in Solr or not. But you haven't given us much info about how/where you call toString. You may

Re: Group By and Sum

2013-03-18 Thread Erick Erickson
Second Walter's comment. Make really, _really_ sure that "the powers that be" recognize that they're asking for something unreasonable and it'll cost them dearly to get it. Best Erick On Mon, Mar 18, 2013 at 12:04 PM, Alan Woodward wrote: > Hi Adam, > > Have a look at the stats component: > ht

Re: DIH silently ignoring a record

2013-03-18 Thread Shalin Shekhar Mangar
That does sound perplexing. Justin, can you tell us which field in the query is your record id? What is the record id's type in database and in solr schema? What is your unique key and its type in solr schema? On Tue, Mar 19, 2013 at 5:19 AM, Justin L. wrote: > Every time I do an import, DataI

Solr Core Creation dynamically

2013-03-18 Thread Ravi_Mandala
Hi, I am trying to create new core dynamically(programmatically) in solr 4.0.I tried with http://localhost:7081/apache-solr-4.0.0/admin/cores?action=CREATE&name=coreX&instanceDir=coreX&config=solr-config.xml&schema=schema.xml&dataDir=data But I am not able to create a core. Is there any way to c

SolrCloud with Zookeeper ensemble : fail to restart master server

2013-03-18 Thread Patrick Mi
Hi there, I have experienced some problems starting the master server. Solr4.2 under Tomcat 7 on Centos6. Configuration : 3 solr instances running on different machines, one shard, 3 cores, 2 replicas, using Zookeeper comes with Solr The master server A has the following run option: -Dbootstr

RE: SnapPull failed - SOLR 4.1

2013-03-18 Thread Sandeep Kumar Anumalla
Hi Mark, I have upgraded Solr 4.2 still I am getting this exception. INFO: removing temporary index download directory files NRTCachingDirectory(org.apache.lucene.store.MMapDirectory@/data/solr-4.2.0/example/solr/collection1/data/index.20130319101506108 lockFactory=org.apache.lucene.store.Simp