Solr 1.4 - Performance Issues

2013-11-04 Thread Stephen Delano
Hi all, I wanted to share the issues we're having with Solr 1.4 to get some ideas of things we can do in the short term that will buy us enough time to validate Solr 4 before upgrading and not have 1.4 burn to the ground before we get there. We've been running Solr 1.4 in production for over 3 ye

Re: character encoding issue...

2013-11-04 Thread Chris
Sorry, was away a bit & hence the delay. I am inserting java strings into a java bean class, and then doing a addBean() method to insert the POJO into Solr. When i Query using either tomcat/jetty, I get these special characters. But I have noted, if I change output to - "Shift-JIS" encoding then

Slow Indexing speed for csv files, multi-threaded indexing

2013-11-04 Thread Vikram Srinivasan
Hello, I know this has been discussed extensively in past posts. I have tried a bunch of suggestions and I still have a few questions. I am using solr4.4 from tomcat 7. I am using openjdk1.7 and I am using 1 solr core I am trying to index a bunch of csv files (total size 13GB). Each csv file

The first search is slow

2013-11-04 Thread Boole.Z.Guo (mis.cnsh04.Newegg) 41442
Hi, I am using solr4.3.1. When I search something, the first time is too slow. How can I improve this? [cid:image001.png@01CEDA11.6F0AA1B0] The first time search [cid:image002.png@01CEDA11.6F0AA1B0] The second time search Best Regards, Boole Guo Software Engineer, NESC-SH.MIS +86-021-5153

Re: Does solr supports Federated search, if not what framework

2013-11-04 Thread Alexandre Rafalovitch
On Tue, Nov 5, 2013 at 6:09 AM, Susheel Kumar < susheel.ku...@thedigitalgroup.net> wrote: > Hello, > > We have a scenario where we present results to users one from solr and > other from real time web site search. The solr data we have locally > available that we are able to index but other websit

Does solr supports Federated search, if not what framework

2013-11-04 Thread Susheel Kumar
Hello, We have a scenario where we present results to users one from solr and other from real time web site search. The solr data we have locally available that we are able to index but other website search, we don't host data and it is real time. We are wondering if we can use some federated

Disjuctive Queries (OR queries) and FilterCache

2013-11-04 Thread Patanachai Tangchaisin
Hello, We are running our search system using Apache Solr 4.2.1 and using Master/Slave model. Our index has ~100M document. The index size is ~20gb. The machine has 24 CPU and 48gb rams. Our response time is pretty bad, median is ~4 seconds with 25 queries/second. We noticed a couple of things

Re: Problem of facet on 170M documents

2013-11-04 Thread Mingfeng Yang
Erick, It could have more than 4M distinct values. The purpose of this facet is to display the most frequent, say top 500, urls to users. Sascha, Thanks for the info. I will look into this thread thing. Mingfeng On Mon, Nov 4, 2013 at 4:47 AM, Erick Erickson wrote: > How many unique URLs do

RE: 2 replicas with different num of documents

2013-11-04 Thread Markus Jelsma
Hi - we've seen that issue as well (SOLR-4260) and it happend many times with older versions. The good thing is that we haven't seen it for a very long time now so i silently assumed other fixes already solved the problem. We don't know how to reproduce the problem but in older versions it seeme

2 replicas with different num of documents

2013-11-04 Thread yriveiro
Hi, I have 2 replicas with different number of documents, Is it possible? I'm using Solr 4.5.1 Replica 1: version:77847 numDocs:5951879 maxDoc:5951978 deletedDocs:99 Replica 2: version:76011 numDocs:5951793 maxDoc:5951965 deletedDocs:172 Is it not supposed tlog ensure the data consistency?

RE: Facet question: Getting only the matched value from multivalued field

2013-11-04 Thread Susheel Kumar
Thanks, Aloke. Prefix solves this problem partially but wanted to see if we have solution which works all the time. For e.g. if we search for "Ronald Wagner" and in multivalues fields we will get result like below and I really want to get only the values facets are "Wagner, Ronald S MD ", "Wag

Re: Can't find some fields in solr result

2013-11-04 Thread Jack Krupansky
Is it possible that you added stored="true" later, after some of the documents were already indexed? Then the older documents would not have the stored values. If so, you need to reindex the older documents. -- Jack Krupansky -Original Message- From: gohome190 Sent: Monday, November

Re: Can't find some fields in solr result

2013-11-04 Thread gohome190
All fields are set to stored="true" in my schema.xml, and fl=* doesn't change the output of the response. I even checked the logs, no errors on any fields. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-find-some-fields-in-solr-result-tp4099245p4099251.html Sent from

Re: Can't find some fields in solr result

2013-11-04 Thread gohome190
Also also, adding fl=* still doesn't solve the problem, still only 19 fields returning. And the missing fields definitely have values, because I can do a specific solr query of a missing field and its value, and the entry show up (with only 19 fields again though) -- View this message in context

Re: Can't find some fields in solr result

2013-11-04 Thread Yonik Seeley
On Mon, Nov 4, 2013 at 2:19 PM, gohome190 wrote: > I have a database that has about 25 fields for each entry. However, when I > do a solr *:* query, I can only see the first 19 fields for each entry. > However, I can successfully use the fields that don't show up as queries. > So weird! Because t

Re: Can't find some fields in solr result

2013-11-04 Thread gohome190
Also, no errors in the Logging, and all fields are in the schema.xml. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-t-find-some-fields-in-solr-result-tp4099245p4099247.html Sent from the Solr - User mailing list archive at Nabble.com.

Can't find some fields in solr result

2013-11-04 Thread gohome190
Hi, I have a database that has about 25 fields for each entry. However, when I do a solr *:* query, I can only see the first 19 fields for each entry. However, I can successfully use the fields that don't show up as queries. So weird! Because that means that solr has them, but isn't sending the

SolrCloud (4.4) and CurrencyField refresh intervals

2013-11-04 Thread Michael Tracey
I've got a 4.4 solrCloud cluster running, and have an external process that rebuilds the currency.xml file and uploads to zookeeper the latest version every X minutes. It looks like with CurrencyField the OpenExchangeRatesOrgProvider provider has a refreshInterval setting, but the documentation

Re: Recherche avec et sans espaces

2013-11-04 Thread Roman Chyla
Hi Antoine, I'll permit myself to respond in English, cause my written French is slower;-) Your problem is a well known amongst Sold users, the query parser splits tokens by empty space, so the analyser never sees input 'la redoutte' but it receives 'la' 'reroute'. You can of course enclose your se

RE: Recherche avec et sans espaces

2013-11-04 Thread Jean-Sebastien Vachon
Bonjour Antoine, Je ne vois que 2 solutions à ton problème. 1) utilisation de synonymes mais tu seras limités au cas connus d'avance seulement alors c'est une solution qui ne scale pas à long terme. 2) sinon tu dois envisager d'avoir un deuxième champ (probablement en CopyField) qui n'utiliser

Re: Performance of "rows" and "start" parameters

2013-11-04 Thread Erick Erickson
bq: start=0&rows=30 Let's see the start and rows parameters for a few of your queries, because on the surface this makes no sense. If you're always starting at 0, this shouldn't be happening And you say "the second query is visibly slower". You're talking about the "deep paging" problem, whic

Re: SolrCloud: read only node

2013-11-04 Thread Erick Erickson
Well, I do have to question why you need to do anything. Just don't send updates to the remote machines.. But do remember that all nodes in SolrCloud can be equal, which is one of the points. FWIW, Erick On Mon, Nov 4, 2013 at 10:34 AM, Uwe Reh wrote: > F***, this is the answer, I was

Recherche avec et sans espaces

2013-11-04 Thread Antoine REBOUL
Bonjour, je souhaite faire en sorte que les recherches dans un champs de type texte renvoient des résultats même si les espaces sont mal saisies (par exemple : "la redoute"="laredoute"). Aujourd'hui mon champ texte est défini de la façon suivante : Merci d'av

Re: Performance of "rows" and "start" parameters

2013-11-04 Thread Michael Della Bitta
The query time increases because in order to calculate the set of documents that belongs in page N, you must first calculate all the pages prior to page N, and this information is not stored in between requests. Two ways of speeding this stuff up are to request bigger pages, and/or use filter quer

Re: SolrCloud: read only node

2013-11-04 Thread Uwe Reh
F***, this is the answer, I was afraid of. ;-) I hoped, there could be anything, similar to http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html. Nevertheless, thank you. Uwe Am 04.11.2013 14:14, schrieb Erick Erickson: In this situation, I'd consider going with the older master/slav

Re: Performance of "rows" and "start" parameters

2013-11-04 Thread Bill Bell
Do you want to look thru then all ? Have you considered Lucene API? Not sure if that is better but it might be. Bill Bell Sent from mobile > On Nov 4, 2013, at 6:43 AM, "michael.boom" wrote: > > I saw that some time ago there was a JIRA ticket dicussing this, but still i > found no relevant i

Re: Core admin: create new core

2013-11-04 Thread Bill Bell
You could pre create a bunch of directories and base configs. Create as needed. Then use schema less API to set it up ... Or make changes in a script and reload the core.. Bill Bell Sent from mobile > On Nov 4, 2013, at 6:06 AM, Erick Erickson wrote: > > Right, this has been an issue for a w

Re: SolrCloud different machine sizes

2013-11-04 Thread michael.boom
Thank you, Erick! - Thanks, Michael -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-different-machine-sizes-tp4099138p4099195.html Sent from the Solr - User mailing list archive at Nabble.com.

Performance of "rows" and "start" parameters

2013-11-04 Thread michael.boom
I saw that some time ago there was a JIRA ticket dicussing this, but still i found no relevant information on how to deal with it. When working with big nr of docs (e.g. 70M) in my case, I'm using start=0&rows=30 in my requests. For the first req the query time is ok, the next one is visibily slow

Re: SolrCloud: read only node

2013-11-04 Thread Erick Erickson
In this situation, I'd consider going with the older master/slave setup. The problem is that in SolrCloud, you have a lot of chatter back and forth. Presumably the connection to your local instances is rather slow, so if you're adding data to your index, each and every add has to be communicated in

Re: SolrCloud different machine sizes

2013-11-04 Thread Erick Erickson
"It Depends"(tm). As long as you're getting adequate throughput on the smaller machines, adding bigger machines won't make it any _slower_. But sometime as you add documents, the smaller machines will start having memory issues etc. and you will see an impact. Fortunately, the migrating path to la

Re: Core admin: create new core

2013-11-04 Thread Erick Erickson
Right, this has been an issue for a while, there's no current way to do this. Someday, I'll be able to work on SOLR-4779 which should go some toward making this work more easily. It's still not exactly what you're looking for, but it might work. Of course with SolrCloud you can specify a configur

Re: Cloud issue as an issue with SolrJ?

2013-11-04 Thread Erick Erickson
Thanks for closing this off! Erick On Sun, Nov 3, 2013 at 8:24 PM, Jack Park wrote: > Issue resolved, with great thanks to Tim Casey. > The issue was based on my own poor understanding of the mechanics of > ZooKeeper. The "host" setting in solr.xml must find the correct value > and not default

Re: Lots of tlog files remained, why?

2013-11-04 Thread Erick Erickson
What is your commit strategy? A hard commit (openSearcher=true or false doesn't matter) should close the current tlog file, open a new one and delete old ones. That said, there will be enough tlog files kept around to hold at least 100 documents. So if you're committing too often (say after every d

Re: character encoding issue...

2013-11-04 Thread Erick Erickson
The problem is there are about a dozen places where the character encoding can be mis-configured. The problem you're seeing above actually looks like a problem with the character set configured in your browser, it may have nothing to do with what's actually in Solr. You might write small SolrJ pro

Re: Store Solr OpenBitSets In Solr Indexes

2013-11-04 Thread Erick Erickson
If the bitset is something you control you can use the binary field type, although it's not a horribly efficient way to store binary data. If the bitset is bounded, you could do something with indexing N long values that will contain the set and write a custom similarity class to work with it. Be

Re: Problem of facet on 170M documents

2013-11-04 Thread Erick Erickson
How many unique URLs do you have in your 9M docs? If your 9M hits have 4M distinct URLs, then this is not very valuable to the user. Sascha: Was that speedup on a single field or were you faceting over multiple fields? Because as I remember that code spins off threads on a per-field basis, and if

Re: Simple (?) zookeeper question

2013-11-04 Thread Erick Erickson
Well, the easiest thing to do is cheat. Fire up the admin UI, should be something like http://localhost:8983/solr See if anything drops down in the "core selector" box and select it. Then select a core, the default is "collection1". Now you should see a "query" section, go there and scroll down to

SolrCloud: read only node

2013-11-04 Thread Uwe Reh
Hi, as service provider for libraries we run a small cloud (1 collection, 1 shard, 3 replicas). To improve the local reliability we want to offer the possibility to set up own local replicas. As fas as I know, this can be easily done just by adding a new node to the cloud. But the external no

Re: Replication after re adding nodes to cluster (sleeping replicas)

2013-11-04 Thread Erick Erickson
The whole point of SolrCloud is to automatically take care of all the ugly details of synching etc. You should be able to add a node and, assuming it has been assigned to a shard, do nothing. The node will start up, synch with the leader, get registered and start handling queries without you having

SolrCloud different machine sizes

2013-11-04 Thread michael.boom
I've setup my SolrCloud using AWS and i'm currently using 2 average machines. I'm planning to ad one more bigger machine (by bigger i mean double the RAM). If they all work in a cluster and the search being distributed, will the smaller machines limit the performance the bigger machine could offer

RE: how can i disable coord?

2013-11-04 Thread Markus Jelsma
You cannot disable coordination factor at query time at this moment so you need to change your Similarity in the schema. Easiest to do this is to set the SchemaSimilarityFactory. It defaults to TFIDF but without queryNorm and coord or use another similarity implementation. -Original messa

Re: Unable to add mahout classifier

2013-11-04 Thread lovely kasi
i didnt understnad what i need to do. Should i make any changes in the CategorizeDocumentFactory or change the version of the solr core jars? Thanks, On Thu, Oct 31, 2013 at 2:35 PM, Koji Sekiguchi wrote: > Caused by: java.lang.ClassCastException: class com.mahout.solr.classifier. >> Categoriz

Core admin: create new core

2013-11-04 Thread Bram Van Dam
The core admin CREATE function requires that the new instance dir and schema/config exist already. Is there a particular reason for this? It would be incredible convenient if I could create a core with a new schema and new config simply by calling CREATE (maybe providing the contents of config.