Carrage Returns in XML Results

2008-02-06 Thread Mike Davies
Hi, I'm having a small problem with Solr, have had a good look for solutions on the web but nothing so far. Apologies if this has been asked before. I am indexing a text field to contain a text article, this article has some line feeds and CR's in it. I can index the field OK and if I look at t

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread zqzuk
Hi, I see there's no response to my question, maybe its better to ask this way... If solr receives 10 concurrent request, does it deal with the 10 requests simultaneously using 10 (or as many as possible) searchers, or does it deal with each request sequentially, which implies that first request

Index will get change or refresh after restart the solr server?

2008-02-06 Thread nithyavembu
Hi all, I have some doubts in solr server indexing data. I am using Tomcat 5 for solr configuration. There is a "data" folder which contains indexing data. First I am adding and updating some data and it indexed then searching for a text and getting the result too. Then shutdow

Re: Carrage Returns in XML Results

2008-02-06 Thread Mike Davies
Have resolved the problem. Turns out it was not a problem with Solr but with SolrSharp, before loading the XML stream into the parser it was removing all \n's from the server response. I've disabled this line and everything seems to be working now. For anyone's future reference, change the follo

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Yonik Seeley
On Feb 6, 2008 7:53 AM, zqzuk <[EMAIL PROTECTED]> wrote: > If solr receives 10 concurrent request, does it deal with the 10 requests > simultaneously It uses a thread per request, simultaneously (up to any limit configured by the app server) > using 10 (or as many as possible) searchers There is

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Yonik Seeley
On Feb 6, 2008 6:37 PM, Ziqi Zhang <[EMAIL PROTECTED]> wrote: > I still do not understand why sending 100 request (of same query) from 100 > threads throws solr server to silence - is it because of the computational > cost to deal with same query in 100 separate threads? Yes... sending a large num

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Ziqi Zhang
Thanks Yonik, It uses a thread per request, simultaneously (up to any limit configured by the app server) How can I change this setting then? I suppose it is to do with Jetty or Tomcat whichever hosts solr application, not through the solrconfig? I still do not understand why sending 100 r

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Ziqi Zhang
Thanks! Also make sure that common filters, sort fields, and facets have been warmed. I assume these are achieved by setting large cache size and large autowarmcount number in solr configuration? specifically filterCache queryResultCache documentCache Thanks!

running 1 instance of solr with multiple index?

2008-02-06 Thread Antonio Eggberg
Hi: Is it possible to run 1 instance of Solr which can be used to query multiple index. We have multiple customers in the same box and they will have their own index. I see there is something called multicore? is there any docs on multicore? Cheers Antonio _

running 1 instance of solr with multiple index?

2008-02-06 Thread Antonio Eggberg
Hi: Is it possible to run 1 instance of Solr which can be used to query multiple index. We have multiple customers in the same box and they will have their own index, I would like to avoid running multiple instance of Solr. Cheers Antonio ___

Re: running 1 instance of solr with multiple index?

2008-02-06 Thread Tobias Lohr
this page gives you an answer: http://wiki.apache.org/solr/MultiCore Hi: Is it possible to run 1 instance of Solr which can be used to query multiple index. We have multiple customers in the same box and they will have their own index, I would like to avoid running multiple instance of Solr.

Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
There have been several proposals for a Lucene-based distributed index architecture. 1) Doug Cutting's "Index Server Project Proposal" at http://www.mail-archive.com/[EMAIL PROTECTED]/msg00338.html 2) Solr's "Distributed Search" at http://wiki.apache.org/solr/DistributedSearch 3) Mark Bu

FW: Performance of filterCache for Faceting - single value & tokenized

2008-02-06 Thread Fuad Efendi
Ciao-Ciao! I did something strange and website (www.tokenizer.org) performs 1000 times faster now (it is still in my basement via ASDL 600kbps upload asychronous) Thank you for supporting SOLR! filterCache: Size: 1051311 What I did: single-valued fields for Category and ItemName. Category field

Performance of filterCache for Faceting

2008-02-06 Thread Fuad Efendi
Ciao-Ciao Everyone! I did something strange and website (www.tokenizer.org) performs 1000 times faster now (it is still in my basement via ASDL 600kbps upload asychronous) Thank you for supporting SOLR! filterCache: Size: 1051311 What I did: single-valued fields for Category and ItemName. Catego

RE: Multiple Search in Solr

2008-02-06 Thread patrik
We're using a version of SOLR that's we've customized to allow multiple indexes with the same schema to be searched. So, it is possible. The tricky part we're noticing is managing updates to the same document. If you don't need that you can get by pretty easily. patrik -Original Message-

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ian Holsman
Clay Webster wrote: There seem to be a few other players in this space too. Are you from Rackspace? (http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop- query-terabytes-data) AOL also has a Hadoop/Solr project going on. CNET does not have much brewing there. Although Yo

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Otis Gospodnetic
Imagine this type of code: synchronized (someGlobalObject) { // search } What happens when 100 threads his this spot? The first one to get there gets in and runs the search and 99 of them wait. What happens if that "// search" also involves expensive operations, lots of IO, warming up, cac

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Otis Gospodnetic
Sorry for the awful word wrapping in original email. The main thing is to warm those caches up (sure, by autowarming for example) before exposing the searcher. In other words, don't hit the completely cold searcher with a bunch of requests at once. Otis -- Sematext -- http://sematext.com/ -- L

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Ziqi Zhang
Thanks Otis! I think I now got a clearer picture of the issue and its causes, thanks. Could you please elaborate on "warming up" searcher prior exposure to real requests, does this mean running through as many most often used queries as possible such that results are cached, and also use as mu

Re: sorlj search

2008-02-06 Thread Matthew Runo
There really isn't any detailed documentation on SolrJ just yet. I was able to guess my way through using it based on method names and so forth, and you can generate javadoc via ant if you get the source from SVN. Thanks! Matthew Runo Software Developer Zappos.com 702.943.7833 On Feb 5, 2

Re: Solr+XSLT output problem; trying to use "wt=xslt" fails with a NoClassDefFoundError

2008-02-06 Thread Chris Hostetter
: Error: type Status report : : message null java.lang.NoClassDefFoundError at : org.apache.solr.request.XSLTResponseWriter.getTransformer(XSLTResponseWr : iter.java:115) at This is pretty puzzling ... my best guess is that either: 1) something about your tomcat setup is causing the javax.xml.

search abstraction library for PHP

2008-02-06 Thread Robert Young
Hi, Thought you guys might be interested, I'm working on a search abstraction library for PHP called Forage, you can check it out at the link below. At the moment it just supports basic indexing and searching with Solr, Xapian and Zend Search Lucene but I'm hoping to add more engines and more feat

RE: Indexing Directly, searching with solr

2008-02-06 Thread Chris Hostetter
: Sending URI: http://text4:8983/solr/@10324_1_155/update first off ... i'm not sure which version of Solr you are using, but that @corename syntax isn't in the trunk now .. it's just "corename" so maybe the commit isn't getting to the core you think it is? Second: make really, really, really

solrj and multiple slaves

2008-02-06 Thread Keene, David
Hey guys, I have a quick question about using solrj to connect to multiple slaves. My application is deployed on multiple boxes that have to talk to multiple solr slaves. In order to take advantage of the queryResult cache, each request from one of my app boxes should be redirected to the same

Re: Performance of filterCache for Faceting

2008-02-06 Thread Mike Klaas
On 6-Feb-08, at 11:07 AM, Fuad Efendi wrote: What I did: single-valued fields for Category and ItemName. Category field is tokenized (with custom analyzer), and I updated only 30% of Lucene index, but it was more than enough for huge performance improvements. Before that, due to some mista

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Yonik Seeley
On Feb 7, 2008 12:26 AM, Ziqi Zhang <[EMAIL PROTECTED]> wrote: > Thanks Otis! > > I think I now got a clearer picture of the issue and its causes, thanks. > > Could you please elaborate on "warming up" searcher prior exposure to real > requests, does this mean running through as many most often use

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
I work for IBM Research. I read the Rackspace article. Rackspace's Mailtrust has a similar design. Happy to see an existing application on such a system. Do they plan to open-source it? Is the AOL project an open source project? On Feb 6, 2008 11:33 AM, Clay Webster <[EMAIL PROTECTED]> wrote: > >

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
No. I'm curious too. :) On Feb 6, 2008 11:44 AM, J. Delgado <[EMAIL PROTECTED]> wrote: > I assume that Google also has distributed index over their > GFS/MapReduce implementation. Any idea how they achieve this? > > J.D. >

Re: Lucene-based Distributed Index Leveraging Hadoop

2008-02-06 Thread Ning Li
One main focus is to provide fault-tolerance in this distributed index system. Correct me if I'm wrong, I think SOLR-303 is focusing on merging results from multiple shards right now. We'd like to start an open source project for a fault-tolerant distributed index system (or join if one already exi

RE: Performance of filterCache for Faceting

2008-02-06 Thread Fuad Efendi
>Indeed the field cache method works much better when the values are >single-valued. Unfortunately, there is no way for solr to know that >the analyzer is only outputting a single token per document, else we >could apply this optimization automatically. Thanks Mike, Some clarification: *si

Re: For an "XML" fieldtype

2008-02-06 Thread Chris Hostetter
: > Is there anything wrong with just using string or text fieldType? : > If you use the XML writer, it will get returned xml encodedd (> becomes > : > etc). : : This is quite the only change I done to StrField, so I get back the original : XML string stored, and could directly transform it with

Re: Index replication on windows

2008-02-06 Thread Chris Hostetter
: I need to find a way to achieve effective index replication on a windows : environment. : From previous posts, I understand that the issue preventing the current : solution from working stems from windows support of hard links. it's related to hardlinks, but not that windows doesnt' support the

Re: duplicate entries being returned, possible caching issue?

2008-02-06 Thread Chris Hostetter
: I've reviewed the wiki pages about snappuller : (http://wiki.apache.org/solr/SolrCollectionDistributionScripts) and : solrconfig.xml (http://wiki.apache.org/solr/SolrConfigXml) and it : seems that the snappuller is intended to be used on the slave server. : In our case, the slave servers do no up

Re: Performance of filterCache for Faceting

2008-02-06 Thread Mike Klaas
On 6-Feb-08, at 4:32 PM, Fuad Efendi wrote: Indeed the field cache method works much better when the values are single-valued. Unfortunately, there is no way for solr to know that the analyzer is only outputting a single token per document, else we could apply this optimization automatically.

RE: Performance of filterCache for Faceting

2008-02-06 Thread Fuad Efendi
> On 6-Feb-08, at 4:32 PM, Fuad Efendi wrote: > > >> Indeed the field cache method works much better when the values are > >> single-valued. Unfortunately, there is no way for solr to > know that > >> the analyzer is only outputting a single token per > document, else we > >> could apply this o

Commit strategies

2008-02-06 Thread James Brady
Hi all, So the Solr tutorial recommends batching operation to improve performance by avoiding multiple costly commits. To implement this, I originally had a couple of methods in my python app reading from or writing to Solr, with a scheduled task blindly committing every 15 seconds. Howe

Re: how to improve concurrent request performance and stress testing

2008-02-06 Thread Chris Hostetter
: > Also make sure that common filters, sort fields, and facets have been : > warmed. : : I assume these are achieved by setting large cache size and large : autowarmcount number in solr configuration? specifically autowarming seeds the cahces of a new Searcher using hte keys of an old searcher