I have seen little repeatable empirical evidence for the usual answer
"mostly no".

With respect: everyone in the Solr universe seems to answer this
question in the way Yonik has.
However, with a large number of requests the XML
serialization/deserialization must have some, likely significant,
impact.

Yonik makes the valid point that that I will generalize to: some
combination of #docs, #queries, doc size, network, hardware, disk, etc
it will impact and others it will be less important.

Is there any chance that a simple performance framework could be
created in Solr, which runs queries directly against Solr, as well as
against the underlying Lucene index directly?
1 - Text file with one query per line (isn't there a tool out there
that will generate random queries based on a given index? Sorry, my
google fails me...)
2 - Test application: Configuration file that defines the max#
parallel queries per second. The queries are run multiple times:
1,2,4,8,16,32...max# queries. Solr is restarted between each run.
These tests are run against:
   a) Solr local
   b) Solr across the network
   c) Lucene index directly, local
   d) Lucene index directly, across the network using RMI (RemoteSearchable)
3 - Generates a report showing the results

It should perhaps also allow a second file with fewer queries that is
used to warm the caches and is not included in the reporting.
Oh, the configuration file should also include the network information
for remote indexes.
The configuration file could also include a parameter for the
probability that a query will be paged into a random 1..n pages, where
n is also a settable parameter.

Just thought a more empirical framework would help all of us, as
opposed to anecdotal evidence.

Thanks,
Glen
http://zzzoot.blogspot.com/

PS. If there is a good analysis of the performance cost in large scale
instances (many documents, many queries in parallel) of the XML
marshaling/demarshaling in Solr, please share it. -g

On Fri, Mar 11, 2011 at 4:48 PM, Yonik Seeley
<yo...@lucidimagination.com> wrote:
> On Fri, Mar 11, 2011 at 4:21 PM, sivaram <yogendra.bopp...@gmail.com> wrote:
>> I searched for this but couldn't find a convincing answer.
>> I'm planning to use Lucene/Solr in a tool for indexing and searching
>> documents. I'm thinking of if I use Lucene directly instead of Solr, will it
>> improves the performance of the search?(in terms of time taken for indexing
>> or returning search results or if Solr slows down my application when
>> compared to Lucene). I have worked with Solr in small scale before but this
>> time I have to use for an index with over a million docs to get indexed and
>> searched.
>
> On a small scale (hundreds of docs or so), Solr's overhead (parsing
> parameters, etc) could matter.
> When you scale up to larger indexes, it's in the noise (i.e. the
> actual computation of searching, faceting, highlighting, etc,
> dominate).
>
> -Yonik
> http://lucidimagination.com
>



-- 

-

Reply via email to