4.1 turns on stored field compression by default, perhaps what's happening
here is that you're seeing the spike when you fetch your very large
document and it gets uncompressed? Just a shot in the dark.....

But you could test it by turning off compression...

That said, I shouldn't think that compression of even a 6M field would take
all that long.

Best
Erick


On Mon, Mar 4, 2013 at 11:52 PM, Sandeep Mirchandani <skmi...@hotmail.com>wrote:

> I work with Aditya, so this information is in continuation where Aditya
> left
> off.
>
> Here are some of the observations based on running a query on a particular
> unique id .  The nature of the document (corresponding to the uniqueid) is
> such that it is fairly large if we were to run a query without an fl list
> for this document, the total size would be in the neighborhood of 6MB.
> However, we are using fl list to get a subset of this document.   We use a
> script that uses curl to call the server, run from a different box, for the
> same uniqueId's but with different fl list.   After the first few runs of
> the search (something like, q=id:foo) we change the fl list to return some
> other fields which produce a different set of fields perhaps larger than
> the
> first query for the same id but different fl list.
>
> 1.  The curl client blocks when the fl list changes.  The CPU from VisualVM
> shows 50% CPU utilization.
> 2. This spin continues till the result is returned back to the curl client.
> 3. We see the same thing from a browser as well and this reproduces the
> problem and helps identify that the spin occurs after the server has
> completed searching for document (since we see an entry in the solr log
> file
> and that contains the QTime for this query), and is now trying to return.
> The browser waits till all the data is received and only after this is
> done,
> renders the page.  So what is taking so long for the server to respond to
> the client?
> 4.  Monitor the sampler from VisualVM and you can see the getFields() on
> the
> top of the list. Since I see it on the top of the list I believe that it
> may
> be spinning here.
> 5.  Restart the Server running SOLR.
> 6.  Start with running the same query from the browser and it returns in a
> couple of seconds.
> 7.  Running the same curl script and we see that sail through the query as
> well, with the server responding back almost immediately.
> 6.  Monitoring sampler this time around and you _don't_ see CPU spinning on
> getFields().
> 7.  I change the solrconfig.xml file in the definitions for firstSearcher
> and add the uniqueId in the q parameter and restart.
> 8.  This time running the curl script runs well.
> 9.  If the server is restarted again, we run the curl script with the
> blocking (spinning) query right on top, the script sails through again.
>
> Just from this observation, it seems like the code for SOLR 4.1 takes a
> wrong turn somewhere for large responses if it comes across the same query
> with a different fl list again.    If the spinning query is pre-cached via
> the solrconfig.xml firstsearcher change or via the browser or run ahead of
> other queries for the same id, it seems to work fine after the first run of
> the command.  However, running it after running the same search with
> different fl does have an effect.   This did not happen with SOLR 3.5 and
> seems like a regression.   The above is repeatable for us.
>
> Question:  Why is this happening on SOLR 4.1?   Seems like the workaround
> for now may be to cache the queries with large document sizes in
> solrconfig.xml .
>
> Would appreciate hearing from others facing this issue thus validating what
> we see as well. Thanks.
>
> Best regards,
> -- Sandeep
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Solr3-5-Vs-Solr4-1-Help-please-tp4043543p4044742.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to