bq: After the commit, query times are unacceptable slow First, please quantify "unacceptable". 100ms? 10,000ms? Details matter.
Second, the purpose of autowarming is exactly to smooth out the first few searches when a new searcher is opened, are you doing any? Third: What are your autocommit settings, and how are you committing in general? How often? LUCENE-4258 has never been implemented. Updateable DocValues are certainly something I'd really like to see in Solr, but they're not there yet, there are some consistency issues that have to be dealt with, see: https://issues.apache.org/jira/browse/SOLR-5944 All that aside, lots and lots of people have solved this problem with appropriate commit policies and autowarming, so that's what I'd look at first. Best, Erick On Wed, Apr 8, 2015 at 6:22 AM, Jack Krupansky <jack.krupan...@gmail.com> wrote: > How much RAM do you have? Check whether your system is compute-bound or > I/O-bound? If all or most of your index doesn't fit in the system memory > available for file caching, you're asking for trouble. > > Is the indexing time also unacceptably slow, or just the query time? > > -- Jack Krupansky > > On Wed, Apr 8, 2015 at 9:03 AM, Achim Domma <ac...@uberresearch.com> wrote: > >> Hi, >> >> I have a core with about 20M documents and the size on disc is about >> 50GB. It is running on a single EC2 instance. If the core is warmed up, >> everything is running fine. The problem is the following: >> >> We assign categories (similar to tags) to documents. Those are stored in >> a multivalue string field. After the commit, query times are >> unacceptable slow. >> >> Those categories are the only field that is every changed, so I was >> thinking about a way to keep the information outside SOLR. I had some >> ideas, but my knowledge of SOLR internals would need some improvement to >> implement them. Looking for other solutions, I stumbled about this >> comment in a JIRA issue: >> >> >> https://issues.apache.org/jira/browse/LUCENE-4258?focusedCommentId=13423159&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13423159 >> >> The following words sound quite good to me: >> >> "People could instead solve this by putting their apps primary key into >> a docvalues field, allowing them to keep these scoring factors >> completely external to lucene (e.g. their own array or whatever), >> indexed by their own primary key. But the problem is I think people want >> lucene to manage this, they don't want to implement themselves whats >> necessary to make it consistent with commits etc." >> >> Sounds like there is an obvious solution, how to keep data outside SOLR, >> but make it somehow accessible via DocValues. But I have no idea about >> what kind of solution he is talking. >> >> Could somebody give me a starting point? I would need to filter on that >> field and facet over it. >> >> cheers, >> Achim >>