1. Just trust that Lucene will perform :) Incremental updates are actually stored in separate new index segments with own caches, so all the old existing data is left un-touched with caches in place.
2. Please explain what you expect from "semantic search" which is an overloaded word. 3. On http://wiki.apache.org/solr/PublicServers the only one saying so explicitly is Jeeran - I'm sure others can fill in with more examples -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 17. apr. 2012, at 12:10, Alexandr Bocharov wrote: > Thanks for your replies, you're good expert :) > I've read documentation on Solr basicaly, I'm familiar with it around 2 > days. > The documentation is very huge at first sight :). Me and my company is > being deciding to use Solr or other solution. > Maybe you're right about re-implementing our sorting functions to something > new. > > 1. If index is stored at disk, what way good performance is achieved (if > index changes frequently, ~50,000 - 100,000 records are updating each 10 > minutes, so maybe caching won't be effective)? > 2. What can you say about semantic search Solr capabilities? Are there any > examples of it in production? > 3. Can you please give some examples projects/sites with Solr 4.0 usage in > production? > > > 2012/4/17 Jan Høydahl <jan....@cominvent.com> > >> Hi, >> >> You have many basic questions about search. Can I recommend one of the >> books? http://lucene.apache.org/solr/books.html >> Also, you'll find a lot of answers on the Solr WIKI: >> http://wiki.apache.org/solr/ if you're not aware of it. >> >> I think Solr may solve your performance problems well. >> Whether it's the right tool for the job depends on several factors. >> Also, sometimes it is useful to step back and think fresh. Perhaps the >> reason why you implemented things like you did was technical reasons driven >> by your DB capabilities. >> When re-implementing on top of Solr, perhaps there are better ways to do >> what you REALLY wanted instead of limiting yourself to the ORDER BY syntax >> etc. >> One of Solr's strengths is relevancy and FunctionQueries and it can do >> amazing things :) >> >> Further answers below.. >> >> -- >> Jan Høydahl, search solution architect >> Cominvent AS - www.cominvent.com >> Solr Training - www.solrtraining.com >> >> On 17. apr. 2012, at 07:20, Alexandr Bocharov wrote: >> >>> Thanks for your reply :) >>> I have some new questions now: >>> 1. How stable is trunk version? Has anyone used it on any kind of >> highload >>> project in production? >> It's stable. Used in production many places. Soon expected in alpha or >> beta release >>> 2. Does version 3.6 support near real time index update? >> No >>> 3. What is scheme of Solr index storing? Is it all in memory for each >> shard >>> or in disk with caching for frequently asked queries in memory? >> On disk but with many caching optimizations >>> 4. The best practice for index updating is - to do delta imports each 5 >>> minutes for example, and once a day - full rebuild index, does it take >> long >>> time for ~100 mln users? Am I right? >> You can do deltas only, as often as you choose. Solr will handle the >> backend details >>> 5. Does sharding and replications have native support in Solr, so >> everyting >>> I need to care about is config file for nodes? Are there any limitations >> of >>> usage such sorting if we use sharding? >> Yes, sharding and replication is natively supported. See the Wiki >>> The reason why we want to move from our DB search scheme (data is sharded >>> into small tables at several servers and managed in code) is that: >>> 1. response time of our search isn't what we need (3-5 s now in >> production, >>> we want <1 s) >>> 2. growing amount of data >>> 3. we want automatically clustering any amount of data and search by it, >>> without need to care about how data stores and does it has durability or >> not >>> >>> That's why we also looking other solutions with autosharding of huge >> amount >>> of data with ability to make such types of query and sorting (thinking >>> about Mysql Cluster, but it's not stable yet, or Oracle Cluster). If >> anyone >>> can give advice for such technology, I'll be glad to hear it. >> What do you expect from "Autosharding"? >>> >>> 2012/4/17 Jan Høydahl <jan....@cominvent.com> >>> >>>>> Hi everyone :) >>>> >>>> Hi :) >>>> >>>>> So, these are my 3 questions: >>>>> 1. Does Solr provide searching among different count fields with >>>> different >>>>> types like in WHERE condition? >>>> >>>> Yes. As long as these are not full-text you should use filter queries >> for >>>> these, e.g. >>>> &q=*:* >>>> &fq=country:USA >>>> &fq=language:SPA >>>> &fq=age:[30 TO 40] >>>> &fq=(bool_field1:1 OR bool_field2:1) >>>> >>>> The reason why I put multiple "fq" instead of one long is to optimize >> for >>>> caching of filters >>>> >>>>> 2. Does Solr provide such sorting, that depends on other fields (like >>>> sums >>>>> in ORDER BY), other words - does it provide any kind of function, which >>>> is >>>>> used to sort results from q1? >>>> >>>> Yes. In trunk version you can sort by function which can do sums and all >>>> crezy things >>>> &sort=sum(product(has_photo,10),if(exists(query($agequery)),50,0)) >>>> asc&agequery=age:[53 TO *] >>>> See http://wiki.apache.org/solr/FunctionQuery for more functions >>>> >>>> But you could also to much of this through boost queries >>>> &sort=score desc >>>> &bq=language:FRA^50 >>>> %bq=age:[53 TO *]^20 >>>> >>>>> 3. Does Solr provide realtime index updating or updating every N >> minutes? >>>> >>>> Sure, there is Near Real-time indexing in TRUNK (coming 4.0) >>>> >>>> Jan >> >>