That seems fairly fast. We index about 3 million documents in about half that time. We are probably limited by the time it takes to get the data from MySQL.
Don't optimize. Solr automatically merges index segments as needed. Optimize forces a full merge. You'll probably never notice the difference, either in disk space or speed. It might make sense to force merge (optimize) if you reindex everything once per day and have no updates in between. But even then it may be a waste of time. You need lots of free disk space for merging, whether a forced merge or automatic. Free space equal to the size of the index is usually enough, but worst case can need double the size of the index. wunder On Feb 21, 2013, at 9:20 AM, Yandong Yao wrote: > Hi Guys, > > I am using Solr 4.1 and have indexed 18M documents using solrj > ConcurrentUpdateSolrServer (each document contains 5 fields, and average > length is less than 1k). > > 1) It takes 70 minutes to index those documents without optimize on my mac > 10.8, how is the performance, slow, fast or common? > > 2) It takes about 40 minutes to optimize those documents, following is top > output, and there are lots of FAULTS, what does this means? > > Processes: 118 total, 2 running, 8 stuck, 108 sleeping, 719 threads > > 00:56:52 > Load Avg: 1.48, 1.56, 1.73 CPU usage: 6.63% user, 6.40% sys, 86.95% idle > SharedLibs: 31M resident, 0B data, 6712K linkedit. > MemRegions: 34734 total, 5801M resident, 39M private, 638M shared. PhysMem: > 982M wired, 3600M active, 3567M inactive, 8150M used, 38M free. > VM: 254G vsize, 1285M framework vsize, 1469887(368) pageins, 1095550(0) > pageouts. Networks: packets: 14842595/9661M in, 14777685/9395M out. > Disks: 820048/43G read, 523814/53G written. > > PID COMMAND %CPU TIME #TH #WQ #POR #MRE RPRVT RSHRD RSIZE > VPRVT VSIZE PGRP PPID STATE UID FAULTS COW MSGSENT MSGRECV SYSBSD > SYSMACH > 4585 java 11.7 02:52:01 32 1 483 342 3866M+ 6724K 3856M+ > 4246M 6908M 4580 4580 sleepin 501 1490340+ 402 3000781+ 231785+ > 15044055+ 10033109+ > > 3) If I don't run optimize, what is the impact? bigger disk size or slow > query performance? > > Following is my index config in solrconfig.xml: > > <ramBufferSizeMB>100</ramBufferSizeMB> > <mergeFactor>10</mergeFactor> > <autoCommit> > <maxDocs>100000</maxDocs> <!-- 100K docs --> > <maxTime>300000</maxTime> <!-- 5 minutes --> > <openSearcher>false</openSearcher> > </autoCommit> > > Thanks very much in advance! > > Regards, > Yandong