Hello, I have tried indexing a vbulletin message board, containing roughly 7 million posts.
My schema is as follows: <field name="postid" type="int" indexed="true" stored="true" /> <field name="threadid" type="int" indexed="false" stored="true" /> <field name="username" type="string" indexed="false" stored="true" /> <field name="title" type="string" indexed="false" stored="true" /> <field name="teaser" type="string" indexed="false" stored="true" /> <field name="date" type="date" indexed="true" stored="true" omitNorms="true"/> <field name="blob" type="text" indexed="true" stored="false" multiValued="true" omitNorms="true"/> <uniqueKey>postid</uniqueKey> <copyField source="username" dest="blob"/> <copyField source="title" dest="blob"/> I am trying to figure out if there is anything I can do to lower the disk usage and or increase sorting speed before we go live with the search. So a few questions came to mind 1) Sorting I was planning to do on the date field(aka add "; date desc"). But I was wondering if it would be more efficient to sort on postid instead(since higher postid in vbulletin=newer post). I already have indexed=true for postid since its our unique field, but then i could set indexed=false for date, and perhaps save some storage space? 2) If we sort on postid instead, would we need to use integer, or the sint type? I assume sint would be faster(?) but perhaps use more storage? 3) About Omitnorms=true, I must admit i dont exactly understand what it does :) But I read that it would save 1 byte pr document. Are the any other fields I need to add it to in my schema? As far as I understand Omitnorms=true only makes a difference for indexed=true fields, and doesnt do anything for int fields? Thanks in advance for any suggestions :) /Bo -- View this message in context: http://www.nabble.com/Optimizing-a-schema-tf2071403.html#a5702635 Sent from the Solr - User forum at Nabble.com.