Hello, 

I have tried indexing a vbulletin message board, containing roughly 7
million posts.

My schema is as follows:

   <field name="postid" type="int" indexed="true" stored="true" />
   <field name="threadid" type="int" indexed="false" stored="true" />
   <field name="username" type="string" indexed="false" stored="true" />
   <field name="title" type="string" indexed="false" stored="true" />
   <field name="teaser" type="string" indexed="false" stored="true" />
   <field name="date" type="date" indexed="true" stored="true"
omitNorms="true"/>
   <field name="blob" type="text" indexed="true" stored="false"
multiValued="true" omitNorms="true"/>

 <uniqueKey>postid</uniqueKey>

   <copyField source="username" dest="blob"/>
   <copyField source="title" dest="blob"/>

I am trying to figure out if there is anything I can do to lower the disk
usage and or increase sorting speed before we go live with the search. So a
few questions came to mind

1) Sorting I was planning to do on the date field(aka add "; date desc").
But I was wondering if it would be more efficient to sort on postid
instead(since higher postid in vbulletin=newer post). I already have
indexed=true for postid since its our unique field, but then i could set
indexed=false for date, and perhaps save some storage space?

2) If we sort on postid instead, would we need to use integer, or the sint
type? I assume sint would be faster(?) but perhaps use more storage?

3) About Omitnorms=true, I must admit i dont exactly understand what it does
:) But I read that it would save 1 byte pr document. Are the any other
fields I need to add it to in my schema? As far as I understand
Omitnorms=true only makes a difference for indexed=true fields, and doesnt
do anything for int fields?

Thanks in advance for any suggestions :)

/Bo
-- 
View this message in context: 
http://www.nabble.com/Optimizing-a-schema-tf2071403.html#a5702635
Sent from the Solr - User forum at Nabble.com.

Reply via email to