[ https://issues.apache.org/jira/browse/LUCENE-10035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
wuda updated LUCENE-10035: -------------------------- Description: Simple text codec add skip list data( include impact) to help understand index format,For debugging, curiosity, transparency only!! When term's docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default value is 8), Simple text codec will write skip list, the *.pst (simple text term dictionary file)* file will looks like this {code:java} field title term args doc 2 freq 2 pos 7 pos 10 ## we omit docs for better view ...... doc 98 freq 2 pos 2 pos 6 skipList ? level 1 skipDoc 65 skipDocFP 949 impacts impact freq 1 norm 2 impact freq 2 norm 12 impact freq 3 norm 13 impacts_end ? level 0 skipDoc 17 skipDocFP 284 impacts impact freq 1 norm 2 impact freq 2 norm 12 impacts_end skipDoc 34 skipDocFP 624 impacts impact freq 1 norm 2 impact freq 2 norm 12 impact freq 3 norm 14 impacts_end skipDoc 65 skipDocFP 949 impacts impact freq 1 norm 2 impact freq 2 norm 12 impact freq 3 norm 13 impacts_end skipDoc 90 skipDocFP 1311 impacts impact freq 1 norm 2 impact freq 2 norm 10 impact freq 3 norm 13 impact freq 4 norm 14 impacts_end END checksum 00000000000829315543 {code} compare with previous,we add *skipList,level, skipDoc, skipDocFP, impacts, impact, freq, norm* nodes, at the same, simple text codec can support advanceShallow when search time. h2. Why there has question mark symbol in the file ? Because the *MultiLevelSkipListWriter* will write "length" and "childPointer" with VLong h1. This speed up search process ? No!!! It can be advanceShallow when search, but why not speed up yet? Because the skip list data after docs(see the file described before), it must iterate all docs before read skip list data, so it never speed up search time. it has no "skipOffset" to direct read skip list data, but as mentioned before, it is For debugging, curiosity, transparency only!! If this is a problem, may be the next time, i can add "skipOffset", so we can read skip list data directly. was: Simple text codec add skip list data( include impact) to help understand index format,For debugging, curiosity, transparency only!! When term's docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default value is 8), Simple text codec will write skip list, the *.pst (simple text term dictionary file)* file will looks like this {code:java} field title term args doc 2 freq 2 pos 7 pos 10 ## we omit docs for better view ...... doc 98 freq 2 pos 2 pos 6 skipList ? level 1 skipDoc 65 skipDocFP 949 impacts impact freq 1 norm 2 impact freq 2 norm 12 impact freq 3 norm 13 impacts_end ? level 0 skipDoc 17 skipDocFP 284 impacts impact freq 1 norm 2 impact freq 2 norm 12 impacts_end skipDoc 34 skipDocFP 624 impacts impact freq 1 norm 2 impact freq 2 norm 12 impact freq 3 norm 14 impacts_end skipDoc 65 skipDocFP 949 impacts impact freq 1 norm 2 impact freq 2 norm 12 impact freq 3 norm 13 impacts_end skipDoc 90 skipDocFP 1311 impacts impact freq 1 norm 2 impact freq 2 norm 10 impact freq 3 norm 13 impact freq 4 norm 14 impacts_end END checksum 00000000000829315543 {code} compare with previous,we add *skipList,level, skipDoc, skipDocFP, impacts, impact, freq, norm* nodes, at the same, simple text codec can support advanceShallow when search time. > Simple text codec add multi level skip list data > -------------------------------------------------- > > Key: LUCENE-10035 > URL: https://issues.apache.org/jira/browse/LUCENE-10035 > Project: Lucene - Core > Issue Type: Wish > Components: core/codecs > Affects Versions: main (9.0) > Reporter: wuda > Priority: Major > Labels: Impact, MultiLevelSkipList, SimpleTextCodec > Attachments: LUCENE-10035.patch > > Time Spent: 1h 40m > Remaining Estimate: 0h > > Simple text codec add skip list data( include impact) to help understand > index format,For debugging, curiosity, transparency only!! When term's > docFreq greater than or equal to SimpleTextSkipWriter.BLOCK_SIZE (default > value is 8), Simple text codec will write skip list, the *.pst (simple text > term dictionary file)* file will looks like this > {code:java} > field title > term args > doc 2 > freq 2 > pos 7 > pos 10 > ## we omit docs for better view ...... > doc 98 > freq 2 > pos 2 > pos 6 > skipList > ? > level 1 > skipDoc 65 > skipDocFP 949 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impact > freq 3 > norm 13 > impacts_end > ? > level 0 > skipDoc 17 > skipDocFP 284 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impacts_end > skipDoc 34 > skipDocFP 624 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impact > freq 3 > norm 14 > impacts_end > skipDoc 65 > skipDocFP 949 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 12 > impact > freq 3 > norm 13 > impacts_end > skipDoc 90 > skipDocFP 1311 > impacts > impact > freq 1 > norm 2 > impact > freq 2 > norm 10 > impact > freq 3 > norm 13 > impact > freq 4 > norm 14 > impacts_end > END > checksum 00000000000829315543 > {code} > compare with previous,we add *skipList,level, skipDoc, skipDocFP, impacts, > impact, freq, norm* nodes, at the same, simple text codec can support > advanceShallow when search time. > > h2. Why there has question mark symbol in the file ? > Because the *MultiLevelSkipListWriter* will write "length" and "childPointer" > with VLong > h1. This speed up search process ? > No!!! It can be advanceShallow when search, but why not speed up yet? Because > the skip list data after docs(see the file described before), it must iterate > all docs before read skip list data, so it never speed up search time. it has > no "skipOffset" to direct read skip list data, but as mentioned before, it is > For debugging, curiosity, transparency only!! If this is a problem, may be > the next time, i can add "skipOffset", so we can read skip list data directly. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org