[ https://issues.apache.org/jira/browse/LUCENE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287748#comment-17287748 ]
ASF subversion and git services commented on LUCENE-9480: --------------------------------------------------------- Commit c51fee9c1a59030bda61b600cca8923410f1e090 in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c51fee9 ] LUCENE-9480: Make DataInput.skipBytes(long) abstract skipBytes() is a "relative" version of seek(), but DataInput previously implemented it via read() calls, because DataInput's API does not include absolute positioning methods (seek, getFilePointer). This resulted in inefficiencies: calls to skipBytes() would cause buffers to be allocated, bytes copied, etc. Instead, make the subclass implement skipBytes() explicitly. The old DataInput implementation is marked deprecated and renamed to skipBytesSlowly(). Some subclasses still implement skipBytes() via skipBytesSlowly(), to be fixed in future improvements. > investigate slow DataInput.skipBytes > ------------------------------------ > > Key: LUCENE-9480 > URL: https://issues.apache.org/jira/browse/LUCENE-9480 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Priority: Major > Attachments: LUCENE-9480.patch, LUCENE-9480.patch, LUCENE-9480.patch > > > Currently DataInput has skipBytes(), but IndexInput also adds seek(). > There isn't a clear reason about the differences in the two methods: why > would you choose one over the other? > It causes some performance issues: for example the default implementation > actually reads bytes into a byte array and throws everything away. This is > really silly for MMapDirectory: skipping bytes should only be a glorified > {{+=}}. > So when I look at latest LUCENE-9447 patch, I can't help but think a ton of > waste is happening: > * Maybe skipBytes() is only used because the stored fields compressor > interface happens to take DataInput? Should it take IndexInput instead? > * Should skipBytes() be overridden by MMapDirectory rather than delegating to > super? doing real reads and byte array copies isn't free. It should be a > {{+=}} with single bounds check. > * Should we revisit having DataInput vs IndexInput at all? Maybe they should > be collapsed into one thing? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org