[ https://issues.apache.org/jira/browse/LUCENE-9480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17287789#comment-17287789 ]
Greg Miller commented on LUCENE-9480: ------------------------------------- Thanks [~rcmuir]! I've created LUCENE-9794 to track the remaining. Cheers! > investigate slow DataInput.skipBytes > ------------------------------------ > > Key: LUCENE-9480 > URL: https://issues.apache.org/jira/browse/LUCENE-9480 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Robert Muir > Priority: Major > Fix For: master (9.0) > > Attachments: LUCENE-9480.patch, LUCENE-9480.patch, LUCENE-9480.patch > > > Currently DataInput has skipBytes(), but IndexInput also adds seek(). > There isn't a clear reason about the differences in the two methods: why > would you choose one over the other? > It causes some performance issues: for example the default implementation > actually reads bytes into a byte array and throws everything away. This is > really silly for MMapDirectory: skipping bytes should only be a glorified > {{+=}}. > So when I look at latest LUCENE-9447 patch, I can't help but think a ton of > waste is happening: > * Maybe skipBytes() is only used because the stored fields compressor > interface happens to take DataInput? Should it take IndexInput instead? > * Should skipBytes() be overridden by MMapDirectory rather than delegating to > super? doing real reads and byte array copies isn't free. It should be a > {{+=}} with single bounds check. > * Should we revisit having DataInput vs IndexInput at all? Maybe they should > be collapsed into one thing? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org