shaie commented on PR #841: URL: https://github.com/apache/lucene/pull/841#issuecomment-1151926798
> So unless I'm totally missing something, I think we should keep the `matches(byte[])` method. I don't think you're missing something, but perhaps to give a different POV, I am thinking about users who will want to extend the API. As a user, it feels more intuitive to deal with a `long[]` than ` byte[]`. Take a look at the impls, they are much simpler and intuitive (IMO). It also _feels_ less expert and approachable. Another point which may change our minds about it -- when we give users a `byte[]`, we force them to know how the data was encoded. The two current `FSM`s need to iterate on the array with `Long.BYTES` increments. If we decided to encode the data in the BDV differently (maybe compress it), then a `byte[]` API will need to be accompanied with this information as well. So again, **purely from an API perspective**, we tell the user "You give us `long[]` at indexing time, we'll give it you back at aggregation time". It's simple, readable, intuitive. Perhaps we should include a comparison of these two variants in our benchmark? E.g. if there's no downside to using `long[]` then I assume we'll be fine with keeping it? But if it performs slower, than we can stick with the `byte[]` to keep things well optimized. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org