thongnt99 opened a new issue, #11799: URL: https://github.com/apache/lucene/issues/11799
### Description Recent learned sparse retrieval methods ([Splade](https://github.com/naver/splade), [uniCOIL](https://github.com/castorini/pyserini/blob/master/docs/experiments-unicoil.md)) were trained to generate impact score directly (replacing tf-idf score). For each document, they will generate a json file with terms and weights, e.g. `{";": 80, "the": 161, "of": 85, "and": 27, "to": 24, "was": 47, "as": 27, "their": 96, "what": 40, "over": 123, "only": 123, "important": 186, "project": 208, "success": 215, "meant": 131, "lives": 140, "presence": 180, "scientific": 200, "communication": 235, "thousands": 142, "hundreds": 144, "truly": 170, "hanging": 141, "cloud": 187, "engineers": 127, "achievement": 192, "researchers": 137, "innocent": 181, "manhattan": 244, "impressive": 191, "equally": 163, "##rated": 132, "minds": 137, "atomic": 214, "amid": 201, "##lite": 120, "intellect": 202, "ob": 140}}` Can we make a new feature that could index this type of document efficiently? The current [work-around ](https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/collection/JsonVectorCollection.java) I am aware of is to create a fake document by repeating the terms: e.g., `"the the the the .... of of of of of "` However, this way is not very efficient if the impact score gets bigger and also it requires impact score quantization before indexing. I think it would be very useful for many people if we can index the json files directly with float impact scores. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org