[I] Inquiry regarding storage reduction by omitting NORMS during indexing [lucene]

via GitHub Wed, 01 Jan 2025 20:22:23 -0800


balaramsharma opened a new issue, #14093:
URL: https://github.com/apache/lucene/issues/14093


   ### Description
   
   Dear Developers,
   
   I learned that **omitting norms during indexing for a field saves a byte per 
document in Lucene**. Reference: 
[https://lucidworks.com/post/scaling-lucene-and-solr/](url) . However, during 
my testing, I observed varying results in the overall size of the Lucene index 
(collection of documents) when disabling norms for string fields during 
indexing.
   
   Here are the configuration details for reference:
   
   **Lucene Version: 5.3.1**
   **Java Version: OpenJDK 17.0.8.1**
   **Indexer Configuration:**
   index.merge_factor: 10
   index.partition_max_doc: 5,000,000
   indexer.commit_interval_sec: 60
   indexer.commit_max_doc: 100,000
   **Merge Policy:** LogByteSizeMergePolicy
   **Test Results:**
   Please take a look at the attached image.
   <img width="887" alt="Screenshot 2025-01-02 at 10 05 48" 
src="https://github.com/user-attachments/assets/1b6729b9-fb9e-43d6-ad2f-ade0b00dad1a";
 />
   
   Could you please provide insights or clarify whether this behavior aligns 
with the expected impact on index size? Additionally, could you explain why the 
size reduction appears to be unpredictable?
   
   Thank you for your assistance!
   
   
   ### Gradle command to reproduce
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[I] Inquiry regarding storage reduction by omitting NORMS during indexing [lucene]

Reply via email to