benwtrent commented on issue #12342:
URL: https://github.com/apache/lucene/issues/12342#issuecomment-1644597999

   I updated the script for gathering the data to handle adversarial cases of 
magnitudes in order and reverse order. 
   
   I have ran the in-order version so far, testing the rest now. 
   
   ORDERED
   
   ```
   WARNING: Gnuplot module not present; will not make charts
   recall       latency nDoc    fanout  maxConn beamWidth       visited index ms
   0.741         0.33   400000  0       32      200     10      0       1.00    
post-filter
   0.979         1.67   400000  90      32      200     100     0       1.00    
post-filter
   0.992         2.89   400000  190     32      200     200     0       1.00    
post-filter
   ```
   
   <details>
    <summary> <h2>Updated script</h2></summary>
   
   ```python
   import numpy as np
   import pyarrow.parquet as pq
   
   tb1 = pq.read_table("train-00000-of-00004-1a1932c9ca1c7152.parquet", 
columns=['emb'])
   tb2 = pq.read_table("train-00001-of-00004-f4a4f5540ade14b4.parquet", 
columns=['emb'])
   tb3 = pq.read_table("train-00002-of-00004-ff770df3ab420d14.parquet", 
columns=['emb'])
   tb4 = pq.read_table("train-00003-of-00004-85b3dbbc960e92ec.parquet", 
columns=['emb'])
   
   np1 = tb1[0].to_numpy()
   np2 = tb2[0].to_numpy()
   np4 = tb4[0].to_numpy()
   np3 = tb3[0].to_numpy()
   
   np_total = np.concatenate((np1, np2, np3, np4))
   
   # Have to convert to a list here to get
   # the numpy ndarray's shape correct later
   # There's probably a better way...
   flat_ds = list()
   for vec in np_total:
       flat_ds.append(vec)
   
   np_flat_ds = np.array(flat_ds)
   
   # Shape is (485859, 768) and dtype is float32
   np_flat_ds
   
   with open("wiki768.test", "w") as out_f:
       np_flat_ds[475858:-1].tofile(out_f)
   
   magnitudes = np.linalg.norm(np_flat_ds[0:400000], axis=1)
   indices = np.argsort(magnitudes)
   np_flat_ds_sorted = np_flat_ds[indices]
   
   with open("wiki768.ordered.train", "w") as out_f:
       np_flat_ds_sorted.tofile(out_f)
   
   with open("wiki768.reversed.train", "w") as out_f:
       np.flip(np_flat_ds_sorted).tofile(out_f)
   
   with open("wiki768.random.train", "w") as out_f:
       np.random.shuffle(np_flat_ds_sorted)
       np_flat_ds_sorted.tofile(out_f)
   ```
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to