mayya-sharipova commented on pull request #2186:
URL: https://github.com/apache/lucene-solr/pull/2186#issuecomment-782960621


   @jpountz  Adrien, thank a lot for the initial review. Sorry for the delay, 
the patch turned out to be much more involved than I expected.
   There are still few tests failing, but I will try to fix them soon.
   
   
   >  Can you give a bit more details about how this PR works at a high level? 
E.g. how does it handle the case when two threads are concurrently trying to 
add a field with different schemas?
   
   ---
   **Segment level**: ensuring a field has the same schema across all the 
documents of the segment.
   - We use `FieldSchema` for this, that's an internal field of `PerField`. It 
represents a schema of the field in the current document. 
   - With every new document we reset `FieldSchema`.  As the document fields 
are processed, we update the schema with options encountered in this document. 
Once the processing for the document is done, we compare the built 
`FieldSchema` of the current document with the corresponding `FieldInfo 
`(FieldInfo is built on a first document in which we encounter this field).
   
   Relevant code in `IndexingChaing:processDocument`:
   ```java
   ...
          if (pf.fieldGen != fieldGen) { // first time we see this field in 
this document
             ....
             pf.reset(docID);
           }
   ...
   for (int i = 0; i < fieldCount; i++) {
     PerField pf = fields[i];
     if (pf.fieldInfo == null) { // the first time we see this field in this 
segment
       initializeFieldInfo(pf);
     } else {
       pf.schema.assertSameSchema(pf.fieldInfo);
     }
   }
   ```
   
   ---
   **Index level**: ensuring a field has the same schema across all the 
documents of the index.
   
   - This check is done in `FieldInfos`. 
   - When we encounter a new field in a segment, we try to initialize it in 
`IndexingChain::initializeFieldInfo` with  the options from the current 
`FieldSchema`.  This calls `FieldInfos.Builder::add` ->  
`FieldInfos.Builder::addField`.  
   -  The  first thing in `FieldInfos.Builder::addField`  is to call 
`globalFieldNumbers.addOrGet` with the given schema options. 
`globalFieldNumbers.addOrGet`  is a synchronized method, and as I understood  
`FieldNumbers` are shared across all indexing threads of the same index.  
   - `globalFieldNumbers.addOrGet` for a field it sees the first time, will 
initialize all its maps `indexOptions`, `docValuesType` etc.  If the field 
already exists, it will check that the given in parameters schema options are 
the same as stored in its maps for the given field.
   - As `globalFieldNumbers.addOrGet` is a synchronized method only a single 
thread will be able to initialize schema options for a field. All other threads 
that deal with the same field, must confirm with the same field schema.
   
   @jpountz  Is my assumption and logic correct here? 
   
   
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to