[jira] [Commented] (GEODE-2913) Update Lucene documentation

ASF GitHub Bot (JIRA) Wed, 17 May 2017 15:18:25 -0700

    [ 
https://issues.apache.org/jira/browse/GEODE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16014856#comment-16014856
 ]


ASF GitHub Bot commented on GEODE-2913:
---------------------------------------

Github user joeymcallister commented on a diff in the pull request:

    https://github.com/apache/geode/pull/518#discussion_r117123398
  
    --- Diff: geode-docs/tools_modules/lucene_integration.html.md.erb ---
    @@ -135,4 +117,164 @@ gfsh> lucene search --regionName=/orders 
-queryStrings="John*" --defaultField=fi
         </region>
     </cache>
     ```
    +## <a id="lucene-index-query" class="no-quick-link"></a>Queries
     
    +### <a id="gfsh-query-example" class="no-quick-link"></a>Gfsh Example to 
Query using a Lucene Index
    +
    +For details, see the [gfsh search 
lucene](gfsh/command-pages/search.html#search_lucene") command reference page.
    +
    +``` pre
    +gfsh> lucene search --regionName=/orders -queryStrings="John*" 
--defaultField=field1 --limit=100
    +```
    +
    +### <a id="api-query-example" class="no-quick-link"></a>Java API Example 
to Query using a Lucene Index
    +
    +``` pre
    +LuceneQuery<String, Person> query = 
luceneService.createLuceneQueryFactory()
    +  .setResultLimit(10)
    +  .create(indexName, regionName, "name:John AND zipcode:97006", 
defaultField);
    +
    +Collection<Person> results = query.findValues();
    +```
    +
    +## <a id="lucene-index-destroy" class="no-quick-link"></a>Destroying an 
Index
    +
    +Since a region destroy operation does not cause the destruction
    +of any Lucene indexes,
    +destroy any Lucene indexes prior to destroying the associated region.
    +
    +### <a id="API-destroy-example" class="no-quick-link"></a>Java API Example 
to Destroy a Lucene Index
    +
    +``` pre
    +luceneService.destroyIndex(indexName, regionName);
    +```
    +An attempt to destroy a region with a Lucene index will result in
    +an `IllegalStateException`,
    +issuing an error message similar to:
    +
    +``` pre
    +java.lang.IllegalStateException: The parent region [/orders] in colocation 
chain cannot be destroyed,
    + unless all its children [[/indexName#_orders.files]] are destroyed
    +at 
org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(PartitionedRegion.java:7231)
    +at 
org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7243)
    +at 
org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:308)
    +at 
DestroyLuceneIndexesAndRegionFunction.destroyRegion(DestroyLuceneIndexesAndRegionFunction.java:46)
    +```
    +### <a id="gfsh-destroy-example" class="no-quick-link"></a>Gfsh Example to 
Destroy a Lucene Index
    +
    +For details, see the [gfsh destroy lucene 
index](gfsh/command-pages/destroy.html#destroy_lucene_index") command reference 
page.
    +
    +The error message that results from an attempt to destroy a region
    +prior to destroying its associated Lucene index
    +issues an error message similar to:
    +
    +``` pre
    +Error occurred while destroying region "orders".
    + Reason: The parent region [/orders] in colocation chain cannot be 
destroyed,
    + unless all its children [[/indexName#_orders.files]] are destroyed
    +```
    +
    +## <a id="lucene-index-change" class="no-quick-link"></a>Changing an Index
    +
    +Changing an index requires rebuilding it.
    +Implement these steps in `gfsh` to change an index.
    +
    +1. Export all region data
    +2. Destroy the Lucene index
    +3. Destroy the region
    +4. Create a new index
    +5. Create a new region without the user-defined business logic callbacks
    +6. Import the region data with the option to turn on callbacks. 
    +The callbacks will be to invoke a Lucene async event listener to index
    +the data.
    +7. Alter the region to add the user-defined business logic callbacks
    +
    +## <a id="addl-gfsh-api" class="no-quick-link"></a>Additional Gfsh Commands
    +
    +See the [gfsh describe lucene 
index](gfsh/command-pages/describe.html#describe_lucene_index") command 
reference page for the command that prints details about
    +a specific index.
    +
    +See the [gfsh list lucene 
index](gfsh/command-pages/list.html#list_lucene_index") command reference page
    +for the command that prints details about the 
    +Lucene indexes created for all members.
    +
    +# <a id="LuceneRandC" class="no-quick-link"></a>Requirements and Caveats
    +
    +- Join queries between regions are not supported.
    +- Nested objects are not supported.
    +- Lucene indexes will not be stored within off-heap memory.
    +- Lucene queries from within transactions are not supported.
    +On an attempt to query from within a transaction,
    +a `LuceneQueryException` is thrown, issuing an error message
    +on the client (accessor) similar to:
    +
    +``` pre
    +Exception in thread "main" 
org.apache.geode.cache.lucene.LuceneQueryException:
    + Lucene Query cannot be executed within a transaction
    +at 
org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(LuceneQueryImpl.java:124)
    +at 
org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:98)
    +at 
org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:94)
    +at TestClient.executeQuerySingleMethod(TestClient.java:196)
    +at TestClient.main(TestClient.java:59)
    +```
    +- If the Lucene index is not created prior to creating the region,
    +an exception will be thrown while attempting to create the region,
    +issuing an error message simlar to:
    +
    +``` pre
    +[error 2017/05/02 15:19:26.018 PDT <main> tid=0x1] 
java.lang.IllegalStateException:
    + Must create Lucene index full_index on region /data because it is defined 
in another member.
    +Exception in thread "main" java.lang.IllegalStateException:
    + Must create Lucene index full_index on region /data because it is defined 
in another member.
    +at 
org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(CreateRegionProcessor.java:478)
    +at 
org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(CreateRegionProcessor.java:379)
    +```
    +- An invalidate of a region entry does not invalidate a corresponding
    +Lucene index entry.
    +A query on a Lucene index that contains values which
    +have been invalidated can return results that no longer exist.
    +Therefore, do not combine entry invalidation with queries on Lucene 
indexes.
    +- Lucene indexes are not supported for regions that have eviction 
configured
    +with a local destroy.
    +Eviction can be configured with overflow to disk,
    +but only the region data is overflowed to disk,
    +not the Lucene index.
    +On an attempt to create a region with eviction configured to do local 
destroy
    +(with a Lucene index),
    +an `UnsupportedOperationException` will be thrown,
    +issuing an error message simlar to:
    +
    +``` pre
    +[error 2017/05/02 16:12:32.461 PDT <main> tid=0x1] 
java.lang.UnsupportedOperationException:
    + Lucene indexes on regions with eviction and action local destroy are not 
supported
    +Exception in thread "main" java.lang.UnsupportedOperationException:
    + Lucene indexes on regions with eviction and action local destroy are not 
supported
    +at 
org.apache.geode.cache.lucene.internal.LuceneRegionListener.beforeCreate(LuceneRegionListener.java:85)
    +at 
org.apache.geode.internal.cache.GemFireCacheImpl.invokeRegionBefore(GemFireCacheImpl.java:3154)
    +at 
org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3013)
    +at 
org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2991)
    +```
    +- Be aware that the same field name used in different objects
    +where the field has a different data type 
    +may have unexpected consequences.
    +For example, if an index on the field SSN has the following entries
    +    - `Object_1 object_1` has String SSN = "1111"
    +    - `Object_2 object_2` has Integer SSN = 1111
    +    - `Object_3 object_3` has Float SSN = 1111.0
    +
    +    Integers and floats will not be converted into strings.
    +    They remain as `IntPoint` and `FloatPoint` within Lucene.
    +    The standard analyzer will not try to tokenize these values.
    +    The standard analyzer will only try to break up string values.
    +    So, a string search for "SSN: 1111" will return `object_1`.
    +    An `IntRangeQuery` for `upper limit : 1112` and `lower limit : 1110`
    +will return `object_2`.
    +    And, a `FloatRangeQuery` with `upper limit : 1111.5` and `lower limit 
: 1111.0`
    +will return `object_3`.
    +- Backups should only be made for regions with Lucene indexes
    +when there are no puts, updates, or deletes are in progress.
    --- End diff --
    
    "or deletes in progress."


> Update Lucene documentation
> ---------------------------
>
>                 Key: GEODE-2913
>                 URL: https://issues.apache.org/jira/browse/GEODE-2913
>             Project: Geode
>          Issue Type: Bug
>          Components: docs
>            Reporter: Karen Smoler Miller
>            Assignee: Karen Smoler Miller
>
> Improvements to the code base that need to be reflected in the docs:
> * Change LuceneService.createIndex to use a factory pattern
> {code:java}
> luceneService.createIndex(region, index, ...)
> {code}
> changes to
> {code:java}
> luceneService.createIndexFactory()
> .addField("field1name")
> .addField("field2name")
> .create()
> {code}
> *  Lucene indexes will *NOT* be stored in off-heap memory.
> * Document how to configure an index on accessors - you still need to create 
> the Lucene index before creating the region, even though this member does not 
> hold any region data.
> If the index is not defined on the accessor, an exception like this will be 
> thrown while attempting to create the region:
> {quote}
> [error 2017/05/02 15:19:26.018 PDT <main> tid=0x1] 
> java.lang.IllegalStateException: Must create Lucene index full_index on 
> region /data because it is defined in another member.
> Exception in thread "main" java.lang.IllegalStateException: Must create 
> Lucene index full_index on region /data because it is defined in another 
> member.
> at 
> org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(CreateRegionProcessor.java:478)
> at 
> org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(CreateRegionProcessor.java:379)
> {quote}
> * Do not need to create a Lucene index on a client with a Proxy cache. The 
> Lucene search will always be done on the server.  Besides, _you can't create 
> an index on a client._
> * If you configure Invalidates for region entries (alone or as part of 
> expiration), these will *NOT* invalidate the Lucene indexes.
> The problem with this is the index contains the keys, but the region doesn't, 
> so the query produces results that don't exist.
> In this test, the first time the query is run, it produces N valid results. 
> The second time it is run it produces N empty results:
> ** load entries
> ** run query
> ** invalidate entries
> ** run query again
> *  Destroying a region will *NOT* automatically destroy any Lucene index 
> associated with that region. Instead, attempting to destroy a region with a 
> Lucene index will throw a colocated region exception. 
> An IllegalStateException is thrown:
> {quote}
> java.lang.IllegalStateException: The parent region [/data] in colocation 
> chain cannot be destroyed, unless all its children 
> [[/cusip_index#_data.files]] are destroyed
> at 
> org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(PartitionedRegion.java:7231)
> at 
> org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7243)
> at 
> org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:308)
> at 
> DestroyLuceneIndexesAndRegionFunction.destroyRegion(DestroyLuceneIndexesAndRegionFunction.java:46)
> {quote}
> * The process to change a Lucene index using gfsh: 
>       1. export region data
>       2. destroy Lucene index, destroy region 
>       3. create new index, create new region without user-defined business 
> logic callbacks
>       4. import data with option to turn on callbacks (to invoke Lucene Async 
> Event Listener to index the data)
>       5. alter region to add user-defined business logic callbacks
> * Make sure there are no references to replicated regions as they are not 
> supported.
> * Document security implementation and defaults.  If a user has security 
> configured for their cluster, creating a Lucene index requires DATA:MANAGE 
> privilege (similar to OQL), but doing Lucene queries requires DATA:WRITE 
> privilege because a function is called (different from OQL which requires 
> only DATA:READ privilege). Here are all the required privileges for the gfsh 
> commands:
> ** create index requires DATA:MANAGE:region
> ** describe index requires CLUSTER:READ
> ** list indexes requires CLUSTER:READ
> ** search index requires DATA:WRITE
> ** destroy index requires DATA:MANAGE:region
> * A user cannot create a Lucene index on a region that has eviction 
> configured with local destroy. If using Lucene indexing, eviction can only be 
> configured with overflow to disk. In this case, only the region data is 
> overflowed to disk, *NOT* the Lucene index. An UnsupportedOperationException 
> is thrown:
> {quote}
> [error 2017/05/02 16:12:32.461 PDT <main> tid=0x1] 
> java.lang.UnsupportedOperationException: Lucene indexes on regions with 
> eviction and action local destroy are not supported
> Exception in thread "main" java.lang.UnsupportedOperationException: Lucene 
> indexes on regions with eviction and action local destroy are not supported
> at 
> org.apache.geode.cache.lucene.internal.LuceneRegionListener.beforeCreate(LuceneRegionListener.java:85)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.invokeRegionBefore(GemFireCacheImpl.java:3154)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3013)
> at 
> org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2991)
> {quote}
> * We can use the same field name in different objects where the field has a 
> different data type, but this may have unexpected consequences. For example, 
> if I created an index on the field SSN with these following entries
>       Object_1 object_1 has String SSN = "1111"
>       Object_2 object_2 has Integer SSN = 1111
>       Object_3 object_3 has Float SSN = 1111.0
> Integers and Floats will not be converted into strings. They remain as 
> IntPoint and FloatPoint in the Lucene world. The standard analyzer will not 
> try to tokenize these value. The standard analyzer will only try to break up 
> string values. So,
> **  If I do a string search for "SSN: 1111" , Lucene will return object_1.
> **  If I do an IntRangeQuery for upper limit : 1112 and lower limit : 1110 , 
> Lucene will return object_2
> **  If I do a FloatRangeQuery with upper limit 1111.5 and lower limit : 
> 1111.0 , Lucene will return object_3
> * Similar to OQL, Lucene queries are not supported with transactions; an 
> exception will be thrown. A LuceneQueryException is thrown on the 
> client/accessor:
> {quote}
> Exception in thread "main" 
> org.apache.geode.cache.lucene.LuceneQueryException: Lucene Query cannot be 
> executed within a transaction
> at 
> org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(LuceneQueryImpl.java:124)
> at 
> org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:98)
> at 
> org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:94)
> at TestClient.executeQuerySingleMethod(TestClient.java:196)
> at TestClient.main(TestClient.java:59)
> {quote}
> This TransactionException is logged on the server.
> * Backups should only be done for regions with Lucene indexes when the system 
> is 'quiet'; i.e. no puts, updates, or deletes are in progress. Otherwise the 
> backups for Lucene indexes will not match the data in the region that is 
> being indexed (i.e. incremental backups will not be consistent between the 
> data region and the Lucene index region due to delayed processing associated 
> with the AEQ). If the region data needs to be restored from backup, then you 
> must follow the same process for changing a Lucene index in order to 
> re-create the index region.
> *  Update docs section on "Memory Requirements for Cached Data" to include 
> conservative estimate of 737 bytes per entry overhead for a Lucene index. All 
> the other caveats mentioned for OQL indexes also apply for Lucene indexes... 
> your mileage may vary...



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Commented] (GEODE-2913) Update Lucene documentation

Reply via email to