[
https://issues.apache.org/jira/browse/GEODE-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16021992#comment-16021992
]
ASF GitHub Bot commented on GEODE-2913:
---------------------------------------
Github user karensmolermiller commented on the issue:
https://github.com/apache/geode/pull/518
Commit
https://github.com/apache/geode/pull/518/commits/012983e58311572d155835a95d8a51a784bf1283
adds reference information about the XSD (xml) definitions for Lucene indexes.
Can all devs with knowledge of the Lucene implementation do a review of this
commit? The Geode devs I know about that would be good reviewers: @ladyVader
@upthewaterspout @nabarunnag @jhuynh1 @dihardman @DivineEnder @boglesby
> Update Lucene documentation
> ---------------------------
>
> Key: GEODE-2913
> URL: https://issues.apache.org/jira/browse/GEODE-2913
> Project: Geode
> Issue Type: Bug
> Components: docs
> Reporter: Karen Smoler Miller
> Assignee: Karen Smoler Miller
>
> Improvements to the code base that need to be reflected in the docs:
> * Change LuceneService.createIndex to use a factory pattern
> {code:java}
> luceneService.createIndex(region, index, ...)
> {code}
> changes to
> {code:java}
> luceneService.createIndexFactory()
> .addField("field1name")
> .addField("field2name")
> .create()
> {code}
> * Lucene indexes will *NOT* be stored in off-heap memory.
> * Document how to configure an index on accessors - you still need to create
> the Lucene index before creating the region, even though this member does not
> hold any region data.
> If the index is not defined on the accessor, an exception like this will be
> thrown while attempting to create the region:
> {quote}
> [error 2017/05/02 15:19:26.018 PDT <main> tid=0x1]
> java.lang.IllegalStateException: Must create Lucene index full_index on
> region /data because it is defined in another member.
> Exception in thread "main" java.lang.IllegalStateException: Must create
> Lucene index full_index on region /data because it is defined in another
> member.
> at
> org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.handleCacheDistributionAdvisee(CreateRegionProcessor.java:478)
> at
> org.apache.geode.internal.cache.CreateRegionProcessor$CreateRegionMessage.process(CreateRegionProcessor.java:379)
> {quote}
> * Do not need to create a Lucene index on a client with a Proxy cache. The
> Lucene search will always be done on the server. Besides, _you can't create
> an index on a client._
> * If you configure Invalidates for region entries (alone or as part of
> expiration), these will *NOT* invalidate the Lucene indexes.
> The problem with this is the index contains the keys, but the region doesn't,
> so the query produces results that don't exist.
> In this test, the first time the query is run, it produces N valid results.
> The second time it is run it produces N empty results:
> ** load entries
> ** run query
> ** invalidate entries
> ** run query again
> * Destroying a region will *NOT* automatically destroy any Lucene index
> associated with that region. Instead, attempting to destroy a region with a
> Lucene index will throw a colocated region exception.
> An IllegalStateException is thrown:
> {quote}
> java.lang.IllegalStateException: The parent region [/data] in colocation
> chain cannot be destroyed, unless all its children
> [[/cusip_index#_data.files]] are destroyed
> at
> org.apache.geode.internal.cache.PartitionedRegion.checkForColocatedChildren(PartitionedRegion.java:7231)
> at
> org.apache.geode.internal.cache.PartitionedRegion.destroyRegion(PartitionedRegion.java:7243)
> at
> org.apache.geode.internal.cache.AbstractRegion.destroyRegion(AbstractRegion.java:308)
> at
> DestroyLuceneIndexesAndRegionFunction.destroyRegion(DestroyLuceneIndexesAndRegionFunction.java:46)
> {quote}
> * The process to change a Lucene index using gfsh:
> 1. export region data
> 2. destroy Lucene index, destroy region
> 3. create new index, create new region without user-defined business
> logic callbacks
> 4. import data with option to turn on callbacks (to invoke Lucene Async
> Event Listener to index the data)
> 5. alter region to add user-defined business logic callbacks
> * Make sure there are no references to replicated regions as they are not
> supported.
> * Document security implementation and defaults. If a user has security
> configured for their cluster, creating a Lucene index requires DATA:MANAGE
> privilege (similar to OQL), but doing Lucene queries requires DATA:WRITE
> privilege because a function is called (different from OQL which requires
> only DATA:READ privilege). Here are all the required privileges for the gfsh
> commands:
> ** create index requires DATA:MANAGE:region
> ** describe index requires CLUSTER:READ
> ** list indexes requires CLUSTER:READ
> ** search index requires DATA:WRITE
> ** destroy index requires DATA:MANAGE:region
> * A user cannot create a Lucene index on a region that has eviction
> configured with local destroy. If using Lucene indexing, eviction can only be
> configured with overflow to disk. In this case, only the region data is
> overflowed to disk, *NOT* the Lucene index. An UnsupportedOperationException
> is thrown:
> {quote}
> [error 2017/05/02 16:12:32.461 PDT <main> tid=0x1]
> java.lang.UnsupportedOperationException: Lucene indexes on regions with
> eviction and action local destroy are not supported
> Exception in thread "main" java.lang.UnsupportedOperationException: Lucene
> indexes on regions with eviction and action local destroy are not supported
> at
> org.apache.geode.cache.lucene.internal.LuceneRegionListener.beforeCreate(LuceneRegionListener.java:85)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.invokeRegionBefore(GemFireCacheImpl.java:3154)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.createVMRegion(GemFireCacheImpl.java:3013)
> at
> org.apache.geode.internal.cache.GemFireCacheImpl.basicCreateRegion(GemFireCacheImpl.java:2991)
> {quote}
> * We can use the same field name in different objects where the field has a
> different data type, but this may have unexpected consequences. For example,
> if I created an index on the field SSN with these following entries
> Object_1 object_1 has String SSN = "1111"
> Object_2 object_2 has Integer SSN = 1111
> Object_3 object_3 has Float SSN = 1111.0
> Integers and Floats will not be converted into strings. They remain as
> IntPoint and FloatPoint in the Lucene world. The standard analyzer will not
> try to tokenize these value. The standard analyzer will only try to break up
> string values. So,
> ** If I do a string search for "SSN: 1111" , Lucene will return object_1.
> ** If I do an IntRangeQuery for upper limit : 1112 and lower limit : 1110 ,
> Lucene will return object_2
> ** If I do a FloatRangeQuery with upper limit 1111.5 and lower limit :
> 1111.0 , Lucene will return object_3
> * Similar to OQL, Lucene queries are not supported with transactions; an
> exception will be thrown. A LuceneQueryException is thrown on the
> client/accessor:
> {quote}
> Exception in thread "main"
> org.apache.geode.cache.lucene.LuceneQueryException: Lucene Query cannot be
> executed within a transaction
> at
> org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findTopEntries(LuceneQueryImpl.java:124)
> at
> org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:98)
> at
> org.apache.geode.cache.lucene.internal.LuceneQueryImpl.findPages(LuceneQueryImpl.java:94)
> at TestClient.executeQuerySingleMethod(TestClient.java:196)
> at TestClient.main(TestClient.java:59)
> {quote}
> This TransactionException is logged on the server.
> * Backups should only be done for regions with Lucene indexes when the system
> is 'quiet'; i.e. no puts, updates, or deletes are in progress. Otherwise the
> backups for Lucene indexes will not match the data in the region that is
> being indexed (i.e. incremental backups will not be consistent between the
> data region and the Lucene index region due to delayed processing associated
> with the AEQ). If the region data needs to be restored from backup, then you
> must follow the same process for changing a Lucene index in order to
> re-create the index region.
> * Update docs section on "Memory Requirements for Cached Data" to include
> conservative estimate of 737 bytes per entry overhead for a Lucene index. All
> the other caveats mentioned for OQL indexes also apply for Lucene indexes...
> your mileage may vary...
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)