Re: Unknown Pdx Type use case found, bug or expected?

Hitesh Khamesra Thu, 09 Mar 2017 14:09:18 -0800

Hi Roger:
Sorry to hear about this. There is system property on client side to clean 
pdx-registry when it disconnects from server. You can find details here 
https://discuss.pivotal.io/hc/en-us/articles/221351508-Getting-Stale-PDXType-error-on-client-after-clean-start-up-of-servers.
I think we should clean pdx-registry when client disconnects. I will file the 
ticket to track this issue.
For disk issue, here are some guidelines 
https://discuss.pivotal.io/hc/en-us/community/posts/208792347-Region-regionA-has-potentially-stale-data-It-is-waiting-for-another-member-to-recover-the-latest-data-.
Did you try to revoke disk store?


Thanks.hitesh  

      From: Roger Vandusen <roger.vandu...@ticketmaster.com>
 To: "u...@geode.apache.org" <u...@geode.apache.org> 
 Sent: Thursday, March 9, 2017 12:55 PM
 Subject: Unknown Pdx Type use case found, bug or expected?
   
 <!--#yiv0185086641 _filtered #yiv0185086641 {font-family:"Cambria 
Math";panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv0185086641 
{font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;}#yiv0185086641 
#yiv0185086641 p.yiv0185086641MsoNormal, #yiv0185086641 
li.yiv0185086641MsoNormal, #yiv0185086641 div.yiv0185086641MsoNormal 
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;font-family:Calibri;}#yiv0185086641
 a:link, #yiv0185086641 span.yiv0185086641MsoHyperlink 
{color:#0563C1;text-decoration:underline;}#yiv0185086641 a:visited, 
#yiv0185086641 span.yiv0185086641MsoHyperlinkFollowed 
{color:#954F72;text-decoration:underline;}#yiv0185086641 pre 
{margin:0in;margin-bottom:.0001pt;font-size:10.0pt;font-family:Courier;}#yiv0185086641
 p.yiv0185086641MsoListParagraph, #yiv0185086641 
li.yiv0185086641MsoListParagraph, #yiv0185086641 
div.yiv0185086641MsoListParagraph 
{margin-top:0in;margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;font-size:12.0pt;font-family:Calibri;}#yiv0185086641
 span.yiv0185086641EmailStyle17 
{font-family:Calibri;color:windowtext;}#yiv0185086641 
span.yiv0185086641HTMLPreformattedChar {font-family:Courier;}#yiv0185086641 
span.yiv0185086641msoIns {text-decoration:underline;color:teal;}#yiv0185086641 
.yiv0185086641MsoChpDefault {font-family:Calibri;} _filtered #yiv0185086641 
{margin:1.0in 1.0in 1.0in 1.0in;}#yiv0185086641 div.yiv0185086641WordSection1 
{}#yiv0185086641 _filtered #yiv0185086641 {} _filtered #yiv0185086641 {} 
_filtered #yiv0185086641 {} _filtered #yiv0185086641 {} _filtered 
#yiv0185086641 {} _filtered #yiv0185086641 {} _filtered #yiv0185086641 {} 
_filtered #yiv0185086641 {} _filtered #yiv0185086641 {} _filtered 
#yiv0185086641 {}#yiv0185086641 ol {margin-bottom:0in;}#yiv0185086641 ul 
{margin-bottom:0in;}-->   Hey Geode,    We have a 3 node server cluster running 
with pdx read serialized and disk store persistence for all regions and 
replication-factor=2.    We do not use cluster-configuration, we use these 
property overrides:    #configuration settings used
enable-cluster-configuration=false
use-cluster-configuration=false
cache-xml-file=geode-cache.xml

#property default overrides
distributed-system-id=1
log-level=config
enforce-unique-host=true
locator-wait-time=60
conserve-sockets=false
log-file-size-limit=64
mcast-port=0    We use these stop/start scripts:    STOP:    gfsh -e "connect 
--locator=$HOSTNAME[$LOCATOR_PORT]" \
     -e "stop server --name=$SERVER_NAME"    gfsh -e "connect 
--locator=$HOSTNAME[$LOCATOR_PORT]" \
     -e "stop locator --name=$LOCATOR_NAME"    START:    gfsh start locator \
  --properties-file=$CONF_DIR/geode.properties \
  --name=$LOCATOR_NAME \
  --port=$LOCATOR_PORT \
  --log-level=config \
  --include-system-classpath=true \
  --classpath=$CLASSPATH \
  --enable-cluster-configuration=false \
  --J=-Dlog4j.configurationFile=$CONF_DIR/log4j2.xml \
  --J=-Dgemfire.jmx-manager=true \
  --J=-Dgemfire.jmx-manger-start=true \
  --J=-Xms512m \
  --J=-Xmx512m    gfsh start server \
  --properties-file=$CONF_DIR/geode.properties \
  --cache-xml-file=$CONF_DIR/geode-cache.xml \
  --name=$SERVER_NAME \
  --server-port=$SERVER_PORT \
  --include-system-classpath=true \
  --classpath=$CLASSPATH \
  --start-rest-api=true \
  --use-cluster-configuration=false \
  --J=-Dlog4j.configurationFile=$CONF_DIR/log4j2.xml \
  --J=-Dgemfire.disk.recoverValues=false \
  --J=-Dgemfire.jmx-manager=false \
  --J=-Dgemfire.jmx-manger-start=false \
  --J=-Xms6g \
  --J=-Xmx6g       There were active proxy clients (1.0.0-incubating/GFE 9.0) 
connected while we proceeded to update the geode version from 1.0.0-incubating 
to 1.1.0.    We did a ‘scripted’ rolling geode version upgrade redeployment by 
serially stopping/deploying/restarting each server node. We had this issue 
below, which we’ve seen before and still find difficult to solve: ‘Region /xxxx 
has potentially stale data. It is waiting for another member to recover the 
latest data.’ The first node1 server hanging on restart and blocking our 
rolling serial redeployment.    So after not being able to resolve this serial 
rolling update problem (again) we decided to delete all the data (currently 
just cached lookup tables and dev WIP/POC data), redeploy the new geode version 
and restart from scratch, so we then deleted all the diskstores (including pdx 
disk store) and restarted the cluster.    REMINDER: the clients were all still 
connected and not restarted!!! (see link below for our awareness now of this 
CLIENT-SIDE error state) These clients then put data into server cluster, the 
‘put’s succeeded, the server regions show they have the data.    BUT now gfsh 
query of this server region data gives ‘Unknown pdx types’ and restarting the 
clients fails on connecting to these regions with the same error: ‘Unknown pdx 
type’.    We are seeking GEODE-USER feedback regarding:    1)      We need to 
find a working enterprise deployment solution to resolve the rolling restart 
problem with stale data alerts blocking cluster config/version updates? 2)      
We don’t believe the problem we saw was related to version upgrading? 3)      
We find it very concerning that connected clients can CORRUPT SEVER-SIDE region 
data and don’t update the pdx registry and diskstore upon ‘put’s? A FAIL of the 
client-side proxy region.put would make more sense? Why didn’t the pdx types 
cached on the client get registered and written back to the servers diskstores? 
The client PUTs DID write data into the server regions – but that data is now 
corrupted and unreadable as ‘Unknown pdx types’?   That is a major issue, even 
though we acknowledge that we would NOT be deleting active diskstores from 
running clusters in production, assuming we can solve the rolling updates 
problem.    We  are now aware of this CLIENT-SIDE error state and can see how 
it might be related to our redeployment use case above but we now have 
corrupted SERVER-SIDE data written in server regions: 
https://discuss.pivotal.io/hc/en-us/articles/206357497-IllegalStateException-Unknown-PDX-Type-on-Client-Side
       -Roger

Re: Unknown Pdx Type use case found, bug or expected?

Reply via email to