Re: Unknown Pdx Type use case found, bug or expected?

Udo Kohlmeyer Thu, 09 Mar 2017 14:02:35 -0800

Does anyone know why we introduced the propertyON_DISCONNECT_CLEAR_PDXTYPEIDS for the client and then make it FALSE asa default?

Surely we'd actually want to reset the pdx registry when a clientdisconnects from the servers.... GEODE-1037



On 3/9/17 12:55, Roger Vandusen wrote:

Hey Geode,
We have a 3 node server cluster running with pdx read serialized anddisk store persistence for all regions and replication-factor=2.
We do not use cluster-configuration, we use these property overrides:

//

/#configuration settings used
/*enable-cluster-configuration*=*false
use-cluster-configuration*=*false
cache-xml-file=geode-cache.xml

*/#property default overrides
/*distributed-system-id*=*1
log-level*=*config
enforce-unique-host*=*true
locator-wait-time*=*60
conserve-sockets*=*false
log-file-size-limit*=*64
mcast-port*=*0*

We use these stop/start scripts:

STOP:

gfsh -e "connect --locator=$HOSTNAME[$LOCATOR_PORT]" \
     -e "stop server --name=$SERVER_NAME"

gfsh -e "connect --locator=$HOSTNAME[$LOCATOR_PORT]" \
     -e "stop locator --name=$LOCATOR_NAME"

START:

gfsh start locator \
  --properties-file=$CONF_DIR/geode.properties \
  --name=$LOCATOR_NAME \
  --port=$LOCATOR_PORT \
  --log-level=config \
  --include-system-classpath=true \
  --classpath=$CLASSPATH \
  --enable-cluster-configuration=false \
  --J=-Dlog4j.configurationFile=$CONF_DIR/log4j2.xml \
  --J=-Dgemfire.jmx-manager=true \
  --J=-Dgemfire.jmx-manger-start=true \
  --J=-Xms512m \
  --J=-Xmx512m

gfsh start server \
  --properties-file=$CONF_DIR/geode.properties \
  --cache-xml-file=$CONF_DIR/geode-cache.xml \
  --name=$SERVER_NAME \
  --server-port=$SERVER_PORT \
  --include-system-classpath=true \
  --classpath=$CLASSPATH \
  --start-rest-api=true \
  --use-cluster-configuration=false \
  --J=-Dlog4j.configurationFile=$CONF_DIR/log4j2.xml \
  --J=-*Dgemfire.disk.recoverValues=false* \
  --J=-Dgemfire.jmx-manager=false \
  --J=-Dgemfire.jmx-manger-start=false \
  --J=-Xms6g \
  --J=-Xmx6g
There were active proxy clients (1.0.0-incubating/GFE 9.0) connectedwhile we proceeded to update the geode version from 1.0.0-incubatingto 1.1.0.
We did a ‘scripted’ rolling geode version upgrade redeployment byserially stopping/deploying/restarting each server node.
We had this issue below, which we’ve seen before and still finddifficult to solve:
‘Region /xxxx has potentially stale data. It is waiting for anothermember to recover the latest data.’
The first node1 server hanging on restart and blocking our rollingserial redeployment.
So after not being able to resolve this serial rolling update problem(again) we decided to delete all the data (currently just cachedlookup tables and dev WIP/POC data),
redeploy the new geode version and restart from scratch, so we thendeleted all the diskstores (including pdx disk store) and restartedthe cluster.
REMINDER: the clients were all still connected and not restarted!!!(see link below for our awareness now of this CLIENT-SIDE error state)
These clients then put data into server cluster, the ‘put’s succeeded,the server regions show they have the data.
BUT now gfsh query of this server region data gives ‘Unknown pdxtypes’ and restarting the clients fails on connecting to these regionswith the same error: ‘Unknown pdx type’.
We are seeking GEODE-USER feedback regarding:
1)We need to find a working enterprise deployment solution to resolvethe rolling restart problem with stale data alerts blocking clusterconfig/version updates?
2)We don’t believe the problem we saw was related to version upgrading?
3)We find it very concerning that connected clients can CORRUPTSEVER-SIDE region data and don’t update the pdx registry and diskstoreupon ‘put’s?
A FAIL of the client-side proxy region.put would make more sense?
Why didn’t the pdx types cached on the client get registered andwritten back to the servers diskstores?
The client PUTs DID write data into the server regions – but that datais now corrupted and unreadable as ‘Unknown pdx types’?
That is a major issue, even though we acknowledge that we would NOT bedeleting active diskstores from running clusters in production,assuming we can solve the rolling updates problem.
We are now aware of this CLIENT-SIDE error state and can see how itmight be related to our redeployment use case above but we now havecorrupted SERVER-SIDE data written in server regions:
https://discuss.pivotal.io/hc/en-us/articles/206357497-IllegalStateException-Unknown-PDX-Type-on-Client-Side

-Roger

Re: Unknown Pdx Type use case found, bug or expected?

Reply via email to