Dear PyMVPA users,
I have an unbalanced dataset of 11 classes with varying number of events per
class (due to data-driven approach). I am running leave-one-subject out
cross-validation to classify the events. I was able to implement a permutation
test where the event labels are shuffled in the training set and then tested in
real data of the leave-one-out subject. What I would like do is to classify all
11 classes in combination (not one versus others or all pairwise
classifications) and then assess if the classifier is able to predict each
class statistically above chance level. But because the data is unbalanced I
cannot assume that the chance level for each class is equal (naive chance level
would be 1/11~0.09). With the code below I was able to create a null
distribution of the classification accuracies for the whole model, but in this
case I cannot assess the statistical significance of each class by comparing
the class accuracies with the null distribution of the whole model.
I would like to extract the confusion matrix of the predictions for each
permutation round (event labels shuffled in the training set). From these
confusion matrices I could create separate null distributions of prediction
accuracies for each class and then assess the significance of each class
separately. I assumed that this could be fairly simple to do but could not find
anything related in the documentation or in this mailing list.
The relevant sections of the code:
# Classifier
clf = MLPClassifier(alpha=1, max_iter=1000) #Neural net classifier
wrapped_clf = SKLLearnerAdapter(clf) # Neural net classifier
stats.chisqprob = lambda chisq, df:stats.chi2.sf(chisq,df)
# Permutator
permutator = AttributePermutator('targets', count=1000)
distr_est = MCNullDist(permutator, tail='right', enable_ca=['dist_samples'])
# Cross validation
cv = CrossValidation(wrapped_clf, NFoldPartitioner(attr='subject'),
errorfx=lambda p,t: np.mean(p==t),
postproc=mean_sample(),
null_dist=distr_est,
enable_ca=['stats'])
bsc_null_results = cv(ds_mni)
perm_accu = cv.null_dist.ca.dist_samples # Null distribution of accuracies for
the whole model
accuracy_bs=bsc_null_results.S # Real accuracy for the whole model
confmat_bs = cv.ca.stats.matrix # Confusion matrix for classification using
real data in the training set
Is it possible to extract the confusion matrices for each permutation round? If
not I would be thankful for any advice on how to assess the significance of
class accuracies separately in this case of unbalanced data.
Sincerely,
Severi Santavirta
_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa