Dear PyMVPA users,

I have an unbalanced dataset of 11 classes with varying number of events per 
class (due to data-driven approach). I am running leave-one-subject out 
cross-validation to classify the events. I was able to implement a permutation 
test where the event labels are shuffled in the training set and then tested in 
real data of the leave-one-out subject. What I would like do is to classify all 
11 classes in combination (not one versus others or all pairwise 
classifications) and then assess if the classifier is able to predict each 
class statistically above chance level. But because the data is unbalanced I 
cannot assume that the chance level for each class is equal (naive chance level 
would be 1/11~0.09). With the code below I was able to create a null 
distribution of the classification accuracies for the whole model, but in this 
case I cannot assess the statistical significance of each class by comparing 
the class accuracies with the null distribution of the whole model.

I would like to extract the confusion matrix of the predictions for each 
permutation round (event labels shuffled in the training set). From these 
confusion matrices I could create separate null distributions of prediction 
accuracies for each class and then assess the significance of each class 
separately. I assumed that this could be fairly simple to do but could not find 
anything related in the documentation or in this mailing list.

The relevant sections of the code:

# Classifier
clf = MLPClassifier(alpha=1, max_iter=1000) #Neural net classifier
wrapped_clf = SKLLearnerAdapter(clf) # Neural net classifier
stats.chisqprob = lambda chisq, df:stats.chi2.sf(chisq,df) 

# Permutator
permutator = AttributePermutator('targets', count=1000)
distr_est = MCNullDist(permutator, tail='right', enable_ca=['dist_samples'])

# Cross validation
cv = CrossValidation(wrapped_clf, NFoldPartitioner(attr='subject'),
                 errorfx=lambda p,t: np.mean(p==t),
                 postproc=mean_sample(),
                 null_dist=distr_est,
                 enable_ca=['stats'])

bsc_null_results = cv(ds_mni)
perm_accu = cv.null_dist.ca.dist_samples # Null distribution of accuracies for 
the whole model
accuracy_bs=bsc_null_results.S # Real accuracy for the whole model
confmat_bs = cv.ca.stats.matrix # Confusion matrix for classification using 
real data in the training set

Is it possible to extract the confusion matrices for each permutation round? If 
not I would be thankful for any advice on how to assess the significance of 
class accuracies separately in this case of unbalanced data. 

Sincerely,
Severi Santavirta


_______________________________________________
Pkg-ExpPsy-PyMVPA mailing list
[email protected]
https://alioth-lists.debian.net/cgi-bin/mailman/listinfo/pkg-exppsy-pymvpa

Reply via email to