We still have a long way to go with the data we were given by some drug
discovery scientists.
The problem is to select the few variables (Collective Variables), from a set
of variables sampled during a
Molecular Dynamics simulation, which exhibit a consistent and coherent
relationship with the given minimum-work
curve, all over the time it takes the molecule to migrate from the initial
configuration to the final configuration.
I have already tried and ruled out a simple correlation.
Someone here has suggested looking for correlation of variables and the work
curve in a time window,
for example 20 time steps wide (everty step is equal to 50 fs). But this
meaningless (on my view) because it
would dig out a transient relationship. Whereas what we need is a relationship
that lasts consistently all over the configurational transformation period.
I made some progress with techniques for Dimensionality Reduction.
The problem is that such techniques do not select variables. For instance, if I
can reduce the dimensionality,
say from 100 to 8, still I am not likely to be able to find the 8 independent
variables which carry most of the
information.
Very likely the basis of the 8-D embedding space will be obtained as functions
(most probably non-linear) of the
original 100 variable or anyhow a big subset of them.
Bottom-line: Dimensionality Reduction does not directly achieve the problem
goal which is to decimate the
number of variables sampled during MD simulations leaving out the ones that
are unimportant for the
chemical-physical reaction in question.
I would greatly appreciate suggestions & advice concerning techniques, methods,
models to perform Variable
Selection other than simple linear regression
Thank you very much.
Best regards,
Maura Edelweiss M.
tutti i telefonini TIM!
[[alternative HTML version deleted]]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.