Hi Ben, This question apparently has nothing to do with R and is therefore off-topic for this list. You should post this question on a statistics forum, or seek local help.
Best, Ista On Mon, Aug 26, 2013 at 1:50 AM, Ben Harrison <h...@student.unimelb.edu.au>wrote: > Hello, I am quite a novice when it comes to predictive modelling, so would > like to see where my particular problem might lie in the spectrum of > problems that you collectively have seen in your experiences. > > Background: I have been handed a piece of software that uses a kohonen SOM > network to analyse and predict data with missing values common, but I want > to compare its results to other forms of modelling and prediction (e.g. > multi-layer perceptrons, random forests??). > > My data is a conglomeration of borehole data from hundreds of boreholes. > Some measurements were made during the drilling of the boreholes (more or > less continuous 'tool responses': geophysical well-logs), and some in the > laboratory on discrete samples of 10 cm up to metre-length scales. > > The data could be considered ordered series to some extent, though changes > in rock types with depth can result in 'step' changes in tool responses. > > My problem is not classifying the rocks, but modelling and predicting a > physical attribute of the rocks---thermal conductivity, which is a lab > measurement, and hard to come by / expensive. I want to use the more common > well-log responses to predict this attribute. > > Some boreholes have different sets of well-log data though. For example, > one might have measurements from the A and B tool, while another might have > A, B, and C tools, and a third the B and C tools. I can construct a decent > data base of about 70,000 observations of a common set of 5 tool responses, > and they have associated with them about 100 measurements of thermal > conductivity. I am mostly confident that the relationship of well-log > responses is non-linear to thermal conductivity. Linear regression has not > proven accurate. > > What 'sort' of problem is this? > > Have you seen problems like this, and what did you use to solve it? > > I have papers by people using other ANN type techniques (MLP in > particular) to model and predict thermal conductivity, but wondered if > there was something else I could try. > > Some other questions I would like a little guidance on: > Are 100 samples enough of the 'target' attribute for confident modelling > and prediction? > How would I quantify the certainty of results of modelling? > The well-log data is extensive, but if I look at the complete set of tool > responses, there is a LOT of missing data (because there is no common tool > set). Is there a way I can still use the less common tool responses? > Is discretisation of the 100 measured thermal conductivities a silly idea? > How many 'bins' can I construct? > > Thanks for reading! > Ben. > > ______________________________**________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help> > PLEASE do read the posting guide http://www.R-project.org/** > posting-guide.html <http://www.R-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.