Hi Sarthak,
Thank you for you note, but I already wrote:
> Don't wait for anybody with proposal. The new GSoC site is right
place to discuss proposals.
So I expected to see and comment, if needed, your proposal on this site.
Let me remind you the site - https://summerofcode.withgoogle.com/
Best regards,
Dmitry
25.03.2016 10:17, sarthak agarwal пишет:
The deadline is today.
Sarthak
On Thu, Mar 24, 2016 at 1:52 AM, sarthak agarwal
<sarthak0...@gmail.com <mailto:sarthak0...@gmail.com>> wrote:
Hello Dmitry,
I fixed the bug (I guess).
Now coming to my proposal for GSoC, So I was thinking of working
on project #4 *Auto-detection of EPSG codes from incomplete WKT.*
What I understood from the project is that we need to predict the
EPSG code of certain files on the basis of some attributes which
are available in the file.
The attributes can be extracted from the file for which I read
this
<http://www.gdal.org/osr_tutorial.html#querying_coordinate_system>.
Now to solve this problem I thought a lot of methods but I think
the best way to solve it will be using machine learning.
The way ML will handle this problem is as follows-
1. We need to find the EPSG code for a file (testing data)
2. We have a file with some attributes (projections,datum,etc ).
3. We need to the guess the best suitable class for that file(EPSG)
4. Also, we have many files for which we know the attributes and
the corresponding class (training data).
This problem is now translated into an ML problem which can be
solved using the following models-
1. Bayesian Stastics
<https://en.wikipedia.org/wiki/Posterior_probability>
where,
posteriror probability = probability of this file have EPSG
code 'a'.
prior probability = probability of occurence of EPSG code 'a'.
likelihood probablity = cases where we saw such attributes
when the EPSG code is 'a'.
2. or we can use a simple knn where k is the number of possible
EPSG code and the dimension of the feature vector is the number of
possible attributes. we need to the find a valid and promising
weight function).
3. We can use multi-class SVM.
4. any other suggestion from the community regarding the possible
choice of the algo.
I am thinking of actually implementing all these algo(may add algo
in future depending upon the suggestion) and select the algo which
gives the best performance among all of them.
Please provide me feedback on my proposal and suggestion if I can
add/change anything.
And since very less time is left in the deadline, I would like to
convert it into proposal ASAP with your help.
Regards,
Sarthak
_______________________________________________
gdal-dev mailing list
gdal-dev@lists.osgeo.org
http://lists.osgeo.org/mailman/listinfo/gdal-dev