Hi all,

I recently posted parts 1 & 2 of a series on extracting text features for 
machine learning…

http://www.scaleunlimited.com/2013/07/10/text-feature-selection-for-machine-learning-part-1/

http://www.scaleunlimited.com/2013/07/21/text-feature-selection-for-machine-learning-part-2/

It uses Solr to generate terms from mailing list text, and then does analysis 
to extract good features for things like classification, similarity and 
clustering.

The last part will cover using Solr to implement a real-time similarity engine, 
and maybe a recommendation engine as well.

It undoubtedly has some things that are unclear or even incorrect, so please 
comment :)

Regards,

-- Ken

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr





Reply via email to