实现模式
整体的思路就是,先导入库,划分数据,建立模型,训练,评估,之后进行预测。
|
|
示例
|
|
参数调节
以三个有重要影响的参数为例,主要包括 kernel, gamma 和 C ,参见原文的图解可以有很直观的印象。
核类型(kernel)
Here “rbf” and “poly” are useful for non-linear hyper-plane.
I would suggest you to go for linear kernel if you have large number of features (>1000) because it is more likely that the data is linearly separable in high dimensional space. Also, you can RBF but do not forget to cross validate for its parameters as to avoid over-fitting.
gamma
Kernel coefficient for ‘rbf’, ‘poly’ and ‘sigmoid’. Higher the value of gamma, will try to exact fit the as per training data set i.e. generalization error and cause over-fitting problem.
C(惩罚系数)
Penalty parameter C of the error term. It also controls the trade off between smooth decision boundary and classifying the training points correctly.
此处添加
非线性可分的分类器的公式
模型评估
cross validation score
|
|
关于思考题的想法
假想一个辅助圆吧,其圆心是Q(0,y1)
可以发现星星类(star)始终离圆心Q比较近,而圆类(circle)离Q比较远。
所以变换应该是
$$ Z=(x-0)^{2}+(y-y1)^{2} $$
参考资料
[1] Understanding Support Vector Machine algorithm from examples (along with code)
[2] 《统计学习方法》 李航