支持向量机的优化算法实现毕业论文

2021-06-08 00:22:30

摘要

支持向量机 (SVM) 是根据统计学习理论产生并基于结构风险最小化原则(SRM)的机器学习方法。由于其出色的性能，现已在很多领域被应用。但同时，寻找SVM的最优参数也成为了新的研究热点。本课题的主要工作如下：

首先对SVM理论进行了详细的分析，包括其理论背景统计学习理论以及基于该理论的SVM算法，并介绍了本文将要优化的两个参数即惩罚因子C以及核函数参数。

然后列出了常见优化算法包括网格搜索算法、遗传算法和粒子群算法，分析了算法思想和在优化SVM参数时的具体操作步骤。

接着将上述三个算法应用于UCI数据库中的三个数据集包括Wine，Balance Scale，wdbc并在matlab平台上得出算法的其他参数如交叉验证参数k等对分类器性能的影响大小以及每个算法对于每个数据集的分类准确率和最优参数对，进而分析数据结果并比较这三种算法的优缺点。

最后得出结论：基于三个算法优化SVM惩罚因子C和核函数参数时，一些参数包括交叉验证的参数k、待求参数和的设置范围以及粒子群算法中加速因子，等对SVM分类器性能基本没有影响，故在取值上都可采取其默认值。而就上述三个算法而言，本文发现当数据集中所包含的属性个数较少时，那么选择网格搜索算法比较好，因为运行时间少且准确率较高；如果是大样本或者多属性数目，则选择遗传算法或者粒子群算法更好，但是由于粒子群算法易陷入局部最优中，所以有些时候遗传算法会更适合。另外，不同算法在搜索同一数据集时，即使最终的分类准确率相同，但得到的最优参数各不同，这表示最优参数不只一组，搜索路径的不同决定了最优参数的差异。

关键词：支持向量机；网格搜索算法；遗传算法；粒子群算法

Abstract

Support vector machine (SVM) is a method of machine learning which generated from the framework of statistical learning theory and based on structural risk minimization principle. Due to its excellent performance, it has been widely used in various fields. But at the same time, finding the optimal parameters of SVM has become a new research hotspot. The main contents of this paper are as follows:

First SVM theory is analyzed in detail, including its theoretical background of the statistical learning theory and SVM algorithm based on the theory. Moreover, this article will introduce two parameters to be optimized which are penalty factor C and the parameter in kernel function.

Then the paper lists the common optimization algorithm including GSA, GA and PSO, not only analyses the thought of these algorithms, but comes up with specific steps in the optimization of the SVM parameters.

Next, the above three algorithms are applied to three data sets include Wine, Balance Scale and wdbc from UCI database, and gets the effects of other parameters in the three algorithms on the performance of the classifier such as cross validation parameter k, etc. According to the comparison results in accuracy and the optimal parameters of each algorithm in each data set, analysis and concludes advantages or disadvantages of these three algorithms.

Finally draw the conclusion: when based on the three algorithms to optimize the SVM parameters include penalty factor C and the parameter in kernel function, we discover that some of the parameters including cross validation parameter k, the range of C and , and acceleration factor , have little affect on the performance of the SVM classifier, so the values can be taken from its default value. And in terms of the three algorithms, we can find that when the data set contains the less number of attributes, then choose the grid search algorithm will be better, because its less running time and higher accurate rate; if it is a large sample or lots of attributes, select genetic algorithm or particle swarm algorithm is better, but due to the particle swarm optimization algorithm is easy to fall into local optimum, so sometimes genetic algorithm will be more suitable. In addition, when searching in the same data set, for the three algorithms, even if the classification accuracy rate is the same, however，the optimal parameter obtained is different from each other, which means that there is more than one set of optimal parameter, and different search path determines different optimal parameter.

Key words: SVM; GSA; GA; PSO

机器学习是一门多领域交叉学科，它专门研究计算机对人类行为的模拟或实现，并从中学习和积累知识，然后在自身性能的基础上，通过计算已有数据的规律性，并从其规律性出发，最终达到分析与预测未知或无法观测的数据的目的^[1]。由于实际中样本数的有限性，与理想所需求的大样本数据存在差异，导致了统计学习理论(SLT)的发展。

统计学习理论是针对有限训练样本的研究理论，具体而言，该理论通过分析训练数据从而找到无法被理论研究的样本规律，然后将这些规律应用在未来数据上，做出预测。

SVM (Support Vector Machine，支持向量机) 就是在SLT的框架上产生并发展的，它建立在结构风险最小原则基础上^[2]。根据有限的样本信息在模型的复杂性（即对特定训练样本的学习精度）和学习能力（即无错误地识别任意样本的能力）之间寻求最佳折中^[3]，使其获得最好的推广性能，是一个有监督的学习模型。

由于支持向量机出色的性质，随着其不断发展，在贝尔实验室应用支持向量机于模式识别领域中的手写数字识别上取得了较大的成功后，SVM的有关算法被广泛地研究和应用。在人脸识别、图像处理、数据挖掘等领域的研究取得了大量的研究成果。

您需要先支付 50元 才能查看全部内容！立即支付

注册

找回密码

支持向量机的优化算法实现毕业论文

Abstract

目录

您可能感兴趣的文章

最新文档

推荐栏目

登录

注册

找回密码

支持向量机的优化算法实现毕业论文

Abstract

目 录

您可能感兴趣的文章

最新文档

推荐栏目

目录