
  • 登录
  • 忘记密码?点击找回


  • 获取手机验证码 60
  • 注册


  • 获取手机验证码60
  • 找回
毕业论文网 > 毕业论文 > 海洋工程类 > 船舶与海洋工程 > 正文


 2021-10-24 15:50:06  

摘 要


关键词:数据清洗 异常值 聚类算法


With the rapid development of information technology, data is growing explosively, and data mining arises at the historic moment. Data mining is a technology to acquire knowledge from data, so the quality of data is particularly important. However, due to manual omissions, network errors and other reasons, there are a variety of data problems, including abnormal data values, duplicate records and missing data values, and these dirty data will lead to low credibility of the information mined. It is particularly important to preprocess data before data mining, and data cleaning is the key technology of data preprocessing. This paper mainly studies the data cleaning technology in data mining, focusing on the cleaning of outlier data. Through the simulation of outlier judgment based on statistical method and outlier judgment based on clustering algorithm, the performance of each method is compared, and the corresponding analysis results are given, which can improve the useful reference for numerical processing methods. Although the traditional outlier judgment technology plays a certain role, the judgment effect is not good when the conditions are not suitable. Clustering algorithm is a valuable achievement of modern data mining technology, which is greatly improved compared with traditional statistics-based data cleaning methods in terms of applicable conditions and application effects. therefore, the study of the use of clustering algorithms and their advantages and disadvantages has become the top priority of modern data cleaning work.

Keywords: data cleaning ,outlier ,clustering algorithm

目 录

摘 要 4

Abstract 5

第一章 绪 论 8

1.1课题研究背景和意义 8

1.2课题研究现状 8

1.3课题主要任务 9

第二章 数据清理 11

2.1数据清理的定义 11

2.2数据清理的方法 11

2.2.1 基于统计分析的方法 12

2.2.2 基于数据特征的方法 12

2.3异常值的处理 13

2.3.1 基于统计分析的方法 13

2.3.2 基于数据特征的方 13

2.4本章小结 14

第三章 基于统计分析的数据清理方法 15

3.1异常值定义 15

3.2异常值剔除准则 15

3.3异常值处理方法 17

3.4仿真计算 18

3.5本章小结 21

第四章 基于聚类的异常值清理方法 22

4.1 聚类算法 22

4.1.1聚类算法定义 22

4.1.2 聚类算法分类 22

4.2两种常用的聚类算法 24

4.2.1K-means聚类算法 24

4.2.2 K-nn近邻算法 25

4.3仿真计算 26

第五章 总结与展望 33

5.1全文总结 33

5.2 工作展望 33

参考文献 34

致谢 36

第一章 绪 论





您需要先支付 50元 才能查看全部内容!立即支付


Copyright © 2010-2022 毕业论文网 站点地图