基于python的大规模图数据布局策略研究与实现任务书
2020-04-13 15:54:37
1. 毕业设计(论文)主要内容:
随着图规模的扩大,存储分级成为系统性能瓶颈。
本课题考虑cpu 缓存和内存的速度差异特性,对内存以邻接表或邻接矩阵表示的图数据进行布局优化。
另外,本课题分析图算法的基本遍历查询操作及处理器执行图算法时的访问顺序特征,分析cpu 缓存友好的影响因素,建立大规模图系统的典型算法和图的类型,最终通过pyhton程序予以实现。
2. 毕业设计(论文)主要任务及要求
1.要求所设计的程序符合计算机体系结构的特点,能够提高图计算过程内存读写性能。
2.在linux系统下开发基于python的图数据处理算法程序,实现希尔伯特图等图排序算法进行图的重新布局,提升图数据的空间局部性。
将未处理的图数据与处理过的图数据用需要大规模图数据的程序进行性能测试,比较图处理算法带来的性能提升。
3. 毕业设计(论文)完成任务的计划与安排
(1)2018/1/14—2018/3/5:确定选题,查阅文献,外文翻译和撰写开题报告;
(2)2018/3/6—2018/4/30:系统架构、程序设计与开发、系统测试与完善;
(3)2018/5/1—2018/5/25:撰写及修改毕业论文;
(4)2018/5/26—2018/6/6:准备答辩。
4. 主要参考文献
[1] Guo Y, Varbanescu A L, Iosup A, et al. Benchmarking graph-processing platforms: a
vision. in: Proceedings of the 5th ACM/SPEC international conference on
Performance engineering. New York. ACM, 2014: 289-292
[2] Korf R E. Depth-first iterative-deepening: An optimal admissible tree search.
Artificial intelligence, 1985, 27(1): 97-109
[3] Sun Z, Wang H, Wang H, et al. Efficient subgraph matching on billion node graphs.
Proceedings of the VLDB Endowment, 2012, 5(9): 788-799
[4] Zhu X, Han W, Chen W. GridGraph: Large-scale graph processing on a single
machine using 2-level hierarchical partitioning. Proceedings of the Usenix Annual
Technical Conference. 2015: 375-386
[5] Pearce R, Gokhale M, Amato N M. Multithreaded asynchronous graph traversal for
in-memory and semi-external memory. in: Proceedings of the 2010 ACM/IEEE
International Conference for High Performance Computing Networking Storage and
Analysis. Washington: IEEE, 2010: 1-11
[6] Dean J, Ghemawat S. Mapreduce: simplified data processing on large clusters.
Communications of the ACM, 2008, 51(1): 107-113
[7] Malewicz G, Austern M H, Bik A J, Dehnert J C, Horn I, Leiser N, Czajkowski G.
Pregel: a system for large-scale graph processing. in: Proceedings of the 2010 ACM
SIGMOD International Conference on Management of data. New York: ACM, 2010:
135-146
[8] Valiant L G. A bridging model for parallel computation. Communications of the
ACM, 1990, 33(8): 103-111
[9] Low Y, Bickson D, Gonzalez J, Guestrin C, Kyrola A, Hellerstein J M. Distributed
graphlab: a framework for machine learning and data mining in the cloud.
Proceedings of the VLDB Endowment, 2012, 5(8): 716-727
[10] Goldberg D E, Holland J H. Genetic algorithms and machine learning. Machine
learning, 1988, 3(2): 95-99
[11] Gonzalez J E, Low Y, Gu H, et al. Powergraph: Distributed graph-parallel
computation on natural graphs. in: Proceedings of the 10th USENIX conference on
Operation System Design and Implemention. New York: OSDI, 2012: 17-30
[12] McCune R R, Weninger T, Madey G. Thinking Like a Vertex: a Survey of
Vertex-Centric Frameworks for Large-Scale Distributed Graph Processing. ACM
Computing Surveys(CSUR), 2015, 48(2): 25
[13] Bertsekas D P. Parallel and distributed computation: numerical methods. ACM
SIGOPS Operating systems, 1990, 2(4): 315-339
[14] Low Y, Gonzalez J E, Kyrola A, et al. Graphlab: A new framework for parallel
machine learning. arXiv preprint, 2014, 3(1): 3-12
[15] http://www.graph500.org/
[16] http://snap.stanford.edu/data/soc-LiveJournal1.html
[17] http://snap.stanford.edu/data/com-LiveJournal.html
[18] http://snap.stanford.edu/data/com-Orkut.html