 2021-12-15 23:08:55  


摘 要



本文详细说明了系统的判别模型的不足,即只能评估一个完整的序列。对于一个部分生成的序列,很难去权衡当前的序列得分和未来的最终序列得分,因为会有Exposure Bias,即局部错误累积效应。与传统的旧思路不同,策略梯度提供了和强化学习DQN之类的创新方法,但是从完美性上考虑,reinforce算法在这方面不算完善。由于是蒙特卡洛算法,其使用收获的期望来计算状态价值。



This paper focuses on the method and optimization of automatic text generation. In essence, automatic text generation is also discrete sequence generation. The generator learns the relevance between words. The discriminator is used to provide gradient guidance for the generator. The relationship between the two is antagonistic network. For the generation of confrontation network, it has achieved great success in the field of image in recent years, but there are still great limitations in the generation of discrete sequence, the most important reason is that the discrete sequence can not update the gradient very well. In addition, the reward for the sequence should also be based on the sequence integrity, so the task can be solved by introducing the idea of reinforcement learning. For an incomplete sequence, we use Monte Carlo search with discriminator to transmit reinforcement learning reward signal.

As for generators, the popular LSTM model in the field of natural language processing can be selected, which has long memory for the past sequences and can better reflect the local association between words. As for the discriminator, CNN model can be selected, and the final output is obviously a probability value, that is, the probability of identifying the sequence as a real sequence, or the similarity between the sequence and the real sequence.

The discriminant model can only evaluate a complete sequence, but for a partially generated sequence, it is difficult to weigh the current sequence score and the future sequence score. Strategy gradient provides a new idea different from dqn, but Monte Carlo strategy gradient reinforcement algorithm is not perfect. Because of the Monte Carlo algorithm, we need complete sequence samples to do the iteration of the algorithm, and Monte Carlo uses the expectation of harvest to calculate the state value.

Key Words: Natural language generation , Generate adversial network , Policy Gradient , LSTM

目 录

摘 要 II

第一章 绪论 1

1.1 课题的背景及意义 1

1.2 国内外研究现状 2

1.3 本文的主要研究工作 2

1.4 论文组织结构 3

1.5 本章小结 3

第二章 相关理论与技术 4

2.1 生成对抗网络(Generative Adversial Networks) 4

2.2 长短期记忆网络(LSTM) 6

2.3 策略梯度(Policy Gradient) 9

2.4 本章小结 12

第三章 系统分析 13

3.1 模型的流程概述 13

3.2 Token生成过程分析 13

第四章 系统实现 18

4.1 项目代码//按模块 18

4.2 项目结果展示 32

4.3 本章小结 33

第五章 总结与展望 34

5.1 总结 34

5.2 展望 34

参考文献 36

  1. 绪论


1.1 课题的背景及意义




