基于视觉注意力机制的机器人手眼协调抓取技术

华中科技大学硕士学位论文

摘要

近年来，深度学习等人工智能技术迎来蓬勃发展的浪潮，在图像分类、语音识别、自然语言处理等领域取得了丰硕的成果。然而在机械制造业等领域中扮演重要角的机器人，在抓取、搬运、码垛等任务中仍然需要人工示教或基于相机标定的精确编程，且操作对象往往限于单一物体，远远满足不了实际需求。因此，运用深度学习相关算法提升机器人应用的智能化水平势在必行。本文针对机器人抓取对不同形状物体缺乏适应性的问题，提出并验证一系列关于机器人手眼协调抓取技术的改进算法。

在机器人手眼协调抓取系统设计研究的基础上，针对作为系统核心模块的抓取成功率预测器模型的预测过程可解释性缺失问题，提出使用基于视觉注意力机制的循环神经网络模型，利用其显式表示图像特征位置的特性，采用将图像特征在二维坐标系下的位置和机器人在三维基坐标系下的运动指令联合输入的方法，帮助模型学习坐标系间的映射关系，赋予模型可解释性的同时，提升模型预测准确性，降低模型的计算复杂度。

针对系统采集的训练数据中正负样本数量不均衡影响模型训练效果的问题，分析Sigmoid分类器搭配的交叉熵损失函数中模型损失值分布的特点，针对性地提出使用线性支持向量机作为抓取成功率预测模型的分类器，以输出的函数间隔表示抓取成功率的方案，推导损失函数和更新公式，并在实验中验证算法对模型

训练结果的提升。

针对在连续的作业空间内，使得抓取成功率预测模型输出值最大的机器人运动指令难以求解问题，使用基于采样的交叉熵优化算法进行最优指令的搜索，给出在具体问题下，优化过程的公式推导和算法实施流程。针对抓取成功率预测模型输出支持向量机的函数间隔的情况，设计以当前位置和执行最优运动指令后的抓取成功率差值为运动决策依据的视觉伺服机制，实现机器人系统稳定、高效的手眼协调抓取过程。

基于以上研究工作，开发包括硬件平台和软件系统在内的机器人手眼协调抓取原型系统，完成数据采集、模型训练、评估和抓取实验，验证了算法的有效性。

关键字：机器人抓取视觉注意力视觉伺服支持向量机

华中科技大学硕士学位论文

Abstract

In recent years, artificial intelligence technologies, like deep learning, have been undergoing a wave of booming development and made fruitful achievements in fields like image classification, speech recognition, and natural language processing. However, robots, which play an important role in mecha

nical manufacturing and other fields, still rely on manual teaching or precise programming based on camera calibration in tasks like grasping, handling and palletizing, and the manipulating objects are often limited to objects with the same appearance, which is far from meeting the actual needs. Therefore, it is imperative to improve the intelligence level of robotic manipulation by introducing deep learning algorithms. This paper proposes and verifies a series of algorithm improvements for robotic hand-eye coordinative grasping technique.

On the basis of researches on robotic hand-eye coordinative grasping system, this paper proposes to use recurrent neural network based on visual attention mechanism as grasp success chance prediction model, which is the kernel module of the system. Taking advantages of the characteristic that the network expresses the positions of image features explicitly, we concatenate the position of image features in 2D image coordinate system and the robot moving vector in 3D base coordinate system, and then input them into the network. This helps the model to learn mapping relationship between two coordinate systems, endues the model with interpretability, and increases the accuracy with less calculation.

To solve the problem in model training caused by quantity imbalance between positive and negative samples, this paper analyses the characteristic of the value distribution in cross-entropy loss function,

and proposes to replace sigmoid classifier with linear support vector machine as classifier in the grasp success chance prediction model, regarding the function margin as success chance. We give the loss function and gradient updating formula, and validate the improvement on model performance.

In continuous operation space, it is difficult to figure out the robot moving vector that makes the grasp success chance prediction model outputs largest value. We employ the sampling based cross-entropy method (CEM) to help search the optimal moving vector, and give the formula derivation and implementation details for this task. Based on the function margin output by the grasp success prediction model, we design a visual servoing mechanism to decide whether to move, taking the difference between success chances of grasping at

华中科技大学硕士学位论文

positions before and after moving along the optimal vector as criteria. Experiments prove that the visual servoing mechanism is capable of producing a robust and efficient grasping procedure.

Based on the forementioned research, we developed the hardware and software of the robotic hand-eye coordinative grasping system, accomplished date collecting, model training, model evaluation and grasping experiment, validating the effectiveness of the algorithms.

Key words：Robotic Grasping Visual Attention Visual Servoing Support Vector Machine

华中科技大学硕士学位论文

摘要...................................................................................................................... I ABSTRACT ............................................................................................................. II 目录 ........................................................................................................................ I V 1绪论.. (1)

1.1课题来源 (1)

1.2课题背景和意义 (1)

1.3相关技术国内外研究现状 (2)

1.4本文主要研究内容 (9)

2机器人手眼协调抓取系统设计 (11)

2.1手眼协调抓取任务步骤分解 (11)

2.2硬件系统构成 (13)

2.3训练数据构成 (16)

手眼2.4数据采集方案 (17)

2.5本章小结 (20)

3基于视觉注意力的抓取成功率预测模型 (21)

3.1视觉注意力算法思想 (21)

3.2抓取成功率预测模型结构 (22)

3.3模型训练方法及原理 (27)

3.4支持向量机在模型中的应用 (28)

华中科技大学硕士学位论文

3.5本章小结 (32)

4基于交叉熵方法的最优指令搜索算法和视觉伺服机制 (33)

4.1基于交叉熵方法的最优运动指令搜索算法 (33)

4.2视觉伺服机制 (38)

4.3本章小结 (40)

5机器人手眼协调抓取系统实验验证 (41)

5.1抓取成功率预测模型实现 (41)

5.2最优指令搜索算法效果验证 (48)

5.3抓取系统构建及实验效果 (49)

5.4本章小结 (50)

6全文总结与展望 (51)

6.1全文总结 (51)

6.2研究展望 (52)

致谢 (53)

参考文献 (54)

基于视觉注意力机制的机器人手眼协调抓取技术

发布评论取消回复

最近发表

热门文章

标签列表