Topics tagged with m5stack-basic

myArm M & C: Advanced Solutions for Diverse Robotic Applications

ElephantRobotics — Invalid Date

Special thanks to M5stack basic for a very suitable device embedded in a robotic arm!

Introduction

In recent years, many projects similar to Stanford University’s Alopha robot project have emerged, primarily focusing on learning by mimicking human motion trajectories to achieve human-like artificial intelligence. The Alopha robot, through advanced algorithms and sensor technology, can precisely replicate human actions, collecting data and learning from it to enhance its performance in various tasks. This imitation learning approach not only enables robots to excel in industrial automation but also shows significant potential in educational and research fields. Against this backdrop, Elephant Robotics has introduced the myArm M & C series robotic arms, further advancing humanoid robot technology.

Innovative Solutions for the Future

With the rapid advancement of robotics technology, Elephant Robotics has introduced the groundbreaking myArm 650 M & C series robotic arms, offering unprecedented flexibility and precision for educational, research, and industrial applications. These high-performance robotic arms are not only powerful but also flexible and user-friendly, making them suitable for a variety of applications and providing ideal solutions. Let’s delve into the myArm 650 M & C series products!

Products

myArm C650

The myArm C650 is a versatile 6-degree-of-freedom robotic motion information collection device, equipped with a fingertip controller and two intelligent buttons. The “C” stands for Controller. It can output end coordinates or joint angles at 50 Hz. Designed for education, research, and industrial data collection, its flexibility and highly modular design make it suitable for various complex operations and tasks. It can be extended to a 7-degree-of-freedom data collector.

myArm M750

The myArm M750 is a general-purpose intelligent six-degree-of-freedom robotic arm with a reach of 750 mm and a rated load capacity of 500 g, with a maximum capacity of up to 1 kg. It comes with a 1-degree-of-freedom parallel end effector (gripper). The “M” stands for Master controller. It is suitable for applications requiring complex motion control, precise positioning, and high programmability.

Product Features

● High-Speed Data Output: Both the myArm C650 and M750 can output end coordinates or joint angles at a frequency of 50 Hz, providing real-time motion data to ensure smooth operation.

● Flexibility and Modular Design: The robotic arms feature a highly modular design, allowing users to customize and expand according to their needs, making them suitable for a wide range of applications.

● Robust Load Capacity: The myArm M750 has a reach of 750 mm and a load capacity of 500 g, with a maximum capacity of up to 1 kg, making it suitable for complex industrial tasks.

● Programming and Control: Supporting Python and ROS, the arms offer powerful programming capabilities and flexible control options, making them ideal for education and industrial automation fields.

● Strong Compatibility: The arms support various sensors and end effectors, including cameras and IMUs, further enhancing data collection and processing capabilities.

AI Learning and Training Project

Recently, many projects similar to Stanford University’s Alopha robot project have emerged. These projects mainly involve learning by mimicking human motion trajectories. By recording trajectory data and conducting extensive training, the Alopha robot can autonomously perform certain household chores, such as cooking, washing dishes, and folding clothes.
https://youtu.be/HaaZ8ss-HP4
The key elements of the entire project are the full-body remote control system, human demonstration learning, and collaborative training of datasets. To this end, we have also launched a corresponding humanoid composite suite, equipped with two sets of myArm M & C robotic arms and the Elephant Robotics mobile platform for mobility.

Now, let me demonstrate how to operate using a set of myArm M & C.
https://youtu.be/-sLegg0wSUQ
The myArm M650 is essentially a 6+1-axis robotic arm: 6 degrees of freedom for the arm itself and 1 for the additional end gripper controller. To ensure better data synchronization, we have also designed the myArm C to be of the same type but different sizes.
https://youtu.be/AlKLbohyA4E
You can see that the synchronous remote control operation is quite smooth. This is mainly due to the myArm's ability to output terminal coordinates and joint angles at a rate of 50 Hz, providing users with real-time motion data. Let's try grasping some parts to experience its effectiveness.
https://youtu.be/DMZSXuJ_lA0
To achieve functionality similar to the Alopha robot, we have completed two parts: the remote control system and human demonstration learning. The most crucial part remaining is the collaborative training of datasets.

Collaborative Training of Datasets:
Collaborative training (co-training) is a machine learning technique particularly suitable for multi-task learning and ensemble learning. The core idea is to use multiple different datasets or data perspectives to jointly train a model, thereby improving the model's performance on specific tasks.
Here is a detailed introduction about the dataset:
https://www.inceptivemind.com/meet-mobile-aloha-your-housekeeping-robot/36176/
The entire Aloha project is open source, so we can find how their team conducted the training online. The specific steps are as follows:

1.Initial Model Training: First, train the initial model using an existing static dataset. This model can perform dual-hand static tasks.

2.Mobile Operation Data Collection: Then, collect demonstration data of the robot performing tasks in a mobile environment using the full-body remote control system.

3.Joint Training: Combine static and dynamic datasets for training. By integrating data from both scenarios, the model can learn a broader range of task
characteristics, improving its generalization ability and success rate on new tasks.

The data involved includes tasks such as grasping and placing objects, tidying up the table, moving speed, opening and closing cabinet doors, camera images, LiDAR data, and IMU data.
To meet the needs of most projects, the myArm M & C series offers significant performance advantages, including high-frequency data transmission at a rate of 50 frames per second. It also has specialized software to collect relevant data, such as the height of objects, the height of the table, and the speed of movement.
Additionally, the myArm is equipped with multiple sensors, such as cameras and IMUs, with the ultimate goal of acquiring precise datasets for training.

Application Scenarios

The project mentioned above is just one example. Our robotic arm can do much more and can be used in conjunction with quadruped robots for tasks like intelligent garbage collection.

Standalone Robotic Arm Applications:

Medical and Rehabilitation
● Surgery Simulation: In medical training, the myArm M & C can simulate surgical procedures, helping surgeons practice and train before actual operations.
● Rehabilitation Assistance: It can assist in the rehabilitation of patients by facilitating hand and arm recovery exercises, with programmable different rehabilitation movement trajectories.

Education and Training
● Robotics Programming Courses: In schools and training institutions, the myArm M & C can be used as a teaching tool, helping students learn robotic programming and control. Through programming tasks, students can master how to control the robot's movements and perform complex operations.
● Laboratory Experiments: Used in university and research institution laboratories for the study of robotic arm kinematics and dynamics. Students can conduct precise motion control experiments and analyze the robot's performance.

Combined Applications:

Integration with Mobile Robots
● Automated Warehouse System: Combine the myArm M & C with mobile robots (such as Elephant Robotics mobile platforms) to create an automated warehouse system. The mobile robots handle the transportation of items, while the myArm handles the picking and placing, achieving efficient item management and sorting.
● Logistics and Transportation: In logistics centers, the combination of the myArm and mobile platforms can automate the sorting and transportation of packages, improving logistics efficiency.
● Sentinel Patrol: Paired with quadruped robots, it can perform patrol inspections in parks and clean up trash when detected.

Summary

The myArm M & C series robotic arms, with their high flexibility, precision, and modular design, demonstrate broad application potential in education, research, industry, and home environments. When used independently, these robotic arms excel in tasks such as precision assembly, quality inspection, educational training, and medical rehabilitation, showcasing their exceptional performance in complex operations. When combined with other robots, such as mobile robots or other robotic arms, the myArm M & C can build efficient automated systems, enabling more complex task allocation and execution. Additionally, these robotic arms can be applied in artistic creation and smart home setups, further expanding their use cases and creative possibilities.

Transforming Camera Angles with Robotic Arm Technology

ElephantRobotics — Invalid Date

Introduction

Have you ever encountered a situation like this: when you're ready to shoot a video with your smartphone or camera, you've already envisioned the perfect shot in your mind, but due to practical limitations, you can't capture the angle you desire? This situation can be frustrating. For example, if you want to shoot from the perspective of an ant on the ground, the lens needs to be on the same level as the ant, which is not only difficult in practice but often hard to achieve.

Although there are many stabilizing devices on the market, such as gimbal stabilizers, to assist in achieving steady and multi-angle shooting effects, in this article, I will explore a unique solution: mounting the smartphone on the end of a robotic arm to shoot, aiming to capture those special viewpoints that traditional methods struggle to grasp. This attempt is not only designed to overcome the physical limitations encountered during the shooting process but also hopes to innovate technologically to expand our imagination and practice of photography angles.
https://www.youtube.com/watch?v=xXq9_Nu9djs
Equipment
myCobot 320 M5stack
The myCobot 320, a collaborative robot arm with six degrees of freedom, has become a highlight in its field thanks to its unique design and high-precision servo motors. This robotic arm has a maximum working radius of 350mm and a maximum end load capacity of 1000g, making it suitable for a wide range of application scenarios. The myCobot 320 not only supports flexible visual development applications but also provides in-depth analysis of mechanical motion principles, offering users 12 standard 24V industrial IO interfaces to meet various development needs.

Its openness is extremely high, compatible with most mainstream operating systems and programming languages, including Python and ROS, offering developers a great deal of flexibility and freedom. Whether in education, research and development, or industrial applications, the myCobot 320 provides strong support, making innovation and application development more convenient and efficient.

myCobot Pro Phone Holder

It can be mounted on the end of the myCobot 320 and myCobot Pro 630 robotic arms, capable of securely holding a smartphone in place.

Initial Attempt

Installing the Phone Holder

https://youtu.be/QvcSwkRnooY

Shooting Video

Programming joint control for the robotic arm, let's see how it performs.
https://youtu.be/b-FiXHTFsrw
Although the robotic arm's movement trajectory shown in the video appears simple, in reality, adjusting these trajectory points still requires a considerable amount of time, and the results may not always be ideal. Therefore, I am considering whether there is a better solution, such as setting several motion modes and pre-planning the robotic arm's shooting path. This would not only make effective use of the robotic arm to assist in shooting but also provide a more efficient deployment method.

Programming Challenge

Development Needs Analysis
The equipment to be used is as follows:

Requirements Optimization:
The goal is to design a series of innovative video shooting methods for static objects. The initial plan adopts three approaches:

Using the smartphone fixed at the end of the robotic arm to achieve 360° panoramic video shooting of the object.
Creating a visual effect that smoothly advances from a distance to close to the object, simulating a "zoom-in" effect.
Achieving rapid rotation and movement at the end of the robotic arm to capture dynamic scenes.

To precisely control the shooting process, the plan is to utilize OpenCV machine vision algorithms and the AVFoundation iOS framework, controlling the exact movements of the robotic arm with Python scripts. We will identify the dimensions of the object through the smartphone camera, thereby calculating the ideal distance between the end of the robotic arm and the object. Based on this distance, we will design a corresponding robotic arm movement algorithm to ensure the best visual effects during the shooting process.

YOLOv5 Vision Algorithm

To save time, we will not train the machine vision algorithm ourselves to recognize specific objects. Instead, we will directly use the YOLOv5 library, which has been trained and optimized by other developers, to achieve accurate detection of target objects.

import cv2
import torch
from pathlib import Path
from models.experimental import attempt_load
from utils.general import non_max_suppression, scale_coords
from utils.torch_utils import select_device, time_synchronized

def detect_apples(img_path):
    device = select_device('')
    weights = 'yolov5s.pt'
    model = attempt_load(weights, map_location=device)
    img0 = cv2.imread(img_path)  # BGR
    img = img0[:, :, ::-1]  # RGB
    img = torch.from_numpy(img).to(device)
    img = img.float()  # uint8 to fp16/32
    img /= 255.0  # 0 - 255 to 0.0 - 1.0
    if img.ndimension() == 3:
        img = img.unsqueeze(0)

    # Inference
    t1 = time_synchronized()
    pred = model(img)[0]
    # Apply NMS
    pred = non_max_suppression(pred, 0.4, 0.5, classes=None, agnostic=False)
    t2 = time_synchronized()
    print(f'Inference time: {(t2 - t1):.3f}s')

    # Process detections
    for i, det in enumerate(pred):  # detections per image
        gn = torch.tensor(img0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
        if len(det):
            det[:, :4] = scale_coords(img.shape[2:], det[:, :4], img0.shape).round()
            for *xyxy, conf, cls in reversed(det):
                label = f'{model.names[int(cls)]} {conf:.2f}'
                plot_one_box(xyxy, img0, label=label, color=(255, 0, 0))
    return img0

def plot_one_box(xyxy, img, color=None, label=None, line_thickness=None):
    # Plots one bounding box on image img
    tl = (
        line_thickness or round(0.002 * max(img.shape[0:2])) + 1
    )  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(xyxy[0]), int(xyxy[1])), (int(xyxy[2]), int(xyxy[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(
            img, c1, c2, color, -1, cv2.LINE_AA
        )  # filled
        cv2.putText(
            img,
            label,
            (c1[0], c1[1] - 2),
            0,
            tl / 3,
            [225, 255, 255],
            thickness=tf,
            lineType=cv2.LINE_AA,
        )
    return img

if __name__ == '__main__':
    img_path = 'apple.jpg'  
    result_img = detect_apples(img_path)
    cv2.imshow('Result', result_img)
    cv2.waitKey(0)
    cv2.destroyAllWindows()

Robotic Arm Motion Control Algorithm
Methods for controlling the robotic arm:

This method controls the movement of the robotic arm's joints.

mc.send_angles([angle_list],speed)

# This method uses coordinates to control the robotic arm's movement in space.
mc.send_coords([coords_list],speed,mode)
example:
mc.send_angles([0,0,0,0,0,0],100)
time.sleep(2)
mc.send_coords([(-3.6), 30.32, (-45.79), (-46.84), 97.38, 0.35],100,1)

After obtaining the dimensions of the object, define a reasonable position for the end of the robotic arm relative to the object.

def calculate_angles_for_distance(distance):
    #Calculate the joint angles of the robotic arm based on the ideal distance.
    # The calculations here need to be adjusted based on actual conditions and physical parameters
    return [0, -10, distance * 0.1, 0, 30, 0]  

def calculate_adjusted_angles(action_angles):
    # Calculate new angles based on the coordinate adjustments that may be needed after a specific action
    # This is just an example, and the specific logic should be adjusted as needed.
    return [angle * 1.1 for angle in action_angles]

Then select the motion control corresponding to the chosen mode.

#For 360° Panoramic Shooting:
# Define the specific shooting mode
def shoot_mode_360(ideal_distance):
    print("excuse 360")
    # First, move to the ideal shooting position.
    move_to_ideal_position(ideal_distance)
    # Handle the ideal distance.
    ideal_ratio =  ratio
    # Perform the specific actions for 360° panoramic shooting.
    mc.send_angles([0, 0, 0, 0, 0, 0], speed=15)
    time.sleep(1)
    mc.send_coords([angle * ratio for angle in angles_list]
，15)
    time.sleep(1)
    mc.send_coords([angle * ratio for angle in angles_list]
，15)
    time.sleep(1)
    mc.send_coords([angle * ratio for angle in angles_list]
，15)

Invoking the Smartphone Camera

In the development process, as I attempted to invoke the smartphone camera interface to realize the automatic shooting function, I encountered a series of challenges. As my first deep exploration into the AVFoundation iOS framework, with the goal of activating and controlling the smartphone's camera, I found that I had not yet fully succeeded in implementing this function. The current difficulties mainly focus on how to accurately call the camera for video shooting and how to compensate for possible image stretching during the shooting process through software adjustments, which requires precise control over the movement of the robotic arm. These issues mark the direction of my subsequent research and require me to continue delving into the use of the AVFoundation framework, especially its specific methods for controlling the camera, and explore how to integrate these controls into the adjustment of the robotic arm's movement to ensure the video quality meets expectations.

Summary

As this project record comes to an end, I realize that although there are many areas for improvement, attempting to make two independent devices work together under different frameworks has been a valuable experience for me. Indeed, the entire project has not yet reached the ideal state in my mind. However, I believe the direction explored by this project has great potential, considering that there are already professional photography robotic arms on the market capable of producing impressive results, which reinforces my belief in the potential value of the project.
The application of robotic arms has already permeated our daily lives, playing an increasingly important role whether in industrial production, daily services, or in the field of artistic creation. With the continuous advancement and popularization of artificial intelligence technology, the combination of AI and robotics will undoubtedly become an important trend in the future development of technology. I have great expectations for the future development of robotics technology and believe that they will display amazing capabilities and creativity in more fields in the future.
If you are interested in my project, or have any ideas and suggestions, I warmly welcome your communication. Your feedback will provide valuable help for me to continue improving and perfecting this project.

Building AIRobot Leveraging ChatGPT for Smarter Robots myCobot 280 M5Stack-Basic

ElephantRobotics — Invalid Date

As the control core of the embedded robotic arm, M5stack-basic is very useful and helps a lot！

EchoActions: Connecting thougths and Robotic Arm M5Stack

ElephantRobotics — Invalid Date

This article is edited and reprinted with authorization from the author, kimizuka.
Original article link: https://blog.kimizuka.org/entry/2024/02/01/141808

Introduction

Translate AirPods through the iPhone app to Express, then through python-shell to pymycobot, and finally synchronize with myCobot to synchronize the rotation angle of AirPods with the posture of myCobot 🤖.

Although it's unclear if there's a demand, I will provide a rough outline of the source code.

Project Structure

This project primarily employs technologies such as headphone-motion, web servers, the express framework of node.js, python-shell, and pymycobot. Here is a brief introduction to these technologies:
1.Headphone-Motion: Headphone Motion is a project that utilizes specific technology to track and utilize user head movements. Although the specifics may depend on the implementation method and platform used (such as iOS), it mainly demonstrates how to capture head movement data by connecting to the device's headphones (especially those smart headphones with built-in sensors). A more intuitive effect of this can be seen in the Headphone Motion Unity Plugin made by GitHub user anastasiadeana. It is capable of real-time tracking of the user's head movements, including tilting and rotating, which is a core technology of this project.

2.Web Server: There are many types of servers, which provide data, services, or applications to other applications or devices. Servers perform tasks such as processing data requests, hosting websites, storing information, running business applications, and so on. In this project, the web server mainly handles receiving head movement data from the iOS application and passing these data to the script controlling the mycobot robot arm.

3.Express-Node.js: Express is a fast, open, minimalistic web application framework for Node.js. It is designed for building web applications and APIs. It allows developers to set up middleware to respond to HTTP requests in a very fast and simple way, making the development of web applications quicker and easier.
https://github.com/expressjs/express

4.Pymycobot-Python: pymycobot is a Python library specifically designed for controlling the myCobot robot arm. This library provides a series of functions and interfaces that allow developers to communicate and control the myCobot robot arm directly through Python scripts. Using pymycobot, developers can write code to control the robot arm's movements, adjust its posture, execute preset action sequences, etc., making it widely applicable in education, research, automation, and more.

iOS App

This application is based on an app I created previously, utilizing react-native-headphone-motion to access the sensors in AirPods through a React Native iOS application.

It's an interesting project where, when you are detected bending your head down for a long time while wearing AirPods, a lamenting sound will remind you.
https://twitter.com/i/status/1745588902981931443
https://blog.kimizuka.org/entry/2024/01/16/105208

One point to note is the change I made: I added a process to send POST requests to the web server in onDeviceMotionUpdates. Additionally, to avoid overburdening the server with POST requests on every update, I set it up to send a request at a minimum interval of 500ms.

App.tsx

useEffect(() => {
  const delay = 500;
  const handleDeviceMotionUpdates = onDeviceMotionUpdates((data) => {
     // If the time since the last request is less than 500ms, return
    if (Date.now() - lastUpdateTimeRef.current < delay) {
      return;
    }

    // Post sensor values to the Web server
    axios.post(String(process.env.API_URL), { 
      pitch: data.attitude.pitchDeg || 0,
      roll: data.attitude.rollDeg || 0,
      yaw: data.attitude.yawDeg || 0
    }).then(() => {
      lastUpdateTimeRef.current = Date.now();
    }).catch((err) => {
      console.error(err);
      lastUpdateTimeRef.current = Date.now();
    });

    setPitch(data.attitude.pitch);
    setPitchDeg(data.attitude.pitchDeg);
    setRoll(data.attitude.roll);
    setRollDeg(data.attitude.rollDeg);
    setYaw(data.attitude.yaw);
    setYawDeg(data.attitude.yawDeg);
    setGravityX(data.gravity.x);
    setGravityY(data.gravity.y);
    setGravityZ(data.gravity.z);
    setRotationRateX(data.rotationRate.x);
    setRotationRateY(data.rotationRate.y);
    setRotationRateZ(data.rotationRate.z);
    setUserAccelerationX(data.userAcceleration.x);
    setUserAccelerationY(data.userAcceleration.y);
    setUserAccelerationZ(data.userAcceleration.z);
  });

  return () => {
    handleDeviceMotionUpdates.remove();
  };
}, []);

In the POST request, I used axios, which can send asynchronous HTTP requests to REST endpoints and handle responses.
Therefore, it's also necessary to add module imports.

import axios from 'axios';

Code

import axios from 'axios'; // Added to simplify the POST request
import React, {
  useEffect,
  useRef, // Added to maintain a 500ms interval
  useState,
} from 'react';
import {Button, SafeAreaView, StyleSheet, Text} from 'react-native';
import {
  requestPermission,
  onDeviceMotionUpdates,
  startListenDeviceMotionUpdates,
  stopDeviceMotionUpdates,
} from 'react-native-headphone-motion';

const API_URL = 'http://localhost:3000'; // Fill in the URL to POST

export default function App() {
  const lastUpdateTimeRef = useRef(0); // Added to keep track of the last update time
  const [pitch, setPitch] = useState(0);
  const [pitchDeg, setPitchDeg] = useState(0);
  const [roll, setRoll] = useState(0);
  const [rollDeg, setRollDeg] = useState(0);
  const [yaw, setYaw] = useState(0);
  const [yawDeg, setYawDeg] = useState(0);
  const [gravityX, setGravityX] = useState(0);
  const [gravityY, setGravityY] = useState(0);
  const [gravityZ, setGravityZ] = useState(0);
  const [rotationRateX, setRotationRateX] = useState(0);
  const [rotationRateY, setRotationRateY] = useState(0);
  const [rotationRateZ, setRotationRateZ] = useState(0);
  const [userAccelerationX, setUserAccelerationX] = useState(0);
  const [userAccelerationY, setUserAccelerationY] = useState(0);
  const [userAccelerationZ, setUserAccelerationZ] = useState(0);

  useEffect(() => {
    const delay = 500; // Store the update interval in a variable
    const handleDeviceMotionUpdates = onDeviceMotionUpdates(data => {
      if (Date.now() - lastUpdateTimeRef.current < delay) {
        // Return if the update interval is not met
        return;
      }

      // Post sensor values to the Web server
      // Update lastUpdateTimeRef regardless of success or failure
      // For some reason, await was not used
      axios
        .post(String(API_URL), {
          pitch: data.attitude.pitchDeg || 0,
          roll: data.attitude.rollDeg || 0,
          yaw: data.attitude.yawDeg || 0,
        })
        .then(() => {
          lastUpdateTimeRef.current = Date.now();
        })
        .catch(err => {
          console.error(err);
          lastUpdateTimeRef.current = Date.now();
        });

      setPitch(data.attitude.pitch);
      setPitchDeg(data.attitude.pitchDeg);
      setRoll(data.attitude.roll);
      setRollDeg(data.attitude.rollDeg);
      setYaw(data.attitude.yaw);
      setYawDeg(data.attitude.yawDeg);
      setGravityX(data.gravity.x);
      setGravityY(data.gravity.y);
      setGravityZ(data.gravity.z);
      setRotationRateX(data.rotationRate.x);
      setRotationRateY(data.rotationRate.y);
      setRotationRateZ(data.rotationRate.z);
      setUserAccelerationX(data.userAcceleration.x);
      setUserAccelerationY(data.userAcceleration.y);
      setUserAccelerationZ(data.userAcceleration.z);
    });

    return () => {
      handleDeviceMotionUpdates.remove();
    };
  }, []);

  return (

Actually, it would be more convenient to specify the API_URL directly in the application, but I implemented it this way considering speed.

Web Server

I set up a local server on my Mac. To operate myCobot, I went through several setup steps, mainly to adapt to Mac computers. These steps included installing drivers for the robot arm, updating the firmware of mycobot 280, and other operations, all detailed in this article.
https://blog.kimizuka.org/entry/2021/08/10/131812
I think creating a web server with Python would be smoother, but based on my skill set, using Node.js is the fastest method. So, I plan to quickly set up the server using Express. Communication with myCobot is conducted through Python, so for this part, I decided to use python-shell to implement it.

require('dotenv').config(); // Used to pass the port of myCobot from outside
const express = require('express');
const { PythonShell } = require('python-shell'); // Used for communicating with myCobot
const app = express();
const http = require('http').Server(app);

const duration = 100; // If the delay (500ms) set on the application side is too small, it will cause problems

app.use(express.json());
app.post('/', (req, res) => {
  try {
    const angles = [0, 0, 0, 0, 0, 0];

    // For myCobot's joint information, refer to page 13 of https://www.elephantrobotics.com/wp-content/uploads/2021/03/myCobot-User-Mannul-EN-V20210318.pdf
    // The array stores the 6 joints in order from the bottom up
    // Each joint has a definite range of motion; ensure not to exceed this range    
    angles[0] = Math.max(-90, Math.min(req.body.yaw || 0, 90)); // J1
    angles[3] = Math.max(-90, Math.min(req.body.pitch || 0, 90)); // J4
    angles[5] = Math.max(-175, Math.min(req.body.roll || 0, 175)); // J6

    // myCobot connected via USB receives instructions from Python
    PythonShell.runString(
      `from pymycobot.mycobot import MyCobot; MyCobot('${ process.env.MY_COBOT_PORT }').send_angles([${ angles }], ${ duration })`,
      null,
      (err) => err && console.error(err)
    );
  } catch (err) {
    console.error(err);
  }
  res.send(200);
});

try {
  const angles = [0, 0, 0, 0, 0, 0];

  // Resets posture at startup
  PythonShell.runString(
    `from pymycobot.mycobot import MyCobot; MyCobot('${ process.env.MY_COBOT_PORT }').send_angles([${ angles }], ${ duration })`,
    null,
    (err) => err && console.error(err)
  );
} catch(err) {
  console.error(err);
}

http.listen(3000, '0.0.0.0');

Because it's necessary to execute pymycobot through PythonShell, the pymycobot directory must be placed at the same level as app.js.
https://github.com/elephantrobotics/pymycobot
Once everything is prepared and the PC is connected to myCobot,
you can start the web server and pass the pitch, roll, and yaw values received through POST requests to myCobot.
Although this time the sensor values of AirPods are sent from an iPhone application via POST, the source of the POST could be from anywhere. Therefore, I think setting up such a server could be potentially useful in the future.

source code:
https://github.com/kimizuka/mycobot-express/tree/example/airpods

Summary

This project showcases the integration of headphone motion detection with robotic control, illustrating the potential of combining human movement data with robotics. By capturing head motion through smart headphones and translating it into commands for a robotic arm, it enhances human-robot interaction and introduces innovative applications in automation and assistive technologies. This approach opens up new avenues for intuitive control mechanisms in robotics, making technology more accessible and interactive.
Finally, thanks again to Kimizuka for sharing this case with us.Feel free to explore further or share your insights with us.

Hand Gestures as the Remote: M5stack-Basic integrated robotic arm

ElephantRobotics — Invalid Date

Introduction

I am a freelancer specializing in machine learning and robotics technology. My passion began during a course in artificial intelligence in college, which inspired me to explore new methods of human-machine interaction. In particular, for the operation of robotic arms, I have always wanted to simplify their complexity to make them more intuitive and easier to use.
The inspiration for this project stems from my love for innovative technology and the pursuit of improving the ways humans interact with machines. My goal is to develop a gesture-based robotic arm control system that allows non-professionals to operate it with ease. For this purpose, I chose Google's MediaPipe library for gesture recognition and used mycobot 320 m5 as the experimental platform.

Technical Overview

Google MediaPipe

MediaPipe is an open-source cross-platform framework developed by Google, specifically designed for building various perception pipelines. This framework offers a wealth of tools and pre-built modules, enabling developers to easily build and deploy complex machine learning models and algorithms, especially in the field of image and video analysis.

A notable feature of MediaPipe is its support for real-time gesture and facial recognition. It can efficiently process video streams and identify and track human gestures and facial features in real-time. This capability makes it incredibly useful in interactive applications, augmented reality (AR), virtual reality (VR), and robotics.

You can try the gesture recognition online feature without needing to install anything.
MediaPipe Studio
Its easy-to-use API and comprehensive documentation make it easier to integrate this framework, making it very suitable for use in the fields of machine learning and computer vision.

pymycobot

pymycobot is a Python API for serial communication and control of the mycobot robotic arm. This library is designed to facilitate developers in controlling the mycobot robotic arm using the Python language. It offers a series of functions and commands that allow users to control the movements and behavior of the robotic arm through programming. For example, users can use the library to get the angles of the robotic arm, send angle commands to control the movement of the arm, or get and send the coordinates of the robotic arm.
The only standard for using this library is that it must be used with the mycobot series of robotic arms, which are specifically adapted for the mycobot.

Product Introduction

myCobot 320 M5stack

The myCobot 320 M5 is a six-axis collaborative robotic arm developed by Elephant Robotics for users. It has a working radius of 350mm and a maximum load capacity of 1000g. The robotic arm is suitable for an open ROS simulation development environment and includes forward and inverse kinematics algorithms. It supports multiple programming languages, including Python, C++, Arduino, C#, and JavaScript, and is compatible with Android, Windows, Mac OSX, and Linux platforms. The versatility of the myCobot 320 M5 makes it suitable for a variety of development and integration applications.

2D Camera

A 2D camera that can be mounted on the end of the mycobot320, communicating via a USB data cable. It can present the view seen from the end of the robotic arm.

Development Process

Project Architecture

I have divided this project primarily into three functional modules:
Gesture Recognition: This module is mainly used for the recognition of gestures, capable of returning information about what the gesture is, such as a thumbs-up, etc.
Robotic Arm Control: This main function is used for setting the motion control of the robotic arm, including coordinate control, angle control, and so on.
Program Logic: This is used to handle the logic of the program's operation, setting confirmation times for gestures, resetting recognition times, etc. These will be detailed further in subsequent sections.

Compilation Environment

Operating System: Windows 11
Programming Language: Python 3.9+
Libraries:opencv,pymycobot,mediapipe,time

Gesture Recognition

To perform gesture recognition, we first need to obtain a camera image. Here, we use the OpenCV library to access the camera feed.


import cv2

# Get camera stream, default camera - 0, external cameras in order - 1, 2, 3
cap = cv2.VideoCapture(1)

# Continuously acquire camera footage
while cap.isOpened():
    #Get the current image screen
    ret, frame = cap.read()
    # Convert BGR image to RGB
    rgb_frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)    
    # display screen on computer
    cv2.imshow('gesture control',frame)
    # Press the 'q' key to exit to avoid an infinite loop
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

With this, the image capture from the camera is successful. Next, we use MediaPipe for gesture recognition.

import mediapipe as mp

# Initialize the MediaPipe Hands module
mp_hands = mp.solutions.hands
hands = mp_hands.Hands()
mp_draw = mp.solutions.drawing_utils


# Process the image and detect hands
result = hands.process(rgb_frame)

if result.multi_hand_landmarks:
    for hand_landmarks in result.multi_hand_landmarks:
        mp_draw.draw_landmarks(frame, hand_landmarks, mp_hands.HAND_CONNECTIONS)

The output after recognizing a gesture is precise in identifying each joint on the hand and names each joint point. MediaPipe Hands provides 21 key points (landmarks) for the hand, collectively depicting the structure of the hand, including the wrist and the joints of each finger. Taking the thumb as an example, there are four joints, which from bottom to top are CMC, MCP, IP, TIP.

cmc: Carpometacarpal Joint
mcp:Metacarpophalangeal Joint
ip:Interphalangeal Joint
tip:tip

Having these landmarks alone is not enough; we need to set a method to recognize specific gestures. For example, if we want to recognize a thumbs-up gesture, we analyze that during a thumbs-up, the tip of the thumb is at the highest point above the entire palm. This makes it much easier. As long as we determine that the tip of the thumb is higher than the tips of all other fingers in the image, then the gesture is identified as a thumbs-up. (Other methods of analysis can also be used.)
Generally, we can obtain three attributes for each joint: X, Y, Z, representing the position of that joint in the image.

# Get the attributes of the thumb tip
thump_tip = hand_landmarks.landmark[mp.hands.HandLandmark.THUMB_TIP]

# Get the height of the thumb tip
thump_tip.y

# Determine thumbs up gesture
def is_thump_up(hand_landmarks):
    thumb_tip = hand_landmarks.landmark[mp_hands.HandLandmark.THUMB_TIP]
    index_tip = hand_landmarks.landmark[mp_hands.HandLandmark.INDEX_FINGER_TIP]
    # Determine which joint is higher.
    if thumb_tip.y < index_tip.y:
        return True
    
    return False

If you want other gestures, you can also set a special identification method based on the characteristics of the hand shape. At this point, gesture recognition is completed.

Robotic Arm Motion Control

Initially, my idea was that when the camera recognizes a gesture, it would send a control command to the robotic arm. Let's start with a simple action, setting the robotic arm to perform a nodding motion.
The pymycobot library offers many functions that are very convenient for controlling the robotic arm.

from pymycobot.mycobot import Mycobot
import time
# connect robot arm
mc = Mycobot(port,baud)

#Control the movement of the robotic arm using angles
mc.send_angles([angles_list],speed)

#Control the movement of the robotic arm using coordinates
mc.send_coords([coords_list],speed,mode)

# Nodding action
def ThumpUpAction(self):
        self.mc.send_angles([0.96, 86.22, -98.26, 10.54, 86.92, -2.37], 60)
        time.sleep(1.5)
        for count in range(3):
            self.mc.send_angles([0.79, 2.46, (-8.17), 4.3, 88.94, 0.26], 70)
            time.sleep(1)
            self.mc.send_angles([(-3.6), 30.32, (-45.79), (-46.84), 97.38, 0.35], 70)
            time.sleep(1)
        self.mc.send_angles([0.79, 2.46, (-8.17), 4.3, 88.94, 0.26], 70)
        time.sleep(1)
        self.mc.send_angles([0.96, 86.22, -98.26, 10.54, 86.92, -2.37], 60)

To enhance the readability and modifiability of the overall code, it's beneficial to create a robotic arm class for easy calling and modification.

class RobotArmController:

        def __init__(self,port):
        self.mc = MyCobot(port, 115200)
        self.init_pose = [0.96, 86.22, -98.26, 10.54, 86.92, -2.37]
        self.coords = [-40, -92.5, 392.7, -92.19, -1.91, -94.14]
        self.speed = 60
        self.mode = 0
        
        def ThumpUpAction(self):
            ...
            
        def OtherAction(self):
            ...

Program Logic Processing

During debugging, some issues arose. When recognizing gestures, continuous recognition meant that if a gesture was recognized 10 times in 1 second, 10 commands would be sent to the robotic arm. This was not what I initially envisioned.
Therefore, logical adjustments were needed. Here's how I addressed it:

# Set a 2-second timeframe to confirm the gesture. Only when a thumbs-up gesture is maintained for 2 seconds, the command to control the robotic arm is issued, using a control variable approach.

#init
#Variable to detect whether gesture exists
gesture_detected = False
#Variable that determines the timing after the gesture appears
gesture_start_time = None
# Set the variable 2s after the gesture appears
gesture_confirmation_time = 2

# When a specific gesture appears, gesture_start_time begins to count. During this period, continuous checks are made. If 2 seconds have passed, the gesture is confirmed, and then the corresponding robotic arm movement for that gesture is executed.

current_time = time.time()
if current_gesture:
    if not gesture_detected:
        gesture_detected = True
        gesture_start_time = current_time
    elif current_time - gesture_start_time > gesture_confirmation_time and not action_triggered:
        if current_gesture == "thumb_up":
            robotic arm action()

However, this is still not sufficient, as the hand maintaining the gesture for over 2 seconds would continue sending commands to the robotic arm. Here, we need to set a cooldown period to allow sufficient time for the robotic arm to complete its movement.

action_triggered = False
cooldown_start_time = None
cooldown_period = 2

# process gesture
                current_time = time.time()
                if current_gesture:
                    if not gesture_detected:
                        gesture_detected = True
                        gesture_start_time = current_time
                    elif current_time - gesture_start_time > gesture_confirmation_time and not action_triggered:
                        #Perform corresponding actions based on gestures
                        if current_gesture == "thumb_up":
                            print('good good')
                            mc.thum_up()
                        elif current_gesture == "palm_open":
                            print('forward')
                            mc.increment_x_and_send()
                        # You can add more gestures and corresponding action judgments
                        action_triggered = True
                        cooldown_start_time = current_time
                else:
                    gesture_detected = False
                    gesture_start_time = None
                    if action_triggered and current_time - cooldown_start_time > cooldown_period:
                        print('can continue')
                        action_triggered = False
                        cooldown_start_time = None

Video

https://youtu.be/9vOPKO_IG9M
Summary
This project demonstrates a method of using gesture recognition to control the myCobot 320, creating a new form of human-machine interaction. Although currently only a limited number of gestures and corresponding robotic arm movements have been implemented, it lays the groundwork for broader applications of robotic arms in the future. The innovative attempt to combine gestures with robotic arm control has not only improved my programming skills but also enhanced my problem-solving abilities, providing valuable experience for future related projects.

AI-Enhanced Robotics, GPT-4V and myCobot in Action

ElephantRobotics — Invalid Date

This article on the mylangrobot project was created by neka-nat, and the author Shirokuma has authorized its editing and reproduction. A big shout out at neka-nat:)
https://twitter.com/neka_nat

Introduction

This project demonstrates a simple pick-and-place operation using a robotic arm, employing GPT-4V and myCobot. The demonstration uses a method named SoM (Object Detection Method) to generate robotic movements through natural language. In simpler terms, the machine receives natural language instructions, locates the target, and then commands the robotic arm to perform the grabbing task. The highlight of this project is the combination of GPT-4V's image processing and the SoM object detection algorithm, enabling the robotic arm to move through natural language interaction with the machine.

Software

SoM

Set of Mark (SoM) is a method designed to enhance the visual understanding capabilities of large language models. After an image is processed with SoM, a series of markers are added to it. These markers can be recognized and processed by language models, aiding in more accurate identification and understanding of objects and content within the image. This enhancement allows language models to conduct more precise analysis and descriptions of elements in images, thereby improving their performance in visual tasks.

GPT-4V

The commonly discussed GPT is a large predictive model with which we can have conversational interactions. The new iteration, GPT-4V, is a multimodal large language model that not only processes textual information but now also handles image information. It is capable of image understanding, image generation, and image description. This large model combines the powerful natural language processing abilities of GPT-4 with current image analysis technologies, offering more efficient and accurate integrated capabilities in vision and language.
Below is an example provided by OpenAI:

User:
"These are frames from a video that I want to upload. Generate a compelling description that I can upload along with the video."

#GPT-4V Image description processing
GPT4:
"🐺 Survival of the Fittest: An Epic Tale in the Snow ❄️ - Witness the intense drama of nature as a pack of wolves face off against mighty bison in a harsh winter landscape. This raw footage captures the essence of the wild where every creature fights for survival. With each frame, experience the tension, the strategy, and the sheer force exerted in this life-or-death struggle. See nature's true colors in this gripping encounter on the snowy plains. 🦬"

Remember to respect wildlife and nature. This video may contain scenes that some viewers might find intense or distressing, but they depict natural animal behaviors important for ecological studies and understanding the reality of life in the wilderness.

Openai-whisper

https://openai.com/research/whisper
An automatic speech recognition (ASR) system developed by OpenAI. It utilizes advanced deep learning techniques to transcribe, translate, and understand human speech. Whisper is characterized by its ability to support multiple languages and its exceptional performance in recognizing voices with various accents and in noisy environments. Additionally, it can handle different audio qualities, making it suitable for various applications such as transcribing meeting records, automatically generating subtitles, and assisting in translations.

pymycobot

https://github.com/elephantrobotics/pymycobot/
pymycobot is a Python library for the mycobot robot series. MyCobot is a compact, multifunctional collaborative robotic arm suitable for education, research, and lightweight industrial applications. The PyMyCobot library provides a simple programming interface, enabling developers to control and program MyCobot robots for operations such as movement, grabbing, and sensing. This library supports multiple operating systems and development environments, facilitating its integration into various projects, especially in robotics and automation. By using Python, a widely-used programming language, pymycobot makes operating and experimenting with MyCobot robots more accessible and flexible.

Hardware

myCobot 280M5

The myCobot 280 M5 is a desktop-level compact six-axis collaborative robot produced by Elephant Robotics. Designed for compactness, it is suitable for education, research, and light industrial applications. The myCobot 280 M5 supports various programming and control methods, compatible with different operating systems and programming languages, including:
●Main and auxiliary control chips: ESP32
●Supports Bluetooth (2.4G/5G) and wireless (2.4G 3D Antenna)
●Multiple input and output ports
●Supports free movement, joint movement, Cartesian movement, trajectory recording, and wireless control
●Compatible operating systems: Windows, Linux, MAC
●Supported programming languages: Python, C++, C#, JavaScript
●Supported programming platforms and tools: RoboFlow, myblockly, Mind+, UiFlow, Arduino, mystudio
●Supported communication protocols: Serial port control protocol, TCP/IP, MODBUS
These features make the myCobot 280 M5 a versatile, user-friendly robot solution suitable for a variety of application scenarios.

myCobot Vertical Suction Pump V2.0

Operates on the principle of vacuum adhesion, providing 3.3V IO control, and can be extensively used in the development of various embedded devices.

Camera

Standard USB and LEGO interfaces. The USB interface can be used with various PC devices, and the LEGO interface can be conveniently fixed. It is applicable to machine vision, image recognition, and other applications.

mylangrobot Software Analysis
The specific workflow of the project described at the beginning is as follows:

Audio Input: Record audio instructions first.
Audio Processing: Use "openai-whisper" to process the audio and convert it into text.
Language Model Interaction: Use the GPT-4 model to process the converted text instructions and understand the user's commands.
Image Processing: Use GPT-4V and the enhanced image capability of SoM to process images and find the target mentioned in the instructions.
Robotic Arm Control: Control the robotic arm to grab the identified target.

Audio Processing
This function utilizes speech_recognition to capture audio data from the microphone, enabling the computer to recognize it.
Libraries used:

import io
import os
from enum import Enum
from typing import Protocol

import openai
import speech_recognition as sr
from pydub import AudioSegment
from pydub.playback import play

Define interfaces, capture user input, and provide output to the user.

class Interface(Protocol):
    def input(self, prefix: str = "") -> str:
        return prefix + self._input_impl()

    def _input_impl(self) -> str:
        ...

    def output(self, message: str) -> None:
        ...

Initialize the microphone for audio input and output.

class Audio(Interface):
    def __init__(self):
        self.r = sr.Recognizer()
        self.mic = sr.Microphone()
        # openai-whisper API key
        self.client = openai.OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

Convert the input audio into text format for output.

 def _input_impl(self) -> str:
        print("Please tell me your command.")
        with self.mic as source:
            self.r.adjust_for_ambient_noise(source)
            audio = self.r.listen(source)

        try:
            return self.r.recognize_whisper(audio, language="japanese")

        except sr.UnknownValueError:
            print("could not understand audio")
        except sr.RequestError as e:
            print("Could not request results from Google Speech Recognition service; {0}".format(e))

The final return 'r' is the text format of the audio, which can be used for interaction with the GPT-4 model.

Image Processing and GPT-4 Language Interaction

When transmitting text to the GPT-4 model for interaction, images are sent along, so image processing and interaction are discussed together.
Libraries used for image processing:

import cv2
import numpy as np
import supervision as sv
import torch
from segment_anything import SamAutomaticMaskGenerator, sam_model_registry

from .utils import download_sam_model_to_cache

Primarily uses the SamAutomaticMaskGenerator feature to mark and draw markers on detected targets.

#Convert image to RGB format
        image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
#Image processing, target detection and marker rendering
        sam_result = self.mask_generator.generate(image_rgb)
        detections = sv.Detections.from_sam(sam_result=sam_result)
        height, width, _ = image.shape
        image_area = height * width

        min_area_mask = (detections.area / image_area) > self.MIN_AREA_PERCENTAGE
        max_area_mask = (detections.area / image_area) < self.MAX_AREA_PERCENTAGE
        detections = detections[min_area_mask & max_area_mask]
        
        
        #Returns the result of the image and detected information
        labels = [str(i) for i in range(len(detections))]
        annotated_image = mask_annotator.annotate(scene=image_rgb.copy(), detections=detections)
        annotated_image = label_annotator.annotate(scene=annotated_image, detections=detections, labels=labels)
        return annotated_image, detections

This results in the following effect.
Note: The below function requires obtaining the GPT-4 API-Key for usage.
The resulting image is passed to the GPT-4 model, which requires some processing before use. Through GPT-4V, the image can be processed to return information about the image content and corresponding object information.

def prepare_inputs(message: str, image: np.ndarray) -> dict:
    # # Path to your image
    # image_path = "temp.jpg"
    # # Getting the base64 string
    base64_image = encode_image_from_cv2(image)

    payload = {
        "model": "gpt-4-vision-preview",
        "messages": [
            {"role": "system", "content": [metaprompt]},
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": message,
                    },
                    {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{base64_image}"}},
                ],
            },
        ],
        "max_tokens": 800,
    }

    return payload


def request_gpt4v(message: str, image: np.ndarray) -> str:
    payload = prepare_inputs(message, image)
    response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
    res = response.json()["choices"][0]["message"]["content"]
    return res

Robotic Arm Control and Overall Integration
After image processing and GPT-4V model processing, the interpreted instructions generate target position information. This position information is passed to the robotic arm control system, which moves to the corresponding location to perform the grabbing action.
Key methods involved:
Move to the target object.

    def move_to_object(self, object_no: int, speed: Optional[float] = None) -> None:
        object_no = self._check_and_correct_object_no(object_no)
        print("[MyCobotController] Move to Object No. {}".format(object_no))
        detection = (
            np.array([-self._detections[object_no][0], -self._detections[object_no][1]]) + self.capture_coord.pos[:2]
        )
        print("[MyCobotController] Object pos:", detection[0], detection[1])
        self.move_to_xy(detection[0], detection[1], speed)

grab action
  def grab(self, speed: Optional[float] = None) -> None:
        print("[MyCobotController] Grab to Object")
        current_pos = self.current_coords().pos
        self.move_to_z(self.object_height + self.end_effector_height, speed)
        self._mycobot.set_basic_output(self._suction_pin, 0)
        time.sleep(2)
        self.move_to_z(current_pos[2], speed)

drop action
    def move_to_place(self, place_name: str, speed: Optional[float] = None) -> None:
        print("[MyCobotController] Move to Place {}".format(place_name))
        self._current_position = self.positions[place_name]
        self._mycobot.sync_send_angles(
            np.array(self._current_position) + self.calc_gravity_compensation(self._current_position),
            speed or self._default_speed,
            self._command_timeout,
        )
        print("Current coords: {}".format(self.current_coords()))

After each function is implemented, coordinate the entire process, streamline the workflow logic, and complete the task.
The specific code can be viewed in the operator.py file.

Example

Below is an example test to observe the project's outcome. The content involves a voice input saying "pick up the chocolate," and the robotic arm executes the task.
https://youtu.be/Eda1m7DnIhQ

Summary

This project demonstrates how to leverage advanced artificial intelligence and robotics technologies to accomplish complex automation tasks. By integrating voice recognition, natural language processing, image analysis, and precise robotic arm control, the project has successfully created a robotic system capable of understanding and executing spoken instructions. This not only enhances the naturalness and efficiency of robot-human interaction but also opens up new possibilities for robotic technology in various practical applications, such as automated manufacturing, logistics, assistive robots, and more.
Finally, thanks again to Shirokuma for sharing this case with us. If you have better examples, feel free to contact us!

Creating a Synchronized Robotic Arm Demo: Step-by-Step Guide

ElephantRobotics — Invalid Date

Introduction

Today, I am going to present a robotic arm model that I have independently designed and implemented. The core feature of this model is to achieve real-time gesture tracking - just a gentle drag with your hand, and the robotic arm can immediately follow your movements.

The reason why I wanted to create such a model is that in some dangerous environments, we can use robotic arms to replace manual work, thereby avoiding threats to human life.

You might ask, why not directly use remote keyboard control, joystick control, or APP control, but choose to manually drag? I believe that only manual operation can satisfy our need for precision to the greatest extent. Therefore, I decided to start trying to make this model and have initially completed the entire demo.

I hope that through this demo, I can demonstrate the infinite possibilities of robotic arms to everyone, and at the same time, I hope to inspire everyone's infinite longing for future technology.

Robotic Arm

The mechArm 270 is a 6 DOF robotic arm, with a compact structure design that can fit into a backpack for easy transportation. Importantly, it has many open control APIs, which allows you to quickly start controlling the robotic arm using Python. There are no complicated operations, and it even supports graphical programming, which allows people who are not very familiar with code to quickly get started with controlling the robotic arm.

The mechArm is a desktop robotic arm with a structure that mimics industrial designs. Its maximum working radius is 270mm, it can carry a load of 250g, and its repeat positioning accuracy is controlled within ±0.5mm.

Project

After introducing the basic equipment, let's start with the record of how I created this demo.

Environment:
Operating system: Windows 11

Programming language: Python 3.9+

Python libraries: pymycobot, time

pymycobot is an open-source library for Elephant Robotics, specifically designed to control the robotic arm of Elephant Robotics. Here is an example of a simple control code.

Code：

#Main methods used

#Create objects to communicate with the robotic arm.
MyCobot(serial,baud)
# angles control robot,
send_angles([list_angles],speed)
# coords control robot
send_coords([list_coords],speed,mode)

Example:

import time
from pymycobot.mycobot import MyCobot

# create a object
mc = MyCobot("com7",115200)

# angles control
mc.send_angles([0,0,0,0,0,0],100)
time.sleep(1)
mc.send_angles([90,90,90,90,90,90],100)
time.sleep(1)

I briefly introduced how to use python to control mechArm. Isn’t it easy?

Problem Analysis

Before starting the project, it's important to set up a framework and understand the specific problems we need to solve. For this, I made a flowchart of the project. In the following, I will refer to the manually controlled robotic arm as R1, and the following motion robotic arm as R2.

Control Robotic Arm: As mentioned above, the robotic arm can be controlled using the methods provided by the pymycobot library.

Motion Control Methods: The R1 robotic arm can be dragged by hand, returning the current angle information of the robotic arm at all times. The R2 robotic arm controls based on the angle information received from R1.

Communication Between Robotic Arms: This step is quite important in the entire project. Once established, the robotic arm can easily implement information transmission.

Next, I will mainly explain the Motion Control Methods and Communication Between Robotic Arms.

Motion Control methods

Get real-time angle information

pymycobot provides the "get_angles()" method to obtain the angle information of the current robot arm.

# Can obtain the current angle information of the robotic arm in real time
get_angles()

# example
print("real-time angles:",mc.get_anlges())

result: real-time angles:[0,0,0,0,0,0]

# Continuously obtain the current angle
while True:
    angels = mc.get_angles()
    print(angles) 
    time.sleep(0.1) #Go to the next step every 0.1s

Set the Refresh Mode for the Robotic Arm

The refresh mode of the robotic arm mainly falls into two categories: interpolation mode and non-interpolation mode. These refer to the ways in which the end effector of the robotic arm is controlled during motion trajectory planning. If no mode is set, the robotic arm may not be able to perform the expected motion correctly, which may lead to the following consequences:

Unsmooth motion
Inaccurate motion
Discontinuous motion

Interpolation Mode: The interpolation mode can realize smooth and continuous trajectory planning, ensuring that the position and posture of the end effector of the robotic arm transition smoothly during the motion process.

Non-Interpolation Mode: The non-interpolation mode means that the robotic arm only focuses on specific target points during the motion process, without performing interpolation calculations. Under the non-interpolation mode, the position and posture of the robotic arm will jump directly between key points, without undergoing a smooth transition.

When multiple robotic arms use the interpolation mode for motion at the same time, there may be situations of waiting or queuing. Therefore, we choose to use the non-interpolation mode.

#Set refresh mode
set_fresh_mode(1/0) 
1：no interpolation
0：interpolation

mc.set_fresh_mode(1)

Our code that integrates the previous ones is as follows.

Code:

import time
from pymycobot.mycobot import MyCobot

mc = MyCobot("COM7", 115200)    #release arm
mb = MyCobot("COM11", 115200)   #move arm

mb.set_fresh_mode(1)    #no interpolation
time.sleep(1)
mc.release_all_servos() #release robot
time.sleep(1)
speed = 100
while True:
    angles = mc.get_angles()    #get release arm angles
    mb.send_angles(angles, speed)   #send angles to move arm
    time.sleep(0.1)
## Communication between robotic arms：
Our solution is to connect two robotic arms to the same PC and connect them through a serial port.

# build connection
from pymycobot.mycobot import MyCobot

mc = MyCobot("COM7", 115200)   
mb = MyCobot("COM11", 115200)

By using the most basic USB data cables for connection, we have two serial port numbers for the robotic arms on our computer, and we can send instructions to them separately.

https://www.youtube.com/watch?v=NByjgoqc2O4

Summary

From the content, it can be seen that although we can achieve about 70-80% synchronization, there are other factors that can cause significant delays. The reasons for the delays could be various, such as the speed of data processing and transmission, the reaction speed of the robotic arm, software optimization, hardware performance, etc. All of these are potential factors that can cause delays.

In addition, there is a significant limitation in that their communication is connected through serial ports. If the distance is a bit further, this method cannot be used, and its practicality is not strong. In the future, I will try to use wireless connections such as Bluetooth and WiFi to control the robotic arm.

Controlling myCobot 320 with MyCobot Controller App

ElephantRobotics — Invalid Date

Introduction

Robotic arms are essential tools in modern industry and research fields, with widespread applications in manufacturing, healthcare, agriculture, education, and more. These robotic arms can not only perform precise operations but also work in hazardous environments inaccessible to humans, greatly enhancing work efficiency and safety.However, traditional control methods of robotic arms often require specialized knowledge and complex programming, which to some extent limits their popularization and application. Therefore, the method of controlling robotic arms via mobile apps emerged. The advantage of mobile app control is that it allows non-professionals to conveniently operate robotic arms. Moreover, through the mobile app, users can remotely control the robotic arm, greatly improving the convenience and flexibility of operation.This article aims to delve into the topic of how to use a mobile app to control robotic arms.

Product

myCobot 320

The myCobot 320 is a 6-axis collaborative robotic arm designed for user-oriented programming and development. It has built-in forward and inverse kinematics algorithms, offers an open ROS simulation development environment, and comes equipped with 12 standard 24V industrial IO interfaces, which can be expanded for PLC control programming. The working radius of the myCobot 320 can reach up to 350mm, and the end can bear a maximum load of 1kg.It offers precise positioning repeatability within a range of 0.5mm.

myCobot Controller
The "myCobot Controller" is an application launched by Elephant Robotics specifically for controlling the myCobot series of robotic arms. This app employs Bluetooth technology, enabling wireless connection between the user's smartphone or tablet and the myCobot robotic arm.

The working principle of the myCobot Controller APP is mainly based on Bluetooth communication technology. After successfully pairing with the myCobot robotic arm via Bluetooth, the app can control various functions of the robotic arm. This method is not only easy to operate, but it also allows for real-time, wireless control, greatly enhancing the flexibility and convenience of operation.

Steps for APP to control myCobot
Next, we will introduce how to use this function while operating.

Install APP

Currently, the myCobot Controller only supports Android system phones, and IOS system users will need to wait for some time. Clicking connect can directly lead to the download location.Currently, there are two ways to install the app. The first one is to directly search for "myCobot controller" on the Google Play Store for download.The second one is to download the APK directly from the official website to the mobile phone for installation. (You can install it directly after a normal download).

Connect myCobot

First, we power up the myCobot and select "Transponder".

After entering, we choose the Bluetooth mode, "Bluetooth".

After successful operation, the interface will appear as shown in the following image. The name of the Bluetooth is: mycobot320m5ble, and its MAC address is: 08:3a:f2:42:f0:26 (Each machine is different, which is convenient for distinguishing when two machines are on at the same time). Upon accessing this page, we simply need to wait, as the subsequent steps can be performed on a mobile phone.

Open the downloaded APP, click the Bluetooth icon to connect to myCobot.

Make sure the name of the Bluetooth corresponds to the name of the robotic arm, "mycobot320m5ble". If you are uncertain, you can verify the MAC address due to its uniqueness. If the display is as shown in the following image, the connection is successful.

Robotic arm motion control

The APP controls the robotic arm using forward and inverse kinematics, mainly in two ways: one is the control of the robotic arm's angle, and the other is the control of the robotic arm's coordinates.

Joint Control

Joint control is a control method of the forward kinematics algorithm. Forward kinematics refers to the process of calculating the position and posture of the robot's end effector when the parameters (Joint) of each joint of the robot are known. In other words, the spatial coordinates of the end of the robotic arm can be calculated by rotating the angles.

The Joint (1-6) on both sides can be increased or decreased to control the corresponding myCobot joint angles. The six parameters in the middle TCL part represent the Cartesian coordinate values of the robotic arm's end at this moment.

Coords Control

Coordinate control is a control method of the inverse kinematics algorithm. Inverse kinematics refers to the process of calculating the parameters (angles) of each joint of the robot when the coordinate position of the robot's end effector is known. This allows the robot to be moved by adjusting the coordinates of the robot's end, thereby calculating the parameters of the robot's joint angles under the current state.

The buttons on the left are easy to understand, corresponding to the position of the robotic arm's end in space, moving linearly up, down, left, and right. In other words, the end of the robotic arm moves forward, backward, left, right, up, and down in space, mainly in straight line movements. In Cartesian coordinates, 'Rz, Ry, Rx' are usually used to represent rotations around the z-axis, y-axis, and x-axis. These rotations typically follow the right-hand rule, that is, if your right hand's four fingers curl in the direction of rotation, then the direction your thumb points is the direction of the rotation axis.

Generally speaking:

● "rz": rotation around the z-axis. This will cause the end of the robotic arm to rotate within the x-y plane, that is, it will change the yaw angle of the end effector.

● "ry": rotation around the y-axis. This will cause the end of the robotic arm to rotate within the z-x plane, that is, it will change the pitch angle of the end effector.

● "rx": rotation around the x-axis. This will cause the end of the robotic arm to rotate within the y-z plane, that is, it will change the roll angle of the end effector.

In simpler terms, it is the robotic arm rotating around the x, y, z axes.

Operation video

https://youtu.be/8JBhk55pz8I

Summary

Overall, the mobile app control provides a more intuitive and easier-to-understand operating method, allowing non-professionals to quickly get started. In addition, we can remotely control the robotic arm, which makes it possible to operate the robotic arm in some complex or dangerous environments, thus expanding the application scenarios of the robotic arm.

Currently, the functionality of the app can still be expanded. What features do you think need to be added to the app? Feel free to comment below, let's discuss the optimization of the app's functionality together! If you like this article, likes and comments are the greatest support for us!

Harnessing the Power of ROS and MoveIt in Robotic Arm Manipulation

Yuly12 — Invalid Date

Introduction

This article will document my experience using myCobot 280 M5stack in ROS. Why use ROS? Because when it comes to robotics, it's hard to avoid the ROS operating system.
Today is our first time using this system. Today, I will start with an introduction to ROS, the configuration of the environment, and the use of mycobot280 in ROS.

ROS

ROS (Robot Operating System) is a flexible framework for writing robot software. It is a semi-open source project that provides researchers and developers with a set of tools and libraries to help them create various robot applications. ROS is mainly used for research and development of various robot systems, such as autonomous vehicles, service robots, industrial robots, drones, etc. It makes the development of robots more efficient and enhances the functionality and flexibility of robots. The two most notable modules in the ROS operating system are navigation and robotic arm control (moveit). Moveit allows users to quickly build robotic arm models and implement robotic arm control, such as modeling, kinematic solution, motion planning, obstacle avoidance, and more. Given such a powerful system, I want to use it to simulate some functions of the robotic arm, such as obstacle avoidance, path planning, etc.

Environment configuration

ROS primarily supports operating systems like Ubuntu (Linux) and Mac OS (with partial support). The most common usage is on the Ubuntu system. My computer is installed with Ubuntu 18.04 version. The corresponding ROS version for Ubuntu 18 is Melodic. Different versions of Ubuntu correspond to different ROS versions.

ubuntu16.04-kinetic
ubuntu18.04-melodic
ubuntu20.04-noetic

Since my computer has already had the system and ROS environment installed, I won’t record each step here. The main focus will be on the operations within ROS. For the installation of the Ubuntu system and the setup of the ROS environment, you can search on Google. The ROS official website also has detailed installation tutorials.
The scenario I’m using today involves controlling a robotic arm with MoveIt, so the demand for computer hardware is not high. Please note that if you are using features like SLAM radar navigation, you might need a higher-spec computer to handle the computational power. The standard to check whether ROS has been successfully installed is to open the terminal and type “roscore”. If the content that appears is the same as the image below, it means the installation was successful.

Let’s use the ROS!

Project

To create a robotic arm in ROS and use some path planning features, you need to perform the following steps:

list itemCreate a URDF file, which is the model of the robotic arm.
list itemLoad the URDF file into the ROS server.
list itemUse RViz for visualization.
list itemCreate a MoveIt configuration package.
list itemUse MoveIt for motion planning.

Creating a URDF file

Creating a URDF file First, we need to create a URDF (Unified Robot Description Format) file for the robotic arm. This file is crucial as it describes the properties of the robotic arm, such as joints, links, and sensors, in the form of a 3D model. Since the product is the myCobot 280 M5Stack robotic arm from Elephant Robotics, Elephant Robotics has provided a URDF file for the myCobot 280 on Github, which describes some physical properties of the robotic arm.
How to generate a URDF file:
You need to use CAD software to create the model, ensuring that the 3D model you create matches the actual size of the robotic arm. Because the exported model needs to be in a format that ROS can accept, it is usually a DAE (Collada) file.

Load URDF to ROS server

Create a ROS package
To create a package to store the URDF file, execute the following code in the terminal:

catkin_create_pkg mycobot_description rospy std_msgs sensor_msgs

Here, ‘rospy’, ‘std_msgs’, and ‘sensor_msgs’ are dependency packages in ROS. Add the generated URDF file to the mycobot_description/urdf directory.

Create a launch file
In the mycobot_description/launch directory, create a file named load_urdf.launch and add the following content to it.

Lanuch ROS file
Open the terminal and enter the following command:

roslaunch my_robot_description load_urdf.launch

After running, we can see the model we created in RViz.

Configure MoveIt

Start MoveIt Setup Assistant:

In a terminal, run the following command to start MoveIt Setup Assistant:

roslaunch moveit_setup_assistant setup_assistant.launch

Load URDF in MoveIt In the main interface of MoveIt Setup Assistant, click the “Create New MoveIt Configuration Package” button. Then, in the pop-up dialogue box, select my URDF file (in the ROS parameter robot_description). Click the “Load Files” button to load your URDF file. If the image as shown below appears, it means the file has been loaded successfully.

Configure Robot

Configure your robot according to your needs. Here are a few configurations briefly introduced:

Self-Collisions: MoveIt will automatically calculate the self-collision matrix of your robot. You can
click the “Regenerate Default Collision Matrix” button to generate a default self-collision matrix.
Virtual Joints: You can define virtual joints for your robot. Virtual joints are often used to connect your robot to the world.
Planning Groups: You can define the planning groups of your robot. Planning groups are a set of joints and links that need to move together. For example, you can create a planning group that includes all arm joints.
Robot Poses: You can define preset poses for your robot. These preset poses can be reused in planning.

After the configuration is completed, a .launch file will be generated, and we can use the robotic arm in moveit by running it.

Let’s see how the moveit function performs path planning. By dragging the coordinates of the end of the robotic arm, we can make changes to the movement of the robotic arm, and also randomly generate paths.

Object obstacle avoidance

We add an obstacle in MoveIt for the robotic arm to avoid. We add a cube, so the robotic arm will navigate around it when it follows its trajectory.

In addition to this, MoveIt has many features. For instance, motion planning, which can perform path planning for multi-joint robots, avoid collisions, and also use various motion planning algorithms like RRT, EST, LBKPIECE, etc. It also includes collision detection, which can carry out complex 3D collision detection, including self-collision and environmental collision (obstacles). Furthermore, it offers simulation capabilities, and can be integrated with the Gazebo robot simulator for realistic physical simulation, among other features.

Summary

MoveIt provides a comprehensive set of tools for research into the motion planning and control of robotic arms, eliminating the need to build complex motion planning systems from scratch. This allows developers to focus more of their energy on implementing specific application functions, rather than on solving underlying motion planning and control problems. Overall, ROS and MoveIt provide strong support for the development of robotic technology. They not only reduce development difficulties and improve development efficiency, but also promote innovation and application in robotic technology. Whether you are a researcher in robotic technology or a developer in practical applications, ROS and MoveIt will be indispensable tools for you.

In the future, I will continue to explore more robotic arm projects and share my findings and achievements from time to time. If you find any errors in this article, feel free to communicate with me. If you like this article, your likes and comments will be the greatest support for me!

Revealing the Potential of myCobotM5Stack AI Kit Vision Algorithms

ElephantRobotics — Invalid Date

Introduction

In this article, we will delve deeper into understanding how the machine recognition algorithm of myCobot 320 AI Kit is implemented. In today's society, with the continuous development of artificial intelligence technology, the application of robotic arms is becoming increasingly widespread. As a robot that can simulate human arm movements, the robotic arm has a series of advantages such as efficiency, precision, flexibility, and safety. In industrial, logistics, medical, agricultural and other fields, robotic arms have become an essential part of many automated production lines and systems. For example, in scenes such as automated assembly on factory production lines, cargo handling in warehouse logistics, auxiliary operations in medical surgery, and planting and harvesting in agricultural production, robotic arms can play its unique role. This article will focus on introducing the application of robotic arms combined with vision recognition technology in the myCobot 320 AI Kit scene, and exploring the advantages and future development trends of robotic arm vision control technology.

Product

myCobot 320 M5Stack

myCobot 320 is a 6-axis collaborative robot designed for user-independent programming and development. With a motion radius of 350mm, it can support a maximum end load of 1000g with a repetitive positioning accuracy of 0.5mm. It provides a fully open software control interface that enables users to quickly control the robotic arm using a variety of mainstream programming languages.The robotic arm uses M5Stack-Basic as the embedded control board, and ATOM as the core control center of the robotic arm.

myCobot Adaptive gripper

The myCobot adaptive gripper is an end-of-arm actuator used for grasping and transporting objects of various shapes and sizes. It has high flexibility and adaptability and can automatically adjust its gripping force and position based on the shape and size of different objects. It can be combined with machine vision to adjust the gripping force and position of the gripper by obtaining information from vision algorithms. The gripper can handle objects up to 1kg and has a maximum grip distance of 90mm. It is powered by electricity and is very convenient to use. This is the equipment we are using, along with the myCobot 320 AI Kit that we will be using later.

Vision algorithm

Vision algorithm is a method of analyzing and understanding images and videos using computer image processing techniques. It mainly includes several aspects such as image preprocessing, feature extraction, object detection, and pose estimation.

Image preprocessing:
Image preprocessing is the process of processing the original image to make it more suitable for subsequent analysis and processing. Commonly used algorithms include image denoising algorithms, image enhancement algorithms, and image segmentation algorithms.
Feature point extraction:
Feature extraction is the process of extracting key features from the image for further analysis and processing. Common algorithms include SIFT algorithm, SURF algorithm, ORB algorithm, HOG algorithm, LBP algorithm, etc.
Object detection:
Object detection is the process of finding a specific object or target in an image. Commonly used algorithms include Haar feature classifier, HOG feature + SVM classifier, Faster R-CNN, YOLO.
Pose estimation:
Pose estimation is the process of estimating the pose of an object by identifying its position, angle, and other information. Common algorithms include PnP algorithm, EPnP algorithm, Iterative Closest Point algorithm (ICP), etc.

Example

Color recognition algorithm
The verbiage is too abstract. Let us demonstrate this step through practical application. How can we detect the white golf ball in the image below? We shall employ the use of OpenCV's machine vision library.

Image processing:

Initially, we must preprocess the image to enable the computer to swiftly locate the target object. This step involves converting the image to grayscale.

Grayscale image:

A grayscale image is a method of converting a colored image to a black and white image. It depicts the brightness or gray level of each pixel in the image. In a grayscale image, the value of each pixel represents its brightness, typically ranging from 0 to 255, where 0 represents black and 255 represents white. The intermediate values represent varying degrees of grayness.

import cv2
import numpy as np

image = cv2.imread('ball.jpg')
# turn to gray pic
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

cv2.imshow('gray', gray)

Binarization:

As we can observe, there is a significant color contrast between the golf ball and the background in the image. We can detect the target object through color detection. Although the golf ball is primarily white, there are some gray shadow areas caused by lighting. Therefore, while setting the pixels of the grayscale image, we must consider the gray areas as well.

lower_white = np.array([180, 180, 180])  # Lower limit
upper_white = np.array([255, 255, 255])  # Upper limit

# find target object
mask = cv2.inRange(image, lower_white, upper_white)
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)

This step is called binarization, which separates the target object from the background.

Contour filtering:

After binarization, we need to establish a filter for the contour area size. If we fail to set this filter, we may encounter the result depicted in the image below, where many areas are selected, whereas we only desire the largest one. By filtering out small regions, we can achieve our desired outcome.

#filter
min_area = 100
filtered_contours = [cnt for cnt in contours if cv2.contourArea(cnt) > min_area]

#draw border
for cnt in filtered_contours:
    x, y, w, h = cv2.boundingRect(cnt)
    cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 255), 2)

import cv2
import numpy as np

image = cv2.imread('ball.jpg')
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

lower_white = np.array([170, 170, 170])   
upper_white = np.array([255, 255, 255])  

mask = cv2.inRange(image, lower_white, upper_white)
contours, _ = cv2.findContours(mask, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
min_area = 500
filtered_contours = [cnt for cnt in contours if cv2.contourArea(cnt) > min_area]

for cnt in filtered_contours:
    x, y, w, h = cv2.boundingRect(cnt)
    cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 255), 2)


cv2.imshow('Object Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()

It is important to note that we are utilizing a robotic arm to grasp the object. Hence, merely detecting the target object is insufficient. We must obtain the coordinate information of the object. To achieve this, we use OpenCV's Aruco markers, which are commonly used 2D barcodes for tasks such as camera calibration, pose estimation, and camera tracking in computer vision. Each Aruco marker has a unique identifier. By detecting and recognizing these markers, we can infer the position of the camera and the relationship between the camera and the markers.

The two unique Arcuo codes in the picture are used to fix the size of the cropped picture and the position of the arcuo code, and the target object can be obtained through calculation.

With the Aruco marker's positioning, we can detect the location of the target object. We can then convert the x and y coordinates into world coordinates and provide them to the robotic arm's coordinate system. The robotic arm can then proceed with grasping the object.

# get points of two aruco
 def get_calculate_params(self, img):
        """
        Get the center coordinates of two ArUco codes in the image
        :param img: Image, in color image format.
        :return: If two ArUco codes are detected, returns the coordinates of the centers of the two codes; otherwise returns None.
        """
        # Convert the image to a gray image 
        gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
        # Detect ArUco marker.
        corners, ids, rejectImaPoint = cv2.aruco.detectMarkers(
            gray, self.aruco_dict, parameters=self.aruco_params
        )

        """
        Two Arucos must be present in the picture and in the same order.
        There are two Arucos in the Corners, and each aruco contains the pixels of its four corners.
        Determine the center of the aruco by the four corners of the aruco.
        """
        if len(corners) > 0:
            if ids is not None:
                if len(corners) <= 1 or ids[0] == 1:
                    return None
                x1 = x2 = y1 = y2 = 0
                point_11, point_21, point_31, point_41 = corners[0][0]
                x1, y1 = int((point_11[0] + point_21[0] + point_31[0] + point_41[0]) / 4.0), int(
                    (point_11[1] + point_21[1] + point_31[1] + point_41[1]) / 4.0)
                point_1, point_2, point_3, point_4 = corners[1][0]
                x2, y2 = int((point_1[0] + point_2[0] + point_3[0] + point_4[0]) / 4.0), int(
                    (point_1[1] + point_2[1] + point_3[1] + point_4[1]) / 4.0)

                return x1, x2, y1, y2
        return None

    # set camera clipping parameters  
    def set_cut_params(self, x1, y1, x2, y2):
        self.x1 = int(x1)
        self.y1 = int(y1)
        self.x2 = int(x2)
        self.y2 = int(y2)

    # set parameters to calculate the coords between cube and mycobot320
    def set_params(self, c_x, c_y, ratio):
        self.c_x = c_x
        self.c_y = c_y
        self.ratio = 320.0 / ratio

    # calculate the coords between cube and mycobot320
    def get_position(self, x, y):
        return ((y - self.c_y) * self.ratio + self.camera_x), ((x - self.c_x) * self.ratio + self.camera_y)

Summary

Vision-based control technology for robotic arms is a rapidly developing and widely applied technology. Compared to traditional robotic arm control technology, vision-based control technology boasts advantages such as high efficiency, precision, and flexibility, and can be extensively utilized in industrial production, manufacturing, logistics, and other fields. With the constant evolution of technology such as artificial intelligence and machine learning, vision-based control technology for robotic arms will have even wider application scenarios. In the future, it will be necessary to strengthen technological research and development and innovation, constantly improving the level of technology and application capabilities.

Let AI play Connect4:Implementing using DQN neural network with myCobot 280 M5stack

ElephantRobotics — Invalid Date

Introduction

Today, we shall acquaint ourselves with the subject of artificial intelligence in chess, whereby we shall employ a mechanical arm as an opponent to engage in a game of chess with you. The study of artificial intelligence in chess can be traced back to the 1950s, when computer scientists began to explore the writing of programs that would enable computers to play chess. The most renowned example of this was Deep Blue, developed by IBM, which in 1997 defeated the reigning world chess champion, Gary Kasparov, with a score of 3.5-2.5.

Artificial intelligence in chess is akin to granting a computer a mode of deliberation that enables it to achieve victory in a match. There are many such modes, most of which stem from superior algorithms. At its core, Deep Blue's algorithm was based on brute-force search: generating every possible move, conducting searches as deeply as possible, constantly evaluating the game state, and attempting to identify the optimal move.
Now, let me show you how to realize the robot arm to play chess intelligently.

Connect 4

The game I shall introduce today is known as Connect4, a strategic board game commonly referred to as "Four in a Row". The objective of Connect4 is to achieve a horizontal, vertical, or diagonal sequence of four game pieces within a vertically-oriented grid comprising six rows and seven columns. Two players take turns inserting their respective pieces from the top of the grid, with the pieces falling to the lowest available position within the selected column. Players may select any column in which to place their pieces, provided the pieces are placed only underneath existing pieces.

As shown in the animation, this is Connect4.

myCobot 280

The robotic arm selected for the task is the myCobot 280 M5Stack, a powerful desktop six-axis robot arm that employs the M5Stack-Basic as its control core and supports multiple programming languages. The myCobot 280's six-axis structure grants it high flexibility and precision, enabling it to perform a variety of complex operations and movements. It supports multiple programming languages, including Python, C++, and Java, allowing developers to program and control the mechanical arm according to their specific needs. Its user-friendly interface and detailed user manual facilitate rapid familiarization, while its embedded design ensures a compact size, making it easy to carry and store.

This is the scene we built,we use myCobot280 to play connect4 with us.

Algorithm for playing chess

Firstly, we must address the crucial matter of which algorithm to employ for playing chess. In other words, we must provide the mechanical arm with a cognitive brain capable of deep contemplation. Allow me to briefly present to you a few commonly used algorithms for playing chess:

The Minimax Algorithm:

This is a classic game algorithm that is applicable for two-player games. It works by recursively simulating the moves of both the opponent and oneself, evaluating the score of every possible move, and selecting the action with the highest score. The Minimax Algorithm can find the best chess strategy by searching through the tree structure of the game board. This algorithm is a zero-sum game, meaning that one player chooses the option that maximizes their advantage from the available choices, while the other player selects the method that minimizes the advantage of their opponent. The total sum is zero at the beginning. Let me give a simple example of Tic-Tac-Toe to illustrate this.

Max represents us, while Min represents our opponent. At this point, we need to assign a score to every possible outcome, which is known as the Utility. This score is evaluated from our perspective (Max), for example, in the figure above, if I win, the score is +1, if I lose, it is -1, and if it is a tie, it is 0. Therefore, we want to maximize this score, while our opponent wants to minimize it. (In the game, this score is called the static value.) I must mention that Tic-Tac-Toe is a relatively simple game, so it is possible to list all possible outcomes. However, for most games, it is impossible to list all outcomes. According to the computational power of the computer, we may only be able to look forward to 7 or 8 steps, so the score is not as simple as -1, 1, 0. There will be specific algorithms to assign different scores based on the current outcome.

Alpha-Beta Pruning Algorithm:

This is an optimization of the Minimax Algorithm. It reduces the number of branches to be searched by pruning, thus speeding up the search process. The Alpha-Beta Pruning Algorithm uses upper and lower bounds (Alpha and Beta values) to determine which branches can be discarded, reducing the depth of the search.
Neural Networks and Deep Learning:
The connect4 game algorithm that we designed also uses neural networks and deep learning for gameplay.

Neural Networks:

Scientists have always hoped to simulate the human brain and create machines that can think. Why can humans think? Scientists have found that the reason lies in the neural network of the human body. Neural networks are mathematical models that simulate the structure and function of the human brain's nervous system. They process information and learn by simulating the connections and signal transmission between neurons. Neural networks are the foundation of all artificial intelligence.

The basic idea of the neural network algorithm is to pass input data to the input layer of the network, then obtain the output layer's results through a series of computations and transmissions in intermediate layers (hidden layers). The training process adjusts the connection weights to minimize the difference between the actual output and expected output, optimizing the performance of the network.
Deep Learning:
Deep learning is a branch of machine learning that focuses on using deep neural networks for learning and reasoning. Deep learning solves complex learning and decision-making problems by constructing deep neural networks, which have multiple intermediate layers (hidden layers). It can be said that deep learning is a learning method that uses neural networks as the core tool. Deep learning not only includes the structure and algorithm of neural networks but also involves training methods, optimization algorithms, and large-scale data processing.

Project

The project is mainly divided into two parts, hardware and software:

The most crucial part of this project is information collection, analysis, and processing. As mentioned earlier, we also used relevant knowledge of neural algorithms and deep learning, and the specific neural network used is the DQN neural network.

DQN Neural Network:

The DQN neural network was proposed by DeepMind and combines the ideas of deep learning and reinforcement learning. DQN uses a deep neural network to estimate the state-action value function (Q function), enabling optimal decision-making in complex environments. The core idea of DQN is to use a deep neural network as a function approximator to approximate the state-action value function. By taking the current state as input, the neural network outputs the corresponding Q value for each action, that is, predicting the long-term return of that action in the current state. Then, the optimal action is selected and executed based on the Q value.

Environment Setup:

Firstly, we need to define the Connect4 game using a two-dimensional array to represent the game board and two types of game pieces, red (R) and yellow (Y). We also need to define the end condition of the game, which is when four game pieces of the same color are connected in a line, the game ends.

self.bgr_data_grid = [[None for j in range(6)] for i in range(7)]

#Used to display the state of the board
def debug_display_chess_console(self):
    for y in range(6):
        for x in range(7):
            cell = self.stable_grid[x][y]
            if cell == Board.P_RED:
                print(Board.DISPLAY_R, end="")
            elif cell == Board.P_YELLOW:
                print(Board.DISPLAY_Y, end="")
            else:
                print(Board.DISPLAY_EMPTY, end="")
        print()
    print()

Here is the code that defines whether the game is over:

def is_game_over(board):
    # Check if there are four consecutive identical pieces in a row.
    for row in board:
        for col in range(len(row) - 3):
            if row[col] != 0 and row[col] == row[col+1] == row[col+2] == row[col+3]:
                return True

    # Check if there are four consecutive identical pieces in a column.
    for col in range(len(board[0])):
        for row in range(len(board) - 3):
            if board[row][col] != 0 and board[row][col] == board[row+1][col] == board[row+2][col] == board[row+3][col]:
                return True

    # Examine if there are four consecutive identical pieces in a diagonal line.
    for row in range(len(board) - 3):
        for col in range(len(board[0]) - 3):
            if board[row][col] != 0 and board[row][col] == board[row+1][col+1] == board[row+2][col+2] == board[row+3][col+3]:
                return True

    for row in range(len(board) - 3):
        for col in range(3, len(board[0])):
            if board[row][col] != 0 and board[row][col] == board[row+1][col-1] == board[row+2][col-2] == board[row+3][col-3]:
                return True

    # Verify if the game board is filled completely.
    for row in board:
        if 0 in row:
            return False

    return True

Building the DQN Neural Network:
We need to define the input layer and output layer of the neural network. The dimension of the input layer should match the state representation of the game board, while the dimension of the output layer should match the number of legal actions. In short, the input layer receives the status information of the game board, and the output layer generates the corresponding action selection.

Experience Replay Buffer:

Machines need to learn, so we need to build an experience replay buffer to store the agent's experience. This can be a list or queue used to store information such as the state, action, reward, and next state during the game process.
Here is the pseudocode for constructing the experience replay buffer:

 class ReplayBuffer:
    def __init__(self, capacity):
        self.capacity = capacity
        self.buffer = []

    def add_experience(self, experience):
        if len(self.buffer) >= self.capacity:
            self.buffer.pop(0)
        self.buffer.append(experience)

    def sample_batch(self, batch_size):
        batch = random.sample(self.buffer, batch_size)
        states, actions, rewards, next_states, dones = zip(*batch)
        return states, actions, rewards, next_states, dones

Decision-making:

We have defined a strategy class named EpsilonGreedyStrategy, which uses the ε-greedy strategy for action selection and exploration. In the initialization function init(), we specify the exploration rate ε. The select_action() method selects actions based on the Q-value, randomly selects actions with a probability based on the exploration rate or selects the action with the highest Q-value.

class EpsilonGreedyStrategy:
    def __init__(self, epsilon):
        self.epsilon = epsilon

    def select_action(self, q_values):
        if random.random() < self.epsilon:
            action = random.randint(0, len(q_values) - 1)
        else:
            action = max(enumerate(q_values), key=lambda x: x[1])[0]
        return action

Training Framework:
We use the PyTorch framework in Python to construct the training and implement loop training. We regularly evaluate the performance of the agent by playing against the current DQN neural network and pre-trained or other opponents until the preset requirements are met.
demo

Summary

This article has come to a temporary conclusion. We mainly introduced how the DQN neural algorithm is implemented in Connect4. The next article will introduce how the robotic arm executes the optimal solution. The algorithm described in this article is just the tip of the iceberg. If you are interested in game algorithms, you can refer to relevant books for further understanding.
We are currently in a period of great change, where artificial intelligence is everywhere, not only capable of defeating top players in games but also has a presence in various fields. We must seize the opportunity to keep up with this technology-filled 21st century.
We will soon update the next article. If you are interested, please follow us and leave a message below, which is the best support for us!

Object Tracking on myCobot 280 Jetson Nano: A Case Study

ElephantRobotics — Invalid Date

Introduction

When we saw a video on YouTube of someone using a robotic arm to achieve object tracking, it deeply inspired us. We became very interested in this project and decided to independently develop a similar program.

myCobot 280 M5Stack

The robotic arm used for the operation is the myCobot280 M5Stack. This is a small 6-axis robotic arm produced by Elephant Robotics, with M5Stack-Basic as the microprocessor, ESP32 as the auxiliary control, and a UR collaborative structure. The myCobot280 M5Stack-Basic has a body weight of 800g, a payload of 250g, a working radius of 280mm, and a compact and portable design. Despite its small size, it is powerful and easy to operate, capable of collaborating with humans and working safely.

Process

The following image is a flowchart of the project development process.

Capture the target

Before beginning development, we conducted some research and experiments. We used a camera to capture images of objects and utilized the OpenCV library for recognition. We attempted various methods, but object recognition required machine learning for the target we wanted to identify, which would increase the project development time. Ultimately, we decided to use aruco codes for identification, which allowed us to quickly capture the aruco codes and proceed to the next stage of development.

Code：

def show_video_v2(self):
        # self.robot.init_robot()
        xyz = np.array([0,0,0])
        LIST = []
        num_count = 0
        list_len = 5
        # cmax = [180, 80, 240]
        # cmin = [130, -80, 200]
        cmax = [150, -150, 300]
        cmin = [-150, -250, 200]

        while cv2.waitKey(1) < 0:
            success, img = self.cap.read()
            if not success:
                print("It seems that the image cannot be acquired correctly.")
                break
            # transfrom the img to model of gray
            gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
            # Detect ArUco marker.
            corners, ids, rejectImaPoint = cv2.aruco.detectMarkers(
                gray, self.aruco_dict, parameters=self.aruco_params
            )

            if len(corners) > 0:
                if ids is not None:
                    # get informations of aruco
                    ret = cv2.aruco.estimatePoseSingleMarkers(
                        # '''https://stackoverflow.com/questions/53303730/what-is-the-value-for-markerlength-in-aruco-estimateposesinglemarkers'''
                        corners, 0.025, self.camera_matrix, self.dist_coeffs
                    )
                    # rvec:rotation offset,tvec:translation deviator
                    (rvec, tvec) = (ret[0], ret[1])
                    
                    (rvec - tvec).any()
                    xyz = tvec[0, 0, :] * 1000
                    rpy = rvec[0,0,:]

                    camera = np.array([xyz[0], xyz[1], xyz[2]])

                    if num_count > list_len:
                        target = model_track(camera)
                        print("target", target)

                        for i in range(3):
                            if target[i] > cmax[i]:
                                target[i] = cmax[i]
                            if target[i] < cmin[i]:
                                target[i] = cmin[i]

                        pose = np.array([-103, 8.9, -164])
                        coord = np.concatenate((target.copy(), pose), axis=0)

                        # q1 = math.atan(xyz[0] / xyz[2])*180/np.pi
                        mc.send_coords(coord,50,0)
                        
                        
                        # print('target', coord)
                        num_count = 1
                    else:
                        num_count = num_count + 1
                    

                    for i in range(rvec.shape[0]):
                        # draw the aruco on img
                        cv2.aruco.drawDetectedMarkers(img, corners)
            cv2.imshow("show_video", img)

Hand-eye calibration

Hand-eye calibration refers to the process of determining the position and orientation of the robot end effector (such as a mechanical arm) relative to the robot base coordinate system in the field of robotics. This process involves pairing the robot end effector with a camera and then determining its position and orientation in the robot base coordinate system by capturing its position and orientation in the camera's field of view.
Hand-eye calibration typically involves a series of movements between the robot end effector and the camera to collect enough data to calculate the transformation matrix between them. This transformation matrix describes the position and orientation of the robot end effector relative to the camera, which can be used to control the robot's motion and accurately perform the required tasks.
In "eye-to-hand" hand-eye calibration, the camera is considered a stationary observer ("eye"), while the robot end effector is considered a moving object in the camera's field of view ("hand"). As the robot end effector moves around the camera, a series of images are collected that contain information about the end effector's position and orientation at different locations and orientations. By analyzing these images, the position and orientation of the robot end effector relative to the camera can be calculated, completing the hand-eye calibration.
The following is the code for processing the coordinate transformation data.

#The function is used to calculate the similarity between cameras.
def calculate_similarity(camera):
    n = camera.shape[0]
    total_similarity = 0
    for i in range(n):
        for j in range(i+1, n):
            vector_a = camera[i]
            vector_b = camera[j]
            dot_product = np.dot(vector_a, vector_b)
            norm_a = np.linalg.norm(vector_a)
            norm_b = np.linalg.norm(vector_b)
            similarity = dot_product / (norm_a * norm_b)
            total_similarity += similarity
    return total_similarity/n
# The function is used to calculate the rate of change in similarity.
def similarity_change_rate(new_similarity):
    global prev_similarity
    if prev_similarity is None:
        prev_similarity = new_similarity
        return 0
    else:
        change_rate = (new_similarity - prev_similarity) / prev_similarity
        prev_similarity = new_similarity
        return change_rate

#The function is used to convert a rotation matrix to Euler angles.
def CvtRotationMatrixToEulerAngle(pdtRotationMatrix):
    pdtEulerAngle = np.zeros(3)

    pdtEulerAngle[2] = np.arctan2(pdtRotationMatrix[1, 0], pdtRotationMatrix[0, 0])

    fCosRoll = np.cos(pdtEulerAngle[2])
    fSinRoll = np.sin(pdtEulerAngle[2])

    pdtEulerAngle[1] = np.arctan2(-pdtRotationMatrix[2, 0], (fCosRoll * pdtRotationMatrix[0, 0]) + (fSinRoll * pdtRotationMatrix[1, 0]))
    pdtEulerAngle[0] = np.arctan2((fSinRoll * pdtRotationMatrix[0, 2]) - (fCosRoll * pdtRotationMatrix[1, 2]), (-fSinRoll * pdtRotationMatrix[0, 1]) + (fCosRoll * pdtRotationMatrix[1, 1]))

    return pdtEulerAngle
# The function is used to convert Euler angles to a rotation matrix.
def CvtEulerAngleToRotationMatrix(ptrEulerAngle):
    ptrSinAngle = np.sin(ptrEulerAngle)
    ptrCosAngle = np.cos(ptrEulerAngle)

    ptrRotationMatrix = np.zeros((3, 3))
    ptrRotationMatrix[0, 0] = ptrCosAngle[2] * ptrCosAngle[1]
    ptrRotationMatrix[0, 1] = ptrCosAngle[2] * ptrSinAngle[1] * ptrSinAngle[0] - ptrSinAngle[2] * ptrCosAngle[0]
    ptrRotationMatrix[0, 2] = ptrCosAngle[2] * ptrSinAngle[1] * ptrCosAngle[0] + ptrSinAngle[2] * ptrSinAngle[0]
    ptrRotationMatrix[1, 0] = ptrSinAngle[2] * ptrCosAngle[1]
    ptrRotationMatrix[1, 1] = ptrSinAngle[2] * ptrSinAngle[1] * ptrSinAngle[0] + ptrCosAngle[2] * ptrCosAngle[0]
    ptrRotationMatrix[1, 2] = ptrSinAngle[2] * ptrSinAngle[1] * ptrCosAngle[0] - ptrCosAngle[2] * ptrSinAngle[0]
    ptrRotationMatrix[2, 0] = -ptrSinAngle[1]
    ptrRotationMatrix[2, 1] = ptrCosAngle[1] * ptrSinAngle[0]
    ptrRotationMatrix[2, 2] = ptrCosAngle[1] * ptrCosAngle[0]

    return ptrRotationMatrix

Robotic Arm Control

After this step, object detection and control of the mechanical arm follow, which involves converting the coordinates of the recognized object into motion commands for the mechanical arm. The pymycobot library is used to control the mechanical arm.

#The function is used for visual tracking and calculating the target position.
def Visual_tracking280(coord, camera):
    pose_camera = camera[:3]
    angle_camear = camera[3:]
    r = CvtEulerAngleToRotationMatrix(angle_camear)
    # r = np.array([[1, 0, 0],
    #                  [0, 1, 0],
    #                  [0, 0, 1]])
    euler = np.radians(coord[3:])
    R = CvtEulerAngleToRotationMatrix(euler)
    offset = np.array([0, 0, -250])
    Roff = np.array([[1, 0, 0],
                     [0, -1, 0],
                     [0, 0, -1]])
    # Roff = np.array([[1, 0, 0],
    #                  [0, 1, 0],
    #                  [0, 0, 1]])
    vector = pose_camera + offset
    # print("R", R)
    # print("r", r)

    move_pos = np.dot(np.dot(R, r), Roff).dot(vector)
    pos = coord[:3] + move_pos
    # angle = np.array(CvtRotationMatrixToEulerAngle(np.dot(np.dot(R, r), Roff))) * 180/np.pi
    angle =  coord[3:]
    target = np.concatenate((pos, angle))
    return target
    
    #Calculate the target position based on the camera coordinates.
def model_track(camera):
    model_pos = np.array([-camera[0], -camera[2], -camera[1]])
    camera_pos = np.array([-37.5, 416.6, 322.9])
    target_pos = model_pos + camera_pos
    # print("model_pos", model_pos)
    # print("target_pos", target_pos)
    return target_pos

Finally, let's summarize the logical relationship of the project.

Let's take a look at how it performs.

It may be noticed that sometimes myCobot does not move. This is because its body was blocking the camera, preventing it from capturing the target object. When moving objects, it is important to ensure that the mechanical arm body does not block the camera.

This code is applicable to the entire myCobot280 series, including Pi, Jetson Nano, and other versions. The parameters may need to be adjusted based on the specific version being used.

People often compare the Jetson Nano with the Raspberry Pi. I have tried this program on two different robotic arms, and it is evident that the Jetson Nano version is much more responsive than the Raspberry Pi, owing to its superior computational power.There is a noticeable delay of approximately one second between them as observed by the naked eye.

Summary

During the debugging process, we found that the tracking effect was not very smooth and responsive. We adjusted the smoothness by controlling the detection cycle, but it was necessary to slowly move the tracked object to achieve better results. There are still some shortcomings, as the body of the mechanical arm may block the camera's field of view when the camera is fixed, making it impossible to proceed with the next tracking step. One solution we thought of is to move the camera to a position where it is not blocked (which would require recalculating the coordinates). If you have any better ideas, please feel free to communicate with us! Thank you for your patience.

Deep Machine Vision and Random Grasping with Robotic Arms

ElephantRobotics — Invalid Date

Introduction

Today, I would like to share with you my experience using the myCobot320 M5 and FS820-E1 depth camera for an unordered object grabbing demonstration. Why did I choose to use a depth camera and robotic arm for this case study?

The commonly used 2D camera can capture two-dimensional images with pixel values in the horizontal and vertical directions. These cameras are typically used to capture static scenes or moving objects, and cannot provide depth information. In machine vision applications, 2D cameras can be used for tasks such as image classification, object detection, and recognition.

In contrast, depth cameras can capture depth information, allowing for the acquisition of three-dimensional information about objects. These cameras use various techniques to measure object depth, such as structured light, time-of-flight, and stereo vision. In machine vision applications, 3D cameras can be used for tasks such as point cloud segmentation, object recognition, and 3D reconstruction.

The information captured by 2D cameras is inadequate for certain special cases, hence the use of depth cameras to obtain more information, such as the length, width, and height of objects.

Let us begin our topic for today.

FS820-E1

Environment building

Firstly, I need to set up the development environment for the FS820-E1 depth camera using RVS, which is the software used for developing this depth camera. By utilizing the visual operator in RVS, I can quickly construct the node for the grabbing function.

In the resource window located at the top left corner, locate the TyCameraResource operator and add it to the ResourceGroup in the operator graph. In the operator list, search for the TyCameraAccess and trigger operators, and add them to the operator graph. Adjust the operator parameters according to the requirements. Then, click on Run and set the Trigger->true in the property panel to visualize the data.

TyCameraResource operator

● The start and stop operators are used to respectively initiate and terminate the thread of the resource operator. The auto_start option is also used to initiate the resource operator. If it is checked, the resource thread will only automatically start when entering the running state for the first time after opening the RVS software.

● The reset option is used to reset the attribute parameters if they need to be changed after opening the resource thread.

TyCameraAccess operator

● Open the visualization properties for cloud, RGB, and depth, and set the cloud_color to -2, which represents the true color.

myCobot 320-M5Stack

The myCobot 320 is a practical robot designed for user-independent programming and development. The product has a maximum effective arm span of 350mm, a maximum load of 1KG, and a repetitive positioning accuracy of ±0.5mm. It supports development in various programming languages.

Environment building

For this project, Python is being used and a compiling environment for Python needs to be set up, as well as installing the Pymycobot library, which enables control of the robotic arm's movement.
pip install pymycobot --upgrade
PS: It is recommended to use a computer graphics card that is at least a 1060 2G discrete graphics card for this project because it requires image recognition and other operations that can benefit from a higher-performing graphics card. The better the graphics card's performance, the faster the program will run. A 3060 graphics card is recommended.

Random object grasping.

Next, we will implement the random object grasping of the robotic arm, which can accurately grasp the object in any posture. The following image shows the overall operator graph, which is the unstacking.xml.

Hand-eye calibration

Using a chessboard pattern for hand-eye calibration.

Preparation:

● Prepare a chessboard, determine the number of rows and columns, and the length of each square (in mm).

● Hand-eye calibration can be divided into two types: eye in hand and eye to hand. Depending on the situation, fix the calibration board and the camera. Here we choose eye to hand calibration.

Data recording

open unstacking_runtime/HandEyeCalibration/HandEyeCalibration.xml

Correctly fill in the number of rows and columns of the calibration board, the unit length of the calibration board cells, and the file path for saving the calibration data in the property panel.

Before starting the calibration process, make sure that the camera can fully recognize the complete chessboard, and during the calibration process, the chessboard must be fixed and cannot be moved. After running the process, you will get 18 sets of data.

Calculation of indicator results

If the positional error is within 0.005 (5mm), then it is considered an ideal result.

Coordinate system conversion

The following steps aim to transform the coordinate system of the point cloud from the camera RGB lens coordinate system to the robot coordinate system, which involves camera extrinsic parameters and hand-eye calibration results.

Steps:

Right-click in the operator graph and select "Import Group XML here" to import the HandToEye_Depth2Robot.group.xml from the RVSCommonGroup. Besides this file, there is also HandInEye_Depth2Robot.group.xml.
Connect the pose port of the hand-eye calibration data group to the rgb2robot port of the HandToEye_Depth2Robot group.
Drag in the LoadCalibFile operator to load the calibration file, connect the finished port to the start port of the HandToEye_Depth2Robot group, connect the extrinsic_pose port to the rgb2depth port, and connect the start port to the InitTrigger port of the finished port. The specific connections are as follows:
Click on the Group, find the rgb2tcp operator, and in the properties panel, paste the hand-eye calibration results in the "pose" property.
Through the previous steps, we have obtained the transformation matrices from the RGB camera to the robot coordinate system (rgb2robot) and from the depth camera to the robot coordinate system (depth2robot). Here, we will transform the point cloud from the depth camera coordinate system to the robot coordinate system.
First, drag the Transform operator into the graph and select "PointCloud" as the type. Connect the depth2robot port to the pose input port of this operator, and connect the pointcloud port of the LoadLocalData operator group to the same-named input port of this operator.

AI training

Acquire training images

To open the file unstacking_runtime/MaskRCNN/ty_ai_savedata.xml, you can use a text editor like Notepad or a code editor like Visual Studio Code. The contents of the file are similar to recording RGB images, and you just need to adjust the string parameter in EmitString to the desired file path. Once you have set the file path, you can click on Capture to record the images. It is recommended to record as many images as possible to ensure stability in the data.

Annotate the trained model
Currently, we have recorded the RGB annotations. We recommend using the labelme software for annotation, and this document provides a method for installing labelme.

● 1. Install pip according to the official website:

https://pip.pypa.io/en/stable/installation/

● 2. Install PyQt5:

pip install PyQt5
● 3. Install labelme:

pip install labelme
Preparation:

First, determine the task goal and clarify which objects need to be detected during the detection process and which objects do not need to be detected, so as to carry out targeted annotation.

The annotation conditions given do not need to be overly strict. Do not think according to human thinking, but whether the annotation ideas set by yourself are easy to implement in the code.

Process:

● Open labelme in the terminal and click "OpenDir" to select the path where our annotations are located (the Emit operator string path in step 3.2.1 for collecting training images).

● Click "Create Polygons" to draw a red border around the wooden blocks.

After finishing, a naming dialog will pop up. Please name it "wooden block" for the first time, and select it directly for subsequent boxes of the same type.

● When all the boxes in the image have been labeled, click "Save" to save them with the default folder and name. Then select "Next Image" to switch to the next image.

Train AI model
Open the file unstacking_runtime/MaskRCNN/ty_ai_train.xml, and adjust the paths for data_directory and classnames_filepath. Then click on the "start_train" button to start the training process.

After the training process is completed, a folder named "train_output" will be generated. In this folder, there will be a model file (ending in.pth) which contains the required weights for the trained model.

AI reasoning

Drag in an Emit operator, select "pose" for the type attribute, rename it to "Grasp Reference Pose", and input "3.141592654" for the pose_roll. This operator will be used in subsequent operators. Connect the pose port of this operator to the down_pose port of the Compute Grasping Point group.
Double-click to expand the Compute Grasping Point group. The data needs to be pre-trained using the MaskRCNN network. Change the type of the AIDetectGPU operator to MaskRCNN and modify the rest of the configuration file parameters accordingly. Since the AI inference operator needs to be initialized before running, an extra Trigger (type is InitTrigger) needs to be added before the operator.
The AI inference operator will obtain the position region of the target in the 2D image (i.e., the mask image, corresponding to the obj_list port). Next, we need to convert these position regions to 3D point clouds, which is the ProjectMask operator in the Compute Grasping Point group. For the ProjectMask operator, we not only need to input the obj_list obtained by the AI inference operator, but also the 2D image corresponding point cloud, the transformation matrix between the 2D image camera coordinate system and the point cloud coordinate system, and the intrinsic parameters of the camera's RGB lens. Here, the point cloud has been converted to the robot coordinate system, so the transformation matrix from the RGB lens to the robot coordinate system needs to be input. The intrinsic parameters of the camera's RGB lens can be directly read from the camera parameter file. After the operator runs, a list of point clouds for all detected targets will be obtained.

Robotic arm positioning and grasping
location and Identification
According to the AI inference process, the point cloud list of all detected targets in the robot coordinate system has been obtained. Next, we need to obtain the centroid coordinates of its point cloud.
Double-click to expand the Find Target group in the Calculate Grasp Point group. We need to first select the wooden block, and then sort the wooden blocks based on their Z-axis coordinates, selecting the topmost wooden block. Therefore, the FilterBoxList operator is used here and renamed to "PointCloud Height Sorting", with the following property values adjusted:
Obtain the plane using the FindElement operator, select "Plane" for the type, and obtain a plane suitable for grasping in the point cloud. Adjust the distance_threshold property value of the operator to adjust the selected plane. Open the cloud visualization property to view the selected plane.
Obtain the center point of the plane using the MinimumBoundingBox operator, rename it to "Get Bounding Box", select "ApproxMVBB" for the type attribute to obtain a convenient coordinate center point for the robot to grasp. Here, give the operator a ref_pose, which is connected to "TowardsDownPose" mentioned in 3.3.4 during AI inference, indicating a 180° rotation around the X-axis, so that the Z-axis faces downward for ease of robot grasping. Open the box and box_pose visualization properties in the "GetBoxCube" attribute panel to display the calculated center point of the plane.
Adjust the direction of the wooden block using the AdjustBoxNode operator, which selects objects with a length greater than their width and changes the object pose. Here, select a yaw of 90°.

Grasp by the robotic arm
After completing the above operations, the target point coordinates have been obtained, and it is necessary to establish a TCP connection between the robot and the RVS software and carry out actual grasping.
Write TCP communication code (RobotControl_Elephant.py). The following is a snippet of the code that implements TCP communication between RVS software and the robotic arm.

#CAPTURE
print("***get pose***%s"%time.asctime())
capture_cmd = "GET_POSES \n"
capture_bytes=bytes(capture_cmd,encoding="utf-8")
sock_rvs.send(capture_bytes)
#recv CAPTURE
data = sock_rvs.recv(socket_buf_len)
print("---------------------------get data----------------------------")
print(data)
print("***data end***%s"%data[-1:])

print("***capture_receive***%s"%time.asctime())
if int(data[-1:]) == 1:
    print("***received CAPTURE result***\n")
if int(data[-1:]) == 2:
    print("***All finished!***")
    #P_FLAG = bool(1-P_FLAG)
    #print("change poision")
    continue
    #break

Adjust the target point coordinates using the ScalePose operator. Set the type to "Normal" and adjust the scale_rpy property to convert the roll, pitch, and yaw values from radians to degrees.
Finally, connect the finished and pose_list ports of the ScalePose operator to the MirrorOutput port of the outermost operator group and connect it back to the HandEyeTCPServer operator. With this, the project file editing is complete.

Show results

After completing the above steps, under the unstacking.xml project, click Run, and run the RobotControl_Elephant.py file at the same time. After identifying multiple blocks, select one of the block poses and send it to the robotic arm for gripping.

Summary

Overall, this is just a small part of what a depth camera can do. In the future, we may even consider stacking these objects together or using other irregular shapes to demonstrate its powerful performance. By training the model in advance, we can achieve the desired results. What do you expect me to do with it? Please leave a message below. Your likes and follows will be my motivation for updating more content!

Reinforcement Learning for Gripping Task of myCobot using Isaac Gym

ElephantRobotics — Invalid Date

This article is republished from Tatsushi Matsubayashi [1] of ALBERT Inc.[2]
Link:https://blog.albert2005.co.jp/2023/03/28/isaac-gym-mycobot/

Greetings, I am from the Advanced Technology Department. Following our previous blog post titled "Controlling myCobot using RealSense D455 for Spatial Recognition", I will now introduce an experiment utilizing myCobot. This time, the experiment will be conducted using a simulator rather than a physical machine. When attempting deep reinforcement learning with robots, it can be challenging to prepare large amounts of training data on a physical machine. However, with a simulator, it is easy to collect massive datasets. Nevertheless, simulators may appear daunting to those who are unfamiliar with them. Therefore, we tried using Isaac Gym, developed by Nvidia, which allowed us to achieve everything from creating an experimental environment to conducting reinforcement learning with almost nothing but Python code. In this post, I will introduce the method we used.

Introduction
1.1 What is Isaac Gym?
Isaac Gym is a physics simulation environment developed by Nvidia for reinforcement learning. Based on the OpenAI Gym library, the physics calculations are performed on the GPU and the results can be received as Pytorch GPU tensors, enabling fast simulation and learning. Physics simulation is carried out using PhysX, and it also supports soft body simulation using FleX (although some features are limited when using FleX).

The latest version as of January 2023 is Preview4. While earlier versions had noticeable bugs, versions 3 and onwards have seen improvements and added functionality, making it a very attractive simulation environment. A future release of Omniverse Isaac Gym, integrated with Isaac Sim, is planned. However, Isaac Gym itself is standalone and can be used for experiments in Python. In a previous blog post ("GPU Server Expansion and A6000 Benchmarking"), it was mentioned that research and development using Omniverse Isaac Simulator had begun, but Isaac Gym was prioritized for reinforcement learning simulation. The biggest benefit of integrating Isaac Gym with Omniverse would likely be the ability to use photorealistic visuals for image recognition and high-precision continuous body simulation, such as with ray tracing. It will be exciting to see how this develops in the future.

PhysX is a physics engine developed by Nvidia, enabling real-time physics calculations on the simulator's GPU. Although the version used by Isaac Gym has not been specified in publicly available arXiv or documentation, it is likely based on PhysX 4, given its release timing and separation from FleX. In Omniverse, PhysX 5 is used and FleX is integrated as well.

FleX is also a physics engine developed by Nvidia, but it enables the representation of soft bodies and fluids using particle-based simulation, in contrast to PhysX's rigid body simulation.

1.2 Purpose of this article
I will tell you how I was able to easily create and learn reinforcement learning tasks using Isaac Gym. As an actual test case, I will introduce an object grasping task created by importing myCobot introduced in the previous article into the simulation environment.

https://blog.albert2005.co.jp/2022/12/21/realsense-d455_mycobot/

1.3 Environment
PC1: Ubuntu 20.04, Python 3.8.10, Nvidia RTX A6000

PC2: Ubuntu 18.04, Python 3.8.0, Nvidia RTX 3060Ti

Please note that Nvidia Driver 470 or later is required.

2.Install
In this chapter, we will install Isaac Gym and IsaacGymEnvs. The recommended environment is Ubuntu 18.04, 20.04, Python 3.6~3.8, and Nvidia Driver==470. Please note that since python_requires<3.9 is described in Isaac Gym's setup.py, it cannot be used as is for versions 3.9 and above. Testing has not been performed on Ubuntu 22.04, but it is probably okay.

2.1 Isaac Gym
You can download the Isaac Gym main package for free from Nvidia's developer page. The documentation is saved in HTML format in the "docs" directory of the package (please note that it is not available on the website). After downloading, you can install it with the following command:

$ cd isaacgym/python$ pip install -e .
However, since PyTorch is specified as "torch ==1.8.0" and "torchvision ==0.9.0", you should install it first from the official page that matches your environment when using your GPU. Docker and Conda virtual environment setup files are also available. Since I use venv to manage my Python virtual environment, I will proceed with pip. Please note that I have written "＞" in full-width characters due to a code block issue

2.2 IsaacGymEnvs
IsaacGymEnvs is a Python package for benchmark testing reinforcement learning environments in Isaac Gym. By referring to the implemented tasks, one can easily construct reinforcement learning environments using the reinforcement learning algorithms implemented in rl-games. Even for those who plan to write their own reinforcement learning algorithms, it is recommended to try learning with Isaac Gym using this package. Originally included in Isaac Gym, it was separated in Preview3 and is now publicly available on GitHub.

$ git clone https://github.com/NVIDIA-Omniverse/IsaacGymEnvs.git
$ cd IsaacGymEnvs$ pip install –e .
With this, the necessary installation is now complete.

Demo
When you install Isaac Gym and take a look inside the package, you'll see that there are many sample environments available. These are also featured in the documentation, but in this article, we'll highlight some of the samples that are relevant to creating custom reinforcement learning environments in Chapter 4. If you've set up your environment, it's a good idea to try running some of these samples and see what they can do. You might even find that they provide some guidance on how to use the API to accomplish things you're interested in trying out (and if you're still not sure, don't hesitate to read through the documentation).

3.1. Isaac Gym
As of Preview4, there are 27 sample environments available.

● "1080_balls_of_solitude.py"

The "1080_balls_of_solitude.py" script generates a pyramid-shaped group of balls that fall down. Running the script without options only allows collisions between balls within the same environment (i.e. within the same pyramid). The "--all_collisions" option enables collisions with balls from other environments, while the "--no_collisions" option disables collisions between objects within the same environment. This script also demonstrates how to configure the arguments of the "create_actor" function to add objects to the environment.

● "dof_controls.py"

This script features an actor that moves in 3D, which is a variation of the well-known Cartpole problem in OpenAI Gym. It demonstrates how to set the control methods for each degree of freedom (DOF) of the robot, which can be either position, velocity, or force. Once set, these control methods cannot be changed during simulation, and the actor can only be controlled through the selected methods. Forgetting to set these control methods can cause the actor to fail to move.

● ”franka_nut_bolt_ik_osc.py“

This script showcases Franka Robotics' multi-jointed robot arm Panda picking up a nut and screwing it onto a bolt. The arm is controlled using inverse kinematics (IK). The file name includes "OSC, " but OSC control is not implemented in this script. However, the script "franka_cube_ik_osc.py" includes OSC control.

With the addition of SDF collision in Preview4, high-resolution collision file loading is possible, allowing for precise collision calculations between the nut and bolt grooves (Figure 1). While the initial SDF loading can take some time, subsequent loads are cached and will start quickly.

Figure 1: Simulation of a panda arm driving a nut onto a bolt

● interop_torch.py

This script shows how to use the function get_camera_image_gpu_tensor to directly obtain sensor data from the camera on the GPU. The obtained data can be output as an image file using OpenCV, just like a regular physical camera. When executed, the script creates a directory called interop_images and saves the camera images there. Since simulation data is not exchanged between the GPU and CPU, fast image processing is possible. However, if using a multi-GPU environment, an error may occur. One solution suggested on the forum is to limit the GPU usage with CUDA_VISIBLE_DEVICES=0, but this did not work in the environment used for this script.

3.2. IsaacGymEnvs
There are 14 reinforcement learning tasks implemented, and benchmark tests can be performed using the scripts in the tasks directory.

● About the configuration file

A configuration file written in YAML is prepared for each task. Common settings are in the config.yaml in the cfg directory, and settings can be changed without changing the YAML file with command line options using Hydra. The detailed settings for each task environment and PhysX are stored in the cfg/task/ directory, and the algorithm selection and structure are stored in the cfg/train/ directory.

● About algorithm implementation

The reinforcement learning algorithm uses the PPO implementation in Rl-games. Although the docs/rl_examples.md mentions the option to select SAC, it is not currently included in the repository.

NN layers are typically MLPs, and some models also include LSTM layers as RNN layers. Although CNN layers can also be added, there are no sample models with CNN layers included. In section 5.2, we will discuss our experience adding CNN layers to a model.

The sample code can be executed in the isaacgymenvs directory where train.py is located.

● Cartpole
python train.py task=Cartpole [options]
This is the classic cartpole task where the goal is to move the cart in a way that the pole does not fall. By default, the model is trained for 100 epochs, which takes about 2 minutes on a PC2 RTX 3060Ti environment, and only 15 seconds in headless mode (without the viewer). When testing the model with inference, it performs well and the pole remains upright (after 30 epochs of training, the model is sufficiently trained to keep the pole upright). Although it may seem simple, the fact that the model can learn to complete this task successfully is reassuring.

● Franka Cube Stack
python train.py task=FrankaCubeStack [options]
This is a task where a Panda arm is used to stack boxes. The joint movements of the 7-axis arm are learned step by step. The default setting is 10, 000 epochs, but the arm movements can be learned in about 1, 000 epochs. On a PC1 RTX A6000 environment, 1, 000 epochs of training took about 20-30 minutes to complete. Figures 2 and 3 show the before and after states of the arm, where it goes from randomly moving to successfully grabbing and stacking boxes.

The action space consists of 7 dimensions for the arm joints, while the observation space is a total of 26 dimensions. The reward function is designed to scale differently for actions that involve getting closer to the box, lifting the box, moving the boxes closer to each other, and successfully completing the stacking task.

It's surprising how easily the arm can learn this level of task. However, it's important to note that the learning assumes a defined world coordinate system and the known positions and orientations of the objects. Therefore, applying this learned behavior to a physical robot may not be as straightforward.

Breakdown of 26-dimensional observation:

● 7 dimensions for the position and orientation of the box being moved

● 3 dimensions for the vector from the box being stacked to the box being moved

● 7 dimensions for the gripper's grasp position and orientation

● 9 dimensions for the arm joints and gripper fingers

Figure 2: FrankaCubeStack before training

Figure 3: FrankaCubeStack after training

Some common options in train.py are:

● headless (default: False): When set to True, the viewer is not launched. This is useful for heavy training or when capturing camera images, as the viewer can slow down the process significantly.

● test (default: False): When set to True, the learning mode is turned off, allowing you to run the environment without training. This is useful for environment generation and checking the learning results.

● checkpoint (default: ''): Specifies the PyTorch weight file to load. The learning results are saved in runs//nn/.pth, and this option is used for resuming training or testing.

● num_envs (default: int): Specifies the number of parallel learning environments. It's important to set an appropriate number to avoid heavy viewers during testing (this option can also be set during training, but changing it can cause errors due to batch size and interference).

Note that train.py configures horizon_length and minibatch_size, but batch_size = horizon_length * num_actors * num_agents, and batch_size must be divisible by minibatch_size. Additionally, num_actors and num_agents are proportional to num_envs, so changing only num_envs can cause errors.

Other samples can be tried easily with just the environment, so give them a try for some interesting tests.

3.3 Viewer Tips
● Drawing a collision mesh

The simulator usually renders an object's visual mesh, but in Isaac Gym's Viewer, you can change it to render the collision mesh instead. To do this, go to the Viewer tab in the menu window and check "Render collision meshes". If an object is behaving strangely, it's a good idea to check if the collision mesh is loaded correctly (sometimes the visual and collision meshes have different orientations, or the mesh may not have loaded correctly or with enough detail in the simulator).

Figure 4: Drawing the collision mesh

● Reduce drawing environment

You can reduce the rendering environment to only one without changing any settings. By checking "Show only selected env" in the Actors menu, as shown in Figure 5, only the selected environment will be displayed. If there are any strange behaviors, you can debug by outputting the environment number and only rendering that environment. This also lightens the rendering load and can increase FPS.

Figure 5: Numbering the drawing environment

● Change initial camera position

The initial camera position and orientation can be set using gymapi's viewer_camera_look_at(viewer, middle_env, cam_pos, cam_target). In task scripts for training, you need to override the set_viewer function to make changes.

Original environment and task creation
It's finally time to create the original task for the main subject.

4.1. Preparation
Prepare the script and configuration files. The goal is to learn a simple task of lifting a box, using Mycobot for object picking. Therefore, we will proceed with creating a task named "MycobotPicking". We need three files:

● tasks: the main Python script

● cfg/task: YAML configuration file for environment and simulation parameters

● cfg/train: YAML configuration file for learning algorithms, neural network layers, and parameters.

We can refer to the "FrankaCubeStack" task mentioned earlier and create these files accordingly. The configuration files are especially important and we can copy and modify them according to our requirements.

As shown in the demo, we can load the task script from the train.py file using command-line options. Therefore, we need to add an import statement for the task class in the init.py file in the tasks directory, along with the task name when passing arguments.

4.2. Environment Creation
The task class is created by inheriting from the VecTask class in the tasks/base directory, and the task has the following structure as shown in Figure 6.

Figure 6: Task configuration. Those with an orange frame do not need to be edited, and those with a green frame are created for each task.

4.2.1. init processing

Creating Simulator and Environment

● create_sim: This function generates an instance of the simulator. The process itself is defined in the parent class, and the properties are set in the config file, such as gravity and step time. Similar to FrankaCubeStack, the function uses the following two functions to generate the ground plane and the actor.

● create_ground_plane: This function generates a ground plane by inputting the normal direction of the plane. If you want to create an uneven terrain, you can refer to the terrain_creation example.

● create_envs: This function loads and sets properties for the actor file, generates the actor and parallelizes the environment. In this task, we generated myCobot from the URDF and the target object from the create_box API. The URDF of myCobot was based on the one used in the previous experiment with MoveIt, and we added a gripper for picking (details about the gripper are explained in section 5.1).

Data Initialization

● init_data: This function defines environment variables from the config file and prepares a buffer for data tensors processed by Isaac Gym (PhysX). The necessary data for calculating state and reward is defined as class variables. The API loads the tensor data into the buffer, which is updated every step by calling the corresponding refresh function.

4.2.2. processing steps

Step Processing:

The main step function is defined in the parent class and does not need to be modified. However, the following two steps are necessary as abstract methods:

● pre_physics_step: Manipulate the actor using the action. The size of the action is defined in the config as [“env”][“numActions”]. For myCobot's 6-axis arm and gripper, we set it to 7 dimensions.

● post_physics_step: Calculate the observation and reward. Also check whether to reset the environment. We set it to reset after reaching a maximum of 500 steps or a successful lift.

The step order is fixed to apply action → physics simulation → calculation of observation and reward to pass data for learning. Even if you only write "pass" here, you can check the environment while starting the viewer.

● reset_idx: Returns the environment to its initial state. Of course, the randomness of the initial state is closely related to the generalization of learning. We set myCobot to the initial posture and randomly reset the position of the target object within myCobot's reachable range.

State and Reward Calculation:

● compute_observation: Update each buffer with the refresh function and put the desired state in the obs_buf. The size of obs_buf is defined in the config as [“env”][“numObservation”].

● compute_reward: Calculate the reward. As the gripper approaches the target object's grip position (between the fingers), a reward is obtained, and a larger reward is obtained as the height of the target object increases.

4.3. Execution of training
Now that the task framework has been created, let's train the model. We can start training the model using the following command:

python train.py task=MycobotPicking --headless
After 200 epochs, the initial weights will be saved, and new weights will be saved if the reward improves. However, the task we created may not work perfectly, and the training process may stop progressing quickly. In the following section, I will discuss the adjustments I made to the task to improve its performance.

4.4. Task Coordination
By using the learned weights to test, you can debug why the training did not work well. You ran the command

python train.py task=MycobotPicking test=True checkpoint=runs/MycobotPicking/nn/[checkpoint].pth
to test your model. However, you encountered issues with your gripper not moving well. Despite your efforts to resolve the issue, you concluded that the URDF did not support closed-loop structures, making it difficult to simulate the gripper's movements accurately. As a result, you decided to use a rule-based approach to control the gripper's closing and lifting actions. You fixed the gripper's fingers to a fixed link and reduced the action space from 7 to 6 dimensions. You also noted that when using a simulator to control a robot arm, it is better to use a gripper without closed loops, such as the Panda arm.

Another issue you faced was that the agent stopped approaching the object at a certain distance and hesitated to touch it, resulting in lower rewards. You modified the reward system by increasing the reward function's value with a threshold distance as a step function, maximizing the reward when the agent reached the target point. You also removed the environment reset after task completion, as it caused the agent to stop learning before reaching the actual goal. Instead, you adjusted the maximum number of steps to the necessary number for task completion, improving the learning speed.

You also found that penalizing difficult tasks too harshly made the reinforcement learning agent too conservative. This gave the agent a more human-like personality, making the learning process more interesting. Finally, you encountered a similar phenomenon in the FrankaCabinet benchmark task, where the agent would stop learning after pulling the drawer to a certain distance, even though a higher reward was available for fully pulling it out. You did not fix this issue, but instead, you removed the environment reset after task completion and adjusted the maximum number of steps to complete the task smoothly.

Figure 7: myCobot keeps away from objects

The self-collision of the arm was being ignored. Although I was able to reach the desired position, the arm was now in a position where self-collision was completely ignored, like a figure-eight. I tried to research whether it was possible to set up self-collision calculation in the documentation, but it didn't work well. In the first place, it is not realistic that all joint angle limits in the provided URDF are set to -3.14~3.14, so I decided to adjust the upper and lower limits of each joint angle to avoid self-collision. The reason why the joint angles moved to the largest possible values is still unknown.

Figure 8: myCobot that ignores the accident collision

The arm doesn't stop exactly where it's supposed to and instead wobbles around it. We wanted the action to approach 0 as it reached the target position, but it was difficult to achieve, and the arm kept vibrating around the target position. We tried penalizing the action and adjusting the reward by setting the target position precisely, but it didn't improve the results. We decided not to worry about this issue as it could be handled by rule-based control during actual operation.

Although it was not a must-have, we wanted the gripper to be downward facing for better appearance. So, we added a penalty term to the reward function that penalized the gripper angle. Figure 9 shows the learning results before fine-tuning.

Figure 9: MyCobot after learning before fine-tuning

The result of the adjustments mentioned above is shown in Figure 10. If this level of accuracy can be achieved on the actual robot, it should be capable of lifting objects sufficiently.

Figure 10: MyCobot after training after fine-tuning

Others
I will introduce the story that was not good and the story that I want to try.

5.1. The story of a homemade URDF gripper that didn't work
The URDF of myCobot was based on the one used in the previous attempt to move the actual robot, but it did not include the gripper. Although there was a gripper model available on the official GitHub page, it only provided a DAE file with a visual representation as shown in Figure 11(a). To create a URDF that can be used in a simulator, separate 3D models for each joint part are required. Therefore, using Blender, we divided the parts by joint (Figure 11(c)) and created simplified box-shaped parts for collisions since it is difficult to reproduce complex shapes (Figure 11(b)). Then, we described the structure of the links and joints in the URDF file to complete the model. However, since URDF does not support models with open link structures, we removed the collision from one of the links on the base and completed the joint with the fingertip side. Although this method is rough, we were able to reproduce the movement of the actual robot in the simulator by moving the six joints at the same angle. Figure 11(d) shows a comparison between the completed model and the actual robot (using the provided model, but the details are quite different). However, when we actually tried to move it, as mentioned in section 4.4, it did not work well. The reason was that it was not possible to move the joints in a coordinated manner when external forces were applied (it may have been solved if torque control was properly implemented).

Figure 11: Creating a gripper for myCobot (a) Published gripper model (b) Collision model parts created according to the model (c) Visual model parts disassembled from the gripper model (d) Isaac Gym Comparison of drawing and actual gripper

5.2. Use image recognition
In benchmarks and the MycobotPicking task, we use object position and orientation information in the observations, but obtaining this information in real-world tasks is not easy. Therefore, using only 2D camera information and easily obtainable servo joint angle information for reinforcement learning would be more valuable.

We attempted to replace observations with images and use a CNN layer for learning in the FrankaCubeStack task. However, we only modified the algorithm to accept image input and as expected, the learning did not perform well. There is no framework to add the servo joint angle information as 1-dimensional data to the CNN layer, and using image information directly in the CNN layer increases the computational complexity and limits the parallelization of the environment. Additionally, we would need to tune hyperparameters such as learning rate and clip value, but we did not pursue this as the effect is not promising enough.

In this test, we only confirmed the method of adding the CNN layer for learning. However, it may be more effective to use transfer learning to encode features for gripper and object recognition from easy-to-use object recognition models such as YOLO or ResNet, and then use the encoded features and joint angles for reinforcement learning, rather than using CNN layers directly with camera images.

5.3. Using the Trained Model on the Actual Robot
I attempted a Sim2Real experiment using the trained model and the myCobot and RealSense for spatial recognition, as mentioned in the previous article. However, it did not work well. While the reaching motion worked to some extent, the movement became unstable as it approached the object, and it was not possible to accurately move to the position to grab the object. Possible issues include the fact that the myCobot does not have enough power to move accurately to the target posture and the accumulation of small differences due to the fact that the simulator predicts the next target posture before reaching the current one, whereas the actual robot does not. Regarding the former, the myCobot used in this experiment is an inexpensive educational arm with a portable weight of 250g, so if you want to move more accurately, you should use a higher-end robot arm, such as those used in reinforcement learning for picking. Elephantrobotics, the company that makesthe myCobot, also sells models with stronger servo motors that can carry up to 1kg, so I would like to try those as well.

Summary
This time, I created a reinforcement learning task using Isaac Gym and actually trained the model. I experienced the design of a robot reinforcement learning problem in a 3D physics simulator and the issues that arise when running the trained model. It was attractive to be able to test the learning environment without having to write the reinforcement learning algorithm from scratch. The availability of benchmark environments makes it easy to compare and verify new learning algorithms, which is a great advantage for researchers and analysts with various professional backgrounds.

ALBERT has researchers and analysts with various professional backgrounds who are ready to assist with any technical or business-related inquiries. Please feel free to contact us.

Reference

Tatsushi Matsubayashi, 2022.12.21, Isaac Gym で myCobot の把持タスクを強化学習してみました。

https://blog.albert2005.co.jp/2023/03/28/isaac-gym-mycobot/

ALBERT Inc. https://www.albert2005.co.jp/english/

UNIT MQTT: unable to set MAC address

shiomee — Invalid Date

Hi,
This is a question about UNIT MQTT (SKU:U129).
I sent the following AT command to change the Unit's MAC address, but an error was returned.
What did i do wrong?

b'AT+NETMAC=00-02-CB-12-34-56\r\n'
b'+NETMAC=ERROR:Error Param\r\n'

Other commands can be executed, but only the MAC address cannot be set.

+VERSION=OK:1.6

Thanks

Smart Applications of Holography and Robotic Arms myCobot 320 M5Stack-Basic

ElephantRobotics — Invalid Date

@kkkkkk
Hi, you can click the link to know more about it