Product Creation-Model Deployment

Deploy the model to local device and imply real-time inference

Plan

Save the trained model and download it from Kaggle.
Prepare local environments
Object Oriented Programming

Inference

Code

import os
import cv2
import numpy as np
import pandas as pd
import mediapipe as mp
import tensorflow as tf
from tensorflow.keras import layers
from tensorflow.keras import optimizers
from tensorflow.keras import Sequential
CLASS_NAMES=['Left_Swipe_new','Right_Swipe_new','Stop_new','Thumbs_Down_new','Thumbs_Up_new']
output_directory ="temp_storage"
class model:
  def __init__(self,CLASS_NAMES,model_pass,model2,output_directory):
      self.CLASS_NAMES=CLASS_NAMES
      self.model=tf.keras.models.load_model(model_pass)
      self.model.summary()
      self.model2=tf.keras.models.load_model(model2)
      self.output_directory=output_directory
  def detect_landmarks(self,image_folder):
      mp_holistic = mp.solutions.holistic.Holistic()
      image_files = sorted(os.listdir(image_folder))
      num_frames = len(image_files)
      num_landmarks = 33  # Fixed number of landmarks for pose estimation
      landmarks = np.zeros((num_frames,num_landmarks, 2))
      for i, file in enumerate(image_files):
          image_path = os.path.join(image_folder, file)
          frame = cv2.imread(image_path)
          # Convert the frame to RGB
          frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
          # Detect landmarks
          results = mp_holistic.process(frame_rgb)
          if results.pose_landmarks:
              for j, landmark in enumerate(results.pose_landmarks.landmark):
                  landmarks[i, j,] = [landmark.x, landmark.y]
      mp_holistic.close()
      return landmarks
  def inference(self,X):
      #CLASS_NAMES=['Left_Swipe_new','Right_Swipe_new','Stop_new','Thumbs_Down_new','Thumbs_Up_new']
      y_pred=np.argmax(self.model.predict(X), axis=1)
      #print(y_pred)
      print(self.CLASS_NAMES[y_pred[0]])
      return y_pred
  def inference2(self,X):
      CLASS_NAMES=['not_move','move']
      y_pred=np.argmax(self.model2.predict(X), axis=1)
      #print(y_pred)
      print(CLASS_NAMES[y_pred[0]])
      return y_pred
  def clear_files(self,folder_path):
      # get all file in the folder
      files = os.listdir(folder_path)
      # delete them all
      for file in files:
          file_path = os.path.join(folder_path, file)
          if os.path.isfile(file_path):
              os.remove(file_path)
      print("-"*50)
      print("ALL Frames Cleared")
  def run(self):
      #self.clear_files(self.output_directory)
      frames = []
      frame_count = 0
      cap = cv2.VideoCapture(0)
      while True:
          ret, frame = cap.read()
          if ret:
              frame_name = os.path.join(self.output_directory, f"frame_{frame_count}.png")
              cv2.imwrite(frame_name, frame)
              frame_count += 1

              if frame_count == 30:
                  print("="*50)
                  X_data = self.detect_landmarks(self.output_directory).reshape((1,30,66))
                  self.clear_files(self.output_directory)
                  #y_check=self.inference2(X_data)
                  frame_count = 0
                  #if y_check==1:
                  y_pred=self.inference(X_data)
                  print(f"Movement Type:{self.CLASS_NAMES[y_pred[0]]}, index:{y_pred[0]}")
                        
              cv2.imshow('Camera', frame)

          if cv2.waitKey(1) & 0xFF == ord('q'):
              break
      cap.release()
      cv2.destroyAllWindows
def main():  
  model(CLASS_NAMES,"231008RNNV5.h5","231012RNN_2V5.h5",output_directory).run()
if __name__ == '__main__':
  main()

Evaluation

The trained model can’t detect if a person is moving or not, meaning that if return non-sense result when the person infront of the camera is doing none of the movement. This problem can be solved by obtain a new model that detect rather someone is moving or not. I did train one of the model, the result was ok(shown in the confusion matrix below)
Lack of computing power: Alhtough the inference work well, but it’s slow… Because the entire data collection, processinng, and inference were done in one while-loop, the time complexity of the model caused unacceptable delay. The camera are suppose to collect the video feed for each seconds, but the data processing and inference step caused to much delay, making it impossible to conduct real-time inference. Noticed that this issue CAN NOT BE SOLVED within the given amount of time
Normally, GPU is required for any Deep learning task. GPU is suitable because it can execute multithreaded tasks while CPU can only excute singlethreaded tasks. I trained the model on Kaggle because it offers free GPU runtime. However, since the inference must be done locally to access local camera, making such task impossible to be done.
This means I will NOT use the model to control the drone as the delay will lead to possible hazard and might cause property damage and personnel casualties. The model is not capable to actually deploy on the drone as well as my current device.
Possible Future Approach

I will still use the video feed from the drone’s camera as input with the help of the library djitellopy, but I will not use the result of the inference to control the drone due to safty and technical consideration.

Plan

Code

Evaluation

Possible Future Approach