Deploy the model to local device and imply real-time inference


  • Save the trained model and download it from Kaggle.

  • Prepare local environments

  • Object Oriented Programming

  • Inference


    import os
    import cv2
    import numpy as np
    import pandas as pd
    import mediapipe as mp
    import tensorflow as tf
    from tensorflow.keras import layers
    from tensorflow.keras import optimizers
    from tensorflow.keras import Sequential
    output_directory ="temp_storage"
    class model:
      def __init__(self,CLASS_NAMES,model_pass,model2,output_directory):
      def detect_landmarks(self,image_folder):
          mp_holistic =
          image_files = sorted(os.listdir(image_folder))
          num_frames = len(image_files)
          num_landmarks = 33  # Fixed number of landmarks for pose estimation
          landmarks = np.zeros((num_frames,num_landmarks, 2))
          for i, file in enumerate(image_files):
              image_path = os.path.join(image_folder, file)
              frame = cv2.imread(image_path)
              # Convert the frame to RGB
              frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
              # Detect landmarks
              results = mp_holistic.process(frame_rgb)
              if results.pose_landmarks:
                  for j, landmark in enumerate(results.pose_landmarks.landmark):
                      landmarks[i, j,] = [landmark.x, landmark.y]
          return landmarks
      def inference(self,X):
          y_pred=np.argmax(self.model.predict(X), axis=1)
          return y_pred
      def inference2(self,X):
          y_pred=np.argmax(self.model2.predict(X), axis=1)
          return y_pred
      def clear_files(self,folder_path):
          # get all file in the folder
          files = os.listdir(folder_path)
          # delete them all
          for file in files:
              file_path = os.path.join(folder_path, file)
              if os.path.isfile(file_path):
          print("ALL Frames Cleared")
      def run(self):
          frames = []
          frame_count = 0
          cap = cv2.VideoCapture(0)
          while True:
              ret, frame =
              if ret:
                  frame_name = os.path.join(self.output_directory, f"frame_{frame_count}.png")
                  cv2.imwrite(frame_name, frame)
                  frame_count += 1
                  if frame_count == 30:
                      X_data = self.detect_landmarks(self.output_directory).reshape((1,30,66))
                      frame_count = 0
                      #if y_check==1:
                      print(f"Movement Type:{self.CLASS_NAMES[y_pred[0]]}, index:{y_pred[0]}")
                  cv2.imshow('Camera', frame)
              if cv2.waitKey(1) & 0xFF == ord('q'):
    def main():  
    if __name__ == '__main__':


  • The trained model can’t detect if a person is moving or not, meaning that if return non-sense result when the person infront of the camera is doing none of the movement. This problem can be solved by obtain a new model that detect rather someone is moving or not. I did train one of the model, the result was ok(shown in the confusion matrix below) cm1
  • Lack of computing power: Alhtough the inference work well, but it’s slow… Because the entire data collection, processinng, and inference were done in one while-loop, the time complexity of the model caused unacceptable delay. The camera are suppose to collect the video feed for each seconds, but the data processing and inference step caused to much delay, making it impossible to conduct real-time inference. Noticed that this issue CAN NOT BE SOLVED within the given amount of time
  • Normally, GPU is required for any Deep learning task. GPU is suitable because it can execute multithreaded tasks while CPU can only excute singlethreaded tasks. I trained the model on Kaggle because it offers free GPU runtime. However, since the inference must be done locally to access local camera, making such task impossible to be done.
  • This means I will NOT use the model to control the drone as the delay will lead to possible hazard and might cause property damage and personnel casualties. The model is not capable to actually deploy on the drone as well as my current device.

    Possible Future Approach

    I will still use the video feed from the drone’s camera as input with the help of the library djitellopy, but I will not use the result of the inference to control the drone due to safty and technical consideration.