Edge Delayed Deep Deterministic Policy Gradient: efficient continuous control for edge scenarios
Sinigaglia, Turcato, Carli et al.
Deep Reinforcement Learning is gaining increasing attention thanks to its capability to learn complex policies in high-dimensional settings. Recent advancements utilize a dual-network architecture to learn optimal policies through the Q-learning algorithm. However, this approach has notable drawbacks, such as an overestimation bias that can disrupt the learning process and degrade the performance of the resulting policy. To address this, novel algorithms have been developed that mitigate overestimation bias by employing multiple Q-functions. Edge scenarios, which prioritize privacy, have recently gained prominence. In these settings, limited computational resources pose a significant challenge for complex Machine Learning approaches, making the efficiency of algorithms crucial for their performance. In this work, we introduce a novel Reinforcement Learning algorithm tailored for edge scenarios, called Edge Delayed Deep Deterministic Policy Gradient (EdgeD3). EdgeD3 enhances the Deep Deterministic Policy Gradient (DDPG) algorithm, achieving significantly improved performance with $25\%$ less Graphics Process Unit (GPU) time while maintaining the same memory usage. Additionally, EdgeD3 consistently matches or surpasses the performance of state-of-the-art methods across various benchmarks, all while using $30\%$ fewer computational resources and requiring $30\%$ less memory.
academic
Edge Delayed Deep Deterministic Policy Gradient: 엣지 시나리오를 위한 효율적인 연속 제어
심층 강화학습(DRL)은 고차원 입력 공간에서 복잡한 정책을 학습하는 능력으로 주목받고 있습니다. 현대 DRL 알고리즘은 과대평가 편향을 극복하기 위해 일반적으로 이중 네트워크 Q-학습 아키텍처에 의존합니다. 그러나 엣지 컴퓨팅 시나리오의 부상으로 인해 개인정보 보호 관심사와 엄격한 하드웨어 제약이 효율적인 알고리즘을 요구합니다. 본 논문은 엣지 컴퓨팅 환경을 위해 특별히 설계된 새로운 강화학습 알고리즘인 Edge Delayed Deep Deterministic Policy Gradient (EdgeD3)를 제안합니다. EdgeD3는 GPU 시간을 25% 감소시키고 계산 메모리 사용을 30% 감소시키면서 여러 벤치마크 및 실제 작업에서 최첨단 알고리즘의 성능을 지속적으로 달성하거나 초과합니다.