ORCA: Bridging Eras with Pragmatic Learning-Based Congestion Control for Seamless Internet Traffic

Paper link: https://dl.acm.org/doi/10.1145/3387514.3405892

In this article, we're taking a close look at a paper on AI paper titled: Classic Meets Modern: Pragmatic Learning-Based Congestion Control for the Internet, breaking down what it does well and where it could be better. We'll highlight the strong points that make the paper stand out, but also point out areas that might need some improvement. By doing this, I hope to give you a clear picture of the paper's strengths and weaknesses, making it easier to understand its importance in the world of artificial intelligence research. Some valuable insights/comments follow later.

ORCA Summary:

The proposed idea of the paper is based on an approach combining classic congestion control strategies and advanced modern deep reinforcement learning (DRL) techniques. Proposed Orca1 is the novel hybrid congestion control for the Internet. It is an adaptive TCP algorithm that can achieve consistently high performance in different environments without being manually tuned for each one. It avoids costly procedures of manually engineering/tuning TCP schemes to specific scenarios and executes learning-based approaches for removing the hard-wired mapping of predefined signals/events to predefined control actions which has been done in classic TCP algorithms for the last 3 decades. The paper proposes a choice of DRL as the base learning Technique which is inspired by its enormous applicability to real-world problems. Some of the key contributions of this paper are; It demonstrated the problems of the current clean-slate learning-based CC designs & revealed the need for a more practical learning-based design. It demonstrates a novel distributed framework to design and train new learning-based CC Schemes which is available to the community. The authors built, deployed, and successfully evaluated Orca over a global testbed on the Internet with servers located on five different continents. The Structural design indicates Normalization which means Instead of feeding the exact values of gathered statistics to the agent, normalized data is used which helps the agent generalize the network environments that it observes during the training sessions to unseen environments and to achieve a better Model. Orca’s agent is based on the Deep Deterministic Policy Gradient algorithm which is a variant of Deterministic Policy Gradient Algo. During the training, a Comparison between the performance of the Clean-Slate version and Orca demonstrates the benefits of Orca’s pragmatic two-level control architecture even during the training. Finegrain control by underlying TCP speeds up the training session dramatically compared to the Clean-Slate version for calculating a new cwnd which was showcased in Figures 2 and 3. Orca’s ultimate goal is to learn one policy that can achieve high performance in diverse environments and network conditions with a single set of parameters. Certain practical challenges faced in the training of Orca are Catastrophic Forgetting and handling the extended training time. The scaled-out orca architecture consists of 3 main components: Actor, Replay memory and Learner. The different network environments are characterized by 3 key parameters; bottleneck link bandwidth, minimum RTT (delay), and bottleneck link’s buffer size. Orca shows consistently high performance in different networks, deployment feasibility with low overhead and TCP friendliness.

Strength:

Orca can achieve up to 3× lower delay compared to BBRv2. In an intercontinental test scenario, Orca can achieve from 5× to 20× better throughput compared to Cubic. Even with large MTP values, Orca’s performance still is better than a pure TCP Cubic. For smaller values of MTP, Orca’s performance increases at the cost of an increase in its overhead.

Weakness:

Orca requires a cluster of customized servers (e.g., patched with new Kernel codes or with access to underlying Kernel services for generating packets) to effectively perform the training. Such requirements greatly complicate the training phase and can prevent people with no liberty of accessing big custom clusters, from having tangible impacts in this domain. There’s no guarantee the Orca will perform well in all network conditions.

Few Comments:

1- Oraca addresses Congestion Control (CC) problems in a Reinforcement learning setting but it is not the final solution. It is interesting how Orca manages huge action space and the contradiction with the real-time nature of the problem mentioned in 3.3

2- It is quite interesting how the agent in Orca gains reward at each step, quantifies its performance and provides it with the criterion to improve its sequence of actions. Orca uses a well-defined terminology Power to evaluate efficiency. Here, Power=Throughput/Delay; maximum Power reflects the objective of maximizing throughput while minimizing the delay in the network. This phenomenon impacts the agent’s interpretation of the environment.

3- It is quite intuitive how an agent is motivated to gain maximum reward, especially in dynamic network environments. The agent makes a small queuing delay to have extra room for observing the maximum bandwidth. Concepts demonstrated like Continuous Probing and convergence, more predictability, and better tradeoff curve on performance & overhead support more efficient and faster training.

4- Orca only requires modifications on the server side and it works smoothly when any other TCP scheme is used on the client side. The paper states that for sending traffic for testing Linux stack and real-world client-server applications are chosen instead of network simulators.

5- More than 256 actors are used to interact with different environments during Orca’s training. The paper also proposes Clean-Slate, a DRL-based trained model with the same setting as Orca which doesn't use any underlying TCP as fine-grain control, but the final results suggest Orca outperforms it by 2 times more score. Orca’s consistently high performance in different network environments and its deployment friendliness, low overhead, TCP friendliness and Orca’s fairness property make it a clear winner. Also, the paper shows that Orca’s high performance is insensitive to the different AQM (active queue management) designs that might be used in the network.

6- Orca’s fairness property is based on choosing a well-studied objective of maximizing Power which is known to be optimal for the entire network outputs the basis for DRL-Agent’s reward, the fair AIMD nature of the underlying TCP scheme (Cubic). More research is needed on how Orca works on other TCP models, not including Cubic. Also, the big drawback in Orca is it requires a cluster of customized servers, patched with new Kernel codes or with access to underlying Kernel services for packet generation to effectively perform the training which is not always feasible.