WCNC UK PPT

FIRST WE COVER THIS, THEN WE WALK ON OUR WAY TOWARDS THAT.

Good day everyone. My name is Sravani Kurma, and I'm here today to present a topic on the Deep Reinforcement Learning (DRL) approach on the spectral-energy efficiency tradeoff in RIS-aided FD MU-MIMO systems.

Introduction:

To begin with, RIS are planar arrays of low-cost, passive elements that can reflect and manipulate electromagnetic waves, allowing them to shape the propagation environment and enhance the signal quality. This means that RIS can be used to create customized wireless channels that are optimized for specific user requirements, resulting in improved signal quality and higher data rates.

One of the main advantages of using RIS is their ability to significantly enhance the spatial multiplexing capabilities of wireless systems, especially when used in combination with full-duplex multi-user MIMO. With RIS, multiple users can transmit and receive data simultaneously without interfering with each other, which is a game-changer for wireless communication systems.

In addition, RIS can help to improve the energy efficiency of wireless systems by reducing the need for active radio components and minimizing the power consumption of wireless devices. They are also relatively low-cost and easy to deploy, making them a cost-effective solution for improving the performance of wireless networks.

Lastly, RIS can enhance the security of wireless systems by creating customized wireless channels that are resistant to eavesdropping and other types of attacks.

Channel model :

To accommodate $K$ multiple-antenna HD users on the same spectral resource with the best service, we assume all the users (i.e., UL and DL users) are equipped with $N_k$ number of antennas, while the BS has $M_r$ receive and $M_t$ transmit antennas where $M= M_t + M_r$, $K = (K_u + K_d)$.

$K_d$ DL users receive messages from the BS with the aid of $R_d$ whereas $K_u$ UL users send messages to the BS with the aid of $R_u$ simultaneously.

All of the RIS elements can be programmed using a control unit in order to adjust the reflection corresponding to the incident signal.

It is presumed that an obstruction is blocking direct user-to-BS transmissions.

Furthermore, we discard the reflections that occur more than once with minimal strength and only take into account the signals that reflect for the first time due to the reasonably high-path loss effect.

Further, to control the overhead signaling requirement and demand for energy supply, we consider partial channel state information (CSI), which includes the availability of statistical CSI for all the links, i.e., RIS-to-user links and RIS-to-BS links.

Here are some example use cases of RIS:

One example of RIS use case is in indoor wireless systems, where RIS can be used to create customized wireless channels that are optimized for specific user requirements, resulting in improved signal quality and higher data rates. This can be particularly useful in environments such as shopping malls, airports, or stadiums where there are many users and a high demand for data.

Another use case of RIS is in smart homes and buildings. Here, RIS can be used to improve the coverage and capacity of wireless networks, resulting in improved signal quality and higher data rates for smart home devices such as security cameras, smart thermostats, and lighting systems.

RIS can also be used in autonomous vehicles and drones, where they can help to improve the reliability and efficiency of wireless communication systems, making them more robust and better equipped to handle the challenges of operating in dynamic and unpredictable environments.

Finally, RIS can also be used in Internet of Things (IoT) devices, where they can help to improve the energy efficiency and security of wireless systems, making them more sustainable and less vulnerable to attacks.

Now, let's move on to the second part of this presentation, which is about why deep reinforcement learning (DRL) is an effective approach for optimizing the phase shifts of RIS.

DRL is a powerful machine learning technique that enables the RIS to learn the optimal phase shifts based on feedback from the wireless environment. This means that the RIS can adapt to changing wireless environments and optimize the phase shifts in real-time, resulting in better performance and higher data rates.

Deep Reinforcement Learning Approach:

Deep Reinforcement Learning (DRL) is a promising approach to overcome this tradeoff. It is a subfield of Machine Learning (ML) that uses an agent to learn optimal decisions by interacting with the environment. In the context of wireless communication systems, the agent learns how to allocate resources to maximize both spectral and energy efficiency.

DRL enables the agent to adapt to different wireless network conditions and achieve optimal decisions. This is done by the agent receiving feedback from the environment, allowing it to learn and improve its decision-making process.

Benefits of DRL Approach:

There are several benefits of using DRL in wireless communication systems:

Adaptive: The agent can adapt to the changing network conditions, which makes it suitable for dynamic wireless environments.
Optimized Resource Allocation: The agent can allocate resources effectively to maximize spectral and energy efficiency.
Reduced Energy Consumption: The agent can make decisions that reduce energy consumption, which is important for sustainability.
Improved Performance: The agent can learn and improve its decision-making process, which results in improved system performance.

We initialize the target critic network $Q^{\prime}\left(s, a ; \theta_{q^{\prime}}\right)$ and critic network $Q\left(s, a ; \theta_q\right)$ with the parameter $\theta_{q^{\prime}}$ and $\theta_q$ respectively. Likewise, the target actor network $\mu^{\prime}\left(s ; \theta_{\mu^{\prime}}\right)$ and the actor network $\mu\left(s ; \theta_\mu\right)$ are defined with the parameters $\theta_{\mu^{\prime}}$ and $\theta_\mu$ respectively.

Then we train the action network using stochastic gradient descent (SGD) and the critic network using policy gradient. By minimizing the loss function as shown in \eqref{loss}, we update the critic network.

Exploration is when an Agent has to sample actions from a set of actions in order to obtain better rewards. Exploitation, on the other hand, is when an Agent takes advantage of what it already knows in repeating actions that lead to “favourable” long-term rewards.

Conclusion:

$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$

QNS: why o.5 multiplied for half duplex for there rate expression?

Ans: In telecommunications, the data transfer rate of a communication channel is expressed in bits per second (bps). In a half-duplex communication system, data can only be transmitted in one direction at a time, either from the sender to the receiver or from the receiver to the sender. This is in contrast to a full-duplex system, where data can be transmitted in both directions simultaneously.

In a half-duplex system, the channel capacity is shared between the two directions of transmission, which means that the maximum data transfer rate in one direction is only half of the channel capacity. This is why the data transfer rate expression for a half-duplex system includes a multiplication factor of 0.5, which represents the sharing of the channel capacity.

For example, if the channel capacity is 100 Mbps (million bits per second), the maximum data transfer rate in one direction in a half-duplex system would be 50 Mbps (0.5 x 100 Mbps).

In order words, the channel capacity is to handle data transfer in both the directions. But hd has one directional txn, it is using only half of the channel capacity.

DDPG algorithm:

DDPG, or Deep Deterministic Policy Gradient, is an algorithm used for reinforcement learning tasks with continuous action spaces. It is an extension of the Deep Q-Network (DQN) algorithm that is used for discrete action spaces.

DDPG consists of an actor-critic architecture, where the actor learns to select actions based on the current state, and the critic evaluates the quality of the action taken by the actor. The actor-critic architecture is implemented using deep neural networks, which allows DDPG to handle high-dimensional input spaces.

DRL ALGO:

We formulate the optimization problem by complete observation of channel state information by adopting the Markov decision process (MDP) framework which has four components namely state space, action space, conditional transitional probabilities for the state, and the reward function i.e., MDP $<\mathcal{S} ,\mathcal{A}, P, \mathcal{R} >$.

The following definitions correspond to these four components of the proposed DRL 

1. State space: The agent interacts with its surroundings to maximize RE performance. The agent only interprets local information, such as reflected channel gains, as a result. The state space $\mathcal{S} \in \{\mathcal{S}_1,\mathcal{S}_2\}$ is defined for uplink and downlink as follows: \vspace{-0.5em}

   2. Action space: For the proposed system, we optimize the  RIS phase-shift matrix and BS power allocation for the DL transmission. Similarly, the RIS phase-shift matrix and $K_u$ power allocation are optimized for UL transmission. As a result, the action space is defined as

3. Reward function:

Our aim is to maximize RE performance, hence we construct the reward function $\mathcal{R} = r\left(s, a\right)\in \{ \mathcal{R}_{FD}, \mathcal{R}_{HD}\}$ as

In order to determine the optimal policy, $\pi^*$, we first developed the RE problem.

To use DDPG to solve this task, we first initialize two deep neural networks, the actor network and the critic network. The actor network takes the current state as input and outputs the corresponding action. The critic network takes the current state and action as input and outputs the corresponding value, which is an estimate of the expected cumulative reward.

During training, we start with a random policy and generate trajectories by selecting actions based on the current policy. We then calculate the corresponding value using the critic network and use it to update the actor and critic networks. The update is done using the gradient of the value function with respect to the actor parameters and the mean-squared error between the estimated value and the actual value.

To speed up convergence and eliminate over-calculation in the DDPG method, we employ strategies involving the replay buffer and target network. A finite-size memory $B$ is utilized in the experience replay buffer to maintain the executed transition $<s^t, a^t, r^t, s^{t+1}>$. In order to train the neural networks, we choose a minor group $D$ of transitions at random from the buffer $B$ once we have sufficient samples. The memory $B$ is assumed to be a fixed size to update the new samples and eliminate the old ones.

We also use a replay buffer to store past experiences, which are randomly sampled during training to improve sample efficiency.

After training, we can use the actor network to select actions based on the current state and solve THE OPTIMIZATION TASK

RESULTS:

The convergence of the reward function i.e., RE of both FD and HD systems using the DRL algorithm is shown in Fig. 2. The results in the figure illustrate that the agent will converge effectively after adequate training episodes. It is observed that the proposed DDPG algorithm provides an improvement of 1.16 times in the RE of FD mode compared to that of HD mode. Moreover, it is also noticed that FD achieves comparatively faster convergence than the HD mode.

REASON:

Full Duplex mode allows for simultaneous transmission and reception of data, which can result in a higher data transfer rate and more efficient use of the communication channel. This can lead to faster convergence since the agent can receive more information from the environment in a shorter period of time.
In a Half Duplex mode, the communication channel is shared between the sender and the receiver, which can lead to collisions or delays if both try to transmit at the same time. This can result in a less stable learning process and slower convergence.
The reward function in the Full Duplex mode may have provided clearer and more consistent feedback to the agent compared to the Half Duplex mode. This can allow the agent to learn more efficiently and converge faster.

Fig. 3 illustrates the effect of CCI on the RE of the FD system. The impact of CCI attenuation i.e., the effect of isolation among the DL-UEs and UL-UEs on the performance of the FD system is shown in this figure. The RE of the HD system is used as a benchmark against the performance of the FD system. For varying CCI, there is no variation in the HD-RE as CCI and RSI does not affect the HD system whereas there is a noticeable reduction in the RE of FD for varying RSI. The FD-RE outperforms HD-RE for higher values of the RSI values (say −100 dBm). As can be seen, the FD system performs better than the HD system as the level of CCI cancellation rises (0 being no cancellation and 1 representing 100% cancellation ). It can be noticed that at CCI = 0, the FD system has 66.7% and 30% gains in RE over the HD system with RSI of −100 dBm and −60 dBm, respectively. Moreover, at CCI = 0.27, the RE of the FD system over the HD system are 0% and 22% with RSI of −100 dBm and −60 dBm, respectively. However, as RSI decreases (say −60 dBm) for some value of CCI (say 0.27), HD starts outperforming the FD system.

Therefore, a suitable smart channel assignment at an early stage is crucial before the precoder/decoder design to ensure that UL and DL users may successfully coexist.

Qns : Why HD (Half-Duplex) starts outperforming FD (Full-Duplex) system after some value of CCI?

The reason why HD (Half-Duplex) starts outperforming FD (Full-Duplex) system after some value of CCI (Co-Channel Interference) is because of the increasing level of residual self-interference caused by imperfect cancellation of the transmitted signal in the FD system.

As the level of CCI attenuation increases, the FD system can effectively cancel the interference caused by the transmission from the own device to itself, leading to improved spectral efficiency (RE). However, there is always some residual self-interference left, which becomes more significant as the level of CCI attenuation approaches its limit.

At some point, the residual self-interference in the FD system becomes stronger than the interference caused by the CCI, and this leads to a degradation in the performance of the FD system compared to HD. In other words, the FD system can no longer effectively cancel the residual self-interference, and it becomes a limiting factor in the system's performance.

Therefore, the optimal performance of the FD system is achieved when there is a balance between the level of CCI attenuation and the residual self-interference caused by imperfect cancellation. Beyond a certain point, the residual self-interference becomes dominant, and the HD system becomes a better option.

Fig. 4 shows the impact of RSI on the system’s RE performance. As can be observed, RE decreases as RSI increases for the FD system whereas it remains constant for the HD system. This is because of the detrimental effect of the significant increment of RSI power on FD-UL performance. However, over the HD system, there exists 15% to 30% improvement in RE of the FD system even at a reasonable RSI of −70 dBm based on the maximum power considered. For instance, when Pmax = 20 dBm and Pmax = 30 dBm, there exists approximately 23.5% and 15% RE improvement in FD system over HD system. Hence, we observe that RSI dominates the performance of the FD-UL when Pmax increases. Additionally, the HD system begins to outperform the FD system at low RSI levels (i.e. −50 dBm and −40 dBm for Pmax = 20 dBm and Pmax = 30 dBm respectively) because the distortion increases with the number of antennas.

Fig. 5 compares the RE of FD-DL and FD-UL of the proposed system for varying values of the base station transmit power (PBS). We observe that the RE of the FD-DL outperforms the FD-UL for high PBS because of the improved DL SINR. In contrast, there is a reduction in FD-UL RE as P_BS increases because of a significant increment of residual selfinterference (RSI) power. We evaluated the effect of the NR on the SE-EE trade-off and found that deploying a RIS with a large NR improves the RE of the proposed system. This is because a stronger reflected signal is introduced with an increase in the NR to enhance energy reception at Kd DL users and ensure information is received at the BS. We compare the results with the system without the RIS case to evaluate the effectiveness of the proposed system. These outcomes show that the proposed model outperforms the system without RIS because RIS-assisted systems may allocate resources more flexibly, leading to larger RE gains with increased NR.

Why the FD-UL with RIS is performing worser than FD-UL without RIS for lower DL power?

As the downlink power decreases, the RIS elements reflect less of the signal and hence the signal strength at the receiver end reduces. In addition to this, the RIS introduces an additional fading effect due to the random phase shifts of the reflected signal. This results in a double fading effect that reduces the overall signal-to-interference-plus-noise ratio (SINR) at the receiver end.

As a result, the resource efficiency of the full duplex downlink RIS aided multi-user MIMO system performs worse than the case without RIS for lower values of downlink power. However, for higher values of downlink power, the RIS aids in enhancing the signal strength and reducing interference, thereby improving the resource efficiency of the system.

Search This Blog

sravanikurma

WCNC UK PPT

Comments

Post a Comment

Popular posts from this blog

NOMA-THz by Ding

Steps to start work on LLM from scratch:

Journal paper review process queries: