TY - JOUR
T1 - Optimal scheduling over time-varying channels with traffic admission control
T2 - Structural results and online learning algorithms
AU - Phan, Khoa T.
AU - Le-Ngoc, Tho
AU - Van Der Schaar, Mihaela
AU - Fu, Fangwen
PY - 2013
Y1 - 2013
N2 - This work studies the joint scheduling-admission control (SAC) problem for a single user over a fading channel. Specifically, the SAC problem is formulated as a constrained Markov decision process (MDP) to maximize a utility defined as a function of the throughput and queue size. The optimal throughput-queue size trade-off is investigated. Optimal policies and their structural properties (i.e., monotonicity and convexity) are derived for two models: simultaneous and sequential scheduling and admission control actions. Furthermore, we propose online learning algorithms for the optimal policies for the two models when the statistical knowledge of the time-varying traffic arrival and channel processes is unknown. The analysis and algorithm development are relied on the reformulation of the Bellman's optimality equations using suitably defined state-value functions which can be learned online, at transmission time, using time-averaging. The learning algorithms require less complexity and converge faster than the conventional Q-learning algorithms. This work also builds a connection between the MDP based formulation and the Lyapunov optimization based formulation for the SAC problem. Illustrative results demonstrate the performance of the proposed algorithms in various settings.
AB - This work studies the joint scheduling-admission control (SAC) problem for a single user over a fading channel. Specifically, the SAC problem is formulated as a constrained Markov decision process (MDP) to maximize a utility defined as a function of the throughput and queue size. The optimal throughput-queue size trade-off is investigated. Optimal policies and their structural properties (i.e., monotonicity and convexity) are derived for two models: simultaneous and sequential scheduling and admission control actions. Furthermore, we propose online learning algorithms for the optimal policies for the two models when the statistical knowledge of the time-varying traffic arrival and channel processes is unknown. The analysis and algorithm development are relied on the reformulation of the Bellman's optimality equations using suitably defined state-value functions which can be learned online, at transmission time, using time-averaging. The learning algorithms require less complexity and converge faster than the conventional Q-learning algorithms. This work also builds a connection between the MDP based formulation and the Lyapunov optimization based formulation for the SAC problem. Illustrative results demonstrate the performance of the proposed algorithms in various settings.
KW - learning
KW - Markov decision process (MDP)
KW - Scheduling
KW - structural results
KW - traffic admission control
UR - http://www.scopus.com/inward/record.url?scp=84884939693&partnerID=8YFLogxK
U2 - 10.1109/TW.2013.081913.121525
DO - 10.1109/TW.2013.081913.121525
M3 - Article
AN - SCOPUS:84884939693
SN - 1536-1276
VL - 12
SP - 4434
EP - 4444
JO - IEEE Transactions on Wireless Communications
JF - IEEE Transactions on Wireless Communications
IS - 9
M1 - 6585729
ER -