190 likes | 273 Vues
Explore optimal probing and transmission strategies for multichannel systems, balancing energy consumption and information acquisition. Learn the system's adaptive policies and gain-maximizing techniques. Discover efficient solutions for different channel states.
E N D
Jointly Optimal Transmission and Probing Strategies for Multichannel Systems Saswati Sarkar University of Pennsylvania Joint work with Sudipto Guha (Upenn) and Kamesh Munagala (Duke)
Multichannel Systems • Current wireless networks have access to multiple channels • Frequencies in FDMA, codes in CDMA, antennas in MIMO, polarization states of antennas, access points in systems with multiple access points • IEEE 802.11a, IEEE802.11b, IEEE802.11h • Future wireless networks are likely to have access to larger number of channels • Potential deregulation of spectrum • Advances in device technology
Multichannel systems • Transmission quality in different channels stochastically vary with time. • A user is likely to find a channel with acceptable transmission quality when the number of channels is large, • if he samples all the channels • Sampling consumes energy and generates interference for other users • Amount of information a user acquires about its channels becomes an important decision variable.
Policy Decisions • Probing policy • How many channels to probe? • Which channels to probe? • Sequence of probing • Selection policy • Which channel to transmit (probed channel or unprobed channel)?
System Goal • Gain is the difference between the expected rate of successful transmission and the cost of probing • Maximize the system gain • Need to jointly optimize the probing and selection policies • Policies will be adaptive • Policies may depend on higher order statistics and cross statistics of channels.
Sense of déjà vu? • Stopping Time problem (Bertsekas, Dynamic Programming, Vol. 1) • Maximize reward by optimally selecting between two control actions at any given time: (a) continue or (b) stop • IID channels (Kanodia et al, 2004, Ji et al, 2004) • Generalized stopping time problems • Select between multiple control actions • Only limited results are known
Sense of déjà vu? • Stochastic multi-armed bandit problem (Bertsekas, Dynamic Programming, Vol. 2) • Observe the state of only one arm at each slot • Select the arm to observe and get reward from it • State of an arm changes only after you observe it • State changes are stochastic • Gittins index based policies maximize the reward (Gittins-Jones, 1974)
Sense of déjà vu? • Adversarial multi-armed bandit problems (Auer et al, 2002) • State of an arm may change even when you don’t observe it • State changes are adversarial • Some versions allow you to get reward from only one selected arm and subsequently observe the states of all arms in the slot • Given a class of N policies and a time horizon of T slots, bound the performance loss as compared to the best of the policies in terms of O(sqrt(T * log N)).
Our Contribution • Polynomial complexity optimal solution when channels are mutually and temporally independent and each channels has two states. • Sigmetrics, 2006, poster paper • Polynomial complexity approximate solution when channels have multiple states • Approximation ratio ½ for arbitrary number of states • Approximation ration 2/3 for 3 states
System Model • Single user with access to n channels • Each channel can be in one of K states, 0,…K-1. • Channel i is in state j with probability pji • Probability of successful transmission in state j is rj • User knows the state of channel i in a slot only if it probes it. • Cost of probing channel i is ci • Channel states are temporally and mutually important
Optimal policy for K = 2 (Sigmetrics, 2006) • Exhaustive (S, i) policy: • probes all channels in set S, • if no probed channel is in state 1, transmit in channel i (backup). • Optimal policy is exhaustive (Si, i) for some i • Si = {j: p1j(1-p1i) > cj} • Probes all channels with high cross-uncertainty with the backup channel • Probes channels in decreasing order of p1j/cj • Search space for optimal policy restricted (n policies) • Determine the optimal by explicitly evaluating the gain in the above class.
Additional challenges for K > 2 • For K = 2, the optimal policy is static • Actions can be determined before observation • For K > 3, the optimal policy may be adaptive • While probed channels are in state 0, focus on channels with high expected reward • After observing a channel in intermediate state, focus on channels which have higher probabilities of being in the highest state • After observing a channel in intermediate state, may select either the observed channel or the backup. • Optimal policy is a decision tree • Exponential number of decision trees • Storage of decision tree consumes exponential complexity
Constant Factor Approximation Algorithm for K > 2 • Let policy A be the best among those that always transmits in a probed channel • Gain GA • Let policy B be the best among those that always transmits in an unprobed channel • Policy B is clearly the one that never probes any channel and transmits in the channel with the highest expected transmission rate. • Gain GB • Theorem 1: Max(GA , GB) >= ½ Maximum Gain • The best among A and B attains half the maximum gain
Proof of Theorem 1 • Q: Probability that the optimal policy transmit in an unprobed channel • OPT: Expected gain of the optimal policy • G: Expected gain of the optimal policy given that it transmits in an unprobed channel
Proof of Theorem 1 • Modify the optimal so that it transmits in the best probed channel whenever it was using the backup • Expected gain OPT’ which is less than or equal to GA • Expected Gain given that the optimal uses a backup x • OPT – OPT’ <= Q (G – x) <= QG <= GB OPT <= OPT’ + GB <= GA +GB <= 2 max(GA , GB)
Missing link • Need to compute the policy that maximizes gains among those that transmit only in probed channels. • Once a channel is observed to be in state u, probe only those channels for which the expected improvement above u exceeds cost.
Special case: K = 3 • Theorem 2: There exists an optimum policy which uses a unique backup channel. • There exists a channel l which satisfies the following characteristic: if the optimum policy uses a backup it uses l.
Approximation algorithm for attaining 2/3 approximation ratio • Let P(l) be the class of policies which • never probe channel l and • never uses any channel as a backup other than l. • Let C be the best policy in P(l) for the best choice of l. • Theorem 3: Max(GA , GB , GC) >= 2/3 Maximum Gain • Best among A, B, C attains 2/3 of the optimum gain.
Open Problems • Is the optimization problem NP-hard when K exceeds 2? • Can we get better approximation ratios? • What happens when channels are not mutually independent? • What happens when channels are not temporally independent? • What happens when you can simultaneously transmit in multiple channels? • Power control • What happens when you need not transmit in every slot? • Lazy scheduling