Quantum Information

Quantum Information Stephen M. Barnett University of Strathclyde steve@phys.strath.ac.uk The Wolfson Foundation

1.Probability and Information • 2. Elements of Quantum Theory • 3. Quantum Cryptography • 4. Generalized Measurements • Entanglement • Quantum Information Processing • 7. Quantum Computation • 8. Quantum Information Theory Quantum Information Stephen M. Barnett Oxford University Press (2009)

Probability and Information • 2. Elements of Quantum Theory • 3. Quantum Cryptography • 4. Generalized Measurements • Entanglement • Quantum Information Processing • 7. Quantum Computation • 8. Quantum Information Theory 1.1 Introduction 1.2 Conditional probabilities 1.3 Entropy and information 1.4 Communications theory

1.1 Introduction There is a fundamental link between probabilities and information Reverend Thomas Bayes 1702 - 1761 Probabilities depend on what we know. If we acquire additional information then this modifies the probability.

Entropy is a function of probabilities Ludwig Boltzmann 1844 - 1906 The entropy depends on the number of (equiprobable) microstates W.

The quantity of information is the entropy of the associated probability distribution Claude Shannon 1916 - 2001 The extent to which a message can be compressed is determined by its information content. Given sufficient redundancy, all errors in transmission can be corrected.

Information is a function of the probabilities Probabilities depend on available information Information theory is applicable to any statistical or probabilistic problem. Quantum theory is probabilistic and so there must be a quantum information theory. But probability amplitudes are the primary quantity so QI is different ...

More generally: 1.2 Conditional probabilities Suppose we have a single event A with possible outcomes {ai}. Everything we know is specified by the probabilities for the possible outcomes: P(ai). For the coin toss the possible outcomes are “heads” and “tails”: P(heads) = 1/2 & P(tails) = 1/2.

and Single event probabilities and joint probabilities related by: Two events: Add a second event B with outcomes {bj} and probabilities P(bj). Complete description provided by the joint probabilities: P(ai,bj) If A and B are independent and uncorrelated then P(ai,bj) = P(ai)P(bj)

given that Finding the constant of proportionality leads to Bayes’ rule: What does learning the value of A tell us about the probabilities for the value of B? If we learn that A = a0, then the quantities of interest are the conditional probabilities: P(bj|a0) This conditional probability is proportional to the joint probability:

Probability tree: B b1 1/4 b2 A 1/4 a1 b3 1/2 1/2 b1 2/3 1/3 b2 1/3 a2 b3 0 b1 1/3 1/6 b2 1/3 a3 b3 1/3 P(ai) P(bj|ai)

Bayes’ theorem Relation between the two types of conditional probability or as an alternative

Example Each day I take a long (l) or short (s) route to work and may arrive on time (O) or late (L). long Work short Given that you see me arrive on time, what is the probability that I took the long route? Home

If A = a0 is very likely then we might have confidently expected it and so learn very little. If A = a0 is highly unlikely then we might need to drastically change our plans. 1.3 Entropy and information The quantity of information is the entropy of the associated probability distribution for the event Suppose we have a single event A with possible outcomes {ai}. If one, a0, is certain to occur, P(a0) = 1, then we acquire no information by observing A.

This suggests logarithms: h[P(ai)] = -K log P(ai) Learning the value of A provides a quantity of information that increases as the corresponding probability decreases. h[P(ai)]P(ai) We think of learning about something new as adding to the available information. For two independent events we have: h[P(ai,bj)] = h[P(ai)P(bj)] = h[P(ai)] + h[P(bj)]

We can absorb K into the choice of basis of the logarithm: Entropy!! log base 2: log base e: It is useful to define information as an average for the event A:

H one bit of information p For two possible outcomes:

Mathematical properties of entropy • H(A) is zero if and only if one of the probabilities is unity. • Otherwise it is positive. • If A can take n possible values a1, a2, …, an, then the maximum value is • H(A) = logn, which occurs for P(ai) = 1/n. • Any change towards equalising the probabilities will cause H(A) to increase:

Entropy for two events

A measure of the correlation between two events • The mutual information is bounded Mutual information

(ii) Minimise any bias by maximising the entropy subject to what we know. This is Jaynes’ MaxEnt principle. It makes the prior probabilities as near to equal as is possible, consistent with any information we have. Edwin Jaynes 1922 - 1998 Problems for Bayes’ theorem? (i) Can we interpret probabilities as frequencies of occurrence? (ii) How should we assign prior probabilities? (i) This is the way probabilities are usually interpreted in statistical mechanics, communications theory and quantum theory.

Maximise this entropy subject to the known constraints by varying Max Ent examples A true die has probability 1/6 for each of the possible scores. This means a mean score of 3.5 What probabilities should we assign given only that the mean score is 3.47?

Probabilities: Swedish Finnish x 0.10 - x 10% 0.06 - x 0.84 + x 90% 6% 94% Maximising He gives x = 0.006 = 6% . 10% Independent! Max Ent examples Should we treat variables with known separate statistical properties as correlated or not? 90% of Finns have blue eyes 94% of Finns are native Finnish speakers

Max Ent and thermodynamics Consider a physical system that can have a number of states labelled by n with an associated energy En. What is the probability for occupying this state? Maximise the entropy subject to the constraint

We recognise this as the Boltzmann distribution is the partition function is the inverse temperature The solution is

“During real physical processes, the entropy of an isolated system always increases. In the state of equilibrium the entropy attains its maximum value.” Rudolf Clausius 1822 - 1888 “A process whose effect is the complete conversion of heat into work cannot occur.” Lord Kelvin (William Thomson) 1824 - 1907 Information and thermodynamics Information is physical: it is stored as the arrangement of physical systems. Second law of thermodynamics:

On the decrease of entropy in a thermodynamics system by the intervention of intelligent beings Box of volume V0, containing a single molecule and surrounded by a heat bath at temperature T V0 Leo Szilard 1898 - 1964 T

Heat transferred Q = W On the decrease of entropy in a thermodynamics system by the intervention of intelligent beings Box of volume V0, containing a single molecule and surrounded by a heat bath at temperature T V0 Leo Szilard 1898 - 1964 T Work W

Inserting the partition reduces the volume from V0 to V0/2. The work done by the expanding gas is The expansion of the gas increases the entropy by the amount Kelvin’s formulation of the second law is saved if the process of measuring and recording the position of the molecule produces at least this much entropy. Here entropy is clearly linked to information.

Bit value 1 Irreversibility and heat generation in the computing process Erasing an unknown bit of information requires the dissipation of at least kBTln2 of energy. Rolf Landauer 1927-1999

Bit value 0 Irreversibility and heat generation in the computing process Erasing an unknown bit of information requires the dissipation of at least kBTln2 of energy. Rolf Landauer 1927-1999

The work done to compress the gas is dissipated as heat. Irreversibility and heat generation in the computing process Rolf Landauer 1927-1999 Remove the partition, then push in a new partition from the right to reset the bit value to 0.

Bob Alice 1.4 Communications theory Communications systems exist for the purpose of conveying information between two or more parties. Communications systems are necessarily probabilistic... If Bob knows the message before it is sent then there is no need to send it! Bob may know the probability for Alice to select each of the possible messages, but not which one until the signal is received.

Information source Receiver Destination Transmitter Received signal Signal Bob Alice Noise source Shannon’s communications model • Noiseless coding theorem - data compression • Noisy coding theorem - channel capacity

The operation of the channel is described by the conditional probabilities: {P(bj|ai)} From Bob’s perspective: {P(ai|bj)} Each received signal is perfectly decodable if P(ai|bj) = dij Let A denote the events in Alice’s domain. The choices of message and associated prepared signals are {ai} The probability that Alice selects and prepares the signal aiis P(ai) Let B denote the reception event - the receipt of the signal and its decoding to produce the message. The set of possible received messages is {bj}

Shannon’s coding theorems Most messages have an element of redundancy and can be compressed and still be readable. TXT MSSGS SHRTN NGLSH SNTNCS TEXT MESSAGES SHORTEN ENGLISH SENTENCES The eleven missing vowels were a redundant component. Removing these has not impaired its understandability. Shannon’s noiseless coding theorem quantifies the redundancy. It tells us by how much the message can be shortened and still read without error.

Shannon’s coding theorems Why do we need redundancy? To correct errors. RQRS BN MK WSAGS NFDBL This message has been compressed and then errors introduced. Lets try the effect of errors on the uncompressed message: ERQORS BAN MAKE WESAAGIS UNFEADCBLE ERRORS CAN MAKE MESSAGES UNREADABLE Shannon’s noisy channel coding theorem tells us how much redundancy we need in order to combat the errors.

The probability that any given string selected has n zeros and N - n ones is: Hence we need only consider typical string with All other possibilities are sufficiently unlikely that they can be ignored. Noiseless coding theorem Consider a string of N bits, 0 and 1, with P(0) = p and P(1) = 1-p: (p> 1/2) Number of possible different strings is 2N, the most probable is 0000 … 0 In the limit of long strings (N >> 1) it is overwhelmingly likely that the number of zeros will be close to Np.

Taking the logarithm of this and using Stirling’s approximation gives The number of equiprobable typical messages is If we consider coding only messages for which n = pN then the number of strings reduces to This is Shannon’s noiseless coding theorem: we can compress a message by reducing the number of bits by a factor of up to H.

The information for the letter probabilities is which gives a Shannon limit of 1.75N bits for the sequence of N letters. Example Consider a message formed from the alphabet A, B, C and D, with probabilities: The simplest coding scheme would use two bits for each letter (in the form 00, 01, 10 and 11). Hence a sequence of N letters would require 2N bits.

The average number of bits used to encode a sequence of N letters is then: For this simple example the Shannon limit can be reached by the following coding scheme: A = 0 B = 10 C = 110 D = 111

Let bit errors occur with probability q: 1 - q 1 1 q 0 0 1 - q Noisy channel coding theorem We can combat errors by introducing redundancy but how much do we need? Consider an optimally compressed message comprising N0 bits so that the probability for each is 2-N0. The number or errors will be close to qN0 and we need to correct these.

logE bits Alice Bob N0 bits The number of ways in which qN0 errors can be distributed among N0 bits is Bob can correct the errors if he knows where they are so ...

N0H2(q) bits N0H(q) bits Alice Bob N0 bits At least N0[1 + H(q)] bits are required in the combined signal and correction channels. If the correction channel also has error rate q then we can correct the errors on it with a second correction channel.

We can correct the errors on this second channel with a third one and so on. The total number of bits required is then: All of the correction channels have the same bit error rate so we can replace the entire construction by the original noisy channel!! We can do this by selecting messages that are sufficiently distinct so that Bob can associate each likely or typical message with a unique original message.

Each binary digit can carry not more than bits of information. Shannon’s noisy channel coding theorem: • We require at least N0/[1 - H(q)] bits in order to faithfully encode 2N0 messages. • N bits of information can be used to carry faithfully not more than 2N[1 - H(q)] distinct messages. q = 0, 1/2, 1 ??

Summary Probabilities depend on available information Information is a function of the probabilities It is the information that limits communications. Information is physical

Bayes’ theorem still works: Three or more events We can add a third event C with outcomes {ck} and probabilities P(ck). The complete description is then given by the joint probabilities P(ai,bj,ck) We can also write conditional probabilities but need to be careful with the notation! P(ai|bj,ck) vs P(ai,bj|ck)

ai given bj bj given ai The likelihood quantifies the effect of what we learn: Fisher’s likelihood How does learning that B = bj affect the probability that A = ai? Bayes’ theorem tells us that Sir Ronald Aylmer Fisher 1890 - 1962

Example from genetics Mice can be black or brown. Black, B, is the dominant gene and brown, b, is recessive. bb BB or Bb (bB) If we mate a black mouse with a brown one then how does the colour of the off spring modify the probabilities for the black mouse’s genes? In theabsence of any information about ancestry we can only assign equal probabilities to each of the 3 possible arrangements: P(BB) = 1/3 P(Bb) = 2/3

Quantum Information