Cloud Security: New Challenges, New Opportunities. Dr. XiaoFeng Wang (Associate Professor, IUB). Cloud Computing. The Cloud is not Secure Security/Privacy is the main concern Service providers avoid liability.
Cloud Computing The Cloud is not Secure Security/Privacy is the main concern Service providers avoid liability “…We strive to keep Your Content secure, but cannot guarantee that we will be successful at doing so, given the nature of the Internet.” - Amazon The Cloud is the future Ultra cost-effective computation The only scalable platform for data-intensive computing
Cloud Security: Anything New? Web Security Virtual Machine Security Secure Computing Outsourcing Data Storage Security and Privacy Decentralized Access Control and more
What We Believe The Cloud brings us NewChallenges: E.g., Move applications into the Cloud Applications change New security problems emerge The Cloud brings us NewOpportunities: E.g., Design for Data-Intensive Computing Hybrid Cloud Infrastructure
What We Found Surprising New Security Challenges in SaaS: fundamental vulnerability toside-channel attacks Susceptibility of API service integrations to logic flaws Surprising New Solution: Secure DNA Alignments can scale on the Cloud
Challenges in Making SaaS Right
Joint Work with Rui Wang, Kehuan Zhang, Zhou Li (my students) and Dr. Shuo Chen, Dr. ShazQadeer (MSR) Related Publications: Oakland’10, CCS’10, Oakland’11
Side-channel Leaks in Web Apps Desktop application Web application split between client and server state transitions driven by network traffic Worry about privacy? Let’s do encryption.
Our Findings The Problem is serious High-profile Web Applications Serious information leaks: health records, family income, investment details, search queries The problem is fundamental stateful communication, low entropy input and significant traffic distinctions. Defense is non-trivial effective defense needs to be application specific. calls for a disciplined web programming methodology.
Impacts Your health, tax, and search data siphoned: Software-as-a-service springs SSL leak, The Register, 23 March 2010 Side-Channel Leaks in Web Applications, Freedom to Tinker, Ed Felten's blog, 23 March 2010 Researchers sound alarm on Web app "side channel" data leaks, Network World, 24 March 2010 Bruce Schneier commented on our work Someone wikied our work SaaSApps May Leak Data Even When Encrypted, Study Says, darkReading, 26 March 2010 and more: Searching Google with “side-channel leaks in web applications”
OnlineTaxA OnlineTaxA Two most-popular online tax software. Design: a wizard-style questionnaire Tailor the conversation based on user’s previous input. Problem: Information Leaks from control flows E.g., Filing status, Number of children, Paid big medical bill, the adjusted gross income (AGI)
child credit state machine Child Credits All transitions have unique traffic patterns. Entry page of Deductions & Credits Summary of Deductions & Credits Not eligible Full credit Partial credit (two children scenario) $0 Partial credit Not eligible Full credit $110000 $150000
Student-loan-interest credit Student-loan-interest Credit Entry page of Deductions & Credits Summary of Deductions & Credits Not eligible Enter your paid interest Full credit Partial credit $0 Partial credit Not eligible Full credit $115000 $145000
$0 Disabled Credit $24999 A subset of identifiable AGI thresholds A subset of identifiable AGI thresholds Earned Income Credit $41646 Retirement Savings $53000 College Expense $116000 IRA Contribution $85000 $105000 Student Loan Interest $115000 $145000 Child credit * $110000 $130000 or $150000 or $170000 … First-time Homebuyer credit $150000 $170000 Adoption expense $174730 $214780 OnlineTaxAcan find more than 350 credits/deductions.
OnlineInvestA OnlineInvestA Funds you invested in? GIF image from MarketWatch. Just compare the image sizes! 3346 B 3312 B 3294 B Fund allocation Pie Charts ?
Inference of Fund Allocation 80000 charts Size of day 1 Size of day 4; Prices of the day Size of day 3; Prices of the day Size of day 2; Prices of the day 800 charts 1 chart 80 charts 8 charts
Challenges in API Service Integrations Examples: Payment Services (PayPal, Amazon Payment, Google Checkout, etc.) Login Services (Google ID, Facebook Connect, etc.) Other cloud API services (Amazon EC2, Simple DB, etc.) Hybrid Web Applications Integrating API services from other parties VERY, VERY HARD TO DO IT RIGHT
Our Study: Cashier-as-a-Service 3rd-party cashiers e.g., PayPal, Amazon Payments, Google Checkout A decision to be made jointly The web store handles the order The CaaS handles the payment
Buy.com RT1.a RT1.b RT4.a Shopper RT4.b T Thank you for your order! Your order #12345 will be shipped. View the order RT3.a.a RT3.a.b T Please confirm: shipping address: xxxxxxxxxxxxxxxxxxx billing address: xxxxxxxxxxxxxxxxxx total amount: $39.54 RT2.a RT2.b RT3.b RT3.a Many other payment services (PayPal Standard, PayPal Express, Amazon Simple Pay, Checkout By Amazon, Google Checkout, etc.) Different integrations PayPal (CaaS) Pay Now RT:HTTP round-trip : Web API
Sounds reasonable, but ask Dad to call me. Mom, can I do X? Mom I think it is fine. Naughty kid Sounds like a wacky idea. I am not sure. What do you think? Dad, Mom is ok about X’, can you call her? Dad OK.
What We Studied Software powering web stores NopCommerce – popular open-source Interspire – ranked #1 by Top10Reviews.com Amazon SDKs – used by stores to integrate Amazon Payments Closed-source big stores JR.com Buy.com
. Amazon Simple Pay Integrated in NopCommerce Chuck, pay in Amazon with this signed letter: Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222. [Jeff’s signature] Great, I will ship order#123! Jeff, I want to buy this DVD. Amazon, I want to pay with this letter Jeff Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222.[Jeff’s signature] Hi, $10 has been paid for order#123. payeeEmail= email@example.com [Amazon’s signature] Shopper Chuck Amazon Note: the phone number is analogous to the returnURL field in Amazon Simple Pay
How to Shop for Free Chuck, pay in Amazon with this signed letter: Great, I will ship order#123! Jeff, I want to buy this DVD. Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222. [Jeff’s signature] Jeff Register an Amazon Seller Account a $25 MasterCard gift card can work We registered it under the name “Mark Smith” with PayPal, Amazon and Google using the card. Pay to Mark but Check out from Jeff Amazon, I want to pay with this letter Hi, $10 has been paid for order#123. payeeEmail= firstname.lastname@example.org [Amazon’s signature] Dear Amazon, order#123 is $10, when it is paid, call me at 425-111-2222.[Jeff’s signature] [Mark’s signature] Shopper Chuck (and seller Mark) Amazon (CaaS)
PayPal Express Integrated in Interspire
PayPal Express Integrated in Interspire Session 2: place an expensive order (orderID2) , but skip the payment step in PayPal Session1: pay for a cheap order (orderID1) in PayPal, but avoid the merchant from updating it status to PAID TStore.com TStore.com RT3.b RT3.b RT4.a (RT3.b) redir to TStore.com/updateOrder?orderID2T* (RT3.b) redir to TStore.com/updateOrder?orderID1T* orderID2T* (RT4.a) call TStore.com/updateOrder?orderID1T*
Google Checkout Integrated in Interspire The order is generated based on the cart at the payment time (RT3.a). The payment amount is calculated based on the cart at the moment when the shopper clicks “checkout” (RT2.a). Between RT2.a and RT3.a, the shopper can add more items to the cart.
What We Found Logic flaws in 9 checkout scenarios
Dear Buy.com customer service, Last week I placed the two orders (Order Number: 54348156 Order number: 54348723) in buy.com. Both items were shipped recently, but I found that my paypal account has not been charged for the order 54348723 (the alcohol tester). My credit card information is: [xxxxxxxxx] The total of the order 54348723 is $5.99. Please charge my credit card. Thank you very much Dear buy.com customer service, I am a Ph.D. student doing research on e-commerce security. I bumped into an unexpected technical issue in buy.com's mechanism for accepting the paypal payments. I appreciate if you can forward this email to your engineering team. The finding is regarding the order 54348723. I placed the order in an unconventional manner (by reusing a previous paypal token), which allowed me to check out the product without paying. I have received the product in the mail. Of course I need to pay for it. Here is my credit card information [xxxxxxxxxxxx]. Please charge my card. The total on the invoice is $5.99. From: Buy.Com Support <email@example.com>Date: Sun, Jun 13, 2010 at 3:32 PMSubject: Re: Other questions or comments (KMM3534132I15977L0KM)To: Test Wang firstname.lastname@example.org Thank you for contacting us at Buy.com. Buy.com will only bill your credit card only when a product has beenshipped. We authorize payment on your credit card as soon as you placean order. Once an item has shipped, your credit card is billed for thatitem and for a portion of the shipping and/or tax charges (ifapplicable). If there are items on "Back Order" status, your credit card isre-authorized for the remaining amount and all previous authorizationsare removed. This is the reason you may have multiple billings for yourorder. … After our refund–eligible period, we mailed the products back by a certified mail. We disclosed technical details to them. A generic reply that misunderstood the situation Re: Other questions or comments (KMM3545639I15977L0KM) Buy.Com Support <email@example.com> Wed, Jun 16, 2010 at 6:25 PM To: Test Wang <firstname.lastname@example.org> Hello Test, Thank you for contacting us at Buy.com. Based on our records you were billed on 6/10/2010 for $5.99. To confirm your billing information please contact PayPal at https://www.paypal.com/helpcenter or at 1-402-935-2050.
What Do We Learn? SaaS is different Two party, multi-party applications Openness and complexity of the web platform Formal verification: how to build the model? Need new development methodologies Treat side channels seriously Better integration supports
Defense is Nontrivial Padding strategies Rounding Random padding Challenges Overheads (at least one-third bandwidth usage) Implementation hurdles Demand for the Change to the Web-app Development methodology
Sidebuster Side-channel detection tool for web applications Information flow analysis to locate where leaks happen Sensitive data “taints” network traffic Content of the data associated with different traffic features Quantify the information being leaked
Secure and Scalable Read Mapping on Hybrid Clouds
Joint work with Yangyi Chen, Bo Peng (my students) and Dr. Haixu Tang (bioinformatics expert) Related Publication: NDSS’12
Read Mapping Reference Genome (about 6 billion bps) T C G C A T T G C G C A A T T T T G G C C A A Read (about 100 bps) Align a short DNA (read) to a long one (reference genome) Pre-requisites for all DNA analyses Computation intensive (involving millions of reads)
Read Mapping on the Cloud Technical Challenges Millions of reads a reference of billions of nucleotides Edit-distance based alignment Cloud solutions Cost of sequencing < cost of mapping within organizaitons Cloud computing is the only solution Privacy NIH disallows reads with human DNA to be given to the public Cloud
Secure Computation Outsourcing Too heavyweight for data-intensive computations E.g., align two sequences of 25 3 minutes via homomorphic encryption/oblivious transfer/secret sharing [Atallah03] 4 seconds via improved SMC [Jha07] Problems with Secret Sharing Communication overheads Policy concerns
Data Anonymization Data aggregation and noise adding Vulnerable to re-identification attacks Given a reference population and a DNA sample, a read donor could be identified from aggregate data
Can we find a Better Solution? Private Cloud Public Cloud The Cloud is special Design for simple, parallelizable tasks Good at processing a large amount of data Work in a hybrid way The problem is special Small edit distance (<= 6)
Seed and Extend G G A A T T G G C C A A G T T G C A T A T G C T G T G C A A A G l-mer Seeding: compare with l-mers to find possible positions of a read Extension: extend from the positions to find the right alignments T T G C T A For a distance d, at least one of d+1 seeds of a read matches an l-mer on its alignment
A Surprisingly Simple Solution Public Cloud does the seeding on encrypted data Simple, parallelizable, data-intensive Private Cloud does the extension Complicated, yet involving relatively small amount of data Goals High security assurance Outsourcing dominant portion of Workload Scalability Limited Inter-Cloud Communication
Our Design Prepare encrypted reference sequences: Extract from the reference subsequences (l-mers) Encrypt unique l-mers Send ciphertext to the public cloud Public Cloud: Match encrypted seeds to the encrypted references Private Cloud: Extend the reads according to the matches
Does This Approach Work? Performance: Can we move the most workload to the public cloud? Security: Can sensitive information be inferred from encrypted sequences?
Short Seeds Problem: When d >= 5, seeds are too short (< 20) Short seeds lots of matches workload for the private cloud Our idea: 2-seed combinations E.g., given d=5, a read can be divided into 7 seeds, with 2-seed combination of 28 bps
Seed Combinations 2-combinations of seeds (12bps): about 2.8 GB for 10 M reads A T A C T G C T A A A G C T C T T T A T T A G C C T G C T A C T C T A G C C T G A C T A G C T A 1 2 3 4 5 … …. 87 88 89 90 … 99 100 101 2-combinations of Reference 12-mers: about 7 TB with SHA-1
Menace of Frequency Analysis Performance gain comes with security threats Public Cloud needs to know when matches happen deterministic cryptosystem the threat of frequency analysis How serious the problem is? Over 80% of 24-mers are unique The rest 20% often carry little information
Security Analysis What the adversary can observe Group the encrypted l-mers into bins (B1, …, Bk, …, Bh) l-mers in the same bin have the same frequency What the adversary has A case population: fk for Bk A reference population: Fk A DNA sample from a testee: k What the adversary wants to know The testee in the case group? What he can do A near-optimal test statistic
Outcomes of the Analysis Test group: the individuals are not in case and reference Data: YRI population from HapMap and Reference Genome (40 cases, 40 for the test group)
Performance Evaluation Implementation over Hadoop Clouds Public Cloud: 20 nodes on FutureGrid (8-core 2.93GHz Intel, 24 GB memory, 862 GB local disk) Private Cloud: a single node Data 10 million real human microbiome data with 4% human DNA, total 250 MB Reference genome: Chromosome 1 (252.4 million bps), Chromosome 22 (52.3 million bps)
Preprocessing and Seeding
Workload on the Privacy Cloud
New Security Challenges, New Opportunities Side-channel leaks and API integrations Wouldn’t be such a serious problem without SaaS Secure read mapping Made possible by the cloud’s data processing capabilities Made practical by the hybrid-cloud infrastructure Key: features of the cloud and the problems it works on
What’s Next? Hybrid Web Applications Hybrid-Cloud Secure Data-Intensive Computing