1 / 30

Investigating Network Access Permissions in Android Apps

Investigating Network Access Permissions in Android Apps . By: Ed Zulkoski. Permissions. Android apps must request permissions for certain features on the device. Permissions are published with the app, and users may choose to not install “suspicious” apps.

maalik
Télécharger la présentation

Investigating Network Access Permissions in Android Apps

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Investigating Network Access Permissions in Android Apps By: Ed Zulkoski

  2. Permissions • Android apps must request permissions for certain features on the device. • Permissions are published with the app, and users may choose to not install “suspicious” apps. • Many apps require network and internet access, but do not always say why.

  3. Why does this app need internet? • Facebook

  4. Why does this app need internet? • Words With Friends

  5. Why does this app need internet? • Super Mario Live Wallpaper

  6. Security Risk of Apps • > 800,000 apps in Google Play Store. • > 7 billion app downloads in 2009, • $4.1 billion. • In a 2012 study1 of over 400,000 Android apps, over 100,000 were classified as potential security risks. • “26 percent of apps access private information such as email and contacts, with only 2 percent of apps being from highly trusted publishers.” 1 - Bit9: "Pausing Google Play: More than 100,000 Android Apps May Pose Security Risk"

  7. Goal • Determine why an app needs network and internet permissions. • Ideal – learn exactly why an app needs internet access • Probably unrealistic • Subgoal – detect if an app uses an ad library

  8. Dataset • 281,079 free apps from the Google Play Store from 2011. • Contains multiple versions of some apps. • Does not have labels indicating if an app uses an ad service (unsupervised learning).

  9. Features • The .apk file. • The Android manifest file

  10. What we don’t have • It would be useful to have paid apps that have a corresponding free app. • Many paid apps remove ads.

  11. Feature Construction • Simple keyword search in app description • Ad, Advertisement, AdMob, etc. • Ivy Leaf Wallpaper Summary: • Problem: “This app is provided to you for free and without AdMob ads.” “FAQ:1. Why is there "Internet access" permission?It is for Google ads on setting screen only, nothing else. Pro version is adsfree with more features.”

  12. Feature Construction • Similar keyword search in Android manifest. • Intents and Activities <activity android:name="com.google.ads.AdActivity” …>

  13. Preprocessing • Remove duplicate app “snapshots.” • Take the latest version (from Alex’s brainstorming session). • Remove any apps without INTERNET and ACCESS_NETWORK_STATE permissions.

  14. Why use ML? • Why not just search for specific activity names in the manifest file? • AdMob: <activity android:name="com.google.ads.AdActivity" … > • Many ad services with different requirements: • Admob, Millennial Media, MobClix, Tapjoy, AdWhirl, Greystripe, InMobi, Airpush, Startapp, Leadbolt, Pontiflex, MobFox, Komli Mobile, MoPub, MdotM, inneractive, Adlantis, Smaato, Daum, AppLift, Mediba, Cauly, YouMi, AdMarvel, madvertise, Sellaring, etc.

  15. Approach 1 • Clustering • Perform supervised learning using the cluster as a feature on a subset of apps. • Still need to know whether these apps use ad services (time consuming).

  16. Approach 2: Active Learning • We don’t want to hand label 200,000+ apps. • A machine learning algorithm can achieve greater accuracy with fewer training labels if it is allowed to choose the data from which it learns. • The learner selects a data instance that it wants to know the value of and queries an oracle (e.g. a human annotator).

  17. Pool-based Active Learning Does this app use ads? Burr Settles: "Active Learning Literature Survey"

  18. Query Strategy • Method to determine the informativeness of an unlabeled instance. • Uncertainty Sampling – choose the instance least certain about. • Example: binary classification – choose instance closest to 0.5 for the current model.

  19. Pool-Based Uncertainty Sampling

  20. Pool-Based Uncertainty Sampling • Seed the learner with known instances.

  21. Pool-Based Uncertainty Sampling • Get the value of the instance least certain about.

  22. Pool-Based Uncertainty Sampling • Get the value of the instance least certain about.

  23. Pool-Based Uncertainty Sampling • Get the value of the instance least certain about.

  24. Pool-Based Uncertainty Sampling • Get the value of the instance least certain about.

  25. Toy Pool-Based Example Burr Settles: "Active Learning Literature Survey"

  26. Text Classification Example Burr Settles: "Active Learning Literature Survey"

  27. Approach 2 • Start with a small set of labeled apps. • Use pool-based active learning (with an underlying logistic regression model) to select new apps to query. • Tell the learner the correct label for the query. • Repeat until I am tired? (discussion point) • Or the model has stabilized.

  28. Conclusion • Many apps require internet access, but the app’s true intentions may be unknown. • It would be useful to determine why apps require these permissions. • Use pool-based active learning to approach this problem.

  29. Discussion • What is the best way for evaluating performance? • How to handle “skewed” data • Possibly many more free apps with ads than without. • High number of apps using AdMob. • When do I stop labeling new instances?

More Related