190 likes | 305 Vues
Secure Search Engine. Ivan Zhou Xinyi Dong. Project Overview. The Secure Search Engine project is a search engine that utilizes special modules to test the validity of a web page.
E N D
Secure Search Engine Ivan Zhou Xinyi Dong
Project Overview • The Secure Search Engine project is a search engine that utilizes special modules to test the validity of a web page. • These tests consist of verifying the web page's certificates and determining if the page in question is a phishing site. • Changed goal: Setup a proxy in the cloud that would be the medium to communicate between the client and the SSE.
Detailed architecture • Components: • Browser / Phone • Proxy • SSE • Certificate verification module • Phishing status verification module
Detailed architecture Cell Phone Browser Internet Proxy Certificate Verification SSE Phishing Verification
Project Description • Migration of a different SSE project to the Mobicloud. • Test and modify if necessary the SSE in this new environment. • Set up a proxy and its code to make it able to communicate to the SSE server and do proper work.
Task Allocations Ivan: • Migration of another version of SSE • Modify and test new SSE Xinyi: • Setup Proxy • Communication between SSE & Proxy
Technical Details for task 1 • Task 1: Migrate the existing SSE project from a local environment to Mobicloud • All software installation: Apache Tomcat, MySQL, Netbeans, SVN, Java JDK, Jython. • Configuration: VM’s Internet connection, VNC configuration, PATH for Java/Tomcat/SVN, connection for MySQL server • Publish website to Apache Tomcat
Technical Details for new task 2 • Two parts need to be tested carefully • Phishing Filter • Crawler • Phishing Filter • Checks with the database if it is a phishing site or not • See if a third party site(phishtank) has said it is a phishing site • Compute the confidence ourselves.
Technical Details for task 2 • Crawler.py: A Python implementation of java code to crawl webpage’s information • Seeds in Database • Crawl domain • Crawl domain path • Crawl child links • Difficulties encountered: • Webpages’ particularity (Localhost) (solved) • Only connect with port 443. Port 80? (solved) • Unreasonable logic in crawler.py(depth..) (exploring) • Other problems (exploring)
Technical Details for task 3 • Develop a background process to frequently update the bank database for the crawler. • crontab -e • Syntax: min|hour|day|month|weekday|command • 00*** /sse/crawler.py
Technical Details for task 4 • Create an Android component to integrate SSE into a mobile device (tentative). • All applications are written using the Java programming language. • Android SDK. • Eclipse: ADT Plugin. • Current firmware v2.1 update 1 on Droid. • Newest firmware available v2.2.1
Technical Details for task 5 Migration of another version of SSE • Reasons: • Previous SSE was buggy and therefore not stable. • Previous SSE’s phishing filter was not working. • Previous SSE was not working properly on some sites. • Same procedure as last version, but use Eclipse IDE instead of Netbeans IDE.
Technical Details for task 6 Modify and test new SSE • Cleanup multiple copies of code. • Broken PhishingFilter / Google Pagerank • Used to point to: http://zquery.com/api?q= • Now uses (limited): http://webinfodb.net/a/pr.php?url= • Additional: http://api.exslim.net/pagerank/check • Change of threshold value
Technical Details for task 7 Setup Proxy • We set another VM in our mobicloud system as proxy. • The proxy c-icap forward request to web server. • Use VPN to connect from client to the proxy.
Technical Details for task 8 Communication between SSE & Proxy • At the proxy, add code in check_url module to get features: • Request SSE server with CURL and get returned value. • Parse the returned webpage and analyze which kind of site it is(hasCertificate, isPhishing). • Warn and block the do-not-has-certificate and phishing site.
Conclusion • The project is completed. • The SSE server is modified from “do not have phishing checking” to being able to check both certificate and phishing site. • The proxy takes the computation load off theclient side. So now the requests to the SSE, and parsing and analyzing of the results, can all be done at the proxy level.
Thank you! • Comments & Questions.