1 / 9

Selecting and Combining Tools

Chapter 14. Selecting and Combining Tools. F. Duveau 02/03/12. Your toolkit. - Regular expressions. To search and replace. - Shell commands. To interact with your computer at the command line. - Shell scripts. To combine / automate command-line operations.

aldan
Télécharger la présentation

Selecting and Combining Tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 14 Selecting and Combining Tools F. Duveau 02/03/12

  2. Yourtoolkit - Regular expressions To search and replace - Shell commands To interactwithyour computer at the command line - Shell scripts To combine / automate command-lineoperations - Python programs For more advancedprocessing How to chose the convenienttools for a particulartask? It depends on the kind and number of input data and the amount and frequency of work to perform

  3. Type of input data - Data from user input raw-input() sys.argv[] - Data from the Internet curl() or wget() commands, urllib2 module - Data fromother programs Output of other programs canbecapturedusing redirection (> or >>) or the pipe operator (|) - Data from hardware Chapter 22 gives background on ho to interactdirectlywith the physical world - Data fromspreadsheets Not suitable to manage large and complexdatasets

  4. Gathering data from the Internet

  5. Data fromspreadsheets More suited to smallone-offprojects and simple record-keeping The first stepisusually to resave the data in a basic text format (eg CSV) xlrd Python package to accessspreadsheet file contents .xlsxfiles are compressed archives containing XML files

  6. Parsing one set of text files

  7. Parsingmanytext files

  8. Correction of the exercise - Input data: html files obtainedfrom the MWG website - Output: csv file containingnames, sequences and ordernumbers STEP 1: Create a list of primer sequences and namesfrom 1 file STEP 2: Extendthislist to multiple files (fixednumber of files) STEP 3: Retrieve the filenamesfrom the ordernumbers in manage-order.hmtl Automaticallyloop over all orders STEP 4: Create the list of ordernumbers STEP 5: Write the output file

  9. Collecting data from the Internet The ideais to adapt the script to find the information directlyfromwebpagesinstead of manuallydownloading html files Problem: The websiteissecuredwith HTTPS protocol. https://ecom.mwgdna.com/services/track/manage-orders.tcl Strategy to solvethisproblem(to beimplemented in your Python script): 1) Change the default User Agent of Python to simulate a classical web browser 2) Use a Python module (eg. httplib) thatcansend HTTP orders to the server 3) Send the username and password to the authentication HTTPS address 4) The server shouldsend a responseallowingyour program to accessyour primer data

More Related