270 likes | 387 Vues
Join our Coding for Humanists workshop, designed for those interested in leveraging programming for digital humanities. In the morning, we’ll cover Ruby installation, basic coding principles, and text processing. The afternoon session will focus on web scraping techniques using Wikipedia. The following day dives into intermediate Ruby concepts, API integration with Twitter and Google Maps, culminating in hands-on exercises. Whether you are a beginner or looking to enhance your skills, this workshop offers valuable knowledge for applying coding in humanities research.
E N D
Topics - Today • Morning • Introduction and Justification • Installing Ruby • Small Code Practice Examples • Text Processing • Afternoon • Web scraping Wikipedia • Text Parsing
Topics - Tomorrow • Morning • Intermediate Ruby • Methods • Conditionals • Loops • Afternoon • Working with APIs • Twitter • Google Maps API • Google Charts API
Introductions • Who you are • What you hope to learn • What tools you currently use
Digital Humanities • Data Mining • Data Visualization • Text Analysis • Geographic Information Systems • Multimedia
Why Ruby • Open-Source • Community • English-like syntax • Interpreted • Ruby on Rails – for web apps
Shorter Code • C++ • Ruby
Ruby Installation • Windows • rubyinstaller • Mac • Comes pre-installed • railsinstaller
Text Editor • Any text editor will do. • Preferences: • Windows • SciTE • Notpad++ • Mac • Text Wrangler • Sublime Edit
Running Ruby Code • Write code using text editor • Save file as ____________.rb • Open Terminal (Command Prompt) • Type “ruby ____________.rb
IRB • Interactive Ruby • Can run commands directly from terminal • Start by typing “irb” • “exit” to return to terminal
First, some terminal commands • ls / dir • cd • .. • tab (auto complete) • mkdir • rmdir • rm
You knew it was coming… • Hello World. • First in irb • Then as a .rb file
Resources • The Bastard’s Book of Ruby • CodeAcademy • Ruby-lang.org
Writing to a file • open("hello-world.txt", 'w') • Will create a new file relative to the program that runs it.
Reading a web page require "open-uri” puts open("http://en.wikipedia.org/wiki/Ada_Lovelace").read What if you wanted to grab the text of the New York Times home page?
Data Types • Strings • In quotes • Just characters. • Numbers • Integers • Whole numbers with no decimals • 5 • 458 • -7 • Floats • Numbers with decimals • 1.4 • 0.5 • 3.0
Variables • Words or letters that stand in for something else. • Numbers • Strings • Objects • Examples • Arithmetic • String concatenation • Mixed
String Example x = "http://en.wikipedia.org/wiki" y = "Ada_Lovelace" z = x + "/" + y puts x puts y puts z
Another way… require "open-uri" remote_base_url = "http://en.wikipedia.org/wiki" remote_page_name = "Ada_Lovelace" remote_full_url = remote_base_url + "/" + remote_page_name puts open(remote_full_url).read
Another way… require "open-uri" remote_base_url = "http://en.wikipedia.org/wiki" remote_page_name = ”Ruby" remote_full_url = remote_base_url + "/" + remote_page_name puts open(remote_full_url).read
Saving remote files require "open-uri" remote_base_url = "http://en.wikipedia.org/wiki" remote_page_name = "Ada_Lovelace" remote_full_url = remote_base_url + "/" + remote_page_name remote_data = open(remote_full_url).read my_local_file = open("my-downloaded-page.html", "w") my_local_file.write(remote_data) my_local_file.close
Collections and Loops (1..10).each do |a_number| puts a_number end
Strings and Numbers • What a difference quotes make. • “42” is different than 42 • Try arithmatic
Looping remote_base_url = “http://en.wikipedia.org/wiki" (1..3).each do |some_number| r_url = remote_base_url + "/" + some_number.to_s puts r_url end
Exercise • Visit one of the URLs generated from our numerical loop, such as http://en.wikipedia.org/wiki/1. It's the Wikipedia entry for Year 1.
Exercise • Specify two numbers, representing a start and end year • Use that range to create a loop • Retrieve the Wikipedia entry that corresponds to each iteration of the loop • Save that Wikipedia page to a corresponding file on your hard drive • In a second loop, combine all those year entries into one file, with the name of "start_year-end_year.html"