Web Publishing Architecture • Look at the various components of Web publishing, many of which are common to most Web applications. • HTML Document Publishing • CGI Scripting Applications • Content Management Systems
The Web Browser is… • A program available everywhere. • A generalized information interface. • A client that connects to distributed servers. • A single point of control over the Web fought over by Microsoft and Netscape.
Key Challenges Were on the Client How to present information in a Web browser.
Developed by Pei Wei in 1992, Viola was an application toolkit, built on top of the X Window System. Its www browser was a sample application, integrating styled text and graphics.
In this example, the Viola browser embedded another application and its controls.
World Wide Web Wizards Workshop (July 1993) • Early attempt to forge common development agenda. • Tension between slow-moving standards development vs. seat-of-the-pants innovation
HTML • Hypertext Markup Language • A simple SGML vocabulary or tagset • Control content and layout of presentation. • Human readable data format.
The Web, Circa 1995 • Publication Models
Key Challenges Were on the Server • Publishing Becomes a Server-side Application • Apache, mod_perl and Perl. • Didn’t Much Depend On Client-Side Capabilities • Development of Custom Content Management Systems • Manage the publishing process
HyperText Transfer Protocol (HTTP) • HTTP is a Request/Response Protocol • "HTTP is a protocol with the lightness and speed necessary for a distributed collaborative hypermedia information system. " Tim Berners-Lee, 1992, Basic HTTP • Achieves a loose coupling of client and servers. • References: HTTP 1.1 Spec
Anatomy of a Request • Browser locates server (oreilly.com) and makes a connection to port number 80 (in a typical configuration) on that machine.
Full Request GET /index.html HTTP/1.1 Host: localhost Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/xbm, */* Accept-Language: en Connection: Keep-Alive User-Agent: Mozilla/4.0 (compatible; MSIE 4.5; Mac_PowerPC)
Server • Returns status of request. • Sends header info followed by a blank line. Content-type: text/html Content-length: 3896 • Sends document or data from a CGI program. • Objects embedded in document such as images generate new requests to the server. HTTP/1.1 200 OK
The Apache Web Server • The Apache Group, an Open Source software project, has developed the leading Web server with over 50% of all servers. • Web servers are fairly stable technology. • Reference: Apache.org, Netcraft survey • Apache: The Definitive Guide
Apache Directories • Have you set up a Web server? • /usr/local/apache is unix/linux installation directory • /htdocs is directory for HTML files. • /cgi-bin is for scripts. • /conf is configuration directory where file httpd.conf lives.
Configuring a Web Server • Site administrator usually takes care of the following server configuration issues by editing httpd.conf: • Document and content type mapping • Authentication and Access Control • Logging • Virtual Servers
URL Management • Decision about URLs: • Relative vs. Absolute links on the site. • Permanent addressing vs. current addressing • /98/09/21/document.html • today.html • What are you going to do when things change? • URLs can be brittle.
Authentication • Authentication is asking a user to provide identification, usually a user name and password. • Basic Authentication uses the htaccess file. More sophisticated applications will manage this information in a user database. • Apache section
Logs Found in logs directory: access.log Log entry tells you: IP number – Date/Time – Request • 188.8.131.52 - - [20/Sep/2001:02:10:08 -0700] "GET / HTTP/1.0" 200 8087
Logs Processing • Some of the tasks surrounding logs: • Log rotation (Day, week, month) • Log compression (files grow large) • Log file parsing and reporting • Reverse DNS lookup • References: Lincoln Stein, Yahoo's list of tools, Marketwave's Hitlist Examples
Server Hardware and OS • Server farms or hosting services are set up to manage the hardware, the OS and the network for 24/7 operation. • Properly configured PC's can be powerful enough to handle sizable load, obviating the need for more expensive servers from Sun. • Small dedicated Web server devices such as the Cobalt server with embedded Linux and Web administration.
Web Publishing • HTML Authoring Systems • Server Side Includes • CGI Applications • Templates
Authoring Systems • Debate over whether to show or hide HTML to authors. • Page Creation Tools • HTML Editors • Homesite; BBEdit. • Web Site Authoring Systems • FrontPage; GoLive; NetObjects; Dreamweaver • Market share estimate of authoring tools. (Security Space)
Server Side Includes • Insert dynamic information such as date or time. • Include file shared by a set of documents. • One way to create a consistent page layout across the site. • Example: Use server-side include to put common information for a page header or footer in a separate file and source it from all documents.
CGI Applications Common Gateway Interface A web server passes control to an application, which generates a dynamic HTML document and returns it to the server. • Forms-based Input and Interaction • Session management • Transactions
Scripting • Perl became the favored scripting language for Web applications. • CGI modules in Perl and Python provide a higher-level interface for the programmer and hide the low level details. • Script installed in server's cgi-bin directory. • HTML document containing form references the CGI script.
Stateless Transactions • HTTP is a stateless protocol. Each interaction is independent of the others. • Maintaining state or session tracking is necessary for a number of applications such as shopping carts.
Characteristics • Embed programming code inside of HTML documents. • Languages like PHP, Cold Fusion and ASP can be viewed as extensions to HTML. • One consideration is whether there’s clean separation between code and documents.
Cold Fusion • Cold Fusion from Allaire/Macromedia is a Windows/NT/2000 application. • Server is configured so that files ending in .cfm are passed to the Cold Fusion application server.
Cold Fusion and HTML file <H2>New Form</H2> <FORM ACTION="searchquery.cfm" METHOD="Post"> Last Name: <Input Type="text" Name="LastName"> <Input Type="Submit" Value="Search"> </FORM>
Application file (.cfm) <CFQUERY Name="EmployeeList" Datasource="Examples"> Select * From Employees WHERE LastName = '#LastName#' </CFQUERY> <body> <H2>Results</H2> <CFOUTPUT> <P>The search for #Form.LastName# returned the following: </CFOUTPUT> <CFOUTPUT QUERY="EmployeeList"> <HR> #FirstName# #LastName# (Phone: #PhoneNumber#) <BR> </CFOUTPUT>
Database Servers • Flat-file database, dbm files • Free • MySQL and Postgres • Mid-range • MS Access and SQL Server • Commercial High-end • Oracle 8i, Sybase, IBM’s DB2
Database Woes • Generating pages dynamically can impact a site’s performance and administration. • Many applications find ways of generating static pages and caching them • Should documents be stored in the database?
Databases • The standard application interfaces to the database are through SQL and/or ODBC. • SQL can be used to create or modify data records in the database as well as to select sets of data from it.
SQL Example: • SELECT NAME, ADDR FROM EMPLOYEES WHERE NAME EQ "DALE DOUGHERTY" • Languages such as Perl, Python and Java all provide fairly standard interfaces for accessing databases. • Earlier Cold Fusion example simply embeds SQL statement in an HTML document. The CF application passes the query to the database server, which processes the request and returns the data to the application, which passes it back to the web server.
Application Server Issues • What degree of technical expertise is required to build applications? • How portable is the application? How much does it tie you to one OS or Web server or language? • Is the server API proprietary or standardized?
Application Service Provider (ASP) • A Web site is increasingly put together as a set of components that could be software or services sourced from different sites. • ASPs are providers of services rather than software. Take away the burden of owning and maintaining software.
Content Management • A specialized application server • A system for managing the production, development and delivery of content by a team of producers.
CMS Features • Manages "metadata" to build collections of documents and create different views. • Generates content from database • Provides for staging of content; replication. • Administrative interface to manage scheduling and workflow • Manage interactions with customers and keep track of vital information. • Allow for distribution of information in multiple formats.
Implementing Layouts in CMS • Which Layout Strategy Will You Use? • Server Side Includes (SSI) • Style sheets (CSS) • Table layout vs block positioning • Templates • XSLT (transformation of XML into HTML)
CS (Community Server) • Content Management System written using Apache, Perl, MySQL • Used for O’Reilly Network, XML.com and Perl.com. • Demo
Other CMS • Vignette • Expensive, commercial CMS system • Ars Digita • Java-based platform. • Zope • Python-based
Advantages of CMS • An cost-effective way to manage information and users. • A consistent administrative interface for building and managing complex Web sites. • A robust development platform that provides common publishing functionality and allows customization.
Other Major Components • Advertising Server • Search Engine • Conferencing System
Ad Server • Software or Service? • The ad server provides for the dynamic rotation of advertising banners on a site, and the collection of data to track impressions and click-throughs. • Ad traffic adminstrator sets up campaigns to run on the server. • Advertisers use the server to get real-time reporting on how ad is doing.