890 likes | 1.07k Vues
Internet and Web Programming Basics. Web Programming. Networking. Early computers were highly centralized. Single point of failure User has to access the computer. Low cost computers made it possible to get past these 2 primary disadvantages with a network.
E N D
Internet and Web Programming Basics Web Programming
Networking • Early computers were highly centralized. • Single point of failure • User has to access the computer. • Low cost computers made it possible to get past these 2 primary disadvantages with a network. • Network – “ ... communication system for connecting end-systems”
Networking • End-systems also known as “hosts” • PCs, workstations • dedicated computers • network components • Advantages of networking • Sharing of resources • Price/Performance • Centralized administration • Computers as communication tools
Networking • Mechanisms by which software running on two or more endpoint can exchange messages • Java is a network centric programming language • Java abstracts details of network implementation behind a standard API
Networking - Traditional Uses • Communication (email) • Resource Sharing • File exchange, disk sharing • Sharing peripherals (printers, tape drives) • Remote execution • …
Networking - New(er) Uses • Information sharing • Peer-to-Peer computing • Entertainment, distributed games • E-Commerce • Collaborative computing • Forums • Chats • WWW
LAN - Local Area Network • Connects computers that are physically close together ( < 1 mile). • high speed • Technologies: • Ethernet 10 Mbps, 100Mbps • Token Ring 16 Mbps
WAN - Wide Area Network • Connects computers that are physically far apart (long-haul network). • typically slower than a LAN. • typically less reliable than a LAN. • Technologies: • telephone lines • satellite communications
Client/Server Architecture • Classical network architecture is a client/server (C/S) architecture • A server is a process (not a machine!) waiting for requests from a client. • A client is a process (not a machine!) sending requests to a server and waiting for a reply. • Both client and server are software entities
Client/Server Architecture • Server examples: • finds a document. • prints a file for client. • records a transaction. • Servers are generally more complex • Two basic types of servers: • Iterative - handles one client at a time. • Concurrent - handles many clients in parallel.
Iterative Server • Naïve server implementation is sequential. • handles one request at a time. • Consider a server that needs to read data from a disk • Reading a file from disk takes a long time • The server will be idle while it waits for the data to be read • Other clients will be waiting
time Start request loading Awaiting disk availability Deliver the data across network Iterative Server
Concurrent Server • Threaded servers can process several requests at once. • Each request is handled by a separate thread. • This does not increase the overall amount of work done, but reduces the wastage! • Threaded operation is worthwhile when threads are expected to block, awaiting I/O operations
time Start request loading Awaiting disk availability Deliver the data across network Concurrent Server
Networking Models • Using a formal model allows us to deal with various aspects of networking abstractly. • We will look at a popular model – OSI reference model. • ISO proposal for the standardization of the various networking protocols (1984) • The OSI reference model is a layered model.
Layering • Divide a task into pieces and then solve each piece independently (or nearly so). • Establish a well defined interface between layers . • Major advantages: • Independence • Extensibility
Applications Libraries System Calls Kernel Layered System Example – Unix OS
OSI 7-Layer Model: High level protocols 7 Application 6 Presentation 5 Session 4 Transport 3 Network 2 Data-Link 1 Physical Low level protocols
Communication Protocols • Communication between two sides is defined through a protocol • Protocol – An agreed upon convention for communication • Both sides need to understand the protocol. • Examples: TCP/IP, UDP, others • Protocols must be formally defined and unambiguous • Tons of documentation
Process Process Interface Protocols Transport Transport Peer-to-peer Protocols Network Network Data Link Data Link Interface vs. Peer-to-Peer Protocols • Interface protocols describe the communication between layers on the same side. • Peer-to-peers protocols describe the communication between the sides at the same layer.
Layers • Physical Layer: transmission of raw bits over a communication channel • Data Link Layer: divides data into packets and provides an error-free communication link • Network Layer: selects path between the two sides, fragmentation & reassembly, connection between network types
The Transport Layer • Transport Layer: provides virtual end-to-end links between peer processes • TCP – Transmission Control Protocol • Connection oriented • Reliable, keeps order • UDP – User Datagram Protocol • Connectionless • Unreliable • Fast
Layers • Session Layer: establishes, manages, and terminates sessions between applications • Presentation Layer: responsible for data compression and encryption • Application Layer: anything above previous layers (specific applications)
The Internet • A worldwide network connecting millions of hosts • WAN interconnecting many LANs of various types • Applications • World-Wide Web • Email • FTP • … more and more
The Web • The term World-Wide Web (or simply Web) describes a collection of pieces of information that • are stored as files on particular hosts • can be reached by other connected hosts • These hosts are called Web servers
Web or Internet? • They are not the same things. • The Internet is a collection of computers or networking devices connected together. • They have communication between each other. • The Web is a collection of documents that are interconnected by hyper-links. • These documents are provided by Web servers and accessed through Web browsers.
How does the Web Work? • The Web information is stored in the Web pages • In HTML format. • The Web pages are stored in the hosts called Web servers • In the Web server file system. • The computers reading the pages are called Web clients using specific Web browser • Most commonly Internet Explorer or Netscape. • The Web server waits for the request from the Web clients over the Internet • Internet Information Server (IIS) or Apache.
HTML • Much of the information that is found on the Web is stored as HTML files. • HTML is a scripting language for storing formatted text. • allows to combine other types of information (such as images) in the documents. • Allows interconnection (links) between the documents.
Browsers • Are used to display HTML documents. • The browser is responsible for • fetching the documents • displaying them according to the HTML rules. • Browsing refers to the activity of viewing Web documents through following the links.
Addresses • Each communication endpoint must have an address. • Consider 2 computers communicating over a network: • the communication protocol must be specified • the name of the host (end-system) must be specified • the specific process of the host must be specified.
URLs • Each Web document has a unique identifying address called a URL (Uniform Resource Locator). • A URL takes the following form: http://cs.vu.ac.at/courses/webp/index.htm • URL structure: <scheme>://<user>:<password>@<host>:<port>/<path>;<params>?<query>#<frag> file protocol host
URL fields • The protocol field specifies the way in which the information should be accessed. • The host field specifies the host on which the information is found. • The file field specifies the particular location on the host’s disk (path) where the file is found and the name of the file • There could be more complex forms of URLs but we do not discuss them
IP Addresses • Hostnames (i.e., URLs) are used by people. • Network mechanisms use IP-addresses instead. • Every host connected to the Web has a unique IP address that identifies it. • IP addresses are • 32-bit (4 byte) numbers • usually written as four decimal numbers separated by dots, e.g. 18.104.22.168, where the numbers refer to the above 4 bytes.
Ports • As data traverses the Web, each packet carries not only the address of the host but also the port on that host to which it is aimed. • 65,536 ports are available at each host. • A port does not represent anything physical like a serial or parallel port. • Hosts are responsible for reading the port number from the packets they receive to decide which program should process that data.
Ports • On Unix systems, ports between 1 and 1023 are reserved for the OS processes. • Any process can listen for connections on ports of 1025 to 65,535 as long as the port is not already occupied. • In Windows and Mac-OS, any process can listen to any port.
Well-Known Ports • Many services run on well-known ports. • Web HTTP servers listen for connections on port 80. • SMTP servers listen on port 25. • Echo servers listen on port 7. • FTP servers listen on port 21. • Telnet servers listen on port 23. • DayTime servers listen on port 13. • whois servers listen on port 43. • finger servers listen on port 79.
Client-Server Model Server application Client application Port 5746 Server machine 22.214.171.124 Client machine 126.96.36.199
Hostnames • However it is inconvenient for people to remember IP addresses and ports. • Many hosts have in addition to IP address a human readable hostname. • www.vu.ac.at • www.cnn.com
Hostnames • Hostnames have hierarchical structure. • Hostname www.cs.vu.ac.at, refers to the host www in the computer science (cs) department of the Vienna University, which is an Academic Campus (ac) in Austria (at). • The rightmost part describes the main domain of the host. Left to it, a sub-domain, and further left more specific sub-domains.
Domains • There are generic domains • com commercial organizations • edu educational institutions • gov U.S. governmental organizations • Most countries use country domains: • il Israel • uk United Kingdom • jp Japan
DNS Servers • The mapping between the hostnames and the corresponding IP address is done by DNS. • It is not feasible for the Web browser to hold a table mapping all the hostnames to their IP-addresses. • New hosts are added to the Web every day • Hosts change their names and IP addresses.
Web Protocols • It is a special set of rules that endpoints (both clients and servers) in the Web use to handle communication. • Transmission Control Protocol (TCP) – To exchange messages with other endpoints at the information packet level. • Internet Protocol (IP) – To send and receive messages at the address level. • Hypertext Transfer Protocol (HTTP) – To deliver HTML, sound, audio files on the World Wide Web.
HTTP Protocol • HyperText Transfer Protocol • Used between Web-clients (e.g., browsers) and Web-servers • Text based • Built on top of TCP protocol • Stateless protocol • No data about the communicating sides is stored
HTTP Transaction - Request • Client sends a request that looks like • GET /index.html HTTP 1.0 • GET is a keyword • Index.html is the requested document • HTTP 1.0 is the protocol version that the client understands • The request terminates always with \r\n\r\n. • Client may send optional information • For example, <keyword:value> list • User-Agent : browser name • Accept : formats the browser understand
HTTP Transaction - Request • Request example: GET /index.html HTTP 1.0 User-Agent: Lynx/2.4 libwww/2.1.4 Accept: text/html Accept: text/plain • In addition to GET, clients can request • HEAD – Retrieve only header for the file • POST – Send data to the server • PUT – Upload a file to the server
HTTP Transaction - Response • Server response • sends status line HTTP/1.0 200 OK • sends header information Content-type: text/html Content-length: 3022 ... • sends a blank line (\n) • sends document contents (e.g., html file)
HTTP Transaction - Response HTTP/1.1 200 OK Date: Fri, 16 Apr 2004 18:48:13 GMT Server: Apache/1.3.29 (Darwin) Last-Modified: Fri, 16 Apr 2004 10:15:59 GMT ETag: "58db37-89-407fb25f" Accept-Ranges: bytes Content-Length: 137 Connection: close Content-Type: text/html <html> <body> <p>Welcome</p> <img src=“smiley.gif"> </body> </html> HTTP Header Blank line Data
HTTP 1.0 response codes • 2xx Successful • response codes between 200-299 indicate that response understood and accepted • 200 OK – the most popular respond indicate success • 201 created – respond to successful POST request • 202 accepted – respond to POST request, meaning processing is not over yet • 204 no content – the server successfully processed the request, but has no content to send back