APV Technical Training

APV Technical Training Chapter 04 – Application Health Check

Objectives • To understand the general concept for Health Checks • To understand Array ADC Appliance offering

Health Checks Concepts (I) • To select a server, a load balancer must know which servers are available. For this, load balancer will periodically send them pings, connection attempts, requests, or anything the administrator considers a valid measure to qualify their state. These tests are called "health checks". • A crashed server might respond to ping but not to TCP connections, and a hung server might respond to TCP connections but not to HTTP requests. • When a multi-layer Web application server is involved, some HTTP requests will provide instant responses while others will fail. Web front might up running, but application server/backend database is down. So there is a real interest in choosing the most representative tests permitted by the application and the load balancer.

Health Check Concepts (II) • Health Checks need apply to other application servers; such as DNS, FTP, Radius, LDAP, SIP, RTSP, MS OCS, etc. • Health Checks have to be spaced enough in time to avoid loading the servers too much, but still be close enough to quickly detect a dead server. • Health Checks detects server down/up impact the server farm. Is one check fail, the server is down? What about Health Checks failed, but application traffic still OK? • Application Health Checks can be complicated, and it's very common that after a few tests, the application developers finally implement a special request dedicated to the load balancer, which performs a number of internal representative tests.

Topics Array ADC Appliance Health Checks • Unit 1: Basic Health Check features • Unit 2: Additional Health Check • Unit 3: Script Health Check • Unit 4: Examples • Unit 5: Known limitations for health check features

Basic Health Check Features • Basic health check • Basic health checks determine the application, service or server availability (Up/Down) status using a specified network protocol format. Array application delivery systems support basic health check types that include ARP, ICMP (ping), TCP, TCPS, UDP, DNS, HTTP and HTTPS. The default health check is assigned with the individual real server (for e.g. represents any backend physical or virtual server in the server farm) configuration. Array appliance will perform basic health check on the IP/port pair of the real server to decide the real service availability. This is also called main health check. • Health Check Interval • Real Service Health Check Types • Health Check Statistics

Basic Health Checking • When SLB real service defined; the default basic health check for the real service is assigned. • Basic Health Check typically is single port/protocol check • ArrayOS constantly (Health Check Interval) monitor SLB Real Service for application availability through various Health Checks • If a Real Service failed Health Check, the Real Service will be marked as DOWN. SLB will not dispatch new connections, requests to a DOWN Real Service. • Reliable Health Check - • When the system is under heavy traffic, some health check messages may be lost. But at the same time, there is still some real traffic to the real service. So the health check status of the real service will be UP, although the health check daemon does not get the response from the real service.

Health Check Interval • Health Check Interval and Timeout (Unit wide) health interval <interval> <timeout>

Real Service Basic Health Check • Each Real Service has its own health check type set by “slb real …” command slb real <type> <name> <ip> [port] [conn] [hc_type] [hc_up] [hc_down] [time_out] • hc_type - The default value is icmp for UDP, tcp for FTP, HTTP, TCP, HTTPS, TCPS, dns for DNS, rtsp_tcp for RTSP, sip_tcp for SIP TCP, and sip_udp for SIP UDP. When the port is 0, the real service can only use “icmp”or “none” health check. • hc_up -The number of health checks to be performed with a positive result before marking the service as “up”. The default value is 3. • hc_down -The number of health checks to be performed with a negative result before determining the service as “down”. The default value is 3. • timeout - Timeout period measured in seconds. This parameter is only required when establishing a real service through UDP. The default timeout setting for an UDP real service is sixty seconds.

Health Check Type – Basic I • ARP Health Check • Limited to L2 MAC based service Health Check • ICMP Health Check • ICMP Health Check is a limited health check method that simply sends an ICMP echo (ping) to the server. If the server responds with an ICMP reply then the server is marked as “up”, and is otherwise marked as “down”. This does not determine whether a service is running or not, or the quality of the service. • TCP Health Check • TCP Health Check opens a TCP connection to the real service. If that connection fails, the real service will be marked as “down”. The real service will be marked as “up” if the TCP connection succeeds. TCP Health Check does not tell you if the service is actually functioning. • TCPS Health Check • TCPS Health Check provides an SSL health check for SLB real servers. If the SSL handshake fails, the server will be marked as “down”. If the SSL handshake succeeds, the server will be marked as “up”. This health check function will check for the availability of the real service by opening an SSL connection to a specific port of the real server or defaults to 443.

Health Check Type – Basic II • HTTP Health Check • The basic built-in HTTP health check opens a TCP connection and sends a HTTP request with one of the HTTP methods pre-defined in the health request table. The Array appliance expects the health response as defined in the response table. The default index chosen to reference request/response table is 0. If the response is not satisfied with the conditions configured in the response table, the server will be marked as “down”, otherwise it is marked as “up”. • HTTPS Health Check • HTTPS Health Check provides an SSL health check for real servers. If the SSL handshake succeeds, the Array appliance will send a pre-defined HTTP request with proper method format to real servers. If the response from the real server is same as the expected response, the real server will be marked as “up”; otherwise, it is marked as “down”. When using HTTPS Health Check, users should pre-define HTTP requests/methods and corresponding expected responses for matching purposes. When using client certificates, the imported client certificate must be encoded by DER rules during client authentication. • DNS Health Check • Sends a DNS request to the real service and expects a DNS reply. • Only for DNS real service

Health Check Type – Basic III • Radius-Auth Health Check & Radius-Account Health Check • Radius-Auth health check and Radius-Acct health checks are provided for checking the availability of the RADIUS servers. • RTSP-TCP Health Check • RTSP health check opens a TCP connection and sends an RTSP "OPTIONS" (Get available methods on the streaming server) request to a RTSP real server. If the real server responds with any of the RFC defined RTSP status codes, then the server will be marked as "up", otherwise, it will be marked as "down". • SIP-UDP& SIP-TCP Health Check • SIP health check opens a UDP or TCP connection and sends a SIP "OPTIONS" request to a real server. This request is used to ask the SIP server for the list of SIP methods it supports. The response may contain a set of capabilities (i.e. audio/video codecs) of the responding SIP server. If the real server responds with RFC defined methods, the server will be marked as "up", otherwise, it will be marked as "down".

HTTP Request/Response HC • For application message exchange based Health Check, 1000 Request/Response entries are available • To configure HTTP Request/Response • health request <request_index> <request_string> • Define an HTTP request. • health response <response_index> <response_string> • Define an HTTP response. • health service<real_name> <request_index> <response_index> • Assign a request and response to a real service.

HTTP Request/Response HC • Configuration Example • Request the “health.html” page from the real service. • Expect a “200 OK” response from the real service. • Assign the above request and response strings to real services “web1” and “web2”.

HTTP Request/Response HC • Configuration Example 2 (virtual web-site hosting on service) • Request the “health.html” page from the “web1a” and “web1b” virtual hosts on the real service. • Expect a “200 OK” response from the real service. • Assign the above request and response strings to real services “web1” and “web2”.

Health Check Setting - WebUI

Health Check Setting – HC Index

Health Check Statistics • To show health check statistics: • show statistics health

Additional Health Check Application availability need be checked with multiple sources. Such as a WEB service is depends on a Database. Additional Health Check can be added to the basic health check for a Real Service by “slb real health ….”, see the following - CLI Commands – slb real health <real_name> <ip> <port> <hc_type> [hc_up] [hc_down] • The command can be used to define additional health check for <real_name> Real Service. • All defined health check need be passed to mark the Real Service is UP.

Additional Health Check • Real Service application availability need be checked with multiple sources. Such as a port applications, WEB service depends on a Database. • Additional Health Check can be added to the basic health check for a Real Service by “slb real health ….”, see the following – slb real health <real_name> <ip> <port> <hc_type> [hc_up] [hc_down] • The command can be used to define additional health check for <real_name> Real Service. • All defined health check need be passed to mark the Real Service is UP. health relation <real name> <relationship> • Define the relationship (and/or) among different health check configurations.

Additional Health Checks Example • Original Health Check • slb real http "google" 74.125.19.103 80 1000 http 3 3 • Health Relationship • health relation google and • Additional Health Checked Defined • slb real health "google" 192.168.0.4 3389 tcp 3 3

Script Health Check • Script Health Check is to support application availability check by simulate application protocol exchanges with the backend application. • Two Script Health Check types are provided for the generic script health check. • “script-tcp” • “script-udp” • Now script health check supports the following advanced application health check: FTP, SMTP, LDAP, RADIUS, POP3, DNS, TELNET Application

Script Health Check Concepts 4. Give List to Real Service CLI: health app Real Service “web1” 3. Define & Add checkers to List CLI: health list, health member List “Listx” 2. Combine Request and Reponse CLI: health checker … Checker “C2” Checker “C3” ….. “Cn” Checker “C1” 1. Outline Request and Response CLI: health request & health response Request Index “n” Response “Index “m”

Request and Response Index • Health Check Index Table contains 1,000 entries for each request and response. • Using “health request …” and “health response ..”; or WebUI “Health Check Setting” under “Real Service” to edit the desired Request and Response • Also; you may import the request/response into the table.

Health Checker – Join Req/Reps • Create a Checker health checker<checker name> <request index> <response index> [time out] [flag] Checker Name: User defined name Request Index: User assigned Response Index: User assigned Time out: How long to wait for the response Flags: Numeric value define success/fail, binary/ASCII. Default is 1. • Bit 0 of the flag: Indicates when the response match expected response pattern • 1: HC Success 0: HC Fail • Bit 1 of the flag: the input/response format 1: binary—the request and response are input in the binary format; 0: ASCII—the request and response are input in the ASCII format.

Define & Add checkers to List • Create a Check List health list <list name> • Add a Checker to a Check List health member< list name> <checker name>[place index]

Give HC List to Real Service • Assign the health checker list to a Real Service health app<real name> <ip> <port> <list name> [<hc_frequence>] [<hc_localIP>] [<hc_localPort>] • NOTE: To use script health check for a Real Service; when configured the Real Service, “script_tcp” or “script_udp” need be used for the <hc_type>. See the following CLI Commands - slb real udp <name> <…….> <hc_up> <hc_down> <time out><hc_type> hc_type- can be icmp script_tcp script_udp radius-auth radius-acct

Health Check Examples • Web Application HC – Keyword match • Web Application HC – TCP Script Health Check • Radius HC – UDP Script Health Check

Keyword Health Check - Web Page • HTTP health check supports Web page keyword matching. The HTTP health check can send request and matching a keyword in the server’s response content. If the keyword is found, this health check is successful. Otherwise, this health check fails. • Configuration example – • Check the keyword “work” in the response from “/cgi-bin/check.pl”. • Using “request index”10 and “response index” 15 for web1.

Script Health Check Configuration • Simple script-tcp health check for Web application • Configure Real Service “web1” with script-tcp HC #slb real http “web1” 10.3.16.188 80 1000 script-tcp 1 1 • Build a script check list “AppCheck1” to check DB and App availability #health request 100 “GET /cgi-bin/dbup.pl HTTP/1.0\r\n\r\n" #health response 100 “database is up“ #health request 101 “GET /cgi-bin/appup.pl HTTP/1.0\r\n\r\n" #health response 101 “application is up“ #health checker “C100” 100 100 3 1 #health checker “C101” 101 101 3 1 #health list “AppCheck1” #health member “AppCheck1” “C100” 1 #health member “AppCheck1” “C101” 2 • Set the HTTP real server using this check list # health app “web1” 10.3.16.188 80 “AppCheck1” Using HC Index to define the request response Define the checker(s) Create the HC List and assign checker(s)

Script Health Check Configuration • Sample RADIUS application health check use script-udp • Configure a RADIUS real server: # slb real udp radius-server 10.3.53.15 1812 1 1 60 script-udp • Configure a script health check list for the server #health request 9"01f1002e18bf9e07114f560d6b6a7e75730212d9459c8a5ce34273c32ca74798f9fd1c" #health response 84 "02f1" #health checker radius-checker 9 84 5 3 #health list radius #health member radius radius-checker • Set the RADIUS real server using this check list # health app radius-server 10.3.53.15 1812 radius Request Index 9 and Response 84 Input/response are cut from sniffer trace (Binary) Checker Flag is set to 3 – Match=OK/Binary

Real Time Alert Real Time Alerts – Email/Log • Application Health Check provides real time alerts on application availability, which helps IT managers in knowing the performance of all applications. Such a monitoring system helps IT managers establish a threshold, and generate real time alerts in case those established thresholds are crossed. IT managers can then take the appropriate steps to overcome those bottlenecks before they start affecting end users.

Server Down Email Alert

Known Limitations • Limitations about Reliable Health Check • Limitations about Script Health Check

Reliable Health Check - Limitation (1) Reliable Health Check sense Real Service product traffic, it does not aware additional health checks. (2) Reliable health check doesn’t support FTP real services. (3) Reliable health check will mark the real service as “UP” only when there are kernel traffic for the real service and the user-land health check failed because of “timeout”.

Script Health Check - Limitation • Interval and timeout in the original health check types and script health check types If you want to check the real server’s health status by original health check type, the health check interval and timeout should be configured with the CLI ‘health interval’ , the default interval and timeout value are 5 seconds. If you want to check the real server’s health status by script health check types (script-tcp and script-udp), the CLI ‘health app’ sets the health check the interval’s value, the default interval value is 2 seconds, the CLI ‘health checker’ sets the timeout about each check, the default timeout value is 3 seconds.

Script Health Check - Limitation • Configured with multi-checker is not allowed in the following case: If the server received the client’s request, send reply and cut the connection (For example: HTTP/1.0 without Keep alive) between server and client. In this case, the number of checkers in the health check list should be just only one. Example: HTTP server deal with the http request like ‘HEAD / HTTP/1.0\r\n\r\n’, send ‘200 OK’ and cut down the connection. • Do script health check with some advance applications, the request may be specific with the server, (it is ugly and boring). Example: do script health check with a RADIUS server, you need to catch the RADIUS client’s request package and RADIUSserver’s response package, pick up the RADIUS protocol data and make them to the standard request, response.

Script Health Check - Limitation • When the CLI ‘no *’ and ‘clear *’are executed, we should know: • When ‘no checker’ and ‘clear checker’ are executed, the checker (s) will be removed from the list and deleted. • When ‘no list’ and ‘clear list’ are executed, the list (s) will be detached with AppHC and deleted. • The CLIs ‘clear conf all/second’ and ‘conf mem/file’ should not be executed by turns in high frequency.

LDAP & Radius health check support • Overview To enhance the SLB health check function, both LDAP and Radius types of health check are supported now. • LDAP health check APV supports health checks on popular LDAP servers like Windows AD, OpenLDAP and SunOne Directory which satisfies customers’ needs for performing health check for LDAP bind and search operations. Note: LDAP health check is only supported for “TCP type” real servers. • RADIUS health check Radius-Auth health check and Radius-Acct health check are provided for checking the availability of the Radius servers. • New CLI Commands (1) • health ldap {real_name|add_hc_name} [bind_dn] [password] [search_dn] [filter_keyword] Add an LDAP health check configuration to a specified real server. no health ldap {real_name|add_hc_name} Remove a specified LDAP health check configuration. show health ldap [real_name|add_hc_name] Display LDAP health check configurations. clear health ldap Clear all the LDAP health check configurations.

LDAP & Radius health check support (continue) • New CLI Commands (2) • health radius auth {real_name|add_hc_name} <secret_string> <username> <password> [resp_code] [attr_list] Configure authentication health check for the Radius server. no health radius auth {real_name|add_hc_name} Remove the specified Radius authentication health check configuration. show health radius auth [real_name|add_hc_name] Show the Radius authentication health check configurations. clear health radius auth Remove all the Radius authentication health check configurations. • health radius acct {real_name|add_hc_name} <secret_string> [resp_code] Configure Radius accounting health check for the Radius server. no health radius acct {real_name|add_hc_name} Remove the specified Radius accounting health check configuration. show health radius acct [real_name|add_hc_name] Show the Radius accounting health check configurations. clear health radius acct Remove all Radius accounting health check configurations. • clear health radius all Remove all the Radius accounting and authentication health check configurations.

LDAP & Radius health check support (continue) • Configuration Example • Configure LDAP health check → Configure the basic health check type as “ldap” for a real server → Configure the additional health check type as “ldap” for a real server • Configure Radius health check → Configure the basic health check type as “radius-auth” for a real server → Configure the additional health check type as “radius-auth” for a real server Note: Please follow the same steps to configure “radius-acct” health check for a real server.

L2 SLB health check enhancement • Overview For ArrayOS TM 8.2, a special reflector function is added to support L2 SLB health check and support all L4/L7 basic network protocol check methods. The reflector will return the received packet to the sender (by MAC). The reflector function makes it possible to sticky health check request/response packets affinity to the same L2 bridge-alike real service/device. In this way, the entire link is checked, which provides more actual application/bridge health information. • New CLI Commands • health ipreflect <reflector_name> <ip_address> <port> [protocol] Configure a reflector for L2 SLB TCP health check. This reflector is set up and runs on another APV appliance. For the parameter “protocol”, only “TCP” is supported. • no health ipreflect <reflector_name> Remove the specified reflector configuration. • clear health ipreflect Clear all the reflector configurations. • show health ipreflect Display all the reflector configurations. • Modified CLI Command • slb real l2ip <real_name> <real_ip> Create L2IP based real services for load balancing operations and protocols. The old parameters “hc_up” and “hc_down” were removed from this command. Note: L2 SLB real services only support additional health check.

L2 SLB health check enhancement (continue) • Configuration Example • Topo Client ----------- APV1 ----------- Firewall1 ----------- APV2 ----------- Real Server ----------- Firewall2 ----------- • APV1 → Step 1 Configure system interfaces’ IP addresses → Step 2 Configure L2 real servers with firewall MAC, no route → Step 3 Configure additional health check for the L2 real servers • APV2 → Step 1 Configure L2 real servers with firewall MAC, no route → Step 2 Configure additional health check for the L2 real servers → Step 3 Enable the health check function → Step 4 Configure reflector for L2 SLB TCP health check

L2 SLB health check enhancement (continue) • Case II -- L2 SLB with single APV • Real Service is bridge alike device, such as SMTP Mail Scanner, IPS/IDS, UTM, etc. • No IP address change by APV and Real Service. • Incoming packets hit APV Port3 will be spread to R1 and R2 (on LAN1/Port1) based on CHI method (SRC+DST IP & Hash, IP Flow). • Packet passed through R1 (or R2) from LAN1 to LAN2 will hit APV Port2. APV will route packets to LAN4 router (designated to internal server). • Return traffic to Port4 will be CHI LB to R1/R2 servers and forwarded to Port1 which shall be routed to LAN3 router. • CHI maintains Real Service table based on the sorted Real Service Name.

L2 SLB health check enhancement (continue) • Case II -- Configuration Example → Step 1 Configure system’s interface IP addresses, static route and default route → Step 2 Configure L2 SLB real servers, and configure additional health check for the L2 real servers respectively → Step 3 Configure SLB groups, and add the L2 real servers as members of the groups → Step 4 Configure L2 virtual services → Step 5 Associate the L2 virtual services with SLB groups → Step 6 Configure reflector for L2 SLB TCP health check

Early warning support in health check • Configuration Example With the ArrayOS TM 8.2 release, the APV appliance supports early warning health status for SLB real servers, which helps better check the status and service quality of real servers. The APV appliance will detect the event that real servers’ response time exceeds the specified threshold, and set a counter to record the total number of times that the event occurs. Based on these records, the APV appliance will create “Warning” logs to notify the administrators of the real server’s abnormal status. • New CLI Commands • health earlywarning [threshold] Enable the health early warning feature. By default, this feature is off. threshold The response time threshold, in milliseconds. It ranges from 0 to 60000. 0 means this feature will be disabled. • clear health earlywarning Reset the early warning threshold and the early warning counter. • show health earlywarning Display the configuration about early warning threshold.

Early warning support in health check (continue) • Usage Notes • For the real servers without health check configured, this feature is not available. • Only when the recorded times that a real servers’ response time consecutively exceeds the threshold is the power of 2 (1, 2, 4, 8...), will the APV appliance record “Warning” logs. Once the response time becomes less than the threshold, the old records will be cleared. The counter will reset and begin to collect new records. • At most 1024 records are allowed on the counter. If the number of records exceeds 1024, the counter will be reset to 0 and start to recount. • With the health check function disabled, to execute the command “health on” will reset the health check early warning counter.

Array Network ADC Product Technical Training Thank You

APV Technical Training