HTTPProtocol Overview: TheHypertext Transfer Protocol (HTTP) is an application-level protocol.HTTP is used for collaborative and distributed systems. This is thefoundation for data communication for the World Wide Web. HTTP is ageneric protocol which can be used for other purposes by usingextensions of its request methods, error codes, and headers.HTTP is a TCP/IPprotocol.
HTTP is used to deliver data like HTML and image files,query results, sound, video and other multimedia files on the WWW.The default port number for HTTP is 80. HTTP’sdevelopment was initiated by TimBerners-Lee at CERN.HTTP’s Standards development was coordinated by the InternetEngineering Task Force (IETF) and the WorldWide Web Consortium (W3C), culminating in thepublication of a series of Requestsfor Comments (RFCs). The first definition ofHTTP/1.
1 occurred in RFC2068 in 1997, although this was obsoletedby RFC2616 in 1999 and then again by RFC7230 and family in 2014.A later version, thesuccessor HTTP/2.0,was standardized in 2015, and is now supported by major web servers. HTTPworks as a request–response protocolin the client–servercomputing model.For example, a webbrowsermay be the client andan application running on a computer hosting a website maybe the server.The client sends an HTTP request messageto the server. The server,which provides resources suchas HTML files and other content, or performs other functions onbehalf of the client, returns a response messageto the client.
The response contains completion status informationabout the request and may also containrequested content in its message body. HTTPis designed to permit intermediate network elements to improve orenable communications between clients and servers. HTTP isan applicationlayer protocoldesigned within the framework of the Internetprotocol suite.HTTPresources areidentified and located on the network by UniformResource Locators (URLs),using the UniformResource Identifiers (URI’s)schemes http and https. Basic FeaturesThereare three basic features that make HTTP a simple but powerfulprotocol: HTTP is connectionless : The HTTP Client send an HTTP request and after a request is made, the client disconnects from the server and waits for a response from server. The server processes the request and re-establishes the connection with the client to send a response back. HTTP is media independent : It means that data of any type can be sent by HTTP as long as the client and the server know how to handle the content of data. It is necessary for the client and the server to specify the type of content using appropriate MIME-type.
HTTP is stateless : HTTP is connectionless and it is a direct result of HTTP being a stateless protocol. The server and client are aware of each other only during a current request. Afterwards, both of them forget about each other.
Due to this nature of the protocol, neither the client nor the browser can retain information between different requests across the web pages. Basic ArchitectureThe followingdiagram shows basic architecture of HTTP. The HTTP protocol isa request/response protocol based on the client/server basedarchitecture where web browsers, robots and search engines, etc. actlike HTTP clients and the Web server acts as a server. Client: The HTTPclient sends a request to the server in the form of a request method,URI, and protocol version, followed by a MIME-like message containingrequest modifiers, client information, and possible body content overa TCP/IP connection. Server: TheHTTP server responds with a status line, including the message’sprotocol version and a success or error code, followed by a MIME-likemessage containing server information, entity meta-information, andpossible entity-body content. HTTP VersionHTTPuses a
” 1*DIGITExample:HTTP/1.1 Uniform Resource IdentifiersUniformResource Identifiers (URI) are simply formatted, case-insensitivestring containing name, location, etc. to identify a resource, forexample, a website, a web service, etc.
A Syntax of URI used for HTTPis as follows:URI= “http:” “//” host “:” port abc_path “?”query Hereif the port isempty or not given, port 80 is assumed for HTTP.Example:http://abc.com/users/web.html Date/Time FormatsAllHTTP date/time formats must be represented in Greenwich Mean Time(GMT), without exception. HTTPapplications are allowed to useany of the following three representations of date/time formats:Mon,18 Dec 2003 03:30:15 GMT ; RFC 822, updated by RFC 1123Monday,18-Dec-03 03:30:15 GMT ; RFC 850, obsoleted by RFC 1036MonDec 18 18:30:15 2003 ; ANSI C’s asctime() format Character SetsWeuse character sets to specify the character sets that the clientprefers.
Multiple character sets can be listed separated by commas.If a value is not specified, the default is the US-ASCII.Example:US-ASCII HTTP-MessageHTTPmakes use of the Uniform Resource Identifier (URI) to identify agiven resource and to establish a connection. Once the connection isestablished, HTTP messages arepassed in a format similar to that used by the Internet mailRFC5322 and the Multipurpose Internet Mail Extensions (MIME)RFC2045. These messages include requests fromclient to server and responses fromserver to client which will have the following format:HTTP-message=
Message Start-LineMessagestart-line has the following syntax:Start-line= Request-line | Status-lineExample:GET/web.html HTTP/1.0(Request-line sent by client)HTTP/1.
0200 OK(Response-line sent by server)2.Header FieldsHTTPheader fields provide required information about the request orresponse, or about the object sent in the message body. There arefour types of HTTP message headers: General-header: These header fields have general applicability for both request and response messages. Request-header: These header fields have applicability only for request messages. Response-header: These header fields have applicability only for response messages. Entity-header: These header fields define Meta information about the entity-body or, if nobody is present, about the resource identified by the request.Allthe above mentioned headers follow the same generic format and eachof the header field consists of a name followed by a colon (:)and the field value as follows:Message-header= field-name “:” field – value3.
Message BodyThemessage body part is optional for an HTTP message but if it isavailable, then it is used to carry the entity-body associated withthe request or response. If entity body is associated, thenusually content-type and Content-Length headerslines specify the nature of the body associated. HTTP Request:AnHTTP client sends an HTTP request to a server in the form of arequest message which includes format like: A request-line Zero or more header(General/request/entity) fields followed by CRLF An empty line indicating the end of the header fields Optionally a message-body Request-Line The Request-Linebeginswith a method token,followed by the Request-URI and the protocol version, and ending withCRLF. The elements are separated by space and SP characters.Request-Line=Method SP Request-URI SP HTTP-version CRLF Request Methods: HTTP Methods: There are 9 main methods in HTTP. These methods indicate what actionhas to be taken on the resource. These methods make sure that theclient gets the expected result i.
e. the resource he/she desires.Generally resources are the server’s files or any executablerunning on the server.
Accordingto HTTP/1.0 specs three main methods were specified:1)GET2)HEAD 3)POSTLaterin HTTP/1.1 specs 5 more methods were defined:4)OPTIONS5)PUT6)DELETE7)TRACE8)CONNECTRFC5789 added the PATCH method9)PATCH GET: This method is used to retrieve the aforementioned resource.
It is the most popularly used method to retrieve information. Parameters for the requested data are added in the query string itself. Response of a GET request can be cached. Not preferable for transmitting confidential information like passwords. Aconditional GET can also be used by adding certain constraints likeIf-Match,If-None-Match, or If-Range header field, If-Modified-Since,If-Unmodified-Since. A conditional GET works only if certainconstraints in header field are fulfilled.
Overnetwork usage can be avoided by using partial GET, which specifiesRange header field in the request itself. HEAD: This method is quite synonymous to GET method but the only striking difference between the two is that the HEAD method returns an empty Message Body in response.Responseto HEAD may be cached if it varies from previous cached versions.Thismethod is popularly used to fetch meta-information from headers. POST: This method is used to send some data to server this data is encapsulated in request body, generally to store it. For e.g.
: web forms. POST method has no restrictions to amount of data that can be sent as a part of query as it encapsulates this data into request message body. Whereas in GET method there’s restriction over amount of data in query string.Alsothis method is preferable while transferring confidential informationlike passwords as URL encoded query won’t appear in browser’saddress box. Itis an idempotent method. OPTIONS: This method is used to retrieve information about the HTTP service options for target resource available at the sever end or intervening intermediary’s end.
We can also determine the server’s capabilities using this method with ‘*’. Responsesto this method are not cacheable. PUT: This method updates representation of aforementioned URI. If aforementioned URI already exists then this must be considered as the updated version of the URI at origin server.
If no such resource is pre-existing then resource with specified URI is created and 201-created response is replied to user agent else 200 or 204 is sent to UA.Responsesto this method are not cacheable. DELETE: This method is used to delete the aforementioned resource in the URL. This method can be overridden by server intervention and there is no assurance that user’s requested file will be deleted in spite of receiving positive response from server.
Responsesto this method are not cacheable. TRACE: This method kind of simulates the sever end view to the client for diagnostic or debugging purposes, i.e. it shows full detailed representation of the request received by the server from the client. CONNECT: This method is used for tunnelling purpose. Here a tunnel is established between the client and the server using one or more proxies and on top of it, it can also be secured by using TLS. PATCH: This method is used to make changes to the predefined URI as directed in request entity.
It is quite synonymous to the PUT method but the difference underlies in the way both of them are processed by the server. In PUT the new version is considered to be the modified version whereas in PATCH certain predefined instructions from the request determine whether the origin server resource would get modified or not. PATCH’s response can be cached under certain suitable circumstances. Request-URI: It is a uniformResource Identifier and identifies the resource upon which to applythe request.Request-URI= “*”| absoluteURI |abs_path | authority Request Header Fields: Therequest-header fields allow the client to pass additional informationabout the request, and about the client itself, to the server. Thesefields act as request modifiers.
. Here is a list of some importantRequest-header fields that can be used based on the requirement: Accept-Charset Accept-Encoding Accept-Language Authorization Expect From Host If-Match If-Modified-Since If-None-Match If-Range If-Unmodified-Since Max-Forwards Proxy-Authorization Range Referer TE User-Agent You can introduceyour custom fields in case you are going to write your own customClient and Web Server. Forfetching the HTTP request of hello.htm page:GET/hello.htmHTTP/1.1User-Agent:Mozilla/4.
0Host:www.httpreqexa.comAccept-Languages:en-usAccept-Encoding:gzip,deflateConnection:Keep-Alive HTTP RESPONSE:Basically,response message is a message which is sent in response to requestmessage.
Belowpicture shows general format of HTTP request message.Letus take one example to understand it more deeply.HTTP/1.1200 OKConnection:closeDate:Sat,07 Jul 2010 12:00:15 GMTServer:Apache/1.3.0(Unix)Lastmodified: Sun,6 May 2010 09:23:24 GMTContent-length:5428Content-type:text/html(datadata data data.
..)Responsemessage has three sections:(i)an initial status line(ii)sixheader lines (iii)entitybodyStatusLine:The start line of HTTP response is called status line.Ithas three fields:(1)Theprotocol version field(2)Astatus code which indicates failure of the request(3)Acorresponding status message Inabove example, status line indicates that server is using HTTP/1.1which is protocol version .200 is the status code and OK iscorresponding status message which indicates that everything is OK.Headerlines:Itfollowsthe same structure as any other header for example, acase-insensitive string followed by a colon (:) and a value whosestructure depends upon the type of the header. The whole header with its value presents as a single line.
Inabove example, server uses Connection: close header line and tell theclient that it is going to close TCP connection after sendingmessage. Inabove example, the Date header line indicates the time and date whenHTTP response message was created and sent by server. Here note thatit is not the when the object was created or last modified, it is thetime when server retrieves the objects from its file system andinserts the object into the HTTP response and sends it. Inabove example the server header line indicates that the message isserved (generated) by apache web server. Inabove example, User-agent which is analogous by server is a responseheader line. Inabove example, Last-Modified header line indicates the date and timewhen the object was created and last modified.
It is critical forobject caching. Inabove example, Content-Length header line indicates that the objectin the entity body is HTML text. Thereare so many response headers are available. We can divide it in someseveral groups: General headers, e.g., Via, which applies to the whole message.
Response headers, e.g., Vary and Accept-Ranges, which gives additional information about the server which doesn’t fit in the status line. Entity headers e.g, Content-Length, which applies to the body of the request. Obviously no such headers are transmitted when there is no body in the request.EntityBody:The last part of a response message is the body.
Not all responseshave one: responses with a status code, like 201or 204,usually don’t. Bodiescan be divided into three categories:Single-resourcebodies: it contains a single file of known length. It is defined by thetwo headers: Content-Typeand Content-Length.Single-resourcebodies:it contains a single file of unknown length.It is encoded by chunkswith Transfer-Encodingset to chunked.Multiple-resourcebodies:it contains a multipart body in which each part contains a differentsection of information.
These are relatively rare. HTTP STATUS CODE:HTTPstatus codes are standard response codes given by the server on theinternet. It helps to identify the cause of the problem when a pageor other resources do not load properly. HTTP status code is theserver-side response in the form of 3-digit integer where the firstinteger represents the class of response.1xx: Informational Thestatus code in this class indicates that the request has beensuccessfully received and the process is continuing. Server may askto switch protocol or server successfully received the requestheaders and may ask the client to continue with the request body. Thisclass contains following status codes :- 100: It means only a part of the request has been received by the serverbut as it has not been rejected, the client should continue with therequest.
101: It occurs when the server switches protocols. 2xx: Successful Thestatus code in this class represents that the action was successfullyreceived, understood and accepted. Thisclass contains following status codes :- 200: It means the request is okay.
201: It means the request is complete and a resource is created. 202: It means the request is accepted for processing but the processingis not complete. 203: It means the information in the entity header is not from theoriginal server but from a local or third party copy. 204: It means a status code and a header are given in the response butthere is no entity body in the reply. 205: It means the browser should reset the content of the form used forthe transaction. 206: It means the server is returning the partial data.3xx: Redirection Thestatus code in this class is used to inform the client that requestedURL has been moved to a different URL either permanently ortemporarily in order to complete the request.
Thisclass contains following status codes :- 300: It means multiple choices and the user can select a link and go tothat location. 301: It means the requested page has been moved permanently. 302: It means the requested page has been moved temporarily. 303: it means the requested page can be found under a different url. 304: It means the url has not been modified since the specified. 305: It means the requested page must be accessed through the proxy. 306: It means the code is reserved and no longer used.
4xx: Client Error Thisclass status code is used to represents client-side errors orrestrictions while requesting for a page. That means either server isunable to understand the client’s request or clients’s access isforbidden to the requested page or it may require authenticationparameters to access the requested page. It indicates that therequest contains some incorrect syntax or cannot be fulfilled.
Thisclass contains following status codes :- 400: It means the server did not understand the request. 401: It means the requested page requires an authorization. 402: It means one can not use this code because of payment required. 403: It means access is forbidden to the requested page. 404: It means the server can not find the requested page. 405: It means the method in the request is not allowed.
406: It means the server generates a response which is not accepted bythe client. 407: It means a proxy authentication is required. 408: It means request timeout. 409: It means the request cannot be completed because of some conflict.
410: It means the requested page is no longer available. 411: It means the server will not accept the request without contentlength. 412: It means that the precondition given in the request is evaluated tofalse by the server. 413: It means the request entity is too large so the server will notaccept the request.
414: It means the url is too long so the server will not accept therequest. 415: It means the request is not accepted because media type is notsupported. 416: it means the requested byte range is not available and is out ofbounds. 417: It means the expectation failed because it could not be met by theserver.5xx: Server Error Thestatus code in this class includes errors where the request for apage is understood by the server but is incapable of filling itbecause of some reason. It represents that the server is failed tofulfil an apparently valid request.
Thisclass contains following status codes :- 500: It means the server met an unexpected condition. 501: It means the server did not support the functionality required. 502: It means the server received an invalid response from the upstreamserver.
503: It means the server is temporarily overloading or down. 504: It means the gateway has timed out. 505: It means the server does not support the http protocol version.