What is WebSocket?
WebSocket is a communications protocol that enables two endpoints — typically a client and a server — to establish a persistent, bidirectional, full duplex TCP connection between them. The protocol’s goal is to provide a method for browser-based applications to carry out two-way communications without needing to open multiple HTTP connections. For this reason, a WebSocket connection is typically between a web browser and a web server, although it can be between any two endpoints that support the protocol.
The WebSocket protocol addresses many HTTP protocol limitations. With HTTP, if an application such as an instant messaging app requires bidirectional communication, the client must make multiple HTTP calls, and the server must maintain multiple TCP connections. This approach results in greater overhead, latency and bandwidth usage.
The WebSocket protocol provides an alternative to the continuous polling HTTP requires for two-way communications. With WebSocket, the client and server need to maintain only a single connection that supports bidirectional communications.
WebSocket first appeared in the Hypertext Markup Language 5 specification, where it was referred to as TCPConnection — a placeholder for a TCP-based socket application programming interface (API).
Ian Hickson and Michael Carter initially conceived the WebSocket protocol in 2008. The Internet Engineering Task Force standardized it in RFC 6455 in 2011.
All major web browsers support the WebSocket protocol. Browsers typically use the WebSocket API to establish WebSocket connections with host servers. The API provides an interface that web applications can use to initiate and manage their WebSocket connections.
The WebSocket protocol supports a wide range of applications that require bidirectional communications, such as live chats, instant messaging, online games and real-time collaboration tools.
Establishing a WebSocket connection
Commonly used in the context of the existing HTTP infrastructure, the WebSocket protocol supports HTTP proxies and intermediaries. It works over HTTP ports 80 and 443. However, WebSocket is not limited to HTTP. Although today’s implementations are deployed in the context of the HTTP framework, the protocol could potentially be used in other ways to help simplify bidirectional communications.
A WebSocket connection typically begins when a browser sends a TCP handshake to the server. The handshake is an HTTP/1.1 GET request that contains multiple HTTP header fields, including those specific to establishing a WebSocket connection, such as the following:
- Connection. List of connection options. For a WebSocket connection, the field must specify the value Upgrade.
- Upgrade. Indicates that the connection should be upgraded to a specific protocol. For a WebSocket connection, the field must specify the value WebSocket.
- Sec-WebSocket-Key. A Base64-encoded random string value that the server uses to prove it received a valid WebSocket handshake.
- Sec-WebSocket-Version. The protocol version, which currently must be 13.
After receiving the opening handshake, the server returns a responding handshake that completes switching to the WebSocket protocol. That handshake includes the Sec-WebSocket-Accept header field, which indicates whether the server is willing to accept the connection. The field’s value is a Base64-encoded hash of the Sec-WebSocket-Key value specified in the opening handshake. The value confirms that the server has accepted the connection.
After the handshakes complete successfully, the client and server can exchange data, which is transmitted as WebSocket messages. Each message is made up of one or more frames that contain the same type of data. At their highest level, message frames fall into one of three categories:
- Textual data frames. Frames containing textual data interpreted according to the UTF-8 encoding standard.
- Binary data frames. Frames containing binary data whose interpretation is left up to the application.
- Control frames. Frames used for protocol-level signaling, such as to ping the other endpoint or specify that the connection should be closed.
When the WebSocket protocol is used for communications, the web server is aware of all WebSocket connections and can communicate with each individually. Communication can be initiated at either endpoint, which facilitates event-driven web programming. In standard HTTP, only clients can request new data.
While the connection remains open, the server and client can send messages to each other anytime until one of them closes the session. To close the connection, either endpoint sends a closing handshake in a message control frame.
Learn the strengths and weaknesses of each major browser: Chrome vs. Firefox vs. Safari vs. Edge.