Spoofing TLS fingerprinting

New security challenges lead to the evolution of anti-bot measures, and sometimes even legitimate scrapers get caught in the crossfire. One such technique is TLS fingerprinting—a sneaky, not-yet-well-known, and pretty effective method of identifying a device attempting to connect to a server. Why is it not the easiest to bypass, and what to do to scrape data you need successfully – this article reveals all the cards. 

What is TLS at all?

To understand TLS fingerprinting, let’s first take a look at TLS itself. TLS or Transport Layer Security (previously SSL – Secure Sockets Layer) is a protocol that runs over a standard HTTP connection and secures your online activities. When you see a lock sign near a URL or when a website’s address starts with https, it means TLS is active. Actually, it’s now hard to come across a website that doesn’t use TLS. In case you try to visit a website that starts with simply “http” (without “s”), your browser will give you a warning about an insecure connection instead of redirecting you to a webpage. 

TLS encrypts your data before sending it to the target server – turns it into strings of random symbols, such as 9823h4brjhvbi, so even if malicious actors intercept your information, they won’t be able to use it. However, before sending your data, TLS must ensure that it sends the information to the legitimate server, and the latter must then decipher it. That’s why, before sending data, TLS first establishes a connection – it exchanges a series of messages with the target server to verify it, to agree on an encryption method, and to generate shared session keys that will help protect and decipher data during communication. 

The entire process is referred to as a TLS handshake, which begins with a “ClientHello” message. This message is what interests us in terms of TLS fingerprinting, as it is the source of data that servers then use to identify you.  

What does the “ClientHello” message consist of

As the “ClientHello” message does not contain sensitive data, such as credentials, and is transmitted in an unencrypted format, you can easily capture it and view its content using network analysis tools like Wireshark. It contains a lot of data. 

  • TLS version – the highest one that a client supports.
  • Random – random number from the client.
  • Session ID – it’s used for resumed sessions, so it is empty in the first handshake.  
  • Cipher suites – encryption algorithms, supported by a client. They are listed in order of preference.
  • Compression method – today it is almost always 0.
  • Supported extensions – additional features that a client may want to use, for example:
  • Server Name Indication (SNI) – it specifies a hostname that a client is requesting. 
  • Application-Layer Protocol Negotiation (ALPN) protocol – an extension for negotiating which protocol should be handled over TLS. It’s effective as it is application-layer independent. Typically, TLS extensions for web servers are HTTP/1.1 or HTTP/2. 
  • TLS libraries – a range of libraries a client uses, as different clients use different ones.
  • Signature algorithms – list of supported algorithms for verifying the server’s identity.
  • Elliptic curves – as encrypted communication uses curved equations across finite fields, this part indicates which curve to use. 
  • Elliptic curve point formats – tells how points are encoded. 

There are other extensions as well. With all of that, a website can infer a significant amount of data about you from a “ClientHello” message, including the software stack, such as browser family, device type, and security posture, among others.  

JA3 fingerprinting method

Alright, a server received a “ClientHello” message. What’s next? Here, the JA3 technique comes into play. Salesforce manages it, and it is the most popular fingerprinting solution. It focuses on five key components of the message: TLS version, cipher suites, extension IDs, supported groups (elliptic curves), and elliptic curve point formats. Each part is converted into decimal format and concatenated with commas and hyphens. However, the actual fingerprint data is a long string, so for convenience, MD5, a cryptographic hash function, is used. It produces a fixed 128-bit hash. A browser then compares the result to other fingerprint possibilities, often from an internal database. Still, sometimes public ones are used as well. 

The goal is to spot the differences – does this client appear to be a typical web browser, or is there something suspicious about it? The issue is that hashes produced by a regular web browser, such as Chrome, or a programming library used for web scraping, would be different because their respective ClientHello messages would be distinct. As JA3 algorithms consider only a few variables, there are not many unique fingerprint options, so it’s relatively easy to distinguish between a regular user and a scraping bot. 

That proves TLS fingerprinting to be effective in fighting off scrapers and makes it a serious headache for those who rely on web scraping. 

There are anti-web-scraping databases that collect JA3 fingerprinting possibilities. You can calculate your fingerprint and check whether it is whitelisted in order to avoid blocking. 

Not only can JA3 fingerprints leak the truth about a connecting client being a scraper, though. Even the difference in values in the ClientHello message itself may trigger anti-bot systems. For example:

  • The absence of SNI may be a red flag, as it is typically expected in a request.
  • Outdated ALPN requests may be seen as a sign of suspicious activity, as modern browsers usually support HTTP/2.
  • Elliptic curves supported by browsers like Chrome and those supported by programming scripts differ, which may raise suspicions.
  • The list of Cipher Suites is ordered by priority and must match a list of common web browsers, including the order. The same goes for Extensions.

Ways to bypass TLS fingerprinting 

When it comes to TLS fingerprinting, your goal is to make your scraper match the fingerprints of a common web browser. It is not an easy task; however, there are several ideas you can try. You may even want to mix some of them. 

When you use high-quality browsers like Puppeteer, Playwright, or Selenium, you actually use real browsers and get authentic fingerprints as a result. They don’t change TLS fingerprinting, so websites cannot determine whether a regular browser or a headless one is attempting to connect. One loophole less.

  • Use a collection of browsers and operating system versions

This trick helps distribute connections through multiple fingerprints, which is helpful if you run a large web scraping project. 

  • Use LibCurl-based HTTP client 

You can update such clients to use curl-impersonate. It is a modified version of the libcurl library, which patches TLS fingerprints to make them appear as those of a regular web browser. Libraries that support libcurl include Typhoeus (Ruby), Guzzle (PHP), PyCurl (Python), and curl default and crul community libraries (R). 

  • TLS fortification

If you use the Go language, you are lucky – it is one of the languages that allows for TLS spoofing. Libraries such as Refraction Networking’s utls, ja3transport, and CycleTLS are what you need.

However, things are not that easy if you use Java or Python. With Java, you can use sslconfig.enabledCipherSuites method to reconfigure the list of unabled cipher suites. With Python, you can do as much as configure Cipher Suite and TLS version variables using requests and httpx. Spoofing those variables may help avoid bans; however, your fingerprints still wouldn’t look truly authentic. 

  • Rotate other elements 

When all the traffic has the same TLS fingerprint (even if it looks genuine), it may look suspicious. Try to rotate elements that can be rotated, such as cookies, user agent, headers, and IP address. It will slightly modify your ClientHello message; as a result, JA3 fingerprints will also be somewhat different. Anti-detection browsers and high-quality proxies will help you with this. 

Final word

Last, but not least, bypassing TLS fingerprinting may be a challenge, as different websites adapt the TLS protocol slightly and have their own criteria and databases to allow or restrict access. To successfully spoof TLS fingerprints, focus on each website specifically, identifying what triggers its fingerprinting mechanisms and how it customizes the standard protocol. Moreover, web sources today combine network-level signals, such as JA3 fingerprinting, with application-level signals like fonts and screen resolution, so relying solely on TLS spoofing won’t suffice – you need to ensure that every detail of your scraper appears genuine. Also, review everything periodically – websites frequently update their policies and detection schemes, and new anti-bot approaches emerge, so you must do the same to obtain accurate data. Of course, you should always stay within allowed limits, scrape only available data, avoid overloading target websites, and use trustworthy tools, such as legally derived proxies. DataImpulse can certainly assist with the last part – press the “Try now” button or contact us at [email protected] if you have any questions.

Jennifer R.

Content Editor

Content Manager at DataImpulse. Jennifer's degree in philology and translation and several years of experience in content writing help her create easy-to-understand copies, even on tangled tech topics. While writing every text, her goal is to provide an in-depth look at the given topic and give answers to all possible questions. Subscribe to our newsletter and always be updated on the best technologies for your business.