The Web
Security model &threats
Paul Krzyzanowski
November 13, 2024
Introduction
When the web browser was first created, it was relatively simple: it parsed static content for display and presented it to the user. The only page layout options supported were headings, paragraphs, and lists. The content could contain links to other pages Images were the only media that could be included in the text. Even tables weren’t supported.
As such, the browser was not an interesting security target. Any dynamic modification of pages was done on servers and all security attacks were focused on those servers. These attacks included things such as using malformed URLs, buffer overflows, root paths, and unicode characters.
The situation is vastly different now. Browsers have become incredibly complex as more features were introduced:
JavaScript to execute arbitrary downloaded code.
The Document Object Model (DOM), which allows JavaScript code to change the content and appearance of a web page along with Cascading Style Sheets (CSS), which define the formatting properties of each HTML element in the document.
XMLHttpRequest, which enables JavaScript to make HTTP requests back to the server and fetch content asynchronously.
WebSockets, which uses the existing TCP connection between the browser and server to allow JavaScript on the page to send and receive data without the need to send HTTP requests and wait for responses.
Multimedia support; HTML5 added direct support for
<audio>
,<video>~
, and<track>~
tags, as well as MediaStream recording of both audio and video and even speech recognition and synthesis.Access to on-device sensors, including geolocation and device tilt. Programs can call the
getCurrentPosition
method to get the user’s current location.
The computing environment available to web pages changed as well. The introduction of JavaScript into the browser gave the browser the ability to process data on the client rather than simply submitting forms to the server.
JavaScript appeared early in the history of browsers (late 1995 in Netscape Communicator) but the versions on different browsers – Netscape’s SpiderMonkey and Microsoft’s JScript – were not always compatible. The language got standardized with the skin-disease-sounding name of ECMAScript and continued to evolve. Late 1999 saw the introduction of regular expressions and numeric data formatting. The 2004 version supported XMLHttpRequest
, classes, packages, type definitions, and many features to turn it into a viable language for larger projects. BY 2015, JavaScript finally evolved into a general-purpose programming language (typed arrays, classes, module syntax, etc.).
JavaScript remains an interpreted language. Source code files are specified in the HTML content of a web page and are downloaded from servers. Despite optimizations, there are times when web apps want better performance.
Google designed the Chromium Native Client, NaCl, to allow browsers to run native applications securely on Chrome. Unlike other browser plug-ins for running apps, NaCl assumes that all downloaded code may be malicious and thus runs all code in a sandbox managed by the browser. Use of NaCl has been limited since the modules are only distributed through the Chrome web store and run on Chrome browsers. Users of other browsers are left out and developers may not be interested unless they can reach a broader audience.
In a quest to provide a high-performing but portable environment for running binary code within a browser, WebAssembly (Wasm) was developed. WebAssembly is a portable binary format for running code compiled from various high-level languages, like C and C++ that is targeted to run on a processor virtual machine, a simple stack-based architecture that can be translated to native code with a just-in-time or ahead-of-time compiler on the browser. It uses a sandboxed architecture, a limited API and no direct access to system calls. WebAssembly’s design was finalized in 2017 and it became a standards recommendation in 2019. All four modern browsers support WebAssembly.
Aside from vulnerabilities that will no doubt be found and fixed over time, a concern for software delivered as NaCl or Wasm modules is that it becomes much more difficult to identify malware when it comes to a module that’s already precompiled. One popular use of Wasm has been for malicious web pages to run cryptomining software on browsers much more efficiently than they could with JavaScript. The browser has no way of knowing what any code it downloads will do.
The interaction model of the web evolved from simple page presentation to running a complete client-server application. The introduction of a rich set of features and code execution capabilities provides a broader attack surface. The fact that many features are relatively new and more continue to be developed increases the likelihood of more bugs and therefore more vulnerabilities. Many browser features are complex, and browser developers won’t always pay attention to or implement every detail of the specs (see quirksmode.org). This leads to an environment where certain aspects of a feature may have bugs or security holes on certain browsers.
The web security model
Traditional software is installed as a single application. The application may use external libraries, but these are linked in by the author and tested. Web apps, on the other hand, dynamically load components from different places. These include fonts, images, scripts, and video, as well as embedded iFrames that embed HTML documents within each other, each of which can contain scripts and other content. JavaScript code may issue XMLHttpRequests to download new content or redirect a page to a different site.
One security concern is that of software stability. If you import JavaScript from several different places, will your page still display correctly and work properly in the future as those scripts are updated and web standards change? Do those scripts attempt to do anything malicious? Might they be modified by their authors to do something malicious in the future?
Then there’s the question of how elements on a page should be allowed to interact. Can analytics code access JavaScript variables that come from a script that was downloaded from jQuery.com on the same web page? The scripts came from different places, but the page author selected them for the page, so maybe it’s ok for them to interact. Should analytics scripts be permitted to interact with event handlers? If the author wanted to measure mouse movements and keystrokes, perhaps it’s ok for a downloaded script to use the event handler. How about embedded frames? To the user, the content within a frame looks like it is simply part of the rest of the page. Should scripts behave any differently?
These are some of the questions faced when constructing a security model for web apps. Before we get to the model, we need to first understand the concept of frames in a browser.
Frames and iFrames
A browser window may contain a collection of documents from different sources. Each document is rendered inside a frame. In the most basic case, there is just one frame: the document window.
A frame is a rigid division that is part of a frameset, a collection of frames. Frames are no longer supported in recent versions of the HTML standard but many browsers still implement support for them.
An iFrame is a floating inline frame that moves with the surrounding content. iFrames are supported and very widely used. When we talk about frames, we will be referring to the floating frames created with an iFrame tag.
Frames are generally invisible to users and are used to delegate screen area to content that is loaded from another source. Think of a frame as a web page within a web page.
A core goal of browser security is to isolate visits to separate pages in distinct windows or tabs. If you visit a.com
and b.com
in two separate tabs, the address bar will identify each of them and they will not be able to share information with each other. Alternatively, a.com
may have frames defined within its page (e.g., to show ads from other sites) and b.com
may be a frame within a.com
. Here, too, we would like the browser to provide isolation between a.com
and b.com
even though b.com
is not visible as a distinct site to the user. A script running in a.com
should not read or manipulate data in b.com
and vice versa.
Same-origin policy
The security model used by web browsers is the same-origin policy. A browser permits scripts loaded from one page to interact only with data that was loaded from the same origin. An origin is defined to be the triple comprising the URI scheme (protocol, such as http vs. https), the hostname, and the port number:
- The protocol (scheme) has to be the same because a web server can be configured to provide different content for http vs. https.
- The hostname has to match because it is common for one web server to host content for multiple hostnames Also, subdomains may resolve to different systems and thus run different servers. For instance,
www.rutgers.edu
may be a different host running different content fromrutgers.edu
. - The port must also be the same since you can configure separate web servers to listen on different ports.
For example,
https://www.poopybrain.com/419/test.html
and
https://www.poopybrain.com/index.html
Share the same origin because they both use https, both use port 443 (the default https port since no port is specified), and the same hostname (www.poopybrain.com
). If any of those components were different, the origin would not be the same. For instance, www.poopybrain.com
is not the same hostname as poopybrain.com
.
Under the same-origin policy, each origin has access to client-side resources, local to that origin, that include:
Cookies: Key-value data that clients or servers can set. Cookies associated with the origin are sent with each http request.
JavaScript namespace: Any functions and variables defined or downloaded into a frame share that frame’s origin.
DOM tree: This is the JavaScript definition of the HTML structure of the page.
DOM storage: Local key-value storage.
Each browser window, frame, or object gets the origin of the URL that was used to load the object. Web pages will often embed frames as page content (and frames may embed other frames). Each of embed frame will not have the origin of the outer frame but rather the URL of the frame contents. Objects such as images, which may be loaded from other places, get the origin of the URL that was used to load them. Any JavaScript code downloaded into a frame will execute with the authority of its frame’s origin. For instance, if cnn.com
loads JavaScript from jQuery.com
, the script runs with the authority of cnn.com
, meaning it can access stored cookies, set variables or call JavaScript methods loaded from other servers within the page.
Passive content, which is non-executable content such as CSS files and images, has no authority. This normally should not matter as passive content does not contain executable code but there have been attacks in the past that had code in passive content and made that passive content turn active.
Cross-origin content
As we saw, it is common for a page to load content from multiple origins. The same-origin policy states that JavaScript code from anywhere runs with the authority of the frame’s origin. Content from other origins is not readable or writable by JavaScript.
JavaScript in two different frames cannot communicate under the same-origin policy. However, the postMessage
mechanism can be used to allow a script to send a message to the parent window that could then be received by another frame. This requires both frames to be explicitly programmed to do this, so it cannot happen accidentally. For example:
Frame A:
window.parent.postMessage(“message”, "https://target-domain.com");
Frame B:
window.addEventListener("message", (event) => {
if (event.origin === "https://trusted-domain.com") {
console.log("Received:", event.data);
}
});
The same-origin policy may have a few unexpected properties:
A frame can load images from other origins but cannot inspect that image. However, it can infer the size of the image by examining the changes to surrounding elements after it is rendered.
A frame may embed and use CSS (cascading stylesheets) files from any origin but cannot inspect the CSS content. However, JavaScript in the frame can discover what the stylesheet does by creating new DOM nodes (e.g., a heading tag) and see how the styling changes.
A frame can load JavaScript, which executes with the authority of the frame’s origin. If the source is downloaded from another origin, it is executable but not readable.However, one can use JavaScript’s
toString
method to decompile the function and get a string representation of the function’s declaration.
These forms of enforcements of the same-origin policy seem somewhat odd since a curious user can download any of that content directly (e.g., via the curl
command) and inspect it.
MIME sniffing attack
Passive content, such as images, videos, and stylesheets, is considered to have no authority because it cannot execute scripts or interact with the Document Object Model (DOM). This means it cannot directly alter a webpage’s behavior or compromise user data.
However, problems arise when browsers perform MIME sniffing, a feature that tries to guess the content type of a resource based on its actual content rather than its declared content (MIME) type.
Attackers can exploit this by serving malicious scripts disguised as passive content (e.g., declaring JavaScript as an image). If a browser incorrectly interprets the resource and executes it, the attacker can inject and execute malicious code, potentially leading to data theft, session hijacking, or other security breaches. To mitigate this, web servers should include headers like X-Content-Type-Options: nosniff
, which instruct browsers not to perform MIME sniffing and to strictly interpret content based on its declared type.
Cross-Origin Resource Sharing (CORS)
Even though content may be loaded from different origins, browsers
restrict cross-origin HTTP requests that are initiated from scripts (e.g., via XMLHttpRequest or Fetch). This can be problematic at times since sites such as poopybrain.com
and www.poopybrain.com
are treated as distinct origins, as are http://poopybrain.com
and https://poopybrain.com
.
Cross-Origin Resource Sharing (CORS) was created to allow web servers to specify cross-domain access permission. This will allow scripts on a page to issue HTTP requests to approved sites. It also allows access to Web Fonts, inspectable images, and access to stylesheets. CORS is enabled by an HTTP header from the server that identifies allowable origins. For example, if the server at https://service.example.com
sends content with an HTTP header that contains
Access-Control-Allow-Origin: http://www.example.com
then the browser will treat the URL http://www.example.com
as having the same origin as the frame’s URL
(https://service.example.com
).
Cookies
Cookies are name-value pairs designed to maintain state between a web browser and a server. Web browsers send all cookies that are applicable for a particular page with each HTTP request:
GET /mypage.html HTTP/2.0
HOST: www.poopybrain.com
Cookie: username=paul; uid=501
Cookie: uid=501
Servers set cookies via an HTTP header as part of an HTTP response. For example, the headers
Set-Cookie: username=paul
Set-Cookie: uid=501
tells the browser to set two cookies: username=paul and uid=501.
Cookies are used for three purposes:
1. Session management (authentication cookies)
Cookies used for session management, often referred to as authentication cookies pass identification about a user’s login session.
When a user logs in, the server sends a cookie with a session ID to identify this logged-in user. This cookie is sent with every subsequent request from the browser to the web server so the server can identify the page with that user.
It allows sites such as Amazon, eBay, Facebook, and Instagram to not prompt you for logins whenever you visit their sites.
Cookies used for session management may also pass shopping cart identifiers even if a user isn’t logged in. That identifier identifies a shopping cart in a database and can be associated with the user the user logs in.
2. Personalization
Cookies used for personalization can identify various user preferences. These preferences may specify font sizes, or types of content to present. Personalization cookies may also include data that will be pre-filled into web forms.
3. Tracking
Tracking cookies are used to monitor a user’s activity. If a browser doesn’t send a cookie on a page request, the server assumes this is the user’s first visit to the site so it creates a new cookie with a unique identifier for the user and sends that with the page contents. The server logs the page visit with that user’s identifier.
This tracking cookie will be sent by the browser every time a page from that web site is requested. The server can now track the requested URL and time of the request with the user ID it assigned to the cookie.
If the user is logged in, logs in later, or creates an account in the future, the server can then associate all the tracked data with that specific user.
Even though we may refer to cookies as authentication cookies or tracking cookies, they all use the same mechanism and syntax. It’s just a matter of how applications make use of them. Cookies can be one of two types: session or persistent (don’t confuse a session cookie with a cookie used for session management!).
Session cookies are stored in memory. They disappear when the browser exits. Persistent cookies are stored to disk and continue to exist when the browser restarts.
If a browser gets a Set-Cookie
directive from a server and it does not contain an expiration date, then the cookie will be handled as a session cookie. For example:
Set-Cookie: name=paul
If the server attaches an expiration date to the Set-Cookie
header then the browser treats the cookie as a persistent cookie and saves its contents. For example:
Set-Cookie: name=paul; expires= Thu, 1 Apr 2025 17:30:00 GMT;
Websites often present a checkbox on a login screen asking you whether you want to save your login information. If you check the box, the server will simply add the expires
option to the cookie.
Now the question is: which cookies should be sent to a server when a browser makes an HTTP request?
The scope of a cookie is defined by its domain and path. Unlike the same-origin policy, the scheme (http or https) is ignored by default, as is the port number. Unless otherwise defined by the server, the default domain and path are those of the URL in the request.
A client cannot set cookies for a different domain. A server, however, can specify top-level or deeper domains. Setting a cookie for a domain example.com
will cause that cookie to be sent whenever example.com
or any domain under example.com
is accessed (e.g., www.example.com
):
Set-Cookie: name=paul; domain=example.com
For the cookie to be accepted by the browser, the domain must include the origin domain of the frame. For instance, if you are on the page www.example.com
, your browser will accept a cookie for example.com
but will not accept a cookie
for foo.example.com
or for poopybrain.com
.
The path is the path under the root URL, which is ignored for determining origins but is used with cookies. The browser does a character-by-character comparison of the path and if the path is a substring of the requested URL then the cookie will be sent. For example
Set-Cookie: name=paul; path=/
will send that cookie for any path under the root of the domain while
Set-Cookie: name=paul; path=/419
will send that cookie for any path whose name starts with the string “/419
”.
Cookies often contain user names, complete authentication information, or shopping cart contents. If malicious code running on the web page could access those cookies or if an eavesdropper could see unencrypted cookies on the network, the attacker could modify your cart, get your login credentials, or even modify cookies related to cloud-based services to access your documents or email. This is a very real problem and two safeguards were put in place:
A server can tag a cookie with an HttpOnly flag. This will not allow scripts on the page to access the cookie, so it is useful for keeping scripts from modifying or reading user identities or session state.
HTTP messages are sent via TCP and data is not encrypted. An attacker that has access to the data stream (e.g., via a man in the middle attack or a packet sniffer) can freely read or even modify cookies. A Secure flag was added to cookies to specify that they can be sent only over an HTTPS connection, which uses TLS (Transport Layer Security) to encrypt content.:
Set-Cookie: username=paul; path=/; HttpOnly; Secure
If a user is making requests via HTTP, Secure cookies will not be transmitted.
Third-party cookies: tracking cookies
Each cookie in a browser is associated with a domain and a path. If the domain of the cookie is the same as the domain of the web page that the user sees (the URL in the title bar), the cookie is a first-party cookie. The server hosting the page sets these first party cookies.
Components loaded from other domains, images, scripts or content in iFrames such as ads or social-media plugins, can also set cookies. These are called third-party cookies. Third-party cookies are usually used for tracking users and are called tracking cookies.
The server that sent the content along with the third-party cookie gets the cookie from your browser every time your browser sends an HTTP request to get content from that server. By assigning a unique ID to each cookie, servers can track your requests across multiple pages and multiple types of content.
For example, a Facebook Like button on rutgers.edu
will have a facebook.com
origin. When your browser requests this content from facebook.com
, the request will contain any cookies it has for the facebook.com
domain, which can include your Facebook session cookie and identify you. The Facebook server can track which web sites you visit that has any Facebook content, such as like buttons, images, or ads.
If you don’t have a Facebook account or are not logged in, the server can track you anonymously by creating a cookie with a unique ID for you that your browser will then send on every site you go to that has content from Facebook.
Browsers allow the blocking of third-party cookies but that can cause some components, like social media widgets to not work properly.
Mixed HTTP and HTTPS content
A web page that was served via HTTPS might request content, such as a script, via a URL that specifies HTTP:
<script src="http://www.mysite.com/script.js"> </script>
The browser would follow the protocol (scheme) in the URL and download that content via HTTP rather than over the secure link. An active network attacker has the opportunity to eavesdrop on that session or hijack the session and modify the content. A safer approach is to not specify the scheme for same-source content. This directs the browser to request the content over the same protocol as its embedding frame.
<script src="//www.mysite.com/script.js"> </script>
Most browsers either disallow or warn of mixed content but users may not be aware of the risks and not have knowledge of what really is going on.
Web-based attacks
Malicious JavaScript
As the JavaScript environment evolved, various vulnerabilities have been discovered and fixed. Some of these enabled arbitrary command execution from the browser.
Malicious pages JavaScript code that will perform malicious functions but malicious content can be embedded in content that is served by a legitimate site. For example, an ad might be presented in an iFrame, which acts like an embedded web page. This ad will loads its own JavaScript and other content. An accidental visit to a malicious page is called a drive-by download.
A drive-by download will typically run a script to redirect the page to load a new page from a malicious server. This new page will download an exploit kit that will probe the operating system and browser to determine their versions and possible vulnerabilities. The exploit kit then sends a request to its server to download malware payload that is specifically designed to exploit those vulnerabilities.
Malicious JavaScript code will not always try to exploit vulnerabilities. Since Javascript is a complete programming language, the code can run a variety that the user did not expect. For instance, it can present additional ads, mine cryptocurrency, fake clicks on ads, and activate likes for TikTok, Facebook, Instagram, and other social media.
Cross-site request forgery (CSRF)
Cross-site request forgery is an attack that gets a victim to send unauthorized requests to web server that for which the user has authentication cookies set.
Let’s consider an example from back when Netflix rented DVDs. You previously logged into Netflix. Because of that, the Netflix server sent an authentication cookie to your browser so will not have to log in the time you visit netflix.com. Now you happen to go to another website that contains a malicious link or JavaScript code to access a URL. The URL is:
http://www.netflix.com/JSON/AddToQueue?movieid=860103
By hitting this link on this other website, the attacker added Plan 9 from Outer Space to your movie queue (this attack really worked with Netflix but has been fixed). This may be a minor annoyance but the same attack could create more malicious outcomes. If, instead of Netflix, the attack could take place against an e-commerce site that accepted your credentials but allows the attacker to add a different shipping address on the URL. More dangerously, a banking site may use your stored credentials and account number (this was the case with ING bank). Getting you to simply click on a link or visit the wrong web page may enable the attacker to request a funds transfer to another account:
http://www.bank.com/action=transfer*amount=1000000&to_account=417824919
Note that the attack works because of how cookies work. You visited some website or clicked on a link you got via a text message or email. This directed your browser to another site. Your browser dutifully sends an HTTP GET request to that site to fetch the URL specified in the link and also sends all the cookies for that site. The attacker never steals your cookies and does not intercept any traffic. This attack is simply the creation of a URL that makes it look like you requested some action.
Preventing CSRF attacks
Cross-site request forgery attacks occur because the browser sends authentication cookies to the targeted web server. The victim can avoid these attacks by ensuring these cookies are not present. This case be done through the following actions:
Log off sites when you’re done with them. Logging off will delete the authentication cookies.
Do not allow browsers to store persistent authentication cookies. This means not selecting the remember me option that is presented on an authentication web page. Authentication cookies will then only be stored as session cookies.
There are several techniques that web server administrators can take to avoid CSRF attacks, including different forms of tokens. A few of these are:
The server can create a unique random token (an anti-CSRF token) for each session. This token is sent to the server with each page request submitted by the user and verified by the server. The token must be sent through hidden fields or HTTP headers so that it will not be part of a URL.
If it is possible for the server and browser to negotiate a shared key then each browser request can contain a token that is an HMAC of the request and the timestamp. An attacker can only create URLs with commands but will not be able to forge an HMAC.
When you send a URL to a server, an
Origin
orReferer
header identifies the URL of the page that issued the request. The server can validate that the Origin matches the server.* The interaction with the server can use HTTP POST requests instead GET requests, placing all parameters into the body of the request rather than in the URL. State information can be passed via hidden input fields instead of cookies. This doesn’t solve the problem but gives the attacker the challenge of getting the victim to click on a malicious web page that can run a script to post a request rather than simply present a URL that contains parameters for the desired action.
Screen sharing
HTML5 added a screen-sharing API. This was designed with application like video conferencing in mind where you might want to share screen content. Normally, no cross-origin communication is permitted between client and server. The screen-sharing API violates this security model. If a user grants screen-sharing permission to a frame, the frame can take a screenshot of the entire display (the entire monitor, all windows, and the browser). It can also get screenshots of pages hidden by tabs in a browser.
This is not a security hole and there are no exploits (yet) to enable screen sharing without the user’s explicit opt-in. However, it is a security risk because the user might not be aware of the scope or duration of screen sharing. If you believe that you are sharing one browser window, you may be surprised to discover that the server was examining all your screen content.
Clickjacking
Clickjacking is a deception attack where the attacker overlays an image to have the user believe that he is clicking some legitimate link or image but is really requesting something else. For example, a site may present a “win a free iPad” image. However, malicious JavaScript in the page can place an invisible frame over this image that contains a link. Nothing is displayed to obstruct the “win a free iPad” image but when a user clicks on it, the link that is processed is the one in the invisible frame. This malicious link could download malware, change security settings for a browser plug-in, or confirm a bank transfer that was issued via a CSRF attack.
One defense for clickjacking is to use defensive JavaScript in the legitimate code to check that the content is at the topmost layer:
window.self == window.top
If it isn’t then it means the content is obstructed, possibly by an invisible clickjacking attack. Another defense is to have the server send an X-Frame-Options
HTTP header to instruct the browser to not allow content from other domains in any frames within that page.
Input sanitization attacks
We saw how user input that becomes a part of database queries or commands can alter those commands and, in many cases, enable an attacker to add arbitrary queries or commands. This was the basis of code injection and command injection attacks.
The same applies to URLs, HTML content, and JavaScript. Any user input needs to be parsed carefully before it can be made part of a URL, HTML content, or JavaScript. Consider a script that is generated with some in-line data that came from a malicious user:
<script> var x = "untrusted_data"; </script>
The malicious user might define that untrusted_data
to be
Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="Bye
The resulting script to set the variable x
now becomes
<script> var x = "Hi"; </script> <h1> Hey, some text! </h1> <script> malicious code... x="Bye"; </script>
Injection attacks have been rated as one of the top three browser security risks by the OWASP foundation.
SQL injection
We previously saw that SQL injection is an issue in any software that uses user input as part of the SQL query. Web browsers are quite possibly the dominant software for this attack vector. Many web services have databases behind them and links often contain queries mixed with user input. If input is not properly sanitized, it can alter the SQL query to modify the database, force a user authentication, or return the wrong data.
Suppose a web form collects a user name and password in the HTML variables uname
and passwd
. Code on the server stores them in the variables username
and pwd
and then composes an SQL query:
username = getRequestString("uname");
pwd = getRequestString("passwd");
sssquery = 'select * from Users where name = "' + username + '" and pwd = "' + pass + '"'
When a user supplies a name and password, a query such as this is created:
select * from Users where name = "ramesh" and pwd = "letmein"
The query selects a record from the Users table for a specific name and matching password. But if the user supplies a string such as this:
" or ""="
For the username and password then this somewhat odd-looking query will be composed:
select * from Users where name = "" or ""="" and pwd = "" or ""=""
The expression or ""=""
will always evaluate to true in SQL, so the query will return all rows from the Users table. If the return data is presented to the user, the user will see data for all users on the return page.
Cross-site scripting
Cross-site Scripting (XSS) is a code injection attack that allows an attacker to inject client-side scripts into web pages. It can be used to bypass the same-origin policy and other access controls. Cross-site scripting remains one of the most popular browser attacks.
The attack may be carried out in two ways: a URL that a user clicks on and gets back a page with the malicious code and by going to a page that contains user content that may include scripts.
In a Reflected XSS attack, all malicious content is in a page request, typically a link that an unsuspecting user will click on. The server will accept the request without sanitizing the user input and present a page in response. This page will include that original content. A common example is a search page that will display the search string before presenting the results (or a “not found” message). Another example is an invalid login request that will return with the name of the user and a “not found” message.
Consider a case where the search string or the login name is not just a bunch of characters but text to a script. The server treats it as a string, does the query, cannot find the result, and sends back a page that contains that string, which is now processed as inline JavaScript code.
www.mysite.com/login.asp?user=<script>malicious_code(…) </script>
In a Persistent XSS attack, user input is stored at a site and later presented to other users. Consider online forums or comment sections for news postings and blogs. If a user enters inline JavaScript as part of the posting, it will be placed into the page that the server constructs for any future people who view the article. The victim will not even have to click a link to run the malicious payload.
Cross-site scripting is a problem due to improper input sanitization. Servers will need to parse input that is expected to be a string to ensure that it does not contain embedded HTML or JavaScript. The problem is more challenging with HTML because of its support for encoded characters. A parser will need to check not only for “script
” but also for “%3cscript%3e
”. As we saw earlier, there may be several acceptable Unicode encodings for the
same character.
With the ability to run arbitrary injected JavaScript code, cross-site scripting may be able to operations such as:
- Access cookies belonging to that website.
- Hijack a session with the site, taking advantage of the user’s authentication cookies.
- Create arbitrary HTTP requests with arbitrary content via XMLHtttpRequest.
- Make arbitrary modifications to the HTML document by changing the DOM structure.
- Install keyloggers to capture user input.
- Download malware – or run JavaScript ransomware.
- Perform a phishing attack by manipulating the DOM to create a frame with content from the attacker’s server that asks for login credentials. Users will assume they are interacting with the trusted service.
The main defense against cross-site scripting is to sanitize all input. Some web frameworks do this automatically. For instance, Django templates allow the author to specify where generated-content is inserted (for example, with syntax such as: <b> hello, {{name}} </b>
) and performs the necessary sanitization to ensure it does not modify the HTML or add JavaScript.
Other defenses against cross-site scripting are:
Use a less-expressive markup language for user input, such as markdown if you want to give users the ability to enter rich text. However, input sanitization is still needed to ensure there are no HTML or JavaScript escapes
Employ a form of privilege separation by placing untrusted content inside a frame with a different origin. For example, user comments may be placed in a separate domain. This does not stop XSS damage but limits it to the domain.
Use the Content Security Policy (CSP). Thecontent security policy was designed to defend agains XSS and clickjacking attacks. It allows website owners to tell clients what content is allowed, whether inline code is permitted, and whether the origin should be redefined to be unique.
Homograph (or homoglyph) attacks
While we have been looking at issues resulting from Unicode, let us take a brief digression from system attacks and consider some deception attacks that are enabled by Unicode.
Unicode was designed to represent practically all of the world’s glyphs1 and contains over 128,000 characters. It includes scripts for Latin, Greek, Cyrillic, Armenian, Hebrew, Arabic, Syriac, Thaana, Devanagari, Bengali, Gurmukhi, Oriya, Tamil, Telugu, Kannada, Malayalam, Sinhala, Thai, Lao, Tibetan, Myanmar, Georgian, Hangul, Ethiopic, Cherokee, Canadian Aboriginal Syllabics, Khmer, Mongolian, Han (Japanese, Chinese, Korean ideographs), Hiragana, Katakana, and Yi, as well as emojis and ancient scripts.
If we consider the lowly slash character, there are several variations with different representations:
/ = solidus (slash) = U+002F
⁄ = fraction slash = U+2044
∕ = division slash = U+2215
̷ = combining short solidus overlay = U+0337
̸ = combining long solidus overlay = U+0338
/ = fullwidth solidus = U+FF0F
Only one of these is a valid pathname separator (the solidus). Using others will create strings that look like pathnames but are not. Some characters may have multiple representations. For example, an accented a (á) is a distinct Unicode character, U+00C1, but also a two-character sequence, U+0041, U+0301. This is not a two-byte Unicode character but rather a combining accent followed by an “a”.
Situations like this make string comparisons a nightmare.
Moreover, some characters look similar. In the Latin alphabet, depending on the font, certain characters may look identical or similar. The number one (1), lowercase L (l), and capital i (I) can look virtually identical in some fonts. Zero (0) and the letter O may be confusing.
A homograph attack (sometimes more accurately called a homoglyph attack) is deception based on the fact that different characters may look similar to a user.
We can create a simple deception attack by registering the website paypai.com and writing the last letter as a capital I to create paypaI.com, which may confuse people with paypal.com in a phishing message.
The deception attack became more insidious with the introduction of internationalized domain names (IDN), which made Unicode characters valid elements of a domain name. While Unicode represents virtually all of the world’s scripts, may characters look identical in those scripts. For example the Greek letters A, B, and E (and many others!) look identical to the Latin A, B, and E as well as to the Cyrillic A, B, and E but have different encodings:
Latin | Greek | Cyrillic | |
---|---|---|---|
A | U+0041 | U+0391 | U+0410 |
B | U+0042 | U+0392 | U+0412 |
E | U+0045 | U+0395 | U+0415 |
K | U+004B | U+039A | Ua041A |
X | U+0058 | U+03A7 | U+0425 |
As an example, we can spell out wikipedia.org using the following non-Latin characters:
Cyrillic a (U+0430), e (U+435), p (U+0440)
Belarusian-Ukranian i (U+0456)
Or we can spell out paypal.com using Cyrillic lookalikes for p, a, and y.
Typosquatting and combosquatting
Like homograph attacks, typosquatting and combosquatting are attacks that exploit domain names to deceive users and redirect them to malicious websites, often for phishing or malware distribution. These attacks rely on user trust and visual similarity to legitimate websites, making them highly effective in tricking unsuspecting users.
Typosquatting targets users who accidentally make typing errors or misspell a website’s URL. Attackers register domain names that closely resemble legitimate ones by altering a character, adding or removing a letter, or swapping adjacent characters. For example, a typosquatted version of example.com
might be exmaple.com
or examplle.com
. These fraudulent sites often mimic the appearance of the original website to trick users into entering sensitive information, such as passwords or payment details, or downloading harmful files.
Combosquatting involves registering domains that combine a legitimate brand or keyword with additional words, numbers, or characters to create misleading URLs. Unlike typosquatting, combosquatting does not depend on user mistakes but instead relies on domains that appear legitimate. For example, a combosquatted version of bank.com
might be secure-bank.com
or bank-login.com
. These domains are created to look like official extensions of a trusted brand and are frequently used in phishing campaigns or for distributing malware.
The main difference between typosquatting and combosquatting is that typosquatting exploits user errors, while combosquatting involves intentionally misleading domain names designed to appear authentic. Both techniques are commonly used for phishing attacks to steal sensitive information, distributing malware, redirecting traffic for ad fraud, or damaging the reputation of a legitimate brand.
To defend against these threats, organizations try to register common misspelings and common combos but it’s impossible and costly to grab all variations. They can also monitor for suspicious domain registrations that resemble their brand and educate users to carefully examine URLs before clicking.
Tracking via images
The same-origin policy treats images as static content with no authority. It would seem that images should not cause problems. However, an image tag (IMG
) can pass parameters to the server, just like any other URL:
<img src="http://evil.com/images/balloons.jpg?extra_information" height="300" width="400"/>
The parameter can be used to notify the server that the image was requested from a specific page. Unlike cookies, which can sometimes be disabled, users will usually not block images from loading.
When a browser loads an image, it contacts the server that hosts the image is contacted with an HTTP GET
request for the content. This request contains:
- Referrer: The URL of the page where the pixel is embedded.
- IP Address: Which can be used to approximate the user’s location.
- Device Details: Browser type, operating system, and screen resolution.
- Timestamp: When the pixel was loaded, indicating the user’s visit time.
- Cookies: Any cookies that were previously set by the server hosting the image.
- Any extra information that’s part of the image URL will be sent. This information can, for example, identify the website or page that is hosting the content or the user ID for a logged-in user.
An image itself can be hidden by setting its size to a single pixel … and even making it invisible:
<img src="https://attacker.com/onebyone.png" height="1" width="1" />
These tiny invisible images are called tracking pixels, web beacons, or spy pixels.
Tracking pixels can be embedded in websites or emails to monitor user activity. These pixels are hosted on third-party servers. When loaded, the HTTP request headers send metadata such as the user’s IP address, browser type, and device information to the server, along with any cookies the browser has stored for that service.
Cross-site tracking and ad retargeting
Tracking pixels enable cross-site tracking – monitoring user activity across multiple websites. For instance, an ad network might embed tracking pixels across multiple websites to monitor user behavior, enabling retargeting ads. If a user visits a shoe store’s site:
<img src="https://ad-network.com/pixel?item=shoes123" width="1" height="1" />
The network records their interest in shoes. Later, the user might see shoe ads while browsing other sites using the same ad network.
Each tracking pixel URL includes a unique identifier, such as a userID, stored in a cookie on the user’s browser. When a user visits multiple websites with the same tracking pixel, the server recognizes the same unique identifier. Even though the user visits unrelated sites, the tracking server knows it’s the same user based on the userID.
For example, if a user browses a shoe store and later a tech blog , the tracking server records visits to both, associating them with the same user. The server can then compile the data into a comprehensive profile, in this case noting that a user has an interest in shoes and reads tech blogs.
Later, when the user visits a different site (a news site with ads from the same network, for example), the ad network can display ads for shoes based on the user’s earlier activity. This process is called retargeting or, as Google calls it, remarketing.
The download of the pixel allows the retargeting service to find out if you already have a cookie for that service and, if not, create cookie with a unique ID and send that cookie with the response.
Every pixel is associated with a unique ID, so the cookie identifies the browser (user) and the ID in the pixel identifies the specific page that has the pixel. The service doesn’t know who you are but it can identify distinguish you from other users by the unique ID of the cookie.
In the simplest case, the ID in the cookie can be an index into a database that will associate the user with the item they looked at but did not purchase. The database can also create a list of all the pixel IDs you visited on all web sites that use the same pixel tracking service and when you visited them. Now the service tracks what pages you’ve visited across many web sites.
If the goal is to present ads, this data can be used as input to the ad selection server to prioritize ads the service thinks you’d like based on your browsing behavior.
Facebook, for example, advertises their Facebook Pixel service. You add a snippet of code into your web page:
“It tracks the people and the types of actions they take when they engage with your brand, including any of your Facebook ads they saw before going to your website, the pages of your site they visit and the items they add to their carts.”
Tracking pixels are also commonly used in email marketing, where they track whether an email was opened and when:
<img src="https://tracking-server.com/open?emailID=abcd1234" width="1" height="1" />
If you receive HTML-formatted mail that contains a one-pixel image, you will not notice the image but the server that contains the image will be sent the request for the image. If the IMG
tag contains a parameter to identify the specific mail message, the server can track when the message was read.
While tracking pixels are useful for analyzing user behavior, targeting ads, and measuring ad performance, they raise privacy concerns since users often have no awareness of being tracked. Tools like ad blockers, email clients that block external images, and browser settings to limit third-party cookies can help reduce unwanted tracking.
Images for deception
Images can also be used for social engineering: to disguise a site by appropriating logos from well-known brands or adding certification logos. If an attacker’s page state that they are a “Microsoft Gold Partner” or their site is “100% secure”, a visitor might assign them some credibility that they do not deserve.
Browser status bar
Most browsers offer an option to display a status bar that shows the URL of a link before you click it.
This bar is trivial to spoof by adding an onclick attribute to the link that invokes JavaScript to take the page to a different link. In this example, hovering over the PayPal link will show a link to http://www.paypal.com/signin
, which appears to be a legitimate PayPal login page. Clicking on that link, however, will take the user to http://www.evil.com
.
<a href="http://www.paypal.com/signin"
onclick="this.href = 'http://www.evil.com/';">PayPal</a>
References
Web Design & Development I: A Brief History of HTML, University of Washington, 2020.
Sebastian Peyrott, A Brief History of JavaScript, auth0.com, January 16, 2017
Yangren Kelsang, Same-Origin Policy: From birth until today, Aura Research Division, April 4, 2019: clear example of the same-origin policy, CSRF, and CORS
Quirksmode — browser compatibility information
Andra Zaharia, JavaScript Malware – a Growing Trend Explained for Everyday Users, Heimdal Security.
KirstenS et al., Cross Site Request Forgery (CSRF), The OWASP Foundation
Yuan Tian, Ying Chuan Liu, et al., All Your Screens Are Belong to Us: Attacks Exploiting the HTML5 Screen Sharing API, Proceedings of the 2014 IEEE Symposium on Security and Privacy, pages 34–48, September 2014. full paper
KirstenS et al., Cross Site Scripting (XSS), The OWASP Foundation
Cross site scripting (XSS) attacks, Imperva.
What is a Tracking Pixel?, adQuadrant, October 28, 2020.
Michal Wlosik, What is Ad Retargeting and How Does It Work?, Clearcode, October 4, 2017, Updated on November 27, 2020.
Aaron Sankin and Surya Mattu, The High Privacy Cost of a “Free” Website, themarkup.org, September 22, 2020
Web beacon, Wikipedia.
Selcuk Uluagac, Cybersecurity researchers spotlight a new ransomware threat – be careful where you upload files, The Conversation, April 26, 2024.
An article that discusses more recent attacks possible by exploiting a browser’s File System Access API but also discusses some issues of increased browser complexity and risks.
-
a glyph is a printable character. Unicode is designed around the concept of scripts rather than languages since multiple languages often share the same set of scripts. ↩︎