What is rate limiting?

ChuckLu發表於2024-08-26

What is rate limiting?

Rate limiting is a strategy for limiting network traffic. It puts a cap on how often someone can repeat an action within a certain timeframe – for instance, trying to log in to an account. Rate limiting can help stop certain kinds of malicious bot activity. It can also reduce strain on web servers. However, rate limiting is not a complete solution for managing bot activity.

What kinds of bot attacks are stopped by rate limiting?

Rate limiting is often employed to stop bad bots from negatively impacting a website or application. Bot attacks that rate limiting can help mitigate include:

  • Brute force attacks
  • DoS and DDoS attacks
  • Web scraping

Rate limiting also protects against API overuse, which is not necessarily malicious or due to bot activity, but is important to prevent nonetheless.

How does rate limiting work?

Rate limiting runs within an application, rather than running on the web server itself. Typically, rate limiting is based on tracking the IP addresses that requests are coming from, and tracking how much time elapses between each request. The IP address is the main way an application identifies who or what is making the request.

A rate limiting solution measures the amount of time between each request from each IP address, and also measures the number of requests within a specified timeframe. If there are too many requests from a single IP within the given timeframe, the rate limiting solution will not fulfill the IP address's requests for a certain amount of time.

Essentially, a rate-limited application will say, "Hey, slow down," to unique users that are making requests at a rapid rate. This is comparable to a police officer who pulls over a driver for exceeding the road's speed limit, or to a parent who tells their child not to eat so much candy in such a short span of time.

What is an IP address?

An IP address is the unique numerical (or, in IPv6, alphanumerical) identifier assigned to any device that connects to the Internet. Every device will have its own IP address for as long as it's online, and like a physical street address or a phone number, this enables devices to send messages back and forth. A traditional (IPv4) address looks like this: 198.41.129.1

For user devices, IP addresses are typically not permanent, because there are not enough IP addresses to go around in IPv4. Instead, the user's Internet service provider (ISP) will dynamically assign addresses as devices connect to the Internet.

A rate limiting solution may use an IP address as a basis for determining which devices are making too many requests and should be temporarily blocked.

How does rate limiting work with user logins?

Users may find themselves locked out of an account if they unsuccessfully attempt to log in too many times in a short amount of time. This occurs when a website has login rate limiting in place.

This precaution exists, not to frustrate users who have forgotten their passwords, but to block brute force attacks in which a bot tries thousands of different passwords in order to guess the correct one and break into the account. If a bot can only make 3 or 4 login attempts an hour, then such an attack is statistically unlikely to be successful.

Rate limiting on a login page can be applied according to the IP address of the user trying to log in, or according to the user's username. Ideally it would use a combination of the two, because:

  • If rate limiting is only applied by IP address, brute force attackers could bypass this by attempting logins from multiple IP addresses (perhaps by using a botnet).
  • If it's only done by username, any attacker that has a list of known usernames can try a variety of commonly used passwords with those usernames and is likely to successfully break into at least a few accounts, all from the same IP address.

Because rate limiting is necessary to prevent these brute force attacks, users who can't remember their passwords may be rate limited along with malicious bots. Users will likely see a "too many login attempts" message of some sort and be prompted to try again within a specified timeframe, or be advised that they are locked out of their accounts altogether.

How does rate limiting work for APIs?

An API, or application programming interface, is a way to request functionality from a program. APIs are invisible to most users, but they're extremely important for applications to function properly. For example, a restaurant's website could rely upon the API of a table reservation service to enable customers to make reservations online. Or, an ecommerce platform could integrate a shipping company's API to provide users with accurate shipping costs.

Every time an API responds to a request, the owner of that API has to pay for compute time: the server resources required for code to run and produce a response to that API request. In the example above, the restaurant's API integration will cause the table reservation service to pay for compute time whenever a restaurant customer makes a reservation.

For this reason, any application or service that offers an API for developers will have limitations on how many API calls can be made per hour or day by each unique user. In this way, third-party developers don't overuse an API.

Rate limiting can also motivate developers to pay more for leveraging the API: often they can only make so many API calls before paying more for the API service.

Rate limiting for APIs helps protect against malicious bot attacks as well. An attacker can use bots to make so many repeated calls to an API that it renders the service unavailable for anyone else, or crashes the service altogether. This is a type of DoS or DDoS attack.

How do social media platforms like X (Twitter) and Instagram use rate limiting?

Social media platform rate limiting is often similar to API rate limiting. Any third-party application that integrates X (formerly known as Twitter), for instance, can only refresh to look for new posts or messages a certain amount of times per hour. Instagram has similar limits for third-party apps. This is why users may occasionally encounter "rate limit exceeded" messages.

What is the difference between bot management and rate limiting?

Rate limiting is fairly one dimensional: While useful, it can only stop very specific types of bot activity. Additionally, rate limiting is not just for bots, but for limiting usage in general. Cloudflare Rate Limiting, for instance, protects against DDoS attacks, API abuse, and brute force attacks, but it doesn't necessarily mitigate other forms of malicious bot activity, and it doesn't distinguish between good bots and bad bots.

In contrast, bot management can holistically detect bot activity in general. For instance, Cloudflare Bot Management uses machine learning to identify likely bots, which enables it to block a wider variety of bot attacks (like credential stuffing, spam posting, inventory hoarding, etc.).Smaller organizations can also block bad bots with Super Bot Fight Mode, available on the Cloudflare Pro and Business plans.

Everything You Need to Know About Rate Limiting for APIs

What is rate limiting?

Rate limiting is a technique used to control the rate at which requests are made to a network, server, or other resource. It is used to prevent excessive or abusive use of a resource and to ensure that the resource is available to all users.

Rate limiting is often used to protect against denial-of-service (DoS) attacks, which are designed to overwhelm a network or server with a high volume of requests, rendering it unavailable to legitimate users. It can also be used to limit the number of requests made by individual users, to ensure that a resource is not monopolized by a single user or group of users.

There are several ways to implement rate limiting. One common approach is to set a maximum number of requests that a user or client can make within a given time period, such as a minute or an hour. If the user exceeds this limit, their subsequent requests may be denied or delayed until the rate limit is reset.

Rate limiting can also be implemented at the network level, by setting limits on the number of requests that can be made to a specific network resource or by limiting the overall rate of traffic on a network.

Why is rate limiting important?

Rate limiting is an important tool for managing network resources and ensuring the availability and performance of networks and servers. It is used widely on the internet and in other types of networks. Here are some key benefits of rate limiting.

Prevent DoS attacks

Rate limiting is often used to protect against denial-of-service attacks, which are designed to overwhelm a network or server with a high volume of requests, rendering it unavailable to legitimate users. By limiting the rate of requests, it is more difficult for an attacker to successfully execute a DoS attack.

Manage resource utilization

Rate limiting can help to ensure that a network or server is not overloaded by a high volume of requests, which can negatively impact performance and availability. By limiting the rate of requests, it is possible to better manage resource utilization, prevent resource starvation, and ensure that the resources are available to all users.

Prevent abuse

Rate limiting can be used to prevent a single user or group of users from monopolizing a resource and to ensure that the resource is available to all users. It can also be used to prevent users from making excessive or unnecessary requests, which can waste resources and impact the performance of a network or server.

Improve user experience

By limiting the rate of requests, it is possible to improve the user experience by reducing delays and improving the responsiveness of a network or server. This can be particularly important for applications that require real-time or near real-time responses, such as online gaming or voice-over-IP communication.

Reduce costs

Rate limiting can help to avoid extra costs by preventing the overuse of a resource. If a resource is overloaded by a high volume of requests, it may require additional resources or capacity to handle the load, which can incur additional costs. By limiting the rate of requests, it is possible to reduce the demand on a resource and avoid the need for additional capacity.

How does rate limiting work?

Rate limiting tools track and throttle requests by monitoring the rate at which requests are made to a network, server, or resource and enforcing limits on this rate. There are several ways to implement rate limiting:

  • Request rate limit: A maximum number of requests that a user or client can make within a given time period, such as a minute or an hour. If the user exceeds this limit, their subsequent requests may be denied or delayed until the rate limit is reset.
  • Traffic rate limit: A maximum rate of traffic that can be transmitted over a network or between networks. This can be used to limit the overall rate of traffic on a network or to prioritize certain types of traffic, such as real-time or mission-critical data, over other types of traffic.
  • Resource-based rate limit: A maximum number of requests that can be made to a specific resource on a network or server. This can be used to ensure that a resource is not overwhelmed by a high volume of requests and is available to all users.

Rate limiting can be implemented at the network level, by setting limits on the rate of traffic or on the number of requests made to specific resources, or at the application level, by setting limits on the number of requests made by individual users or clients.

Rate limiting vs. API throttling

Rate limiting and API throttling are techniques used to control the rate at which requests are made to a network, server, or resource. However, there are some differences between the two:

  • Scope: Rate limiting is a general term that refers to the practice of limiting the rate of requests made to a network, server, or resource. API throttling specifically refers to the practice of limiting the rate of requests made to an application programming interface (API). APIs are used to enable communication between different software applications and systems, and API throttling is used to ensure that the API is not overwhelmed by a high volume of requests.
  • Purpose: The main purpose of rate limiting is to prevent excessive or abusive use of a network, server, or resource and to ensure that the resource is available to all users. API throttling is used for similar purposes, but specifically to protect the API from being overwhelmed by a high volume of requests and to ensure that the API is available to all users.
  • Implementation: Rate limiting and API throttling can be implemented in similar ways, such as by setting limits on the rate of requests or traffic and enforcing these limits using algorithms or other techniques. However, API throttling may involve additional considerations, such as setting different rate limits for different API endpoints or for different types of API clients.

Common rate limiting algorithms

Leaky bucket

The leaky bucket algorithm is similar to the token bucket algorithm, but instead of storing a fixed number of tokens, it stores a fixed amount of data. As requests are made, data is removed from the bucket. If the bucket is empty, requests are throttled until more data becomes available. The rate at which data is added to the bucket can be used to control the rate of requests.

This algorithm is simple and easy to understand and implement. It allows a fixed amount of data to be transmitted at a consistent rate, which can be useful for applications that require a steady flow of data.

However, the leaky bucket algorithm can be less accurate than other algorithms in tracking and enforcing rate limits, as it relies on a fixed rate of data transmission rather than a fixed number of requests. It can also result in delays for some requests, as the rate at which data is transmitted may not match the rate at which requests are made.

Token bucket

The token bucket algorithm is a common method used by rate limiting tools to track and throttle requests. In this algorithm, a bucket is used to store a fixed number of tokens, each of which represents a request that can be made. As requests are made, tokens are removed from the bucket. If the bucket is empty, requests are throttled until more tokens become available. The rate at which tokens are added to the bucket can be used to control the rate of requests.

One advantage of the token bucket algorithm is that it is memory efficient, as it only requires a fixed number of tokens to be stored in memory. This can be important in systems with limited memory resources. However, the token bucket algorithm is susceptible to race conditions, which can occur when multiple threads or processes attempt to access the same resource simultaneously.

Fixed window

The fixed window algorithm is a method used by rate limiting tools to track and throttle requests by dividing time into fixed intervals, or windows. Requests are counted within each window and if the number of requests exceeds a predetermined limit, subsequent requests are throttled until the next window.

Sliding log

This rate limiting technique involves keeping a log of all requests made by a client within a specific time period, using a sliding window of fixed size. It can be useful for more advanced rate limiting scenarios, such as when it is necessary to distinguish between different types of clients or to implement more complex rules for limiting the rate of requests.

However, it is also more resource-intensive, as it requires the server to maintain a larger and more detailed log of requests.

Sliding window

The sliding window algorithm is a method used by rate limiting tools to track and throttle requests by dividing time into a series of overlapping windows and counting the number of requests made within each window.

It works by keeping track of the number of requests made by a client within a specific time period, using a window of fixed size. The size of the window determines the maximum number of requests that can be made within that time period, and the window slides forward with each passing moment, discarding old request counts and allowing new ones to be recorded.

The sliding window algorithm is more flexible than the fixed window algorithm, as it allows the size and duration of the windows to be adjusted based on the rate of requests. This can make it more accurate in tracking and enforcing rate limits. However, this algorithm may be more complex to implement and maintain than other algorithms.

Requirements of an efficient rate limiting system design

Functional requirements are the specific capabilities or features that a system must have in order to perform its intended functions. These include:

  • Tracking the rate at which requests are made to a network, server, or resource.
  • Enforcing limits on the rate of requests made to a network, server, or resource.
  • Handling requests that exceed the rate limit, either by denying the request or by delaying it until the rate limit is reset.
  • Distinguishing between different types of requests and applying different rate limits to different types of requests.
  • Applying rate limits to individual users or clients, as well as to specific resources.

Non-functional requirements are the characteristics of a system that describe how it should behave, but do not directly relate to its specific functions. These include:

  • Handling a high volume of requests without experiencing delays or failures.
  • Scaling up or down as the volume of requests changes.
  • Adapting to changes in the rate of requests in real-time.
  • Maintaining accuracy and consistency in tracking and enforcing rate limits.
  • The system must be able to withstand attacks or other forms of abuse that may attempt to bypass or circumvent the rate limits.
  • It should be easy to manage and maintain.

Rate limiting challenges and best practices

Here are some challenges that can arise when implementing rate limiting:

  • Identifying the appropriate rate limit: Determining the appropriate rate limit for a system can be difficult, as it depends on a number of factors, such as the available bandwidth, the number of users, and the type of traffic. Setting the rate limit too high may result in network congestion, while setting it too low may prevent users from accessing the system.
  • Dealing with bursty traffic: Rate limiting is designed to handle steady traffic flows, but it can be challenging to handle bursty traffic, where the rate of traffic can fluctuate rapidly. This can result in some traffic being dropped even when the network is not congested.
  • Avoiding false positives: Rate limiting algorithms may sometimes flag legitimate traffic as malicious, and block it as a result. This can lead to problems for legitimate users, and it may require additional monitoring and adjustments to the rate limiting algorithm to avoid false positives.
  • Ensuring fairness: When multiple users or applications are sharing a network, it is important to ensure that the rate limiting is fair and that each user or application gets an appropriate share of the bandwidth. This can be challenging, especially in dynamic environments where the number of users and the amount of traffic can vary over time.
  • Scaling to handle large volumes of traffic: As the volume of traffic increases, the rate limiting system may need to be scaled to handle the increased load. This can be challenging, and it may require additional hardware and software resources to ensure that the rate limiting system can handle the increased traffic.

Here are some best practices for rate limiting that can help solve the above challenges:

  • Identify the needs of the system: Before implementing rate limiting, it is important to understand the requirements of the system and the goals of the rate limiting. This will help to ensure that the rate limiting is implemented in a way that meets the needs of the system.
  • Choose an appropriate algorithm: Several algorithms can be used for rate limiting, several of which were discussed above. It is important to choose an algorithm that is appropriate for the needs of the system and that can be implemented effectively.
  • Set appropriate limits: The rate limit should be set at a level that is appropriate for the needs of the system. This may involve setting different limits for different types of traffic, or for different times of day.
  • Monitor and adjust the rate limit as needed: The rate limit should be monitored to ensure that it is effective and that it is not causing problems for the system. If necessary, the rate limit can be adjusted to ensure that it is providing the desired level of protection.
  • Combine with other traffic management techniques: Rate limiting should be used in conjunction with other traffic management techniques, such as traffic prioritization, to ensure that important traffic is able to get through even when the network is busy. This can help to ensure that the system remains available and responsive even under heavy load.

Products and tools that can be used to implement API rate limiting:

Kong is a popular open-source API gateway that also includes rate limiting functionality.

Tyk is another open-source API gateway that offers rate limiting and other security features.

RateLimiter.io is a cloud-based rate limiting service that can be used to protect APIs from abuse.

AWS WAF is a web application firewall (WAF) service that can be used to protect APIs from a variety of threats, including rate limiting.

Azure Front Door is a content delivery network (CDN) service that can also be used to protect APIs from rate limiting and other threats.

Gloo Gateway is a cloud-native API gateway that can be used to protect APIs from abuse. It offers a variety of features, including rate limiting, authentication, and authorization.

Apigee is a platform for developing, managing, and securing APIs. It offers a variety of features, including rate limiting, authentication, and authorization. Apigee is a good option for enterprises that need to manage a large number of APIs.

相關文章