Skip to content

Rate limit fix #863

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Rate limit fix #863

wants to merge 2 commits into from

Conversation

ricnava00
Copy link

If $elapsed_seconds > $this->intervalSeconds the first time hasRequestQuota() is called, rateLimitReached is never set to true and the limit is not enforced This can happen if the rate limit is set in a callback function some time after starting

I completely removed $rateLimitReached since its only use was in the removed condition check

Alternatively, it should be possible to reinitialize currentStartTime when setRateLimit is called, but I still don't really see the usefulness of resetting the timer only when the limit was reached in the previous interval

If $elapsed_seconds > $this->intervalSeconds the first time hasRequestQuota() is called, rateLimitReached is never set to true and the limit is not enforced
This can happen if the rate limit is set in a callback function some time after starting

I completely removed $rateLimitReached since its only use was in the removed condition check

Alternatively, it should be possible to reinitialize currentStartTime when setRateLimit is called, but I still don't really see the usefulness of resetting the timer only when the limit was reached in the previous interval
@zachborboa
Copy link
Member

Hi. I don't believe the rate limit is (currently) intended to be set after starting requests.

Here's an example of how setRateLimit can be used:

$multi_curl = new MultiCurl();
$multi_curl->setRateLimit('60/1m');

$multi_curl->addGet('https://httpbin.org/ip');
// etc.

$multi_curl->start();

What's the use case for modifying the rate limit in a callback?

@ricnava00
Copy link
Author

I use it like this

$multi_curl=new MultiCurl();
//$multi_curl->error(...);
//$multi_curl->setRetry(...);
//Other shared settings
foreach(["fast","slow"] as $site)
{
	for($n=0;$n<10;$n++)
	{
		$ch=$multi_curl->addGet("http://".$site."-site.com/?page="+$n);
		if($n==0&&$site=="slow")
		{
			$ch->beforeSend(function() use ($multi_curl)
			{
				$multi_curl->setRateLimit('60/1m');
			});
		}
	}
}
$multi_curl->start();

If I had to call setRateLimit while multi_curl is not running I would either have to create another MultiCurl, copy all the callbacks, decide which one to use in the loop, and start them one after the other, or I would have to duplicate the loop for the fast and slow sites and set the rate limit between the two

@ricnava00
Copy link
Author

I was having a look at this issue again, and I found a more realistic problem that this solves

Usually, I handle 429 errors by sleeping for some time
This breaks the rate limit if the total sleeping time exceeds the check interval

use Curl\Curl;
use Curl\MultiCurl;

$mch=new MultiCurl();
$mch->setRateLimit("4/1s");
$mch->setConcurrency(4);
$mch->setRetry(function(Curl $ch)
{
	if($ch->errorCode>=429)
	{
		echo "Too many requests, sleeping for 1 second\n";
		sleep(1);
		return false; #In a real case you would handle the retry logic here
	}
	return false;
});
$startTime=microtime(true);
$mch->beforeSend(function($ch) use ($startTime)
{
	echo "Sending ".$ch->url." at ".round((microtime(true)-$startTime)*1000,1)."ms\n";
});

foreach(array_merge(array_fill(0,10,200),array_fill(0,8,429),array_fill(0,20,200)) as $i)
{
	$mch->addGet("https://httpstat.us/".$i);
}
$mch->start();
Sending https://httpstat.us/200 at 2ms
Sending https://httpstat.us/200 at 2.4ms
Sending https://httpstat.us/200 at 2.5ms
Sending https://httpstat.us/200 at 2.5ms
Sending https://httpstat.us/200 at 1078.4ms
Sending https://httpstat.us/200 at 1078.5ms
Sending https://httpstat.us/200 at 1078.6ms
Sending https://httpstat.us/200 at 1078.6ms
Sending https://httpstat.us/200 at 2210.3ms
Sending https://httpstat.us/200 at 2210.4ms
Sending https://httpstat.us/429 at 2210.4ms
Sending https://httpstat.us/429 at 2210.5ms
Too many requests, sleeping for 1 second
Too many requests, sleeping for 1 second
Sending https://httpstat.us/429 at 4340.8ms
Sending https://httpstat.us/429 at 4340.9ms
Sending https://httpstat.us/429 at 4341ms
Sending https://httpstat.us/429 at 4341ms
Too many requests, sleeping for 1 second
Too many requests, sleeping for 1 second
Too many requests, sleeping for 1 second
Too many requests, sleeping for 1 second
Sending https://httpstat.us/429 at 8473.8ms
Sending https://httpstat.us/429 at 8473.9ms
Sending https://httpstat.us/200 at 8474ms
Sending https://httpstat.us/200 at 8474ms
Too many requests, sleeping for 1 second
Too many requests, sleeping for 1 second
Sending https://httpstat.us/200 at 10607.2ms
Sending https://httpstat.us/200 at 10607.2ms
Sending https://httpstat.us/200 at 10607.3ms
Sending https://httpstat.us/200 at 10607.4ms
Sending https://httpstat.us/200 at 10736.1ms
Sending https://httpstat.us/200 at 10736.2ms
Sending https://httpstat.us/200 at 10736.3ms
Sending https://httpstat.us/200 at 10736.3ms
Sending https://httpstat.us/200 at 10864.5ms
Sending https://httpstat.us/200 at 10864.6ms
Sending https://httpstat.us/200 at 10864.7ms
Sending https://httpstat.us/200 at 10864.8ms
Sending https://httpstat.us/200 at 10992.4ms
Sending https://httpstat.us/200 at 10992.5ms
Sending https://httpstat.us/200 at 10992.6ms
Sending https://httpstat.us/200 at 10992.6ms
Sending https://httpstat.us/200 at 11122.7ms
Sending https://httpstat.us/200 at 11122.8ms

As you can see, the first 200s are received in batches of 4 each second, which is correct. After sleeping, the batches are all consecutive and no longer wait.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants