-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce redundant call of prepareClientToWrite when call addReply* continuously. #670
base: unstable
Are you sure you want to change the base?
Conversation
3ec1437
to
4549e46
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## unstable #670 +/- ##
============================================
- Coverage 70.17% 70.04% -0.14%
============================================
Files 110 110
Lines 60077 60089 +12
============================================
- Hits 42160 42087 -73
- Misses 17917 18002 +85
|
Ping @valkey-io/core-team, could you help to take a look, is this optimization sensible? |
Having a couple of weird optimizations that are dangerous to get wrong (we won't install the correct handler) feels weird. Do we know why that specific function is taking 7% of the CPU? Is there some cache thrashing going on maybe we can fix more generically? I read through the code in prepareClientToWrite and it seemed fairly straight forward. |
Is this just an estimate or measured using the same tests? Can you share the row test numbers before and after the change?
+1. |
src/module.c
Outdated
if (prepareClientToWrite(c) != C_OK) return VALKEYMODULE_OK; | ||
addReplyProto(c, "+", 1, 1); | ||
addReplyProto(c, msg, strlen(msg), 1); | ||
addReplyProto(c, "\r\n", 2, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like delegating this internal detail to the caller.
Can we make a variadic version of addReplyProto so these three calls can be replaced with just one that does prepare client just once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I will update it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@zuiderkwast To keep this patch small and clean, I removed non related logic, only did optimization for addReplyBulkCBuffer
which is in the critical path.
- Remove
addReplyLongLongWithPrefix
from header file, it is only used innetworking.c
. - Use
_addReplyToBufferOrList
to replaceaddReplyProto
inaddReplyLongLongWithPrefix
, and move the prepare client logic to parent.
@madolson @PingXie @zuiderkwast Thanks for your review on this.
@madolson I did some check, the write handler should only be installed in the first call of
Maybe caused by L1 cache miss(see below profiling result).
@PingXie The 7% improvement is based on this test-suite https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1key-set-1K-elements-smembers.yml.
I did some profiling for cache missing We can find the the |
Your code looks correct. However there are other cases, such as hitting the CoB limit, that will cause your implementation to behavior slightly differently. |
This is interesting. I don't have a dedicated bare metal server to test on, but we are just calling this function a lot. |
Some other observations while looking at the code more:
I wonder if we try to optimize the codepath first? |
9b5d3a8
to
b7b83bd
Compare
…tinuously. Signed-off-by: Lipeng Zhu <[email protected]> Co-authored-by: Wangyang Guo <[email protected]>
@madolson Thanks for your suggestions.
You mean we cloud move below logic outside if (!clientHasPendingReplies(c) && io_threads_op == IO_THREADS_OP_IDLE) putClientInPendingWriteQueue(c);
Got it.
Do you want to combine them together with this patch or separate? |
I assume this is the data cache misses?
+1 on 1) and 3) but not sure about 2). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @lipzhu!
Can you confirm that you get the expected performance boost with these changes? and how much is that?
Yes.
@PingXie I test 4 test suites which may benefit from this patch, they are all from https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites repo. |
Description
While exploring hotspots with profiling some benchmark workloads, we noticed the high cycles ratio of
prepareClientToWrite
, taking about 9% of the CPU ofsmembers
,lrange
commands. After deep dive the code logic, we thought we can gain the performance by reducing the redundant call ofprepareClientToWrite
when call addReply* continuously.For example: In https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082,
prepareClientToWrite
is called three times in a row.Profiling of Hotspots
Test Environment
Performance Result
QPS can increase by up to ~7%.
Test Scenario
1. Start server
2. Prepare test data
3. Run benchmark
References test-suites
memtier_benchmark-1key-set-1K-elements-smembers.yml
memtier_benchmark-1key-set-100-elements-smembers.yml
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml
memtier_benchmark-1key-list-100-elements-lrange-all-elements.yml
memtier_benchmark-1key-list-10-elements-lrange-all-elements.yml