Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce redundant call of prepareClientToWrite when call addReply* continuously. #670

Merged
merged 2 commits into from
Jun 25, 2024

Conversation

lipzhu
Copy link
Contributor

@lipzhu lipzhu commented Jun 19, 2024

Description

While exploring hotspots with profiling some benchmark workloads, we noticed the high cycles ratio of prepareClientToWrite, taking about 9% of the CPU of smembers, lrange commands. After deep dive the code logic, we thought we can gain the performance by reducing the redundant call of prepareClientToWrite when call addReply* continuously.

For example: In https://github.com/valkey-io/valkey/blob/unstable/src/networking.c#L1080-L1082, prepareClientToWrite is called three times in a row.

Profiling of Hotspots

image

Test Environment

  • OPERATING SYSTEM: CentOS Stream 9
  • Kernel: 6.2.0
  • PROCESSOR: Intel Xeon Platinum 8380
  • Server and Client in same socket.

Performance Result

Perf Boost

Test Name Perf Boost
memtier_benchmark-1key-list-100-elements-lrange-all-elements.yml 3.5%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml 13.1%
memtier_benchmark-1key-set-100-elements-smembers.yml 4.9%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml 10.0%

Test Scenario

1. Start server

taskset -c 0-3 ~/valkey/src/valkey-server /tmp/valkey.conf

port 9001
bind * -::*
daemonize no
protected-mode no
save ""

2. Prepare test data

valkey-cli -h 127.0.0.1 -p 9001 "SADD" "set:1000" "tocawprsxz" "csqcfxyzsy" "ipubipttcb" "jqurtadjro" "zdulfflfqx" "bbfvuialin" "ifpfyncdfe" "kjeevccyof" "stttkrbfqs" "qatkvfuttq" "ltvfnuuwil" "znartcywze" "bzursuzuei" "jlghqxtvok" "osncqcuest" "uxvibjduto" "giubyhvaav" "joijmgposs" "lqxpnplleo" "bnatichltp" "nnfxoqebys" "lyhqvxolfw" "caaqjozcqh" "nlqtadqinl" "hfjxrrsszf" "fussukqrph" "cbjovvgqqy" "wcpbfslakk" "slskrnekbc" "nvonwipkta" "jhdcicttmm" "rpyroriegs" "lnuuootxmp" "ufdxqlonpg" "qgasrnjrld" "jhocasnttw" "smwbxeqbed" "kapxglqccs" "hhwvslfxmf" "rbdabbmnrf" "ltgidziwzm" "qpffifpdiz" "sadqcfniau" "bqoabrqwwj" "wghyakzbit" "bnxtlmiwup" "akzsgaeqon" "zwlhpcahwu" "kfselnpkim" "dxtzxeguoi" "roczxpuisd" "myzjxvtvjh" "alfftdxaxc" "vyiqkfoics" "dygkzcpakt" "ukprrucywq" "afzlyodwiz" "vdaebbupfe" "wemmvswznk" "xzbqjpzqlm" "lqqivzlppd" "rrzcqyzdzf" "ncckxlmsvg" "cpjveufsvk" "babfjxxabw" "btcvhacldb" "mqqrgbacfa" "eqaxrccwjq" "erahoeivfw" "omeatkwwtc" "mjwrbndexu" "gktcjcfxbb" "tfonhwnuxj" "pheajlhymx" "vefgwelnfo" "gayysuldha" "tqpqihwjtl" "eirhwkdgfq" "rnfodijavx" "erqgyscser" "nnnxouavyp" "yhejmjwwni" "mwmcwqzbld" "ofcurtthcs" "idmjjbjvni" "ovqohpxjft" "ocoflktdhp" "kgopxvsdah" "pyjpxqnavq" "nubsytpfao" "ddpgrvwowd" "glynpmsjcf" "whsxmqffqg" "sstqpivwip" "cqfnhujrbj" "gsvkmnluiz" "zdmgjjyukl" "gcfcbjybkx" "qmhyoyctod" "kdodndexvr" "tkgihmsrha" "kmifjielrw" "gefoharnza" "tcpwkimype" "nxllkzroin" "bpvbnmpekh" "ljinllovsw" "mugdxqnjxj" "tqqmmvwact" "uidvmrqyjd" "dthtfrqkce" "efhynoxlul" "iosqxoobrk" "sujbwndgwl" "btxehrokkw" "pmaagvqldo" "evuxmkrrfl" "dclualrzqb" "jfqxkxgqhj" "fvemodlpgz" "lawrpikwsk" "socoxaegfa" "snomfrutha" "yxsnreuepl" "vjihaakiof" "nnhrgirrtw" "jccorylnjg" "gehuriygwq" "icqjxcermo" "ocgjeuljxf" "qslrwqmixc" "rhzpguhsws" "zxlbhyeckf" "iziwqojsoq" "qlorevyltp" "gbjzsedhag" "mkxysrkpug" "bhrvnadcdk" "qxxinxaqxn" "ctnaggdbru" "fsthobmdxk" "cvnnitrrow" "vlhtdpqavh" "vhjaphfdpj" "yhdgqenmwv" "ysntbzffxq" "emfjcnujqn" "qnqzibcmip" "ngcxqjjpdm" "gkneclxnnt" "rhkpfsuhoq" "dgodkfjzos" "isqymcuffe" "ripecixnpr" "dxpepbctea" "gbeizdzdyb" "aqlapmghln" "yhlalzusch" "xglrugpjkt" "ngwifjdpha" "jvekvvldai" "hmdzsuuyrn" "ncabqesziv" "whdftyqojv" "rhzqdtxucc" "ftyxhyfokj" "vqtixjkcbb" "krfosgrmjb" "ahcaaodvgi" "ooeswhfdnj" "jhctncrzlw" "haxesjafmh" "vxrtzngznb" "fidsuuizcf" "mamtueyrqn" "quemrlmwod" "pkgpjwyfbh" "ckibsdtfff" "tjnjhejnju" "puvgjfjyaf" "cvmicoarvv" "mxpzuzrzuo" "rrrfhnclbv" "xeurpmfdmo" "yqvkykgjbe" "behdxlfdho" "dyzedskzkq" "rfhlttsuqy" "pkehotsmka" "alokvrpbih" "mobwpcyxuk" "umwunfzsvo" "naggqdxcjm" "rakustfykw" "dtkfydidli" "kohpozxkhl" "usjmfkopln" "axhoxkubdv" "asretszbav" "tmkoxwdgpx" "wjhaavxfge" "pcuaesomdc" "vjhpmffzxc" "qwxzqlqter" "jjumoixniz" "ruxsmttpak" "pjdundsxrd" "kdklhpxntt" "muhewfzihs" "dplonqlliz" "wjibkklezg" "dfemamyevk" "nryvfijxhj" "bqqohkuylc" "wiqhmhkiel" "lftmqoxhfc" "sjbaedopjb" "dlomhvkoxg" "jhkdwtqvwl" "vqashxkrik" "mupcilqfjg" "suahxaebee" "rqkcyxiwhz" "jqgtbgbybq" "ygbfgfefac" "kjblkrvknt" "yajpmxmuwz" "wwowdvybjj" "bdtbaxnuko" "adfhfatarh" "vfcpevtekf" "fiugzrozky" "spogjykkfs" "tdggmsxysk" "aoqlctikzg" "nwywtydqew" "qjrhtqgwjc" "dhzgpwewsx" "outdlyeqvq" "trwzipsers" "qtpcwuafar" "scgjdkyetq" "aqyfvxyjqr" "xkvgnzjgrm" "hhbceuegvh" "paitaeqrpb" "yfdsmhtria" "bxwvqvndcc" "dpyjoihqrs" "tnratexlre" "hplvvuoscb" "muocqqypmt" "pxzkuasjek" "flrsaczxzc" "pubqtzzzko" "vpqlxtfkjz" "fiafoggekm" "qtewhixedb" "iijjcabgak" "tqjpijliii" "uttazeawix" "hxbmykkugi" "bekchebgys" "ffrviosqzo" "rjrptuhkav" "sldzewoxas" "uesalivsis" "maxylirjgh" "vpzsmbjkvy" "eiziligjfr" "tqblforkpa" "nszbrpweoz" "rzanpefsfy" "cejkfhuykf" "abinkgshoi" "gqybtjuhvq" "oqdlpaubsc" "nrbfkysxaf" "mhxojehvxx" "vuqlqdpfdn" "orqqevpmca" "xigznrdgqy" "jzceexkqam" "szupcnvvij" "btgeubdzbb" "nojnedgabk" "sdnkjddyut" "lbjarnpxhh" "wevfinjbqk" "dvgqwzignk" "ejzwnidqwr" "nlxwjmzwln" "brrlblrxwa" "hyikggurti" "wybmlpqblt" "hertbwuzyw" "rwhzzytdsq" "symbgeyple" "zbfeyptemz" "pghbwbtfmk" "mxydilgynv" "bhwytqsafu" "ecsburyjhh" "cvohdragtx" "lscjhgztom" "giswndixdf" "etsngvbrff" "lgqazzajpx" "pypepewjvq" "nswjopvtqv" "tuajnnqtcq" "bvvoibkfrt" "kjqeujfkoh" "diwmfuckel" "bwizktcwmb" "ughnpilqqm" "ihealvwnxb" "thqttakyzy" "auwfujaoya" "rofnkytnhm" "ilkuddrdvh" "hmwfncgzxg" "pzrchtwaaw" "ffksbrtbfq" "ethxaycsil" "uwiqrvcqvu" "fgcehqgsso" "yoblelzlkd" "gjiwldcfqh" "sbrjnwxdip" "nenhiiibwx" "ebhhhgabjd" "xpkwqbfban" "pupmdjgyed" "aejnvyfdst" "krxneqolle" "nouncgkoik" "kamgfgbxel" "fffylsswky" "agswwrfabr" "pkvcbelpos" "mxapzqqqsw" "ywmqoaztmy" "sfuvzzxbxq" "kdcvbkrbsj" "twpiiaedpc" "egmgddriry" "nmfihtnkel" "kqzjnkdlxd" "eovsizpcjp" "bsavjyaksg" "xlmvatfsly" "dlhjfafskj" "wmvhvwnowp" "vjjozwrovk" "gbazuqnmit" "ubwlcefgqb" "jttqzbazgz" "dozecfsvue" "pgdhjrxhga" "gzekysdunp" "ygoiannoht" "hklchdenoe" "sotbjzlsvz" "qjwrnhooax" "cdghgcsoth" "mjlpvuoghe" "qclkaeciey" "oownjpxrov" "nvqfyljbef" "tsnawydcru" "wrrgxxkxkc" "ylulwsnjay" "lxsinouutc" "ozpyyaznsh" "cmhkstsjok" "ybckvbeoib" "fsoardckcw" "ltkauvxggz" "sqwhsgboef" "wgtjxahmef" "spoqshzjoi" "pfvfxrrfhl" "nahweurftw" "fojjpqmbck" "zexblqeora" "qsoiwsugdv" "ksppwhhqzj" "otadcihtmd" "imnjbkmsls" "zzenkvuesw" "kbfqdppnfa" "igehetokzq" "koujdppfua" "wqsqzzbqhm" "tglieutcis" "owovlhorvw" "nraylduhut" "nwnyjkugcf" "kpfqxroqbs" "xwxwosqkhm" "ollacusjzj" "wcouaiatsu" "nvkfnfzoki" "fgjnsosfrp" "pltsnzqvpi" "rhnkzlsjtk" "ysnndkycix" "bpnfopinub" "blujwnyluy" "wgtmckqknh" "zorzyqtjtr" "hvtlkrungk" "rgtondctpo" "mjgvtydjtm" "kcbotffyca" "gybxnvwchp" "gazojexans" "hmcpcrjumm" "zejhycldyy" "iiissmznfe" "qvpuudyuks" "gviypfayfm" "plqbwsiuzw" "nunchscyqc" "qocjpufxio" "iqbyikqjmx" "omwbgglqsp" "nywteueaig" "ntmgbzaivy" "ijdgnlzprg" "rnlaakgsrf" "fpdflprzvn" "azkdbpnshy" "mvfnirshbd" "sotsxznskx" "uzktwqcdeb" "myrrmvflyw" "jgaieawkcu" "utymwhxigo" "vtaiyncmyg" "gpodilvrnm" "xgfzndhodu" "saqilljaid" "jxiewthqls" "nbwksmwxpx" "rwfykeeqgx" "tlnkrncpwi" "ogyvxbgcwi" "ffcqkkzllx" "rtnhivnxtb" "vzcclamtun" "jjlefkekuw" "xjksnqifds" "ctusqixohm" "osaekeukqx" "irlduoinie" "nifzrybfuh" "ctqxoyxbwc" "vsvhjrymqc" "bzwxqcpftf" "ltghdkluqq" "vklwhyzqhk" "ghwcrdlbjj" "lzzptujbjp" "qlvgfplbod" "ghepftfjgk" "aiqqyusnuv" "rspghuhpbp" "lfkqrtxocm" "iibgagtkpg" "ywiurvfbpg" "tdceweesxh" "pvwvdaorrl" "ejlunxlwxn" "ymqxhmnidz" "lydebbpmfb" "ztjuqomjck" "eyrbqexkff" "oqmuhlruqy" "gnrmnwaxls" "mumhqarhgg" "skbzfbeziu" "hnnfmyurhx" "yrsizkbbwz" "azpwrzovza" "txhllnvudv" "aslibwggrp" "ubghghklvj" "jqqogagqni" "emfqsjraia" "ctgwmawlgl" "mivctgaajt" "knycrcrsbm" "ubtiscdgrn" "ulepgommyy" "qbhdjhoohc" "cctlfgicpv" "phfuspevwk" "oeawjlqnyg" "jpphbjtbrh" "ofykgotycd" "csjfbpjyzq" "thmmmlqluk" "buzhjxsbkm" "pisgqibyae" "skkawcmqqt" "mmqblvrscy" "dpkiubfzbx" "yivxcecwlp" "kbnjiilaqd" "rwrxxrnwtq" "veegnotgmj" "pbfijwccjp" "expefhkisx" "ynnhyctikq" "bhfmhanvxe" "otclvmbilg" "hskkmrluuf" "ftnbjymlll" "nbkaxrojqq" "qydrgilxxt" "dxufcyurjx" "fgygwdazbm" "tivnqailcl" "jwvqixjhho" "oglqutqfcx" "wvrlxfoxff" "ropuqidkxv" "qcsxjrjcfc" "twuvkpjzzw" "fqtktfghcv" "suhwnartid" "wvsnfinuil" "rngtndwjyg" "tsmzfswaxo" "uvlswctlhx" "llamjvxyqo" "wovoupawzt" "caxgjftjyj" "gwzqcetcji" "yzrdbalexf" "fnpdsuozxt" "dbtbtvkqss" "pwgjoppmgc" "wxjdgbugeu" "qchpfcigwa" "lxzdcbveuy" "bwjyghaztz" "uedehyieof" "pfaytznuaa" "lspvrnxnjo" "zkbqvttlzy" "fkdmuxraqf" "nbizrabfuo" "fgzwwaedjy" "gkmwutvars" "bwsdzrxzse" "txgjxzovte" "cbtpbbfrdd" "vqgztpmzhz" "rdipvyeqoi" "bovkdabcdo" "fhobhpwwkp" "mkbkflixkr" "mjifqzmtsd" "pkcqdokojd" "dtgjnddwch" "uboipezuni" "dfdodbelzn" "fzsoiryhfn" "krtsiucvvu" "aieekmivcb" "aeafusfzdn" "ehnrizfmfo" "dcjlwhfstw" "wksgvbkbyw" "hvfprkjlbc" "jlgepeyhpc" "ljklggibcy" "mhrvuemywb" "wdqygrxkya" "ystnkbogee" "flvftlpbjq" "vgfgbsbnwy" "rsivptwulz" "bzjzucrypq" "bweysooxiv" "mmcunsiwad" "mszjkgsrio" "bvurseeqmh" "wtcpliaxmk" "ndwiompimr" "mdcwoblmkl" "dflxukffgl" "mcojdazpfq" "tctgzmjads" "dewdgfrhos" "iwqanwtvcd" "nfucelqjfe" "wgtrwefdsw" "skstqdgbos" "rwllkdzxrj" "qwozutlufu" "fmpdixcckx" "jybzltmwrs" "ossjrvqmaa" "adlxahxsbq" "mbewprqunw" "xbvbujurqw" "rnvhfxbuoi" "pyrpwxalpc" "adlryhdbpr" "gritvkzfgw" "aufhfrhccf" "umoicweaab" "kgirldeylz" "nknlysgviv" "plbxaamppj" "ikpikupjoi" "eioxaswdee" "imexfccbxk" "ouroipthpq" "jbzyfznpdn" "asidljmwgb" "jeazfmhrcb" "dablvesuho" "zuoqjiciij" "qmxxfyuodo" "vkqalcokst" "jhibapuhga" "cmqraybrlw" "beqsnrixhl" "rmqxtqcxua" "ndltyojjxj" "hyanpicfan" "yzutuazhmh" "tumnalubch" "eksvvoxziw" "weqhfkosif" "wwfbpjatrp" "lrhrkuyzry" "uvbtcgtopw" "fmyleefltp" "kkrxiaiife" "gbkqhfumyu" "tdmjyuitvv" "jvtalmlkng" "rdsfcdvkqz" "xqvjnlpssl" "fuftndsnim" "keklddczkd" "wrqnytptzm" "rwzijctxzs" "btakuczlec" "fuipidfbjt" "kjiqagynco" "ahjawbsqcw" "iehxaaneev" "ezbiwqnabg" "pnnzqcutoq" "wlogkzxkpo" "xzswnnldvs" "qqfnugftmr" "zuccleayil" "ckqebhazel" "brwlqbfoat" "anmcogawkg" "roqzbzpbbt" "dxnprfawun" "fffreqppjj" "gfdzgxfdcg" "sshbuxfljd" "shckmujxzo" "rqurawzebz" "vpehhmoxva" "vldwfdnicm" "tzhjrlfvfp" "ymwwctfodg" "qsxfnsicrx" "gfhrrjczsp" "gtqrsktbaa" "dniplpxfof" "htawohddyn" "dbcxnsiacw" "dhfundvlpn" "uewpgskfpu" "cuuytorpnp" "vlcnbfqvox" "jbqibabrmv" "xhspgwheck" "fsuovvpgng" "gcjruttnno" "wxswusqpeo" "qhhhipzncq" "mcbuftndrr" "owjfgjqqjc" "vvmkjgajwa" "wvlvshnhmx" "ekponflaeq" "kuiumwomxi" "aoydkdfrpe" "cglxptkcsz" "uqbpcvkipa" "ubzgvzputq" "wmyphdckda" "ukdnaklmcp" "ramoirrdyd" "vwayaqmtid" "ltomuspfzc" "wzxdkpehwf" "yzcspfvcot" "cgpvvnbvlk" "farwqgfyjf" "lbxvlwzony" "ocesqguvym" "yzviqaobku" "cnngbbpowp" "ucxeoqcssr" "zcffhzusrl" "yzmodbpsnb" "aryiyaltqw" "xkaailrpns" "lpahctqgna" "cnbqnvxmjp" "nugjvhftma" "xsgcuvxzor" "xwtwtwmbgu" "emdwpvauyc" "ahfktrqmgh" "jznackjcrd" "etcsjxoqab" "kpzmuwqbnt" "dspznsgszk" "rcwbzvwbva" "mlznoaajqq" "iwuuxdactm" "zujobawsct" "snepgcispg" "cgmivhyskk" "snunzlgfkd" "ppdxnadmje" "wtzqqecgfy" "ncremxgfdb" "cblsafugqk" "hjekcxfyds" "faxedqgskm" "jjczogqdwz" "jfbgmhtjke" "nehqnkqnld" "lcdchjadll" "llimzyabsp" "iwapedwyle" "iobkwbwceu" "twmbtaxdro" "nmtmjmhmdl" "ewoqykjbkc" "tmyuncyoyd" "dcepfcdddn" "dnvwyhyhsn" "nrencopzqn" "yjyffpgoop" "uvqtefqdhk" "yjhypaonqq" "uqvzpcvugl" "cakvxrdpmj" "tvzacklhdz" "higdkhodzy" "ormdblyhhn" "wbouqpojzl" "eyhgspybnr" "lywsezpzgf" "usykkwszvh" "bcwncpnibg" "jgcqryhsvk" "yfvwesgulw" "geizujxrkg" "zknlteeaxq" "nqwjivcosg" "qmnxipsiga" "pthacnunjj" "afamsavgsi" "bzfzxzecrs" "sxcihybfci" "padscbypdo" "gaotvjctjh" "beicnwdryg" "xsueeljljp" "mkrrypcfzy" "ekjgqnjxyl" "iyeiercbxr" "rkwlgzhvvy" "hmnaoeeasz" "aquymkrswt" "ulnnuwyptq" "xftfzsoiwc" "urkkyscfti" "wabroeeoop" "qpzkuxsipr" "dxdngrmypg" "icatrrbcjs" "fhuptkhkzm" "apyzwvajot" "vealtjlqyc" "khkkfmzkow" "trzqdcaqdw" "itmekixthv" "pudgkcbwdx" "zuibhuihtz" "kzuywkxlku" "ogtqmpnzie" "jetamrlglx" "fjdjumschq" "kprzbyngsw" "xeyxlxiqch" "dtuhvpszzt" "fpmbbgiaao" "hjlhurakwh" "mshexjmkmn" "cynhehkcxs" "cvbbbdzmie" "cvnlzjdfgf" "ifhkjgmxrd" "audguegpmo" "jzstgleeby" "eafrzhdhhq" "fmmammvdyj" "uncqdpbhwb" "fzatoyblsr" "xtwlklqdna" "ydqppngxvh" "mkngszsxeu" "vyewicgjio" "tstbluhyhj" "qzxxwlfeki" "ocmtsfpsgh" "xmknbbmdbf" "pdjmftsmob" "ygrpkpstxq" "hrhiqcarju" "aadzbodres" "curhymvwsx" "tbqidtevrl" "avchkjnlwm" "tyephutkmb" "lxoaezrdxs" "ctkwlhmgfz" "xkiuuciwrn" "irrovfyshb" "hwuofuftlr" "mhbfsuaovv" "wzuhzzdezi" "jlpobgvouj" "qbpmtomqpu" "shlwywnxpk" "srkvjhetmj" "hvxefqtmqu" "fazsvkljef" "bstezdkmig" "asbtvfzien" "vewfxcxkpf" "tqkprkoixe" "rcaatkjyur" "euleuicawb" "ifiizdeong" "cjcrpmggtu" "kxggjpatkd" "klwqsggtob" "mnsaklzgob" "xfxlervrgn" "eraxdyjftw" "xrvonyieqa" "fswhywqxhy" "iqzxblqkeo" "rxvhmzvbcv" "wvdmobfisx" "ujybghjfnf" "yufagalzhk" "qxbqbfcgjp" "vorgqhmaoq" "zewylkylsy" "vvmaucizkv" "bgcoyoduda" "vnsufnurol" "rtskokvklv" "svvdufedug" "qgdgujdvtg" "rjrtvpntke" "shgetgsird" "ywgeotcect" "zsikdzycyt" "gcsswbksnc" "qgobfhgspy" "pbxrbaxnor" "viwarrumob" "eaetplspga" "jqmscuprwq" "nkyuframnm" "gygftrsdbm" "qzlfnntjar" "fzzcioobeb" "ydigxptqbl" "bgtxhxkhvv" "hggqmlgwha" "ywlqbjqeug" "qwowxqzrkz" "zybosgbtxt" "cflarkquuv" "klaeknlbrm" "ccnbldglgl" "dpauqcpgyi" "ylxiwiesps" "xyxmlrdbui" "arqfxfqkzh" "byrkeibrfb" "laepwenqmc" "kluswgtjsf" "mgldvzleyy" "yqmzmmzwpd" "tvlckdoyfe" "dmxcbvzrxg" "qquwyuyvvw" "pmytvtksfi" "umttshfkpk" "rmdayyptch" "glwrmjpotx" "bgcnzgcmza" "ivinvxopgz" "dmbarohbfj" "rncdgqxqfq" "zmmwzkjrjl" "gdlztbhpeq" "zrwgpknaop" "powzkcrtvv" "cszvzbrmoy" "dtjljhzqcm" "anznywecwk" "amuwlfaxwv" "ajdkqflpen" "evjrybtwww" "oxsdmrdbit" "yafipxfsip" "xekxarmwcq" "dgcesswkvc" "gdqgmwxkmt" "spdyueanru" "yrvmdhnnfc" "aexxjlgwuo" "xpcpytommm" "gjutzwoxlf" "stnfirydgi" "snpuvnebpy" "rfxibyjmpg" "ortxlvmdoc" "gdozstnglr" "eqiukbyscu" "qzcrpbvatq" "dwzqowbrsd" "iesbitdnjd" "inboyxgoqa" "lfojnetxdc" "njmufqrykx" "ybcdthmgws" "igwekdegcw" "ajkgxmtamu" "qkyfpamste" "nwybjbhgep" "arqqmfmmbz" "rqiyxwpuyv" "nsdvirehqh" "qckueiqiwh" "tjnbsybxws" "jphvxuqipp" "ghtoyhrfxh" "erglflfnql" "kngwkkzwts" "nmguhcygct" "jigyicdeft" "gamcdtywne" "nunpqugdit" "ghqwxaqlef" "nqxdrqigvf" "xepfvvcovk" "ezgxjiwwig" "izizuzzjuv" "mallnshtok" "tctrsxgnrc" "exhjfssojj" "yilvzcevlj" "nepxmyiuhr" "dqqfcdugde" "iamjlqlznh" "mvmsikqfxu" "kmqlwfbsex" "pribqncfuf" "zavrjnezrf" "kmcwshsbye" "uzaejrbwue" "olezxlliej" "hjjxyybxiv"

3. Run benchmark

numactl --cpunodebind 1 memtier_benchmark -s 127.0.0.1 -p 9001 --command="SMEMBERS set:1000"  --hide-histogram --test-time 180

@lipzhu lipzhu force-pushed the redundant_prepareClientToWrite branch from 3ec1437 to 4549e46 Compare June 19, 2024 06:41
Copy link

codecov bot commented Jun 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 70.20%. Comparing base (5a51bf5) to head (f57e7d3).
Report is 12 commits behind head on unstable.

Additional details and impacted files
@@             Coverage Diff              @@
##           unstable     #670      +/-   ##
============================================
+ Coverage     70.17%   70.20%   +0.02%     
============================================
  Files           110      110              
  Lines         60077    60100      +23     
============================================
+ Hits          42160    42194      +34     
+ Misses        17917    17906      -11     
Files Coverage Δ
src/networking.c 85.29% <100.00%> (-0.09%) ⬇️
src/server.h 100.00% <ø> (ø)

... and 14 files with indirect coverage changes

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 21, 2024

Ping @valkey-io/core-team, could you help to take a look, is this optimization sensible?

@madolson
Copy link
Member

madolson commented Jun 21, 2024

Having a couple of weird optimizations that are dangerous to get wrong (we won't install the correct handler) feels weird. Do we know why that specific function is taking 7% of the CPU? Is there some cache thrashing going on maybe we can fix more generically? I read through the code in prepareClientToWrite and it seemed fairly straight forward.

@PingXie
Copy link
Member

PingXie commented Jun 21, 2024

QPS can increase by up to ~7%.

Is this just an estimate or measured using the same tests? Can you share the row test numbers before and after the change?

Do we know why that specific function is taking 7% of the CPU?

+1. prepareClientToWrite doesn't involve any system calls. Do you mind sharing cache misses metrics (for both data and instruction) for the tests?

src/module.c Outdated Show resolved Hide resolved
@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 21, 2024

@madolson @PingXie @zuiderkwast Thanks for your review on this.

Having a couple of weird optimizations that are dangerous to get wrong (we won't install the correct handler) feels weird

@madolson I did some check, the write handler should only be installed in the first call of prepareClientToWrite. Am I missing anything?

Do we know why that specific function is taking 7% of the CPU?

Maybe caused by L1 cache miss(see below profiling result).

QPS can increase by up to ~7%.

Is this just an estimate or measured using the same tests? Can you share the row test numbers before and after the change?

@PingXie The 7% improvement is based on this test-suite https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites/memtier_benchmark-1key-set-1K-elements-smembers.yml.

Do you mind sharing cache misses metrics (for both data and instruction) for the tests?

I did some profiling for cache missing
perf record -p 'pidof valkey-server' -e mem_load_retired.l1_miss,mem_load_retired.l2_miss,mem_load_retired.l3_miss sleep 10

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

image

image

@madolson
Copy link
Member

I did some check, the write handler should only be installed in the first call of prepareClientToWrite. Am I missing anything?

Your code looks correct. However there are other cases, such as hitting the CoB limit, that will cause your implementation to behavior slightly differently.

@madolson
Copy link
Member

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

This is interesting. I don't have a dedicated bare metal server to test on, but we are just calling this function a lot.

@madolson
Copy link
Member

madolson commented Jun 21, 2024

Some other observations while looking at the code more:

  1. putClientInPendingWriteQueue doesn't need to be called every time. It could just be called at the end of every command processed. We just need to set a flag that we need to do it.
  2. We do a bunch of tests that are basically immutable for the entire duration of the command. CLIENT_MASTER, CLIENTER_REPLY_OFF, CLIENT_SCRIPT, CLIENT_MODULE never change. Maybe we refactor this into a single flag we check. It should give solid branch prediction while in the inner loop of the command.
  3. We check c->conn, which is in a different cache line. This is also an immutable property and we should only need to check this once. Maybe we just add a flag for this? This shouldn't be an issue

I wonder if we try to optimize the codepath first?

@lipzhu lipzhu force-pushed the redundant_prepareClientToWrite branch 2 times, most recently from 9b5d3a8 to b7b83bd Compare June 22, 2024 03:03
…tinuously.

Signed-off-by: Lipeng Zhu <[email protected]>
Co-authored-by: Wangyang Guo <[email protected]>
@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 22, 2024

@madolson Thanks for your suggestions.

Some other observations while looking at the code more:

  1. putClientInPendingWriteQueue doesn't need to be called every time. It could just be called at the end of every command processed. We just need to set a flag that we need to do it.

You mean we cloud move below logic outside prepareClientToWrite?

if (!clientHasPendingReplies(c) && io_threads_op == IO_THREADS_OP_IDLE) putClientInPendingWriteQueue(c);
  1. We do a bunch of tests that are basically immutable for the entire duration of the command. CLIENT_MASTER, CLIENTER_REPLY_OFF, CLIENT_SCRIPT, CLIENT_MODULE never change. Maybe we refactor this into a single flag we check. It should give solid branch prediction while in the inner loop of the command.

Could you be more specific?

  1. We check c->conn, which is in a different cache line. This is also an immutable property and we should only need to check this once. Maybe we just add a flag for this?

Got it. @madolson The c->conn is in the same cache line with c->flags, in the prepareClientToWrite, the memory latency should be ignorable. Am I misunderstanding?

I wonder if we try to optimize the codepath first?

Do you want to combine them together with this patch or separate?

@PingXie
Copy link
Member

PingXie commented Jun 22, 2024

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

I assume this is the data cache misses?

Some other observations while looking at the code more:

  1. putClientInPendingWriteQueue doesn't need to be called every time. It could just be called at the end of every command processed. We just need to set a flag that we need to do it.
  2. We do a bunch of tests that are basically immutable for the entire duration of the command. CLIENT_MASTER, CLIENTER_REPLY_OFF, CLIENT_SCRIPT, CLIENT_MODULE never change. Maybe we refactor this into a single flag we check. It should give solid branch prediction while in the inner loop of the command.
  3. We check c->conn, which is in a different cache line. This is also an immutable property and we should only need to check this once. Maybe we just add a flag for this?

I wonder if we try to optimize the codepath first?

+1 on 1) and 3) but not sure about 2). flags should be hot in L1 d-cache so is the idea here about reducing the instruction count (via a combined flag), @madolson?

Copy link
Member

@PingXie PingXie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @lipzhu!

Can you confirm that you get the expected performance boost with these changes? and how much is that?

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 23, 2024

We can find the the getClientType (inlined) and its associated symbols contribute more than 17% L1 cache miss, but didn't observe the prepareClientToWrite.

I assume this is the data cache misses?

Yes.

LGTM. Thanks @lipzhu!

Can you confirm that you get the expected performance boost with these changes? and how much is that?

@PingXie I test 4 test suites which may benefit from this patch, they are all from https://github.com/redis/redis-benchmarks-specification/blob/main/redis_benchmarks_specification/test-suites repo.

Test Name Perf Boost
memtier_benchmark-1key-list-100-elements-lrange-all-elements.yml 3.5%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml 13.1%
memtier_benchmark-1key-set-100-elements-smembers.yml 4.9%
memtier_benchmark-1key-list-1K-elements-lrange-all-elements.yml 10.0%

@madolson
Copy link
Member

Do you want to combine them together with this patch or separate?

This change is much simpler, so I'm OK merging this if we are still seeing a good performance boost.

src/networking.c Outdated Show resolved Hide resolved
src/networking.c Outdated Show resolved Hide resolved
@madolson
Copy link
Member

flags should be hot in L1 d-cache so is the idea here about reducing the instruction count (via a combined flag), @madolson?

Yes. I disassembled it for my x86 machine, and there were a lot of conditional jumps which might have been causing some issues on the instruction cache. I'm surprised there wasn't a better way to optimize it though.

@madolson The c->conn is in the same cache line with c->flags, in the prepareClientToWrite, the memory latency should be ignorable. Am I misunderstanding?

You're right, I glanced at it briefly and did the math wrong. This shouldn't be a concern.

@lipzhu
Copy link
Contributor Author

lipzhu commented Jun 25, 2024

Do you want to combine them together with this patch or separate?

This change is much simpler, so I'm OK merging this if we are still seeing a good performance boost.

@madolson , Thanks for you help, I already updated the performance boost info in top comments. For the suggestion listed in #687, I‘d like to search a benchmark test suite later which can help us to prove it really works. What do you think?

Signed-off-by: Lipeng Zhu <[email protected]>
Co-authored-by: Wangyang Guo <[email protected]>
@madolson
Copy link
Member

madolson commented Jun 25, 2024

For the suggestion listed in #687, I‘d like to search a benchmark test suite later which can help us to prove it really works. What do you think?

I'm honestly okay just dropping that issue. I prefer an approach where we start with a benchmark we think is useful, and then try to optimize that case, like you did here. My suggestions were more to see if there was a way to have a more general optimization that might help the case you brought up.

@madolson madolson merged commit 4d3d6c0 into valkey-io:unstable Jun 25, 2024
19 checks passed
@lipzhu lipzhu deleted the redundant_prepareClientToWrite branch June 25, 2024 01:42
@zuiderkwast zuiderkwast added release-notes This issue should get a line item in the release notes labels Jun 25, 2024
@zuiderkwast
Copy link
Contributor

Worth mentioning in the release notes as one of the performance optimizations?

@madolson
Copy link
Member

madolson commented Jun 25, 2024

Worth mentioning in the release notes as one of the performance optimizations?

There are always going to be a lot of little performance optimizations. In Redis we usually only listed large ones (maybe this counts as large?).

I guess it's probably better to mark it and then we can decide later to remove them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance release-notes This issue should get a line item in the release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants