What a year in 2023 - come join us in 2024!
2023 was a breakout year for AI, and it was especially true for us in Meta DC Networking. I’m so proud of the many teams who develop and operate the network for the high-performance AI clusters that support our ranking, recommendation, and latest generative AI models like Llama 2, and I can't wait to share what we've got going in 2024!
Looking back on 2023
* Meta has always been very open with our technology, and last year was no exception with our AI-related infrastructure. For example, see our AI Infra overview (https://lnkd.in/gvjBhQpf); our AI networking overview (https://lnkd.in/gpfuHQ45); and many talks from OCP Global Summit last October (https://lnkd.in/gjY3KgMP).
* As someone in networking, I've always enjoyed working closely with the developers of applications that critically rely on network performance. It was the Web in the 1990s, audio/video streaming and mobile apps in the 2000s, and hyper-scale service/app deployments in the 2010s. Now, it’s AI that needs high-performance networking, and we have been constantly working with those teams over the last several years to optimize our networks for AI. For example, see our published work with the ML Commons community (https://lnkd.in/g9TY4FpT).
* There's a lot of important work going on within Meta to learn about with respect to the broader AI space. For example, just in December alone, we announced Purple Llama to help the community move toward more open trust and safety; we launched the AI Alliance as a community of developers, researchers, and adopters to advance open, safe, and responsible AI; and our FAIR team shared an overview of the last decade of their open research (https://ai.meta.com/blog/).
** We’re hiring! **
In 2024, our DC and AI Networking teams are most urgently looking for (1) firmware/driver engineers to work on networking for Meta’s AI silicon (https://lnkd.in/gwyZzJxg) and (2) engineers with HPC and distributed communication library expertise (e.g., with NCCL), ideally in the context of large ML clusters.
More broadly, several of our teams are looking for network/software/production engineers, network TPMs, and planners to work on all aspects of running DC and AI networking @scale, including physical/logical network design/delivery; intent-based networking/test infra/automation; switch fabric platforms, NOS, and control plane; NICs, kernel networking, and transport protocols; large-scale/real-time fault detection and remediation; and much more...
As you think about your own plans for 2024, please reach out if you want to know more about what we’re doing here in Meta DC Networking and how you might fit in. We’re having a great time working across our product roadmap for 2024 and way beyond, and we’d love to have you join us!
Congrats to t GDE community at hashtag #GoogleIO! Let's keep inspiring and empowering each other on this incredible journey.