JetStream - A throughput and memory optimized engine for LLM inference on TPU and GPU

About

JetStream is a fast library for LLM inference and serving on TPU and GPU.

Getting Started

Run local server & Testing

Use the following commands to run a server locally:

# Start a server
python -m jetstream.core.implementations.mock.server

# Test local mock server
python -m jetstream.core.tools.requester

# Load test local mock server
python -m jetstream.core.tools.load_tester

Test core modules

# Test JetStream core orchestrator
python -m jetstream.core.orchestrator_test

# Test JetStream core server library
python -m jetstream.core.server_test

# Test mock JetStream engine implementation
python -m jetstream.engine.mock_engine_test

# Test mock JetStream token utils
python -m jetstream.engine.utils_test

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
jetstream		jetstream
.gitignore		.gitignore
AUTHORS		AUTHORS
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
requirements.in		requirements.in
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

JetStream - A throughput and memory optimized engine for LLM inference on TPU and GPU

About

Getting Started

Run local server & Testing

Test core modules

About

Releases 3

Packages

Contributors 17

Languages

License

google/JetStream

Folders and files

Latest commit

History

Repository files navigation

JetStream - A throughput and memory optimized engine for LLM inference on TPU and GPU

About

Getting Started

Run local server & Testing

Test core modules

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases 3

Packages 0

Contributors 17

Languages

Packages