Perhaps the most important aspect of the Django ORM to understand is how QuerySets work. Since QuerySets are lazily-evaluated, you can chain filter() and exclude() all day without actually hitting the database. Look out for this in order to evaluate QuerySets only when you actually need to.
- When QuerySets are evaluated:
# Iteration
for person in Person.objects.all():
# Some logic
# Slicing/Indexing
Person.objects.all()[0]
# Pickling (i.e. serialization)
pickle.dumps(Person.objects.all())
# Evaluation functions
repr(Person.objects.all())
len(Person.objects.all())
list(Person.objects.all())
bool(Person.objects.all())
# Other
[person for person in Person.objects.all()] # List comprehensions
person in Person.objects.all() # `in` checks
- When QuerySets are not cached:
# Not reusing evaluated QuerySets
print([p.name for p in Person.objects.all()]) # QuerySet evaluated and cached
print([p.name for p in Person.objects.all()]) # New QuerySet is evaluated and cached
# Slicing/indexing unevaluated QuerySets
queryset = Person.objects.all()
print(queryset[0]) # Queries the database
print(queryset[0]) # Queries the database again
# Printing
print(Person.objects.all())
- When QuerySets are cached:
# Reusing an evaluated QuerySet
queryset = Person.objects.all()
print([p.name for p in queryset]) # QuerySet evaluated and cached
print([p.name for p in queryset]) # Cached results are used
# Slicing/indexing evaluated QuerySets
queryset = Person.objects.all()
list(queryset) # Queryset evaluated and cached
print(queryset[0]) # Cache used
print(queryset[0]) # Cache used
- When Django evaluates a QuerySet, foreign-key relationships and reverse relationships are not included in the query, and thus not included in the cache, unless specified otherwise.
# Foreign-key related objects
person = Person.objects.get(id=1)
person.father # foreign object is retrieved and cached
person.father # cached version is used
## Never cached
# Callable attributes
person = Person.objects.get(id=1)
person.children.all() # Database hit
person.children.all() # Another database hit
3. Use select_related() and prefetch_related() when you will need foreign-key/reverse related objects.
- These tools tell Django that you actually will need these objects, so that it will go ahead and query and cache them for you. The common pitfall here is to not use these when they are needed. This results in a lot of unnecessary database queries.
# DON'T
queryset = Person.objects.all()
for person in queryset:
person.father # Foreign key relationship results in a database hit each iteration
# DO
queryset = Person.objects.all().select_related('father') # Foreign key object is included in query and cached
for person in queryset:
person.father # Hits the cache instead of the database
- This is something you will most likely run into, as trying to write clean code can often result in this pitfall. Using get() or evaluating a QuerySet in a loop can be very bad for performance. Instead, do what you can to do the database work before entering the loop.
# DON'T (contrived example)
filtered = Person.objects.filter(first_name='Shallan', last_name='Davar')
for age in range(18):
person = filtered.get(age=age) # Database query on each iteration
# DO (contrived example)
filtered = Person.objects.filter( # Narrow down the QuerySet to only what you need
first_name='Shallan',
last_name='Davar',
age_gte=0,
age_lte=18,
)
lookup = {person.age: person for person in filtered} # Evaluate the QuerySet and construct lookup
for age in range(18):
person = lookup[age] # No database query
- If you know your QuerySet could be very large, and you only need to iterate over it once, it makes sense to eliminate usage of the cache in order to preserve memory and other overhead. iterator() provides just this ability.
# Save memory by not caching anything
for person in Person.objects.iterator():
# Some logic
- Your database can do almost anything data-related much faster than Python can. If at all possible, do your work in the database. Django provides many tools to make this possible.
- Use filter() and exclude() for filtering:
# DON'T
for person in Person.objects.all():
if person.age >= 18:
# Do something
# DO
for person in Person.objects.filter(age__gte=18):
# Do something
- Use F expressions:
# DON'T
for person in Person.objects.all():
person.age += 1
person.save()
# DO
Person.objects.update(age=F('age') + 1)
- Do aggregation in the database:
# DON'T
max_age = 0
for person in Person.objects.all():
if person.age > max_age:
max_age = person.age
# DO
max_age = Person.objects.all().aggregate(Max('age'))['age__max']
- values() and values_list() provide lists, dictionaries, or tuples evaluating only the fields you specify.
- Use values():
# DON'T
age_lookup = {
person.name: person.age
for person in Person.objects.all()
}
# DO
age_lookup = {
person['name']: person['age']
for person in Person.objects.values('name', 'age')
}
- Use values_list():
# DON'T
person_ids = [person.id for person in Person.objects.all()]
# DO
person_ids = Person.objects.values_list('id', flat=True)
Caveats:
-
Use these in favor of values() when you need a QuerySet instead of a list of dicts.
-
May only make a difference if the fields you are excluding require a lot of processing to be converted to a Python object.
-
Use defer():
queryset = Person.objects.defer('age') # Imagine age is computationally expensive
for person in queryset:
print(person.id)
print(person.name)
- Use only():
queryset = Person.objects.only('name')
for person in queryset:
print(person.name)
Caveats:
- Only use these when you don’t need to evaluate the QuerySet for other reasons.
- Use count():
# DON'T
count = len(Person.objects.all()) # Evaluates the entire queryset
# DO
count = Person.objects.count() # Executes more efficient SQL to determine count
- Use exists():
# DON'T
exists = len(Person.objects.all()) > 0
# DO
exists = Person.objects.exists()
- Instead of updating model instances one at a time, delete() and update() allow you to do this in bulk.
- Use delete():
# DON'T
for person in Person.objects.all():
person.delete()
# DO
Person.objects.all().delete()
- Use update():
# DON'T
for person in Person.objects.all():
person.age = 0
person.save()
# DO
Person.objects.update(age=0)
Caveats:
- This works a bit differently than calling create().
- Read more about it in the Django docs.
names = ['Jeff', 'Beth', 'Tim']
creates = []
for name in names:
creates.append(
Person(name=name, age=0)
)
Person.objects.bulk_create(creates)
- Similarly, bulk-add to many-to-many fields:
person = Person.objects.get(id=1)
person.jobs.add(job1, job2, job3)
- The Django ORM automatically retrieves and caches foreign keys, so use them instead of causing a needless database query.
# DON'T
father_id = Person.objects.get(id=1).father.id # Causes a needless database query
# DO
father_id = Person.objects.get(id=1).father_id # The foreign key is already