Skip to main content

batch select many-to-many and one-to-many fields (to help avoid n+1 query problem)

Project description

====================
Django Batch Select
====================

The idea of Django Batch Select is to provide an equivalent to Django's select_related_ functionality. As of such it's another handy tool for avoiding the "n+1 query problem".

select_related_ is handy for minimizing the number of queries that need to be made in certain situations. However it is only usual for pre-selecting ForeignKey_ relations.

batch_select is handy for pre-selecting ManyToManyField_ relations and reverse ForeignKey_ relations.

It works by performing a single extra SQL query after a QuerySet_ has been evaluated to stitch in the the extra fields asked for. This requires the addition of a custom Manager_, which in turn returns a custom QuerySet_ with extra methods attached.

Example Usage
=============

Assuming we have models defined as the following:

::

from batch_select.models import BatchManager

class Tag(models.Model):
name = models.CharField(max_length=32)

class Section(models.Model):
name = models.CharField(max_length=32)

objects = BatchManager()

class Entry(models.Model):
title = models.CharField(max_length=255)
section = models.ForeignKey(Section, blank=True, null=True)
tags = models.ManyToManyField(Tag)

objects = BatchManager()

I'll also define a helper function to show the SQL queries generated:

::

from django import db

def show_queries():
for query in db.connection.queries:
print query["sql"]
db.reset_queries()

Here are a few example (with generated sql queries):

::

>>> Entry.objects.batch_select('tags').all()
[]
>>> show_queries() # no results, so no 2nd query
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
>>> Entry.objects.create()
>>> Entry.objects.create()
>>> tag1 = Tag.objects.create(name='tag1')
>>> tag2 = Tag.objects.create(name='tag2')
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entry.tags.add(tag1)
>>> db.reset_queries()
>>> entries = Entry.objects.batch_select('tags').all()
>>> entry = entries[0]
>>> print entry.tags_all
[<Tag: Tag object>]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry" LIMIT 1
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1)
>>> entries = Entry.objects.batch_select('tags').all()
>>> for entry in entries:
.... print entry.tags_all
....
[<Tag: Tag object>]
[]
>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT (`batch_select_entry_tags`.`entry_id`) AS "entry_id", "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags".entry_id IN (1, 2)

Re-running that same last for loop without using batch_select generate three queries instead of two (n+1 queries):

::

>>> entries = Entry.objects.all()
>>> for entry in entries:
.... print entry.tags.all()
....
[<Tag: Tag object>]
[]

>>> show_queries()
SELECT "batch_select_entry"."id", "batch_select_entry"."title", "batch_select_entry"."section_id" FROM "batch_select_entry"
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags"."entry_id" = 1
SELECT "batch_select_tag"."id", "batch_select_tag"."name" FROM "batch_select_tag" INNER JOIN "batch_select_entry_tags" ON ("batch_select_tag"."id" = "batch_select_entry_tags"."tag_id") WHERE "batch_select_entry_tags"."entry_id" = 2

More Advanced Usage
=========================

By default the batch fields are inserted into fields named <name>_all, on each object. So:

::

Entry.objects.batch_select('tags').all()

Results in the Entry instances having fields called 'tags_all' containing the Tag objects associated with that Entry.

If you want to give the field a different name just use a named parameter - in the same way as using the Aggregation_ API:

::

Entry.objects.batch_select(selected_tags='tags').all()

Would means the Tag objects would be assigned to fields called 'selected_tags'.

If you want to perform filtering of the related objects you will need to use a Batch object. By doing this you can pass extra named parameters in the same way as when using the filter method of a QuerySet:

::

from batch_select.models import Batch

Entry.objects.batch_select(tags_containing_blue=Batch('tags', name__contains='blue'))

Would return Entry objects with fields called 'tags_containing_name' with only those Tags whose name contains 'blue'.

In addition to filtering by names parameters you can also the following methods on a Batch object, with their effects being passed on to the underlying QuerySet_ object:

* filter_
* exclude_
* annotate_
* order_by_
* reverse_
* select_related_
* extra_
* defer_
* only_

(Note that distinct(), values() etc are not included as they would have side-effects on how the extra query is associated with the original query)
So for example to achieve the same effect as the filter above you could do the following:

::

from batch_select.models import Batch

Entry.objects.batch_select(tags_containing_blue=Batch('tags').filter(name__contains='blue'))

Whereas the following would exclude tags containing "blue" and order by name:

::

from batch_select.models import Batch

batch = Batch('tags').exclude(name__contains='blue').order_by('name')
Entry.objects.batch_select(tags_not_containing_blue=batch)


TODOs and BUGS
==============
See: http://github.com/lilspikey/django-batch-select/issues

.. _select_related: http://docs.djangoproject.com/en/dev/ref/models/querysets/#id4
.. _ForeignKey: http://docs.djangoproject.com/en/dev/ref/models/fields/#foreignkey
.. _ManyToManyField: http://docs.djangoproject.com/en/dev/ref/models/fields/#manytomanyfield
.. _QuerySet: http://docs.djangoproject.com/en/dev/ref/models/querysets/
.. _Manager: http://docs.djangoproject.com/en/dev/topics/db/managers/
.. _Aggregation: http://docs.djangoproject.com/en/dev/topics/db/aggregation/
.. _filter: http://docs.djangoproject.com/en/dev/ref/models/querysets/#filter-kwargs
.. _exclude: http://docs.djangoproject.com/en/dev/ref/models/querysets/#exclude-kwargs
.. _annotate: http://docs.djangoproject.com/en/dev/ref/models/querysets/#annotate-args-kwargs
.. _order_by: http://docs.djangoproject.com/en/dev/ref/models/querysets/#order-by-fields
.. _reverse: http://docs.djangoproject.com/en/dev/ref/models/querysets/#reverse
.. _extra: http://docs.djangoproject.com/en/dev/ref/models/querysets/#extra-select-none-where-none-params-none-tables-none-order-by-none-select-params-none
.. _defer: http://docs.djangoproject.com/en/dev/ref/models/querysets/#defer-fields
.. _only: http://docs.djangoproject.com/en/dev/ref/models/querysets/#only-fields

Project details


Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page