Haystack and Whoosh notes 2009/04/26
Real search is always better than running LIKE
queries
from MySQL so today I picked up Haystack [1]
and Whoosh [2]. I chose this combination for
the low barrier to entry and the easy upgrade path should that be
required. Both are pure Python and speak setup.py
.
The first problem I ran into has actually been fixed but not committed. The Gist embedded below patches Whoosh as recommended in the bug report [3]. The bug manifests itself as “IOError: [Errno 24] Too many open files” when you try to load even modestly sized datasets all at once. I can’t make my laptop give me more than 8192 file descriptors and my Slice will only give me 1024 so I could never see just how bad things got on a 1.5 million row sample. With the patch, though, everything is golden.
The second and last problem I encountered was more of a documentation problem. Some of the official tutorial is a bit overkill, so here’s the fastest get-up-and-go tutorial I can distill:
Add
'haystack',
toINSTALLED_APPS
in yoursettings.py
. Also add these two lines to let Haystack know where to keep your Whoosh index files:HAYSTACK_SEARCH_ENGINE = 'whoosh' HAYSTACK_WHOOSH_PATH = '/path/to/server/writable/directory'
Add two lines to your global
urls.py
:import haystack haystack.autodiscover()
Create a file called
search_indexes.py
next tomodels.py
. This file will contain model-like classes defining your search schema. It is important to list every field you will want in your search results (the primary key, for example) in the search schema. The field defined withdocument=True
and theprepare
method determine the searchable data.Update: Daniel Lindsley pointed out that Haystack reserves
id
for itself so I’ve changed my example to use aslug
field. Same point applies, just don’t use anid
field in your subclasses ofSearchIndex
.from haystack import indexes from haystack.sites import site from models import Foo class FooIndex(indexes.SearchIndex): text = indexes.CharField(document=True) slug = indexes.CharField(model_attr='slug') name = indexes.CharField(model_attr='name') city = indexes.CharField(model_attr='city') state = indexes.CharField(model_attr='state') def prepare(self, obj): self.prepared_data = super(FooIndex, self).prepare(obj) self.prepared_data['text'] = obj.name return self.prepared_data site.register(Foo, FooIndex)
I’ve called the indexable data “text” and use the
prepare
method to explicitly allow searching by name only. The official documentation ask for a template file to use during preparation but I think this is overkill.Replace your old ORM-based search view with something like this:
from haystack.views import SearchView def search(req): return SearchView(template='search.html')(req)
Replace your search page’s template with something like this:
{% extends 'layout.html' %} {% url core.views.search as base %} {% block content %} <form action="{{ base }}" method="get"> <h1><label for="query">{% block title %}Search{% endblock %}</label> for <input id="query" name="query" type="text" value="{{ query }}" /> <input type="submit" value="Search" class="button" /></h1> </form> {% if page.object_list %} <ol start="{{ page.start_index }}"> {% for o in page.object_list %} <li><a href="{{ base }}/{{ o.slug }}">{{ o.name }}</a></li> {% endfor %} </ol> <p>Page {{ page.number }} of {{ page.paginator.num_pages }}</p> <ul> {% if page.has_previous %} <li><a href="{{ base }}?query={{ query|urlencode }}&page={{ page.previous_page_number }}">← Previous</a></li> {% endif %} {% if page.has_next %} <li><a href="{{ base }}?query={{ query|urlencode }}&page={{ page.next_page_number }}">Next →</a></li> {% endif %} </ul> {% else %} {% if query %} <p>We couldn’t find anything named <strong>{{ query }}</strong></p> {% endif %} {% endif %} {% endblock %}
The view gets the
query
, apage
from a regular Django Paginator and thepaginator
itself. Aform
comes along too but I prefer to ignore this. If you defined your own User model that lives atreq.user
, it must implementget_and_delete_messages
[4] becausedjango.contrib.auth.models.User
leaks intodjango.core.context_preprocessors
a bit.
Here’s the previously mentioned patch-and-install script
(Updated to reflect bugfixes merged into the trunk of
Whoosh!):