Haystack and Whoosh notes 2009/04/26
Real search is always better than running LIKE queries
from MySQL so today I picked up Haystack [1]
and Whoosh [2]. I chose this combination for
the low barrier to entry and the easy upgrade path should that be
required. Both are pure Python and speak setup.py.
The first problem I ran into has actually been fixed but not committed. The Gist embedded below patches Whoosh as recommended in the bug report [3]. The bug manifests itself as “IOError: [Errno 24] Too many open files” when you try to load even modestly sized datasets all at once. I can’t make my laptop give me more than 8192 file descriptors and my Slice will only give me 1024 so I could never see just how bad things got on a 1.5 million row sample. With the patch, though, everything is golden.
The second and last problem I encountered was more of a documentation problem. Some of the official tutorial is a bit overkill, so here’s the fastest get-up-and-go tutorial I can distill:
Add
'haystack',toINSTALLED_APPSin yoursettings.py. Also add these two lines to let Haystack know where to keep your Whoosh index files:HAYSTACK_SEARCH_ENGINE = 'whoosh' HAYSTACK_WHOOSH_PATH = '/path/to/server/writable/directory'
Add two lines to your global
urls.py:import haystack haystack.autodiscover()
Create a file called
search_indexes.pynext tomodels.py. This file will contain model-like classes defining your search schema. It is important to list every field you will want in your search results (the primary key, for example) in the search schema. The field defined withdocument=Trueand thepreparemethod determine the searchable data.Update: Daniel Lindsley pointed out that Haystack reserves
idfor itself so I’ve changed my example to use aslugfield. Same point applies, just don’t use anidfield in your subclasses ofSearchIndex.from haystack import indexes from haystack.sites import site from models import Foo class FooIndex(indexes.SearchIndex): text = indexes.CharField(document=True) slug = indexes.CharField(model_attr='slug') name = indexes.CharField(model_attr='name') city = indexes.CharField(model_attr='city') state = indexes.CharField(model_attr='state') def prepare(self, obj): self.prepared_data = super(FooIndex, self).prepare(obj) self.prepared_data['text'] = obj.name return self.prepared_data site.register(Foo, FooIndex)
I’ve called the indexable data “text” and use the
preparemethod to explicitly allow searching by name only. The official documentation ask for a template file to use during preparation but I think this is overkill.Replace your old ORM-based search view with something like this:
from haystack.views import SearchView def search(req): return SearchView(template='search.html')(req)
Replace your search page’s template with something like this:
{% extends 'layout.html' %} {% url core.views.search as base %} {% block content %} <form action="{{ base }}" method="get"> <h1><label for="query">{% block title %}Search{% endblock %}</label> for <input id="query" name="query" type="text" value="{{ query }}" /> <input type="submit" value="Search" class="button" /></h1> </form> {% if page.object_list %} <ol start="{{ page.start_index }}"> {% for o in page.object_list %} <li><a href="{{ base }}/{{ o.slug }}">{{ o.name }}</a></li> {% endfor %} </ol> <p>Page {{ page.number }} of {{ page.paginator.num_pages }}</p> <ul> {% if page.has_previous %} <li><a href="{{ base }}?query={{ query|urlencode }}&page={{ page.previous_page_number }}">← Previous</a></li> {% endif %} {% if page.has_next %} <li><a href="{{ base }}?query={{ query|urlencode }}&page={{ page.next_page_number }}">Next →</a></li> {% endif %} </ul> {% else %} {% if query %} <p>We couldn’t find anything named <strong>{{ query }}</strong></p> {% endif %} {% endif %} {% endblock %}The view gets the
query, apagefrom a regular Django Paginator and thepaginatoritself. Aformcomes along too but I prefer to ignore this. If you defined your own User model that lives atreq.user, it must implementget_and_delete_messages[4] becausedjango.contrib.auth.models.Userleaks intodjango.core.context_preprocessorsa bit.
Here’s the previously mentioned patch-and-install script
(Updated to reflect bugfixes merged into the trunk of
Whoosh!):