#33459: Explain how to optimize full text search with SearchVectorField and
GinIndex
-------------------------------------+-------------------------------------
Reporter: Thomas Aglassinger | Owner: nobody
Type: | Status: new
Cleanup/optimization |
Component: Documentation | Version: 4.0
Severity: Normal | Resolution:
Keywords: postgres | Triage Stage:
| Unreviewed
Has patch: 1 | Needs documentation: 0
Needs tests: 0 | Patch needs improvement: 0
Easy pickings: 0 | UI/UX: 0
-------------------------------------+-------------------------------------
Old description:
> The current documentation section on `SearchVectorField` at
> https://docs.djangoproject.com/en/dev/ref/contrib/postgres/search/#searchvectorfield
> does not explain how to use `GinIndex` or `GistIndex` to increase the
> performance of the search. It currently only describes how to add a
> `SearchVectorField`. To my understaning this somewhat does improve the
> performance on a linear scale by removing the need to parse the fields to
> search with Postgres' full text search parser. However also indexing this
> field would typically improve performance by a mangitude.
>
> I eventually managed to piece this together from an article found at
> http://logan.tw/posts/2017/12/30/full-text-search-with-django-and-
> postgresql/ but believe this fairly standard use case should be covered
> in the Django documentation already.
>
> So I propose to add a few paragraphs that show how to add a
> `SearchVectorField` to a model with a `GinIndex`, compute a search vector
> from multiple fields and then perform a ranked search on it.
>
> I don't consider the current patch to be final, things to discuss:
> - Should the section on "SearchVectorField" be ranamed to
> "SearchVectorField and indexing"?
> - Should the section on "Performance" be included into the section on
> "SearchVectorField"? Currently it describes the problem well but I found
> the solution of pointing to the Postgres documentation unhelpful. If
> GinIndex is mention later anyway, the pointer to the postgres
> documentation could be added afterwards for further reading.
> - Is it alright to extend the `Entry` model from the previous chapter, or
> should I add a separate model like `SearchableEntry`? The first approach
> might confuse readers if they skim over the part where `Entry` gets
> redefined and think it's the same model as in other chapters.
>
> Also it might be helpful to include a "full text search how-to" for
> example describing how to efficiently search a database of news articles
> in multiple languages. While the current reference documentation explains
> search configurations well enough, the later examples (rightfully) omit
> it to keep the explanations focused. This however limits their usefulness
> for skimming and copying the examples.
>
> If you are interested, I could write such a how-to.
New description:
The current documentation section on `SearchVectorField` at
https://docs.djangoproject.com/en/dev/ref/contrib/postgres/search/#searchvectorfield
does not explain how to use `GinIndex` or `GistIndex` to increase the
performance of the search. It currently only describes how to add a
`SearchVectorField`. To my understaning this somewhat does improve the
performance on a linear scale by removing the need to parse the fields to
search with Postgres' full text search parser. However also indexing this
field would typically improve performance by a mangitude.
I eventually managed to piece this together from an article found at
http://logan.tw/posts/2017/12/30/full-text-search-with-django-and-
postgresql/ but believe this fairly standard use case should be covered in
the Django documentation already.
So I propose to add a few paragraphs that show how to add a
`SearchVectorField` to a model with a `GinIndex`, compute a search vector
from multiple fields and then perform a ranked search on it.
For the related pull request, see
<https://github.com/django/django/pull/15350>
I don't consider the current patch to be final, things to discuss:
- Should the section on "SearchVectorField" be ranamed to
"SearchVectorField and indexing"?
- Should the section on "Performance" be included into the section on
"SearchVectorField"? Currently it describes the problem well but I found
the solution of pointing to the Postgres documentation unhelpful. If
GinIndex is mention later anyway, the pointer to the postgres
documentation could be added afterwards for further reading.
- Is it alright to extend the `Entry` model from the previous chapter, or
should I add a separate model like `SearchableEntry`? The first approach
might confuse readers if they skim over the part where `Entry` gets
redefined and think it's the same model as in other chapters.
Also it might be helpful to include a "full text search how-to" for
example describing how to efficiently search a database of news articles
in multiple languages. While the current reference documentation explains
search configurations well enough, the later examples (rightfully) omit it
to keep the explanations focused. This however limits their usefulness for
skimming and copying the examples.
If you are interested, I could write such a how-to.
--
Comment (by Thomas Aglassinger):
Added link to PR: https://github.com/django/django/pull/15350
--
Ticket URL: <https://code.djangoproject.com/ticket/33459#comment:1>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
--
You received this message because you are subscribed to the Google Groups
"Django updates" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
To view this discussion on the web visit
https://groups.google.com/d/msgid/django-updates/067.15a4911ea1a3741b662729e3f74f018b%40djangoproject.com.