Hi, here is a first version of the new mandoc.db(5) file format, the program makewhatis(8) writing it, and the programs man(1) and apropos(1) reading it. This is not yet intended for commit, i have only done insufficient testing, so i'm sure there are still bugs. But i want to show it around early, feedback usually improves development. Kraut backup, too.
The main goal is to get away without SQLite in order to become able to remove that growing beast to ports land. I hope to commit right after unlock, such that SQLite removal can be done quickly and lots of time remains for ports to stabilize before coming anywhere close to 6.1. At this point, i'm not aware of any loss of functionality. Normal man(1) - including the priority of file names over NAME names over SYNOPSIS names -, normal apropos(1), semantic apropos including regular expressions, complex apropos(1) queries with or- and and-operators and parentheses, normal makewhatis(8) and incremental makewhatis(8) -u and -d all seem to work. If you want to play, you can use the old man(1) and the new one in parallel and compare the results provided by both. Until we put it in, the new one uses the file name "mandoc.new.db" instead of "mandoc.db". You can do something like this (at your own risk): cd /usr/src cp -pR mandoc mandoc.new cd mandoc.new rm obj make cleandir patch < ... make obj make depend make cd obj/ ln mandoc man ln mandoc apropos ln mandoc makewhatis doas ./makewhatis # If you trust me. I guarantee nothing but # that it won't break with your system man(1). ./man foo ./apropos bar Here are a few comparisons: Source code size: libsqlite3/src 119 *.c files with 151,733 lines 25 *.h files with 17,098 lines new code 7 *.c files with 1,582 lines (-99.0%) 5 *.h files with 217 lines (-98.7%) makewhatis old 2507 lines; new 2232 lines (-11%) apropos old 840 lines; new 773 lines (-8%) The new glue code is not only smaller by 350 lines but also much easier to understand. The old code constructing SQL query strings with strlcat(3) was a nightmarish mixture of programming languages. Database sizes (with a random set of ports installed): /usr/share/man old 3.87 MB new 1.66 MB (-57%) /usr/X11R6/man old 0.39 MB new 0.18 MB (-53%) /usr/local/man old 1.39 MB new 0.76 MB (-45%) /usr/share/man -Q old 0.99 MB new 0.41 MB (-59%) /usr/X11R6/man -Q old 0.38 MB new 0.18 MB (-54%) /usr/local/man -Q old 1.26 MB new 0.68 MB (-46%) Database build times (examples on my old i386 notebook): /usr/share/man old 17.96s new 26.52s (+48%) /usr/X11R6/man old 2.86s new 1.60s (-44%) /usr/local/man old 8.77s new 6.11s (-30%) /usr/share/man -Q old 2.28s new 1.41s (-38%) /usr/X11R6/man -Q old 1.04s new 0.44s (-58%) /usr/local/man -Q old 2.13s new 1.24s (-42%) Database editing times: time doas makewhatis -d /usr/share/man man1/cat.1 old 50ms new 1180ms (+2350%) Actually, that's expected: SQLite is a professional database, so it is optimized for quickly doing small changes in large files. But fortunately, we don't need high performance for that use case, it only occurs once at the end of each run of pkg_add(1), and spending a second there is hardly a problem. Besides, as soon as the number of files grows, the benefit of SQLite appears to vanish: cd /usr/share/man ls man2/*.2 | time xargs doas makewhatis -d . old 2.01s new 1.75s (-13%) ls man3/*.3 | time xargs doas makewhatis -d . old 10.9s new 4.86 (-55%) So for a port with about 100 manuals, the new code already catches up with the old code, and when installing a bunch of ports containing about 1000 manuals, the advantage of the new code is roughly the same as when rebuilding the database from scratch. Lookup times by name: time ls /usr/share/man/man*/*.[1-9] | \ sed 's/.*\///;s/\.[^.]*$//' | xargs -n 1 man -w old 54.28s new 26.38s (-51%) Time to build a huge search result: time man -k Nd~. > /dev/null old 1.65s new 0.13s (-92%) Time for searching by macro values: time man -k Xr=pledge old 125ms new 8.4ms (-93%) Complex search query: time man -s 4 -k \( virtual -a arch=sparc64 \) -o An=kettenis old 24s new 2s (-92%) Note that these comparisons are not quite fair. I did NOT develop this code optimized for size or speed efficiency, but for simplicity, and i did not do any substantial work on any kind of optimization yet. But back in 2014, i spent several weeks optimizing the code for SQLite for database size, for database build time, and for lookup time. Besides, all these are just random examples, i did not search for cases that might make either the old or the new code look silly. But i believe this indicates that the new concept might be viable. A few notes on the code: * I did not go back to the ancient dbopen(3) code. I did not use any other third-party database engine. Not only because the last try of the kind with SQLite turned out to be less than pleasing in the end, but also because there is not much need. Why import a third-party project for something that can be done with on the order of a thousand lines of pure POSIX code? * The concept is designed to be most efficient for read-only access because that's what users do most of the time, acceptably efficient for creating databases, and deliberately not optimized for editing existing databases. * The code deliberately supports very few data types in the database: Basically only int32_t (for counters and pointers), strings, and pointers to structs containing pointers. More simply isn't needed, and stopping there limits complexity. * There are no fancy algorithms in the database code whatsoever, only linear searches, simply because anything more complicated wouldn't help in practice. Almost all practical searches use substring matching (and some even regex matching), and none of these allow optimizations using binary, tree-based, or hashed searches. * As before, the high-level code in makewhatis(8) and apropos(1) proper continues to use ohash(3) for assembly of information and deduplication where appropriate, but the new mini-database code does not need or use ohash(3). * The files dba* implement an allocation-based version of the database needed for read-write access, based on system malloc(3). * The files dbm* implement an mmap(2)-based version of the database for quick read-only access to parts of files without reading them completely, also profiting from the buffer cache for keeping those parts around. * The files db[am].[hc] and dba_read.c contain mandoc-specific code. * The files db[am]_(array|write|map).[hc] contain generic, potentially reusable code. That's not very important though, these files only have 500 lines grand total. * The file dbm_dump.c may eventually go away after sufficient debugging. * The database format uses network byte order, or more specifically, big endian 32 bit signed integers throughout, both for countable quantities and to represent pointers. So i hope the database files can be shared across architectures. Using 16 bit integers would have been insufficient for common database sizes. Using 64 bit integers would have been nothing but a waste. Anything in between would have been gratuitiously asking for alignment trouble. * I generally tried to avoid unsigned types whereever possible, except that of course i represent strings using the native char type (even if unsigned) and that i use size_t in a few places where POSIX functions like strlen(3) require it. * Do any conventions exist for choosing magic bytes? See dba.c, function dba_write(), and dbm_map.c, function dbm_map(). * The code for atomically replacing a database can be centralized into one single function, dbwrite(), and is no longer spread out all across mandocdb.c. * The function names_check() is no longer useful after getting rid of MLINKS and can be deleted without replacement. * The function exprdump() will of course go away when debugging of the new recursive descent expression parser is done. Any thoughts? Ingo Index: Makefile =================================================================== RCS file: /cvs/src/usr.bin/mandoc/Makefile,v retrieving revision 1.101 diff -u -p -r1.101 Makefile --- Makefile 30 Mar 2016 06:38:46 -0000 1.101 +++ Makefile 1 Jul 2016 03:19:31 -0000 @@ -4,7 +4,7 @@ CFLAGS += -W -Wall -Wstrict-prototypes -Wno-unused-parameter DPADD += ${LIBUTIL} -LDADD += -lsqlite3 -lutil -lz +LDADD += -lutil -lz SRCS= mandoc.c mandoc_aux.c mandoc_ohash.c preconv.c read.c \ roff.c tbl.c tbl_opts.c tbl_layout.c tbl_data.c eqn.c @@ -15,7 +15,9 @@ SRCS+= main.c mdoc_term.c tag.c chars.c SRCS+= mdoc_man.c SRCS+= html.c mdoc_html.c man_html.c out.c eqn_html.c SRCS+= term_ps.c term_ascii.c tbl_term.c tbl_html.c -SRCS+= manpath.c mandocdb.c mansearch_const.c mansearch.c +SRCS+= dbm_map.c dbm.c dbm_dump.c +SRCS+= dba_write.c dba_array.c dba.c dba_read.c +SRCS+= manpath.c mandocdb.c mansearch.c PROG= mandoc Index: dba.c =================================================================== RCS file: dba.c diff -N dba.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dba.c 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,401 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Allocation-based version of the mandoc database, for read-write access. + * The interface is defined in "dba.h". + */ +#include <sys/types.h> +#include <errno.h> +#include <stdint.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include "mandoc_aux.h" +#include "mansearch.h" +#include "dba_write.h" +#include "dba_array.h" +#include "dba.h" + +static void *prepend(const char *, char); +static void dba_pages_write(struct dba_array *); +static void dba_macros_write(struct dba_array *); +static void dba_macro_write(struct dba_array *); + + +/*** top-level functions **********************************************/ + +struct dba * +dba_new(int32_t npages) +{ + struct dba *dba; + int32_t im; + + dba = mandoc_malloc(sizeof(*dba)); + dba->pages = dba_array_new(npages, DBA_GROW); + dba->macros = dba_array_new(MACRO_MAX, 0); + for (im = 0; im < MACRO_MAX; im++) + dba_array_set(dba->macros, im, dba_array_new(128, DBA_GROW)); + return dba; +} + +void +dba_free(struct dba *dba) +{ + struct dba_array *page, *macro, *entry; + + dba_array_FOREACH(dba->macros, macro) { + dba_array_undel(macro); + dba_array_FOREACH(macro, entry) { + free(dba_array_get(entry, 0)); + dba_array_free(dba_array_get(entry, 1)); + dba_array_free(entry); + } + dba_array_free(macro); + } + dba_array_free(dba->macros); + + dba_array_undel(dba->pages); + dba_array_FOREACH(dba->pages, page) { + dba_array_free(dba_array_get(page, DBP_NAME)); + dba_array_free(dba_array_get(page, DBP_SECT)); + dba_array_free(dba_array_get(page, DBP_ARCH)); + free(dba_array_get(page, DBP_DESC)); + dba_array_free(dba_array_get(page, DBP_FILE)); + dba_array_free(page); + } + dba_array_free(dba->pages); + + free(dba); +} + +/* + * Write the complete mandoc database to disk; the format is: + * - One integer each for magic and version. + * - One pointer each to the macros table and to the final magic. + * - The pages table. + * - The macros table. + * - And at the very end, the magic integer again. + */ +int +dba_write(const char *fname, struct dba *dba) +{ + int save_errno; + int32_t pos_end, pos_macros, pos_macros_ptr; + + if (dba_open(fname) == -1) + return -1; + dba_int_write(MANDOCDB_MAGIC); + dba_int_write(MANDOCDB_VERSION); + pos_macros_ptr = dba_skip(1, 2); + dba_pages_write(dba->pages); + pos_macros = dba_tell(); + dba_macros_write(dba->macros); + pos_end = dba_tell(); + dba_int_write(MANDOCDB_MAGIC); + dba_seek(pos_macros_ptr); + dba_int_write(pos_macros); + dba_int_write(pos_end); + if (dba_close() == -1) { + save_errno = errno; + unlink(fname); + errno = save_errno; + return -1; + } + return 0; +} + + +/*** functions for handling pages *************************************/ + +/* + * Create a new page and append it to the pages table. + */ +struct dba_array * +dba_page_new(struct dba_array *pages, const char *name, const char *sect, + const char *arch, const char *desc, const char *file, enum form form) +{ + struct dba_array *page, *entry; + + page = dba_array_new(DBP_MAX, 0); + entry = dba_array_new(1, DBA_STR | DBA_GROW); + dba_array_add(entry, prepend(name, NAME_FILE & NAME_MASK)); + dba_array_add(page, entry); + entry = dba_array_new(1, DBA_STR | DBA_GROW); + dba_array_add(entry, (void *)sect); + dba_array_add(page, entry); + if (arch != NULL && *arch != '\0') { + entry = dba_array_new(1, DBA_STR | DBA_GROW); + dba_array_add(entry, (void *)arch); + } else + entry = NULL; + dba_array_add(page, entry); + dba_array_add(page, mandoc_strdup(desc)); + entry = dba_array_new(1, DBA_STR | DBA_GROW); + dba_array_add(entry, prepend(file, form)); + dba_array_add(page, entry); + dba_array_add(pages, page); + return page; +} + +/* + * Add a section, architecture, or file name to an existing page. + * Passing the NULL pointer for the architecture makes the page MI. + * In that case, any earlier or later architectures are ignored. + */ +void +dba_page_add(struct dba_array *page, int32_t ie, const char *str) +{ + struct dba_array *entries; + char *entry; + + entries = dba_array_get(page, ie); + if (ie == DBP_ARCH) { + if (entries == NULL) + return; + if (str == NULL) { + dba_array_free(entries); + dba_array_set(page, DBP_ARCH, NULL); + return; + } + } + if (*str == '\0') + return; + dba_array_FOREACH(entries, entry) + if (strcmp(entry, str) == 0) + return; + dba_array_add(entries, (void *)str); +} + +/* + * Add an additional name to an existing page. + */ +void +dba_page_alias(struct dba_array *page, const char *name, uint64_t mask) +{ + struct dba_array *entries; + char *entry; + char maskbyte; + + if (*name == '\0') + return; + maskbyte = mask & NAME_MASK; + entries = dba_array_get(page, DBP_NAME); + dba_array_FOREACH(entries, entry) { + if (strcmp(entry + 1, name) == 0) { + *entry |= maskbyte; + return; + } + } + dba_array_add(entries, prepend(name, maskbyte)); +} + +/* + * Return a pointer to a temporary copy of instr with inbyte prepended. + */ +static void * +prepend(const char *instr, char inbyte) +{ + static char *outstr = NULL; + static size_t outlen = 0; + size_t newlen; + + newlen = strlen(instr); + if (newlen > outlen) + outstr = mandoc_realloc(outstr, newlen + 2); + *outstr = inbyte; + memcpy(outstr + 1, instr, newlen + 1); + return outstr; +} + +/* + * Write the pages table to disk; the format is: + * - One integer containing the number of pages. + * - For each page, five pointers to the names, sections, + * architectures, description, and file names of the page. + * MI pages write 0 instead of the architecture pointer. + * - One list each for names, sections, architectures, descriptions and + * file names. The description for each page ends with a NUL byte. + * For all the other lists, each string ends with a NUL byte, + * and the last string for a page ends with two NUL bytes. + * - To assure alignment of following integers, + * the end is padded with NUL bytes up to a multiple of four bytes. + */ +static void +dba_pages_write(struct dba_array *pages) +{ + struct dba_array *page; + void *entry; + int32_t pos_pages, pos_end; + + pos_pages = dba_array_writelen(pages, 5); + dba_array_FOREACH(pages, page) { + dba_array_setpos(page, DBP_NAME, dba_tell()); + dba_array_writelst(dba_array_get(page, DBP_NAME)); + } + dba_array_FOREACH(pages, page) { + dba_array_setpos(page, DBP_SECT, dba_tell()); + dba_array_writelst(dba_array_get(page, DBP_SECT)); + } + dba_array_FOREACH(pages, page) { + if ((entry = dba_array_get(page, DBP_ARCH)) != NULL) { + dba_array_setpos(page, DBP_ARCH, dba_tell()); + dba_array_writelst(entry); + } else + dba_array_setpos(page, DBP_ARCH, 0); + } + dba_array_FOREACH(pages, page) { + dba_array_setpos(page, DBP_DESC, dba_tell()); + dba_str_write(dba_array_get(page, DBP_DESC)); + } + dba_array_FOREACH(pages, page) { + dba_array_setpos(page, DBP_FILE, dba_tell()); + dba_array_writelst(dba_array_get(page, DBP_FILE)); + } + pos_end = dba_align(); + dba_seek(pos_pages); + dba_array_FOREACH(pages, page) + dba_array_writepos(page); + dba_seek(pos_end); +} + + +/*** functions for handling macros ************************************/ + +/* + * Create a new macro entry and append it to one of the macro tables. + */ +void +dba_macro_new(struct dba *dba, int32_t im, const char *value, + const int32_t *pp) +{ + struct dba_array *entry, *pages; + const int32_t *ip; + int32_t np; + + np = 0; + for (ip = pp; *ip; ip++) + np++; + pages = dba_array_new(np, DBA_GROW); + for (ip = pp; *ip; ip++) + dba_array_add(pages, dba_array_get(dba->pages, + be32toh(*ip) / 5 / sizeof(*ip) - 1)); + + entry = dba_array_new(2, 0); + dba_array_add(entry, mandoc_strdup(value)); + dba_array_add(entry, pages); + + dba_array_add(dba_array_get(dba->macros, im), entry); +} + +/* + * Look up a macro entry by value and add a reference to a new page to it. + * If the value does not yet exist, create a new macro entry + * and add it to the macro table in question. + */ +void +dba_macro_add(struct dba_array *macros, int32_t im, const char *value, + struct dba_array *page) +{ + struct dba_array *macro, *entry, *pages; + + if (*value == '\0') + return; + macro = dba_array_get(macros, im); + dba_array_FOREACH(macro, entry) + if (strcmp(value, dba_array_get(entry, 0)) == 0) + break; + if (entry == NULL) { + entry = dba_array_new(2, 0); + dba_array_add(entry, mandoc_strdup(value)); + pages = dba_array_new(1, DBA_GROW); + dba_array_add(entry, pages); + dba_array_add(macro, entry); + } else + pages = dba_array_get(entry, 1); + dba_array_add(pages, page); +} + +/* + * Write the macros table to disk; the format is: + * - The number of macro tables (actually, MACRO_MAX). + * - That number of pointers to the individual macro tables. + * - The individual macro tables. + */ +static void +dba_macros_write(struct dba_array *macros) +{ + struct dba_array *macro; + int32_t im, pos_macros, pos_end; + + pos_macros = dba_array_writelen(macros, 1); + im = 0; + dba_array_FOREACH(macros, macro) { + dba_array_setpos(macros, im++, dba_tell()); + dba_macro_write(macro); + } + pos_end = dba_tell(); + dba_seek(pos_macros); + dba_array_writepos(macros); + dba_seek(pos_end); +} + +/* + * Write one individual macro table to disk; the format is: + * - The number of entries in the table. + * - For each entry, two pointers, the first one to the value + * and the second one to the list of pages. + * - A list of values, each ending in a NUL byte. + * - To assure alignment of following integers, + * padding with NUL bytes up to a multiple of four bytes. + * - A list of pointers to pages, each list ending in a 0 integer. + */ +static void +dba_macro_write(struct dba_array *macro) +{ + struct dba_array *entry, *pages, *page; + int empty; + int32_t addr, pos_macro, pos_end; + + dba_array_FOREACH(macro, entry) { + pages = dba_array_get(entry, 1); + empty = 1; + dba_array_FOREACH(pages, page) + if (dba_array_getpos(page)) + empty = 0; + if (empty) + dba_array_del(macro); + } + pos_macro = dba_array_writelen(macro, 2); + dba_array_FOREACH(macro, entry) { + dba_array_setpos(entry, 0, dba_tell()); + dba_str_write(dba_array_get(entry, 0)); + } + dba_align(); + dba_array_FOREACH(macro, entry) { + dba_array_setpos(entry, 1, dba_tell()); + pages = dba_array_get(entry, 1); + dba_array_FOREACH(pages, page) + if ((addr = dba_array_getpos(page))) + dba_int_write(addr); + dba_int_write(0); + } + pos_end = dba_tell(); + dba_seek(pos_macro); + dba_array_FOREACH(macro, entry) + dba_array_writepos(entry); + dba_seek(pos_end); +} Index: dba.h =================================================================== RCS file: dba.h diff -N dba.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dba.h 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,51 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Public interface of the allocation-based version + * of the mandoc database, for read-write access. + * To be used by dba.c, dba_read.c, and makewhatis(8). + */ + +#define DBP_NAME 0 +#define DBP_SECT 1 +#define DBP_ARCH 2 +#define DBP_DESC 3 +#define DBP_FILE 4 +#define DBP_MAX 5 + +struct dba_array; + +struct dba { + struct dba_array *pages; + struct dba_array *macros; +}; + + +struct dba *dba_new(int32_t); +void dba_free(struct dba *); +struct dba *dba_read(const char *); +int dba_write(const char *, struct dba *); + +struct dba_array *dba_page_new(struct dba_array *, const char *, + const char *, const char *, const char *, + const char *, enum form); +void dba_page_add(struct dba_array *, int32_t, const char *); +void dba_page_alias(struct dba_array *, const char *, uint64_t); + +void dba_macro_new(struct dba *, int32_t, + const char *, const int32_t *); +void dba_macro_add(struct dba_array *, int32_t, + const char *, struct dba_array *); Index: dba_array.c =================================================================== RCS file: dba_array.c diff -N dba_array.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dba_array.c 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,181 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Allocation-based arrays for the mandoc database, for read-write access. + * The interface is defined in "dba_array.h". + */ +#include <assert.h> +#include <stdint.h> +#include <stdlib.h> +#include <string.h> + +#include "mandoc_aux.h" +#include "dba_write.h" +#include "dba_array.h" + +struct dba_array { + void **ep; /* Array of entries. */ + int32_t *em; /* Array of map positions. */ + int flags; + int32_t ea; /* Entries allocated. */ + int32_t eu; /* Entries used (including deleted). */ + int32_t ed; /* Entries deleted. */ + int32_t ec; /* Currently active entry. */ + int32_t pos; /* Map position of this array. */ +}; + + +struct dba_array * +dba_array_new(int32_t ea, int flags) +{ + struct dba_array *array; + + assert(ea > 0); + array = mandoc_malloc(sizeof(*array)); + array->ep = mandoc_reallocarray(NULL, ea, sizeof(*array->ep)); + array->em = mandoc_reallocarray(NULL, ea, sizeof(*array->em)); + array->ea = ea; + array->eu = 0; + array->ed = 0; + array->ec = 0; + array->flags = flags; + array->pos = 0; + return array; +} + +void +dba_array_free(struct dba_array *array) +{ + int32_t ie; + + if (array == NULL) + return; + if (array->flags & DBA_STR) + for (ie = 0; ie < array->eu; ie++) + free(array->ep[ie]); + free(array->ep); + free(array->em); + free(array); +} + +void +dba_array_set(struct dba_array *array, int32_t ie, void *entry) +{ + assert(ie >= 0); + assert(ie < array->ea); + assert(ie <= array->eu); + if (ie == array->eu) + array->eu++; + if (array->flags & DBA_STR) + entry = mandoc_strdup(entry); + array->ep[ie] = entry; + array->em[ie] = 0; +} + +void +dba_array_add(struct dba_array *array, void *entry) +{ + if (array->eu == array->ea) { + assert(array->flags & DBA_GROW); + array->ep = mandoc_reallocarray(array->ep, + 2, sizeof(*array->ep) * array->ea); + array->em = mandoc_reallocarray(array->em, + 2, sizeof(*array->em) * array->ea); + array->ea *= 2; + } + dba_array_set(array, array->eu, entry); +} + +void * +dba_array_get(struct dba_array *array, int32_t ie) +{ + if (ie < 0 || ie >= array->eu || array->em[ie] == -1) + return NULL; + return array->ep[ie]; +} + +void +dba_array_start(struct dba_array *array) +{ + array->ec = array->eu; +} + +void * +dba_array_next(struct dba_array *array) +{ + if (array->ec < array->eu) + array->ec++; + else + array->ec = 0; + while (array->ec < array->eu && array->em[array->ec] == -1) + array->ec++; + return array->ec < array->eu ? array->ep[array->ec] : NULL; +} + +void +dba_array_del(struct dba_array *array) +{ + if (array->ec < array->eu && array->em[array->ec] != -1) { + array->em[array->ec] = -1; + array->ed++; + } +} + +void +dba_array_undel(struct dba_array *array) +{ + memset(array->em, 0, sizeof(*array->em) * array->eu); +} + +void +dba_array_setpos(struct dba_array *array, int32_t ie, int32_t pos) +{ + array->em[ie] = pos; +} + +int32_t +dba_array_getpos(struct dba_array *array) +{ + return array->pos; +} + +int32_t +dba_array_writelen(struct dba_array *array, int32_t nmemb) +{ + dba_int_write(array->eu - array->ed); + return dba_skip(nmemb, array->eu - array->ed); +} + +void +dba_array_writepos(struct dba_array *array) +{ + int32_t ie; + + array->pos = dba_tell(); + for (ie = 0; ie < array->eu; ie++) + if (array->em[ie] != -1) + dba_int_write(array->em[ie]); +} + +void +dba_array_writelst(struct dba_array *array) +{ + const char *str; + + dba_array_FOREACH(array, str) + dba_str_write(str); + dba_char_write('\0'); +} Index: dba_array.h =================================================================== RCS file: dba_array.h diff -N dba_array.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dba_array.h 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,44 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Public interface for allocation-based arrays + * for the mandoc database, for read-write access. + * To be used by dba*.c and by makewhatis(8). + */ + +struct dba_array; + +#define DBA_STR 0x01 /* Map contains strings, not pointers. */ +#define DBA_GROW 0x02 /* Allow the array to grow. */ + +#define dba_array_FOREACH(a, e) \ + dba_array_start(a); \ + while (((e) = dba_array_next(a)) != NULL) + +struct dba_array *dba_array_new(int32_t, int); +void dba_array_free(struct dba_array *); +void dba_array_set(struct dba_array *, int32_t, void *); +void dba_array_add(struct dba_array *, void *); +void *dba_array_get(struct dba_array *, int32_t); +void dba_array_start(struct dba_array *); +void *dba_array_next(struct dba_array *); +void dba_array_del(struct dba_array *); +void dba_array_undel(struct dba_array *); +void dba_array_setpos(struct dba_array *, int32_t, int32_t); +int32_t dba_array_getpos(struct dba_array *); +int32_t dba_array_writelen(struct dba_array *, int32_t); +void dba_array_writepos(struct dba_array *); +void dba_array_writelst(struct dba_array *); Index: dba_read.c =================================================================== RCS file: dba_read.c diff -N dba_read.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dba_read.c 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,74 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Function to read the mandoc database from disk into RAM, + * such that data can be added or removed. + * The interface is defined in "dba.h". + * This file is seperate from dba.c because this also uses "dbm.h". + */ +#include <regex.h> +#include <stdint.h> +#include <stdlib.h> +#include <stdio.h> +#include <string.h> + +#include "mandoc_aux.h" +#include "mansearch.h" +#include "dba_array.h" +#include "dba.h" +#include "dbm.h" + + +struct dba * +dba_read(const char *fname) +{ + struct dba *dba; + struct dba_array *page; + struct dbm_page *pdata; + struct dbm_macro *mdata; + const char *cp; + int32_t im, ip, iv, npages; + + if (dbm_open(fname) == -1) + return NULL; + npages = dbm_page_count(); + dba = dba_new(npages); + for (ip = 0; ip < npages; ip++) { + pdata = dbm_page_get(ip); + page = dba_page_new(dba->pages, pdata->name, pdata->sect, + pdata->arch, pdata->desc, pdata->file + 1, *pdata->file); + cp = pdata->name; + while (*(cp = strchr(cp, '\0') + 1) != '\0') + dba_page_add(page, DBP_NAME, cp); + cp = pdata->sect; + while (*(cp = strchr(cp, '\0') + 1) != '\0') + dba_page_add(page, DBP_SECT, cp); + if ((cp = pdata->arch) != NULL) + while (*(cp = strchr(cp, '\0') + 1) != '\0') + dba_page_add(page, DBP_ARCH, cp); + cp = pdata->file; + while (*(cp = strchr(cp, '\0') + 1) != '\0') + dba_page_add(page, DBP_FILE, cp); + } + for (im = 0; im < MACRO_MAX; im++) { + for (iv = 0; iv < dbm_macro_count(im); iv++) { + mdata = dbm_macro_get(im, iv); + dba_macro_new(dba, im, mdata->value, mdata->pp); + } + } + dbm_close(); + return dba; +} Index: dba_write.c =================================================================== RCS file: dba_write.c diff -N dba_write.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dba_write.c 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,117 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Low-level functions for serializing allocation-based data to disk. + * The interface is defined in "dba_write.h". + */ +#include <assert.h> +#include <endian.h> +#include <err.h> +#include <errno.h> +#include <fcntl.h> +#include <stdint.h> +#include <stdio.h> + +#include "dba_write.h" + +static FILE *ofp; + + +int +dba_open(const char *fname) +{ + ofp = fopen(fname, "w"); + return ofp == NULL ? -1 : 0; +} + +int +dba_close(void) +{ + return fclose(ofp) == EOF ? -1 : 0; +} + +int32_t +dba_tell(void) +{ + long pos; + + if ((pos = ftell(ofp)) == -1) + err(1, "ftell"); + if (pos >= INT32_MAX) { + errno = EOVERFLOW; + err(1, "ftell = %ld", pos); + } + return pos; +} + +void +dba_seek(int32_t pos) +{ + if (fseek(ofp, pos, SEEK_SET) == -1) + err(1, "fseek(%d)", pos); +} + +int32_t +dba_align(void) +{ + int32_t pos; + + pos = dba_tell(); + while (pos & 3) { + dba_char_write('\0'); + pos++; + } + return pos; +} + +int32_t +dba_skip(int32_t nmemb, int32_t sz) +{ + const int32_t out[5] = {0, 0, 0, 0, 0}; + int32_t i, pos; + + assert(sz >= 0); + assert(nmemb > 0); + assert(nmemb <= 5); + pos = dba_tell(); + for (i = 0; i < sz; i++) + if (nmemb - fwrite(&out, sizeof(out[0]), nmemb, ofp)) + err(1, "fwrite"); + return pos; +} + +void +dba_char_write(int c) +{ + if (putc(c, ofp) == EOF) + err(1, "fputc"); +} + +void +dba_str_write(const char *str) +{ + if (fputs(str, ofp) == EOF) + err(1, "fputs"); + dba_char_write('\0'); +} + +void +dba_int_write(int32_t i) +{ + i = htobe32(i); + if (fwrite(&i, sizeof(i), 1, ofp) != 1) + err(1, "fwrite"); +} Index: dba_write.h =================================================================== RCS file: dba_write.h diff -N dba_write.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dba_write.h 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,30 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Internal interface to low-level functions + * for serializing allocation-based data to disk. + * For use by dba_array.c and dba.c only. + */ + +int dba_open(const char *); +int dba_close(void); +int32_t dba_tell(void); +void dba_seek(int32_t); +int32_t dba_align(void); +int32_t dba_skip(int32_t, int32_t); +void dba_char_write(int); +void dba_str_write(const char *); +void dba_int_write(int32_t); Index: dbm.c =================================================================== RCS file: dbm.c diff -N dbm.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dbm.c 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,443 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Map-based version of the mandoc database, for read-only access. + * The interface is defined in "dbm.h". + */ +#include <assert.h> +#include <endian.h> +#include <err.h> +#include <errno.h> +#include <regex.h> +#include <stdint.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "mansearch.h" +#include "dbm_map.h" +#include "dbm.h" + +struct macro { + int32_t value; + int32_t pages; +}; + +struct page { + int32_t name; + int32_t sect; + int32_t arch; + int32_t desc; + int32_t file; +}; + +enum iter { + ITER_NONE = 0, + ITER_NAME, + ITER_SECT, + ITER_ARCH, + ITER_DESC, + ITER_MACRO +}; + +static struct macro *macros[MACRO_MAX]; +static int32_t nvals[MACRO_MAX]; +static struct page *pages; +static int32_t npages; +static enum iter iteration; + +static struct dbm_res page_bytitle(enum iter, const struct dbm_match *); +static struct dbm_res page_byarch(const struct dbm_match *); +static struct dbm_res page_bymacro(int32_t, const struct dbm_match *); +static char *macro_bypage(int32_t, int32_t); + + +/*** top level functions **********************************************/ + +/* + * Open a disk-based mandoc database for read-only access. + * Map the pages and macros[] arrays. + * Return 0 on success. Return -1 and set errno on failure. + */ +int +dbm_open(const char *fname) +{ + const int32_t *mp, *ep; + int32_t im; + + if (dbm_map(fname) == -1) + return -1; + + if ((npages = be32toh(*dbm_getint(4))) < 0) { + warnx("Invalid number of pages: %d", npages); + goto fail; + } + pages = (struct page *)dbm_getint(5); + + mp = dbm_get(*dbm_getint(2)); + if (be32toh(*mp) != MACRO_MAX) { + warnx("Invalid number of macros: %d", be32toh(*mp)); + goto fail; + } + for (im = 0; im < MACRO_MAX; im++) { + ep = dbm_get(*++mp); + nvals[im] = be32toh(*ep); + macros[im] = (struct macro *)++ep; + } + return 0; + +fail: + dbm_unmap(); + errno = EFTYPE; + return -1; +} + +void +dbm_close(void) +{ + dbm_unmap(); +} + + +/*** functions for handling pages *************************************/ + +int32_t +dbm_page_count(void) +{ + return npages; +} + +/* + * Give the caller pointers to the data for one manual page. + */ +struct dbm_page * +dbm_page_get(int32_t ip) +{ + static struct dbm_page res; + + assert(ip >= 0); + assert(ip < npages); + res.name = dbm_get(pages[ip].name); + res.sect = dbm_get(pages[ip].sect); + res.arch = pages[ip].arch ? dbm_get(pages[ip].arch) : NULL; + res.desc = dbm_get(pages[ip].desc); + res.file = dbm_get(pages[ip].file); + res.addr = dbm_addr(pages + ip); + return &res; +} + +/* + * Functions to start filtered iterations over manual pages. + */ +void +dbm_page_byname(const struct dbm_match *match) +{ + assert(match != NULL); + page_bytitle(ITER_NAME, match); +} + +void +dbm_page_bysect(const struct dbm_match *match) +{ + assert(match != NULL); + page_bytitle(ITER_SECT, match); +} + +void +dbm_page_byarch(const struct dbm_match *match) +{ + assert(match != NULL); + page_byarch(match); +} + +void +dbm_page_bydesc(const struct dbm_match *match) +{ + assert(match != NULL); + page_bytitle(ITER_DESC, match); +} + +void +dbm_page_bymacro(int32_t im, const struct dbm_match *match) +{ + assert(im >= 0); + assert(im < MACRO_MAX); + assert(match != NULL); + page_bymacro(im, match); +} + +/* + * Return the number of the next manual page in the current iteration. + */ +struct dbm_res +dbm_page_next(void) +{ + struct dbm_res res = {-1, 0}; + + switch(iteration) { + case ITER_NONE: + return res; + case ITER_ARCH: + return page_byarch(NULL); + case ITER_MACRO: + return page_bymacro(0, NULL); + default: + return page_bytitle(iteration, NULL); + } +} + +/* + * Functions implementing the iteration over manual pages. + */ +static struct dbm_res +page_bytitle(enum iter arg_iter, const struct dbm_match *arg_match) +{ + static const struct dbm_match *match; + static const char *cp; + static int32_t ip; + struct dbm_res res = {-1, 0}; + + assert(arg_iter == ITER_NAME || arg_iter == ITER_DESC || + arg_iter == ITER_SECT); + + /* Initialize for a new iteration. */ + + if (arg_match != NULL) { + iteration = arg_iter; + match = arg_match; + switch (iteration) { + case ITER_NAME: + cp = dbm_get(pages[0].name); + break; + case ITER_SECT: + cp = dbm_get(pages[0].sect); + break; + case ITER_DESC: + cp = dbm_get(pages[0].desc); + break; + default: + abort(); + } + ip = 0; + return res; + } + + /* Search for a name. */ + + while (ip < npages) { + if (iteration == ITER_NAME) + cp++; + if (dbm_match(match, cp)) + break; + cp = strchr(cp, '\0') + 1; + if (iteration == ITER_DESC) + ip++; + else if (*cp == '\0') { + cp++; + ip++; + } + } + + /* Reached the end without a match. */ + + if (ip == npages) { + iteration = ITER_NONE; + match = NULL; + cp = NULL; + return res; + } + + /* Found a match; save the quality for later retrieval. */ + + res.page = ip; + res.bits = iteration == ITER_NAME ? cp[-1] : 0; + + /* Skip the remaining names of this page. */ + + if (++ip < npages) { + do { + cp++; + } while (cp[-1] != '\0' || + (iteration != ITER_DESC && cp[-2] != '\0')); + } + return res; +} + +static struct dbm_res +page_byarch(const struct dbm_match *arg_match) +{ + static const struct dbm_match *match; + struct dbm_res res = {-1, 0}; + static int32_t ip; + const char *cp; + + /* Initialize for a new iteration. */ + + if (arg_match != NULL) { + iteration = ITER_ARCH; + match = arg_match; + ip = 0; + return res; + } + + /* Search for an architecture. */ + + for ( ; ip < npages; ip++) + if (pages[ip].arch) + for (cp = dbm_get(pages[ip].arch); + *cp != '\0'; + cp = strchr(cp, '\0') + 1) + if (dbm_match(match, cp)) { + res.page = ip++; + return res; + } + + /* Reached the end without a match. */ + + iteration = ITER_NONE; + match = NULL; + return res; +} + +static struct dbm_res +page_bymacro(int32_t im, const struct dbm_match *match) +{ + static const int32_t *pp; + struct dbm_res res = {-1, 0}; + const char *cp; + int32_t iv; + + assert(im >= 0); + assert(im < MACRO_MAX); + + /* Initialize for a new iteration. */ + + if (match != NULL) { + iteration = ITER_MACRO; + cp = nvals[im] ? dbm_get(macros[im]->value) : NULL; + for (iv = 0; iv < nvals[im]; iv++) { + if (dbm_match(match, cp)) + break; + cp = strchr(cp, '\0') + 1; + } + pp = iv == nvals[im] ? NULL : dbm_get(macros[im][iv].pages); + return res; + } + if (iteration != ITER_MACRO) + return res; + + /* No more matches. */ + + if (pp == NULL || *pp == 0) { + iteration = ITER_NONE; + pp = NULL; + return res; + } + + /* Found a match. */ + + res.page = (struct page *)dbm_get(*pp++) - pages; + return res; +} + + +/*** functions for handling macros ************************************/ + +int32_t +dbm_macro_count(int32_t im) +{ + assert(im >= 0); + assert(im < MACRO_MAX); + return nvals[im]; +} + +struct dbm_macro * +dbm_macro_get(int32_t im, int32_t iv) +{ + static struct dbm_macro macro; + + assert(im >= 0); + assert(im < MACRO_MAX); + assert(iv >= 0); + assert(iv < nvals[im]); + macro.value = dbm_get(macros[im][iv].value); + macro.pp = dbm_get(macros[im][iv].pages); + return ¯o; +} + +/* + * Filtered iteration over macro entries. + */ +void +dbm_macro_bypage(int32_t im, int32_t ip) +{ + assert(im >= 0); + assert(im < MACRO_MAX); + assert(ip > 0); + macro_bypage(im, ip); +} + +char * +dbm_macro_next(void) +{ + return macro_bypage(MACRO_MAX, 0); +} + +static char * +macro_bypage(int32_t arg_im, int32_t arg_ip) +{ + static const int32_t *pp; + static int32_t im, ip, iv; + + /* Initialize for a new iteration. */ + + if (arg_im < MACRO_MAX && arg_ip > 0) { + im = arg_im; + ip = arg_ip; + pp = dbm_get(macros[im]->pages); + iv = 0; + return NULL; + } + if (im >= MACRO_MAX) + return NULL; + + /* Search for the next value. */ + + while (iv < nvals[im]) { + if (*pp == ip) + break; + if (*pp == 0) + iv++; + pp++; + } + + /* Reached the end without a match. */ + + if (iv == nvals[im]) { + im = MACRO_MAX; + ip = -1; + pp = NULL; + return NULL; + } + + /* Found a match; skip the remaining pages of this entry. */ + + if (++iv < nvals[im]) + while (*pp++ != 0) + continue; + + return dbm_get(macros[im][iv - 1].value); +} Index: dbm.h =================================================================== RCS file: dbm.h diff -N dbm.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dbm.h 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,68 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Public interface for the map-based version + * of the mandoc database, for read-only access. + * To be used by dbm*.c, dba_read.c, and man(1) and apropos(1). + */ + +enum dbm_mtype { + DBM_EXACT = 0, + DBM_SUB, + DBM_REGEX +}; + +struct dbm_match { + regex_t *re; + const char *str; + enum dbm_mtype type; +}; + +struct dbm_res { + int32_t page; + int32_t bits; +}; + +struct dbm_page { + const char *name; + const char *sect; + const char *arch; + const char *desc; + const char *file; + int32_t addr; +}; + +struct dbm_macro { + const char *value; + const int32_t *pp; +}; + +int dbm_open(const char *); +void dbm_close(void); + +int32_t dbm_page_count(void); +struct dbm_page *dbm_page_get(int32_t); +void dbm_page_byname(const struct dbm_match *); +void dbm_page_bysect(const struct dbm_match *); +void dbm_page_byarch(const struct dbm_match *); +void dbm_page_bydesc(const struct dbm_match *); +void dbm_page_bymacro(int32_t, const struct dbm_match *); +struct dbm_res dbm_page_next(void); + +int32_t dbm_macro_count(int32_t); +struct dbm_macro *dbm_macro_get(int32_t, int32_t); +void dbm_macro_bypage(int32_t, int32_t); +char *dbm_macro_next(void); Index: dbm_dump.c =================================================================== RCS file: dbm_dump.c diff -N dbm_dump.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dbm_dump.c 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,197 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Function to dump an on-disk read-only mandoc database + * in diff(1)able format for debugging purposes. + */ +#include <regex.h> +#include <stdint.h> +#include <stdio.h> +#include <string.h> + +#include "mansearch.h" +#include "dbm_map.h" +#include "dbm.h" + +static void str_dump(const char **); +static void lst_dump(const char **); +static void pchk(const char *, const char **, const char *); +static void pchk_align(const char *, const char **, const char *); + + +void dbm_dump(void) +{ + const int32_t *ip1, *ip2; + const int32_t *mp, *ep, *p0, *pp; + const char *name0, *sect0, *arch0, *desc0, *file0; + const char *namep, *sectp, *archp, *descp, *filep; + int32_t pc, mc, ec, i1, i2; + + ip1 = dbm_getint(0); + printf("magic 0x%08x\n", be32toh(*ip1)); + printf("version 0x%08x\n", be32toh(*++ip1)); + printf("macro offset 0x%08x\n", be32toh(*++ip1)); + mp = dbm_get(*ip1); + printf("end offset 0x%08x\n", be32toh(*++ip1)); + ep = dbm_get(*ip1); + pc = be32toh(*++ip1); + printf("page count %04d\n", pc); + namep = name0 = dbm_get(ip1[1]); + sectp = sect0 = dbm_get(ip1[2]); + archp = arch0 = ip1[3] == 0 ? NULL : dbm_get(ip1[3]); + descp = desc0 = dbm_get(ip1[4]); + filep = file0 = dbm_get(ip1[5]); + printf(" === PAGES ===\n"); + for (i1 = 0; i1 < pc; i1++) { + pchk(dbm_get(*++ip1), &namep, "name"); + printf("page name "); + lst_dump(&namep); + pchk(dbm_get(*++ip1), §p, "sect"); + printf("page sect "); + lst_dump(§p); + if (*++ip1) { + if (arch0 == NULL) + archp = arch0 = dbm_get(*ip1); + else + pchk(dbm_get(*ip1), &archp, "arch"); + printf("page arch "); + lst_dump(&archp); + } + pchk(dbm_get(*++ip1), &descp, "desc"); + printf("page desc # "); + str_dump(&descp); + printf("\npage file "); + pchk(dbm_get(*++ip1), &filep, "file"); + switch(*filep++) { + case 1: + printf("src "); + break; + case 2: + printf("cat "); + break; + default: + printf("UNKNOWN FORMAT %d ", filep[-1]); + break; + } + lst_dump(&filep); + } + printf(" === END OF PAGES ===\n"); + ip1++; + pchk(name0, (const char **)&ip1, "name0"); + pchk(sect0, &namep, "sect0"); + if (arch0 != NULL) { + pchk(arch0, §p, "arch0"); + pchk(desc0, &archp, "desc0"); + } else + pchk(desc0, §p, "desc0"); + pchk(file0, &descp, "file0"); + pchk_align((const char *)mp, &filep, "macros"); + ip1 = mp; + mc = be32toh(*ip1); + printf("macro count %d at 0x%08x\n", mc, be32toh(dbm_addr(ip1))); + pp = mp = dbm_get(ip1[1]); + printf(" === MACROS ===\n"); + for (i1 = 0; i1 < mc; i1++) { + ip2 = dbm_get(*++ip1); + pchk((const char *)ip2, (const char **)&pp, "macro"); + pp++; + ec = be32toh(*ip2); + printf("macro %d entry count %d\n", i1, ec); + if (ec == 0) + continue; + namep = name0 = dbm_get(ip2[1]); + pp = p0 = dbm_get(ip2[2]); + printf(" === MACRO %d ===\n", i1); + for (i2 = 0; i2 < ec; i2++) { + pchk(dbm_get(*++ip2), &namep, "value"); + printf("macro %d # ", i1); + str_dump(&namep); + pchk(dbm_get(*++ip2), (const char **)&pp, "pages"); + while (*pp != 0) { + printf("# %s ", (char *)dbm_get( + *(int32_t *)dbm_get(*pp)) + 1); + pp++; + } + printf("\n"); + pp++; + } + printf(" === END OF MACRO %d ===\n", i1); + ip2++; + pchk(name0, (const char **)&ip2, "value0"); + pchk_align((const char *)p0, &namep, "page0"); + } + printf(" === END OF MACROS ===\n"); + ip1++; + pchk((const char *)mp, (const char **)&ip1, "macro0"); + pchk((const char *)ep, (const char **)&pp, "end"); + printf("magic 0x%08x at 0x%08x\n", + be32toh(*ep), be32toh(dbm_addr(ep))); +} + +static void +str_dump(const char **cp) +{ + if (**cp <= (char)NAME_MASK) { + putchar('['); + if (**cp & NAME_FILE) + putchar('f'); + if (**cp & NAME_HEAD) + putchar('h'); + if (**cp & NAME_FIRST) + putchar('1'); + if (**cp & NAME_TITLE) + putchar('t'); + if (**cp & NAME_SYN) + putchar('s'); + putchar(']'); + (*cp)++; + } + while (**cp != '\0') + putchar(*(*cp)++); + putchar(' '); + (*cp)++; +} + +static void +lst_dump(const char **cp) +{ + while (**cp != '\0') { + printf("# "); + str_dump(cp); + } + (*cp)++; + printf("\n"); +} + +static void +pchk(const char *want, const char **got, const char *name) +{ + if (*got == want) + return; + printf("{MISMATCH %s got %d want %d}", + name, be32toh(dbm_addr(want)), be32toh(dbm_addr(*got))); + *got = want; +} + +static void +pchk_align(const char *want, const char **got, const char *name) +{ + if (*got <= want && *got + 4 > want) + return; + printf("MISMATCH %s got %d want %d # ", + name, be32toh(dbm_addr(want)), be32toh(dbm_addr(*got))); + *got = want; +} Index: dbm_map.c =================================================================== RCS file: dbm_map.c diff -N dbm_map.c --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dbm_map.c 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,170 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Low-level routines for the map-based version + * of the mandoc database, for read-only access. + * The interface is defined in "dbm_map.h". + */ +#include <sys/mman.h> +#include <sys/stat.h> +#include <sys/types.h> + +#include <endian.h> +#include <err.h> +#include <errno.h> +#include <fcntl.h> +#include <regex.h> +#include <stdint.h> +#include <stdlib.h> +#include <string.h> +#include <unistd.h> + +#include "mansearch.h" +#include "dbm_map.h" +#include "dbm.h" + +static struct stat st; +static char *dbm_base; +static int ifd; +static int32_t max_offset; + +/* + * Open a disk-based database for read-only access. + * Validate the file format as far as it is not mandoc-specific. + * Return 0 on success. Return -1 and set errno on failure. + */ +int +dbm_map(const char *fname) +{ + int save_errno; + const int32_t *magic; + + if ((ifd = open(fname, O_RDONLY)) == -1) + return -1; + if (fstat(ifd, &st) == -1) + goto fail; + if (st.st_size < 5) { + warnx("File too short"); + errno = EFTYPE; + goto fail; + } + if (st.st_size > INT32_MAX) { + errno = EFBIG; + goto fail; + } + if ((dbm_base = mmap(NULL, st.st_size, PROT_READ, MAP_SHARED, + ifd, 0)) == MAP_FAILED) + goto fail; + magic = dbm_getint(0); + if (be32toh(*magic) != MANDOCDB_MAGIC) { + warnx("Bad initial magic of %x (expected %x)", + be32toh(*magic), MANDOCDB_MAGIC); + errno = EFTYPE; + goto fail; + } + magic = dbm_getint(1); + if (be32toh(*magic) != MANDOCDB_VERSION) { + warnx("Bad version number %d (expected %d)", + be32toh(*magic), MANDOCDB_VERSION); + errno = EFTYPE; + goto fail; + } + max_offset = be32toh(*dbm_getint(3)) + sizeof(int32_t); + if (st.st_size != max_offset) { + warnx("Inconsistent file size of %llu (expected %d)", + st.st_size, max_offset); + errno = EFTYPE; + goto fail; + } + magic = dbm_get(*dbm_getint(3)); + if (be32toh(*magic) != MANDOCDB_MAGIC) { + warnx("Bad final magic of %x (expected %x)", + be32toh(*magic), MANDOCDB_MAGIC); + errno = EFTYPE; + goto fail; + } + return 0; + +fail: + save_errno = errno; + close(ifd); + errno = save_errno; + return -1; +} + +void +dbm_unmap(void) +{ + if (munmap(dbm_base, st.st_size) == -1) + err(1, "munmap"); + if (close(ifd) == -1) + err(1, "close"); + dbm_base = (char *)-1; +} + +/* + * Take a raw integer as it was read from the database. + * Interpret it as an offset into the database file + * and return a pointer to that place in the file. + */ +void * +dbm_get(int32_t offset) +{ + offset = be32toh(offset); + if (offset < 0 || offset >= max_offset) { + warnx("Database corrupt: offset %d > %d", offset, max_offset); + return NULL; + } + return dbm_base + offset; +} + +/* + * Assume the database starts with some integers. + * Assume they are numbered starting from 0, increasing. + * Get a pointer to one with the number "offset". + */ +int32_t * +dbm_getint(int32_t offset) +{ + return (int32_t *)dbm_base + offset; +} + +/* + * The reverse of dbm_get(). + * Take pointer into the database file + * and convert it to the raw integer + * that would be used to refer to that place in the file. + */ +int32_t +dbm_addr(const void *p) +{ + return htobe32((char *)p - dbm_base); +} + +int +dbm_match(const struct dbm_match *match, const char *str) +{ + switch (match->type) { + case DBM_EXACT: + return strcmp(str, match->str) == 0; + case DBM_SUB: + return strcasestr(str, match->str) != NULL; + case DBM_REGEX: + return regexec(match->re, str, 0, NULL, 0) == 0; + default: + abort(); + } +} Index: dbm_map.h =================================================================== RCS file: dbm_map.h diff -N dbm_map.h --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ dbm_map.h 1 Jul 2016 01:57:57 -0000 @@ -0,0 +1,29 @@ +/* $OpenBSD$ */ +/* + * Copyright (c) 2016 Ingo Schwarze <schwa...@openbsd.org> + * + * Permission to use, copy, modify, and distribute this software for any + * purpose with or without fee is hereby granted, provided that the above + * copyright notice and this permission notice appear in all copies. + * + * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES + * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF + * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR + * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN + * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF + * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. + * + * Private interface for low-level routines for the map-based version + * of the mandoc database, for read-only access. + * To be used by dbm*.c only. + */ + +struct dbm_match; + +int dbm_map(const char *); +void dbm_unmap(void); +void *dbm_get(int32_t); +int32_t *dbm_getint(int32_t); +int32_t dbm_addr(const void *); +int dbm_match(const struct dbm_match *, const char *); Index: main.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/main.c,v retrieving revision 1.172 diff -u -p -r1.172 main.c --- main.c 14 Apr 2016 20:54:15 -0000 1.172 +++ main.c 1 Jul 2016 01:57:58 -0000 @@ -315,9 +321,6 @@ main(int argc, char *argv[]) /* man(1), whatis(1), apropos(1) */ if (search.argmode != ARG_FILE) { - if (argc == 0) - usage(search.argmode); - if (search.argmode == ARG_NAME && outmode == OUTMODE_ONE) search.firstmatch = 1; @@ -325,7 +328,6 @@ main(int argc, char *argv[]) /* Access the mandoc database. */ manconf_parse(&conf, conf_file, defpaths, auxpaths); - mansearch_setup(1); if ( ! mansearch(&search, &conf.manpath, argc, argv, &res, &sz)) usage(search.argmode); @@ -429,7 +431,7 @@ main(int argc, char *argv[]) if (resp == NULL) parse(&curp, fd, *argv); - else if (resp->form & FORM_SRC) { + else if (resp->form == FORM_SRC) { /* For .so only; ignore failure. */ chdir(conf.manpath.paths[resp->ipath]); parse(&curp, fd, resp->file); @@ -478,7 +480,6 @@ out: if (search.argmode != ARG_FILE) { manconf_free(&conf); mansearch_free(res, sz); - mansearch_setup(0); } free(defos); @@ -582,7 +583,8 @@ fs_lookup(const struct manpaths *paths, glob_t globinfo; struct manpage *page; char *file; - int form, globres; + int globres; + enum form form; form = FORM_SRC; mandoc_asprintf(&file, "%s/man%s/%s.%s", Index: mandocdb.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mandocdb.c,v retrieving revision 1.169 diff -u -p -r1.169 mandocdb.c --- mandocdb.c 15 Mar 2016 20:50:23 -0000 1.169 +++ mandocdb.c 1 Jul 2016 01:57:58 -0000 @@ -27,6 +27,7 @@ #include <fts.h> #include <getopt.h> #include <limits.h> +#include <stdarg.h> #include <stddef.h> #include <stdio.h> #include <stdint.h> @@ -34,8 +35,6 @@ #include <string.h> #include <unistd.h> -#include <sqlite3.h> - #include "mandoc_aux.h" #include "mandoc_ohash.h" #include "mandoc.h" @@ -44,29 +43,11 @@ #include "man.h" #include "manconf.h" #include "mansearch.h" +#include "dba_array.h" +#include "dba.h" -extern int mansearch_keymax; extern const char *const mansearch_keynames[]; -#define SQL_EXEC(_v) \ - if (SQLITE_OK != sqlite3_exec(db, (_v), NULL, NULL, NULL)) \ - say("", "%s: %s", (_v), sqlite3_errmsg(db)) -#define SQL_BIND_TEXT(_s, _i, _v) \ - if (SQLITE_OK != sqlite3_bind_text \ - ((_s), (_i)++, (_v), -1, SQLITE_STATIC)) \ - say(mlink->file, "%s", sqlite3_errmsg(db)) -#define SQL_BIND_INT(_s, _i, _v) \ - if (SQLITE_OK != sqlite3_bind_int \ - ((_s), (_i)++, (_v))) \ - say(mlink->file, "%s", sqlite3_errmsg(db)) -#define SQL_BIND_INT64(_s, _i, _v) \ - if (SQLITE_OK != sqlite3_bind_int64 \ - ((_s), (_i)++, (_v))) \ - say(mlink->file, "%s", sqlite3_errmsg(db)) -#define SQL_STEP(_s) \ - if (SQLITE_DONE != sqlite3_step((_s))) \ - say(mlink->file, "%s", sqlite3_errmsg(db)) - enum op { OP_DEFAULT = 0, /* new dbs from dir list or default config */ OP_CONFFILE, /* new databases from custom config file */ @@ -88,14 +69,14 @@ struct inodev { struct mpage { struct inodev inodev; /* used for hashing routine */ - int64_t pageid; /* pageid in mpages SQL table */ + struct dba_array *dba; char *sec; /* section from file content */ char *arch; /* architecture from file content */ char *title; /* title from file content */ char *desc; /* description from file content */ struct mlink *mlinks; /* singly linked list */ - int form; /* format from file content */ int name_head_done; + enum form form; /* format from file content */ }; struct mlink { @@ -106,19 +87,9 @@ struct mlink { char *fsec; /* section from file name suffix */ struct mlink *next; /* singly linked list */ struct mpage *mpage; /* parent */ - int dform; /* format from directory */ - int fform; /* format from file name suffix */ int gzip; /* filename has a .gz suffix */ -}; - -enum stmt { - STMT_DELETE_PAGE = 0, /* delete mpage */ - STMT_INSERT_PAGE, /* insert mpage */ - STMT_INSERT_LINK, /* insert mlink */ - STMT_INSERT_NAME, /* insert name */ - STMT_SELECT_NAME, /* retrieve existing name flags */ - STMT_INSERT_KEY, /* insert parsed key */ - STMT__MAX + enum form dform; /* format from directory */ + enum form fform; /* format from file name suffix */ }; typedef int (*mdoc_fp)(struct mpage *, const struct roff_meta *, @@ -129,20 +100,17 @@ struct mdoc_handler { uint64_t mask; /* set unless handler returns 0 */ }; -static void dbclose(int); -static void dbadd(struct mpage *); +static void dbadd(struct dba *, struct mpage *); static void dbadd_mlink(const struct mlink *mlink); -static void dbadd_mlink_name(const struct mlink *mlink); -static int dbopen(int); -static void dbprune(void); +static void dbprune(struct dba *); +static void dbwrite(struct dba *); static void filescan(const char *); static void mlink_add(struct mlink *, const struct stat *); static void mlink_check(struct mpage *, struct mlink *); static void mlink_free(struct mlink *); static void mlinks_undupe(struct mpage *); static void mpages_free(void); -static void mpages_merge(struct mparse *); -static void names_check(void); +static void mpages_merge(struct dba *, struct mparse *); static void parse_cat(struct mpage *, int); static void parse_man(struct mpage *, const struct roff_meta *, const struct roff_node *); @@ -177,7 +145,6 @@ static int set_basedir(const char *, in static int treescan(void); static size_t utf8(unsigned int, char [7]); -static char tempfilename[32]; static int nodb; /* no database changes */ static int mparse_options; /* abort the parse early */ static int use_all; /* use all found files */ @@ -191,8 +158,6 @@ static struct ohash mpages; /* table of static struct ohash mlinks; /* table of directory entries */ static struct ohash names; /* table of all names */ static struct ohash strings; /* table of all strings */ -static sqlite3 *db = NULL; /* current database */ -static sqlite3_stmt *stmts[STMT__MAX]; /* current statements */ static uint64_t name_mask; static const struct mdoc_handler mdocs[MDOC_MAX] = { @@ -327,6 +292,7 @@ mandocdb(int argc, char *argv[]) { struct manconf conf; struct mparse *mp; + struct dba *dba; const char *path_arg, *progname; size_t j, sz; int ch, i; @@ -337,7 +303,6 @@ mandocdb(int argc, char *argv[]) } memset(&conf, 0, sizeof(conf)); - memset(stmts, 0, STMT__MAX * sizeof(sqlite3_stmt *)); /* * We accept a few different invocations. @@ -436,7 +401,7 @@ mandocdb(int argc, char *argv[]) if (OP_TEST != op && 0 == set_basedir(path_arg, 1)) goto out; - if (dbopen(1)) { + if ((dba = dba_read(MANDOC_DB)) != NULL) { /* * The existing database is usable. Process * all files specified on the command-line. @@ -452,7 +417,7 @@ mandocdb(int argc, char *argv[]) for (i = 0; i < argc; i++) filescan(argv[i]); if (OP_TEST != op) - dbprune(); + dbprune(dba); } else { /* * Database missing or corrupt. @@ -462,12 +427,13 @@ mandocdb(int argc, char *argv[]) op = OP_DEFAULT; if (0 == treescan()) goto out; - if (0 == dbopen(0)) - goto out; + dba = dba_new(128); } if (OP_DELETE != op) - mpages_merge(mp); - dbclose(OP_DEFAULT == op ? 0 : 1); + mpages_merge(dba, mp); + if (nodb == 0) + dbwrite(dba); + dba_free(dba); } else { /* * If we have arguments, use them as our manpaths. @@ -512,14 +478,11 @@ mandocdb(int argc, char *argv[]) continue; if (0 == treescan()) continue; - if (0 == dbopen(0)) - continue; - - mpages_merge(mp); - if (warnings && !nodb && - ! (MPARSE_QUICK & mparse_options)) - names_check(); - dbclose(0); + dba = dba_new(128); + mpages_merge(dba, mp); + if (nodb == 0) + dbwrite(dba); + dba_free(dba); if (j + 1 < conf.manpath.sz) { mpages_free(); @@ -569,7 +532,8 @@ treescan(void) FTS *f; FTSENT *ff; struct mlink *mlink; - int dform, gzip; + int gzip; + enum form dform; char *dsec, *arch, *fsec, *cp; const char *path; const char *argv[2]; @@ -935,6 +899,7 @@ mlink_add(struct mlink *mlink, const str mpage = mandoc_calloc(1, sizeof(struct mpage)); mpage->inodev.st_ino = inodev.st_ino; mpage->inodev.st_dev = inodev.st_dev; + mpage->form = FORM_NONE; ohash_insert(&mpages, slot, mpage); } else mlink->next = mpage->mlinks; @@ -1083,7 +1048,7 @@ mlink_check(struct mpage *mpage, struct * and filename to determine whether the file is parsable or not. */ static void -mpages_merge(struct mparse *mp) +mpages_merge(struct dba *dba, struct mparse *mp) { char any[] = "any"; struct mpage *mpage, *mpage_dest; @@ -1094,9 +1059,6 @@ mpages_merge(struct mparse *mp) int fd; unsigned int pslot; - if ( ! nodb) - SQL_EXEC("BEGIN TRANSACTION"); - mpage = ohash_first(&mpages, &pslot); while (mpage != NULL) { mlinks_undupe(mpage); @@ -1153,8 +1115,8 @@ mpages_merge(struct mparse *mp) * to the target. */ - if (mpage_dest->pageid) - dbadd_mlink_name(mlink); + if (mpage_dest->dba != NULL) + dbadd_mlink(mlink); if (mlink->next == NULL) break; @@ -1219,7 +1181,7 @@ mpages_merge(struct mparse *mp) mlink = mlink->next) mlink_check(mpage, mlink); - dbadd(mpage); + dbadd(dba, mpage); mlink = mpage->mlinks; nextpage: @@ -1227,44 +1189,6 @@ nextpage: ohash_delete(&names); mpage = ohash_next(&mpages, &pslot); } - - if (0 == nodb) - SQL_EXEC("END TRANSACTION"); -} - -static void -names_check(void) -{ - sqlite3_stmt *stmt; - const char *name, *sec, *arch, *key; - - sqlite3_prepare_v2(db, - "SELECT name, sec, arch, key FROM (" - "SELECT name AS key, pageid FROM names " - "WHERE bits & ? AND NOT EXISTS (" - "SELECT pageid FROM mlinks " - "WHERE mlinks.pageid == names.pageid " - "AND mlinks.name == names.name" - ")" - ") JOIN (" - "SELECT sec, arch, name, pageid FROM mlinks " - "GROUP BY pageid" - ") USING (pageid);", - -1, &stmt, NULL); - - if (sqlite3_bind_int64(stmt, 1, NAME_TITLE) != SQLITE_OK) - say("", "%s", sqlite3_errmsg(db)); - - while (sqlite3_step(stmt) == SQLITE_ROW) { - name = (const char *)sqlite3_column_text(stmt, 0); - sec = (const char *)sqlite3_column_text(stmt, 1); - arch = (const char *)sqlite3_column_text(stmt, 2); - key = (const char *)sqlite3_column_text(stmt, 3); - say("", "%s(%s%s%s) lacks mlink \"%s\"", name, sec, - '\0' == *arch ? "" : "/", - '\0' == *arch ? "" : arch, key); - } - sqlite3_finalize(stmt); } static void @@ -1788,7 +1712,7 @@ putkeys(const struct mpage *mpage, char } else { htab = &strings; if (debug > 1) - for (i = 0; i < mansearch_keymax; i++) + for (i = 0; i < KEY_MAX; i++) if ((uint64_t)1 << i & v) say(mpage->mlinks->file, "Adding key %s=%*s", @@ -1994,53 +1918,23 @@ render_string(char **public, size_t *psz static void dbadd_mlink(const struct mlink *mlink) { - size_t i; - - i = 1; - SQL_BIND_TEXT(stmts[STMT_INSERT_LINK], i, mlink->dsec); - SQL_BIND_TEXT(stmts[STMT_INSERT_LINK], i, mlink->arch); - SQL_BIND_TEXT(stmts[STMT_INSERT_LINK], i, mlink->name); - SQL_BIND_INT64(stmts[STMT_INSERT_LINK], i, mlink->mpage->pageid); - SQL_STEP(stmts[STMT_INSERT_LINK]); - sqlite3_reset(stmts[STMT_INSERT_LINK]); -} - -static void -dbadd_mlink_name(const struct mlink *mlink) -{ - uint64_t bits; - size_t i; - - dbadd_mlink(mlink); - - i = 1; - SQL_BIND_INT64(stmts[STMT_SELECT_NAME], i, mlink->mpage->pageid); - bits = NAME_FILE & NAME_MASK; - if (sqlite3_step(stmts[STMT_SELECT_NAME]) == SQLITE_ROW) { - bits |= sqlite3_column_int64(stmts[STMT_SELECT_NAME], 0); - sqlite3_reset(stmts[STMT_SELECT_NAME]); - } - - i = 1; - SQL_BIND_INT64(stmts[STMT_INSERT_NAME], i, bits); - SQL_BIND_TEXT(stmts[STMT_INSERT_NAME], i, mlink->name); - SQL_BIND_INT64(stmts[STMT_INSERT_NAME], i, mlink->mpage->pageid); - SQL_STEP(stmts[STMT_INSERT_NAME]); - sqlite3_reset(stmts[STMT_INSERT_NAME]); + dba_page_alias(mlink->mpage->dba, mlink->name, NAME_FILE); + dba_page_add(mlink->mpage->dba, DBP_SECT, mlink->dsec); + dba_page_add(mlink->mpage->dba, DBP_ARCH, mlink->arch); + dba_page_add(mlink->mpage->dba, DBP_FILE, mlink->file); } /* * Flush the current page's terms (and their bits) into the database. - * Wrap the entire set of additions in a transaction to make sqlite be a - * little faster. * Also, handle escape sequences at the last possible moment. */ static void -dbadd(struct mpage *mpage) +dbadd(struct dba *dba, struct mpage *mpage) { struct mlink *mlink; struct str *key; char *cp; + uint64_t mask; size_t i; unsigned int slot; int mustfree; @@ -2085,111 +1979,87 @@ dbadd(struct mpage *mpage) cp = mpage->desc; i = strlen(cp); mustfree = render_string(&cp, &i); - i = 1; - SQL_BIND_TEXT(stmts[STMT_INSERT_PAGE], i, cp); - SQL_BIND_INT(stmts[STMT_INSERT_PAGE], i, mpage->form); - SQL_STEP(stmts[STMT_INSERT_PAGE]); - mpage->pageid = sqlite3_last_insert_rowid(db); - sqlite3_reset(stmts[STMT_INSERT_PAGE]); + mpage->dba = dba_page_new(dba->pages, mlink->name, + mlink->dsec, mlink->arch, cp, mlink->file, mpage->form); if (mustfree) free(cp); - while (NULL != mlink) { + while ((mlink = mlink->next) != NULL) dbadd_mlink(mlink); - mlink = mlink->next; - } - mlink = mpage->mlinks; for (key = ohash_first(&names, &slot); NULL != key; key = ohash_next(&names, &slot)) { assert(key->mpage == mpage); - i = 1; - SQL_BIND_INT64(stmts[STMT_INSERT_NAME], i, key->mask); - SQL_BIND_TEXT(stmts[STMT_INSERT_NAME], i, key->key); - SQL_BIND_INT64(stmts[STMT_INSERT_NAME], i, mpage->pageid); - SQL_STEP(stmts[STMT_INSERT_NAME]); - sqlite3_reset(stmts[STMT_INSERT_NAME]); + dba_page_alias(mpage->dba, key->key, key->mask); free(key); } for (key = ohash_first(&strings, &slot); NULL != key; key = ohash_next(&strings, &slot)) { assert(key->mpage == mpage); - i = 1; - SQL_BIND_INT64(stmts[STMT_INSERT_KEY], i, key->mask); - SQL_BIND_TEXT(stmts[STMT_INSERT_KEY], i, key->key); - SQL_BIND_INT64(stmts[STMT_INSERT_KEY], i, mpage->pageid); - SQL_STEP(stmts[STMT_INSERT_KEY]); - sqlite3_reset(stmts[STMT_INSERT_KEY]); + i = 0; + for (mask = TYPE_Xr; mask <= TYPE_Lb; mask *= 2) { + if (key->mask & mask) + dba_macro_add(dba->macros, i, + key->key, mpage->dba); + i++; + } free(key); } } static void -dbprune(void) +dbprune(struct dba *dba) { - struct mpage *mpage; - struct mlink *mlink; - size_t i; - unsigned int slot; - - if (0 == nodb) - SQL_EXEC("BEGIN TRANSACTION"); + struct dba_array *page, *files; + char *file; - for (mpage = ohash_first(&mpages, &slot); NULL != mpage; - mpage = ohash_next(&mpages, &slot)) { - mlink = mpage->mlinks; - if (debug) - say(mlink->file, "Deleting from database"); - if (nodb) - continue; - for ( ; NULL != mlink; mlink = mlink->next) { - i = 1; - SQL_BIND_TEXT(stmts[STMT_DELETE_PAGE], - i, mlink->dsec); - SQL_BIND_TEXT(stmts[STMT_DELETE_PAGE], - i, mlink->arch); - SQL_BIND_TEXT(stmts[STMT_DELETE_PAGE], - i, mlink->name); - SQL_STEP(stmts[STMT_DELETE_PAGE]); - sqlite3_reset(stmts[STMT_DELETE_PAGE]); + dba_array_FOREACH(dba->pages, page) { + files = dba_array_get(page, DBP_FILE); + dba_array_FOREACH(files, file) { + if (*file < ' ') + file++; + if (ohash_find(&mlinks, ohash_qlookup(&mlinks, + file)) != NULL) { + if (debug) + say(file, "Deleting from database"); + dba_array_del(dba->pages); + break; + } } } - - if (0 == nodb) - SQL_EXEC("END TRANSACTION"); } /* - * Close an existing database and its prepared statements. - * If "real" is not set, rename the temporary file into the real one. + * Write the database from memory to disk. */ static void -dbclose(int real) +dbwrite(struct dba *dba) { - size_t i; + char tfn[32]; int status; pid_t child; - if (nodb) + if (dba_write(MANDOC_DB "~", dba) != -1) { + if (rename(MANDOC_DB "~", MANDOC_DB) == -1) { + exitcode = (int)MANDOCLEVEL_SYSERR; + say(MANDOC_DB, "&rename"); + unlink(MANDOC_DB "~"); + } return; - - for (i = 0; i < STMT__MAX; i++) { - sqlite3_finalize(stmts[i]); - stmts[i] = NULL; } - sqlite3_close(db); - db = NULL; - - if (real) + (void)strlcpy(tfn, "/tmp/mandocdb.XXXXXXXX", sizeof(tfn)); + if (mkdtemp(tfn) == NULL) { + exitcode = (int)MANDOCLEVEL_SYSERR; + say("", "&%s", tfn); return; + } - if ('\0' == *tempfilename) { - if (-1 == rename(MANDOC_DB "~", MANDOC_DB)) { - exitcode = (int)MANDOCLEVEL_SYSERR; - say(MANDOC_DB, "&rename"); - } - return; + (void)strlcat(tfn, "/" MANDOC_DB, sizeof(tfn)); + if (dba_write(tfn, dba) == -1) { + exitcode = (int)MANDOCLEVEL_SYSERR; + say(tfn, "&dba_write"); + goto out; } switch (child = fork()) { @@ -2198,14 +2068,13 @@ dbclose(int real) say("", "&fork cmp"); return; case 0: - execlp("cmp", "cmp", "-s", - tempfilename, MANDOC_DB, (char *)NULL); + execlp("cmp", "cmp", "-s", tfn, MANDOC_DB, (char *)NULL); say("", "&exec cmp"); exit(0); default: break; } - if (-1 == waitpid(child, &status, 0)) { + if (waitpid(child, &status, 0) == -1) { exitcode = (int)MANDOCLEVEL_SYSERR; say("", "&wait cmp"); } else if (WIFSIGNALED(status)) { @@ -2217,171 +2086,27 @@ dbclose(int real) "Data changed, but cannot replace database"); } - *strrchr(tempfilename, '/') = '\0'; +out: + *strrchr(tfn, '/') = '\0'; switch (child = fork()) { case -1: exitcode = (int)MANDOCLEVEL_SYSERR; say("", "&fork rm"); return; case 0: - execlp("rm", "rm", "-rf", tempfilename, (char *)NULL); + execlp("rm", "rm", "-rf", tfn, (char *)NULL); say("", "&exec rm"); exit((int)MANDOCLEVEL_SYSERR); default: break; } - if (-1 == waitpid(child, &status, 0)) { + if (waitpid(child, &status, 0) == -1) { exitcode = (int)MANDOCLEVEL_SYSERR; say("", "&wait rm"); } else if (WIFSIGNALED(status) || WEXITSTATUS(status)) { exitcode = (int)MANDOCLEVEL_SYSERR; - say("", "%s: Cannot remove temporary directory", - tempfilename); - } -} - -/* - * This is straightforward stuff. - * Open a database connection to a "temporary" database, then open a set - * of prepared statements we'll use over and over again. - * If "real" is set, we use the existing database; if not, we truncate a - * temporary one. - * Must be matched by dbclose(). - */ -static int -dbopen(int real) -{ - const char *sql; - int rc, ofl; - - if (nodb) - return 1; - - *tempfilename = '\0'; - ofl = SQLITE_OPEN_READWRITE; - - if (real) { - rc = sqlite3_open_v2(MANDOC_DB, &db, ofl, NULL); - if (SQLITE_OK != rc) { - exitcode = (int)MANDOCLEVEL_SYSERR; - if (SQLITE_CANTOPEN != rc) - say(MANDOC_DB, "%s", sqlite3_errstr(rc)); - return 0; - } - goto prepare_statements; - } - - ofl |= SQLITE_OPEN_CREATE | SQLITE_OPEN_EXCLUSIVE; - - remove(MANDOC_DB "~"); - rc = sqlite3_open_v2(MANDOC_DB "~", &db, ofl, NULL); - if (SQLITE_OK == rc) - goto create_tables; - if (MPARSE_QUICK & mparse_options) { - exitcode = (int)MANDOCLEVEL_SYSERR; - say(MANDOC_DB "~", "%s", sqlite3_errstr(rc)); - return 0; + say("", "%s: Cannot remove temporary directory", tfn); } - - (void)strlcpy(tempfilename, "/tmp/mandocdb.XXXXXX", - sizeof(tempfilename)); - if (NULL == mkdtemp(tempfilename)) { - exitcode = (int)MANDOCLEVEL_SYSERR; - say("", "&%s", tempfilename); - return 0; - } - (void)strlcat(tempfilename, "/" MANDOC_DB, - sizeof(tempfilename)); - rc = sqlite3_open_v2(tempfilename, &db, ofl, NULL); - if (SQLITE_OK != rc) { - exitcode = (int)MANDOCLEVEL_SYSERR; - say("", "%s: %s", tempfilename, sqlite3_errstr(rc)); - return 0; - } - -create_tables: - sql = "CREATE TABLE \"mpages\" (\n" - " \"desc\" TEXT NOT NULL,\n" - " \"form\" INTEGER NOT NULL,\n" - " \"pageid\" INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL\n" - ");\n" - "\n" - "CREATE TABLE \"mlinks\" (\n" - " \"sec\" TEXT NOT NULL,\n" - " \"arch\" TEXT NOT NULL,\n" - " \"name\" TEXT NOT NULL,\n" - " \"pageid\" INTEGER NOT NULL REFERENCES mpages(pageid) " - "ON DELETE CASCADE\n" - ");\n" - "CREATE INDEX mlinks_pageid_idx ON mlinks (pageid);\n" - "\n" - "CREATE TABLE \"names\" (\n" - " \"bits\" INTEGER NOT NULL,\n" - " \"name\" TEXT NOT NULL,\n" - " \"pageid\" INTEGER NOT NULL REFERENCES mpages(pageid) " - "ON DELETE CASCADE,\n" - " UNIQUE (\"name\", \"pageid\") ON CONFLICT REPLACE\n" - ");\n" - "\n" - "CREATE TABLE \"keys\" (\n" - " \"bits\" INTEGER NOT NULL,\n" - " \"key\" TEXT NOT NULL,\n" - " \"pageid\" INTEGER NOT NULL REFERENCES mpages(pageid) " - "ON DELETE CASCADE\n" - ");\n" - "CREATE INDEX keys_pageid_idx ON keys (pageid);\n"; - - if (SQLITE_OK != sqlite3_exec(db, sql, NULL, NULL, NULL)) { - exitcode = (int)MANDOCLEVEL_SYSERR; - say(MANDOC_DB, "%s", sqlite3_errmsg(db)); - sqlite3_close(db); - return 0; - } - -prepare_statements: - if (SQLITE_OK != sqlite3_exec(db, - "PRAGMA foreign_keys = ON", NULL, NULL, NULL)) { - exitcode = (int)MANDOCLEVEL_SYSERR; - say(MANDOC_DB, "PRAGMA foreign_keys: %s", - sqlite3_errmsg(db)); - sqlite3_close(db); - return 0; - } - - sql = "DELETE FROM mpages WHERE pageid IN " - "(SELECT pageid FROM mlinks WHERE " - "sec=? AND arch=? AND name=?)"; - sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_DELETE_PAGE], NULL); - sql = "INSERT INTO mpages " - "(desc,form) VALUES (?,?)"; - sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_PAGE], NULL); - sql = "INSERT INTO mlinks " - "(sec,arch,name,pageid) VALUES (?,?,?,?)"; - sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_LINK], NULL); - sql = "SELECT bits FROM names where pageid = ?"; - sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_SELECT_NAME], NULL); - sql = "INSERT INTO names " - "(bits,name,pageid) VALUES (?,?,?)"; - sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_NAME], NULL); - sql = "INSERT INTO keys " - "(bits,key,pageid) VALUES (?,?,?)"; - sqlite3_prepare_v2(db, sql, -1, &stmts[STMT_INSERT_KEY], NULL); - - /* - * When opening a new database, we can turn off - * synchronous mode for much better performance. - */ - - if (real && SQLITE_OK != sqlite3_exec(db, - "PRAGMA synchronous = OFF", NULL, NULL, NULL)) { - exitcode = (int)MANDOCLEVEL_SYSERR; - say(MANDOC_DB, "PRAGMA synchronous: %s", - sqlite3_errmsg(db)); - sqlite3_close(db); - return 0; - } - - return 1; } static int Index: mansearch.c =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mansearch.c,v retrieving revision 1.49 diff -u -p -r1.49 mansearch.c --- mansearch.c 8 Jan 2016 15:01:58 -0000 1.49 +++ mansearch.c 1 Jul 2016 01:57:58 -0000 @@ -34,118 +34,52 @@ #include <string.h> #include <unistd.h> -#include <sqlite3.h> - #include "mandoc.h" #include "mandoc_aux.h" #include "mandoc_ohash.h" #include "manconf.h" #include "mansearch.h" - -extern int mansearch_keymax; -extern const char *const mansearch_keynames[]; - -#define SQL_BIND_TEXT(_db, _s, _i, _v) \ - do { if (SQLITE_OK != sqlite3_bind_text \ - ((_s), (_i)++, (_v), -1, SQLITE_STATIC)) \ - errx((int)MANDOCLEVEL_SYSERR, "%s", sqlite3_errmsg((_db))); \ - } while (0) -#define SQL_BIND_INT64(_db, _s, _i, _v) \ - do { if (SQLITE_OK != sqlite3_bind_int64 \ - ((_s), (_i)++, (_v))) \ - errx((int)MANDOCLEVEL_SYSERR, "%s", sqlite3_errmsg((_db))); \ - } while (0) -#define SQL_BIND_BLOB(_db, _s, _i, _v) \ - do { if (SQLITE_OK != sqlite3_bind_blob \ - ((_s), (_i)++, (&_v), sizeof(_v), SQLITE_STATIC)) \ - errx((int)MANDOCLEVEL_SYSERR, "%s", sqlite3_errmsg((_db))); \ - } while (0) +#include "dbm.h" struct expr { - regex_t regexp; /* compiled regexp, if applicable */ - const char *substr; /* to search for, if applicable */ - struct expr *next; /* next in sequence */ - uint64_t bits; /* type-mask */ - int equal; /* equality, not subsring match */ - int open; /* opening parentheses before */ - int and; /* logical AND before */ - int close; /* closing parentheses after */ + /* Used for terms: */ + struct dbm_match match; /* Match type and expression. */ + uint64_t bits; /* Type mask. */ + /* Used for OR and AND groups: */ + struct expr *next; /* Next child in the parent group. */ + struct expr *child; /* First child in this group. */ + enum { EXPR_TERM, EXPR_OR, EXPR_AND } type; }; -struct match { - uint64_t pageid; /* identifier in database */ - uint64_t bits; /* name type mask */ - char *desc; /* manual page description */ - int form; /* bit field: formatted, zipped? */ +const char *const mansearch_keynames[KEY_MAX] = { + "arch", "sec", "Xr", "Ar", "Fa", "Fl", "Dv", "Fn", + "Ic", "Pa", "Cm", "Li", "Em", "Cd", "Va", "Ft", + "Tn", "Er", "Ev", "Sy", "Sh", "In", "Ss", "Ox", + "An", "Mt", "St", "Bx", "At", "Nx", "Fx", "Lk", + "Ms", "Bsx", "Dx", "Rs", "Vt", "Lb", "Nm", "Nd" }; -static void buildnames(const struct mansearch *, - struct manpage *, sqlite3 *, - sqlite3_stmt *, uint64_t, - const char *, int form); -static char *buildoutput(sqlite3 *, sqlite3_stmt *, - uint64_t, uint64_t); +void dbm_dump(void); + +static struct ohash *manmerge(struct expr *, struct ohash *); +static struct ohash *manmerge_term(struct expr *, struct ohash *); +static struct ohash *manmerge_or(struct expr *, struct ohash *); +static struct ohash *manmerge_and(struct expr *, struct ohash *); +static char *buildnames(const struct dbm_page *); +static char *buildoutput(size_t, int32_t); +static size_t lstlen(const char *); +static void lstcat(char *, size_t *, const char *); +static int lstmatch(const char *, const char *); static struct expr *exprcomp(const struct mansearch *, - int, char *[]); + int, char *[], int *); +static struct expr *expr_and(const struct mansearch *, + int, char *[], int *); +static struct expr *exprterm(const struct mansearch *, + int, char *[], int *); +static void exprdump(struct expr *, int); static void exprfree(struct expr *); -static struct expr *exprterm(const struct mansearch *, char *, int); static int manpage_compare(const void *, const void *); -static void sql_append(char **sql, size_t *sz, - const char *newstr, int count); -static void sql_match(sqlite3_context *context, - int argc, sqlite3_value **argv); -static void sql_regexp(sqlite3_context *context, - int argc, sqlite3_value **argv); -static char *sql_statement(const struct expr *); - - -int -mansearch_setup(int start) -{ - static void *pagecache; - int c; - -#define PC_PAGESIZE 1280 -#define PC_NUMPAGES 256 - - if (start) { - if (NULL != pagecache) { - warnx("pagecache already enabled"); - return (int)MANDOCLEVEL_BADARG; - } - - pagecache = mmap(NULL, PC_PAGESIZE * PC_NUMPAGES, - PROT_READ | PROT_WRITE, - MAP_SHARED | MAP_ANON, -1, 0); - if (MAP_FAILED == pagecache) { - warn("mmap"); - pagecache = NULL; - return (int)MANDOCLEVEL_SYSERR; - } - - c = sqlite3_config(SQLITE_CONFIG_PAGECACHE, - pagecache, PC_PAGESIZE, PC_NUMPAGES); - - if (SQLITE_OK == c) - return (int)MANDOCLEVEL_OK; - - warnx("pagecache: %s", sqlite3_errstr(c)); - - } else if (NULL == pagecache) { - warnx("pagecache missing"); - return (int)MANDOCLEVEL_BADARG; - } - - if (-1 == munmap(pagecache, PC_PAGESIZE * PC_NUMPAGES)) { - warn("munmap"); - pagecache = NULL; - return (int)MANDOCLEVEL_SYSERR; - } - - pagecache = NULL; - return (int)MANDOCLEVEL_OK; -} int mansearch(const struct mansearch *search, @@ -153,41 +87,50 @@ mansearch(const struct mansearch *search int argc, char *argv[], struct manpage **res, size_t *sz) { - int64_t pageid; - uint64_t outbit, iterbit; char buf[PATH_MAX]; - char *sql; + struct dbm_res *rp; + struct expr *e; + struct dbm_page *page; struct manpage *mpage; - struct expr *e, *ep; - sqlite3 *db; - sqlite3_stmt *s, *s2; - struct match *mp; - struct ohash htab; - unsigned int idx; - size_t i, j, cur, maxres; - int c, chdir_status, getcwd_status, indexbit; + struct ohash *htab; + size_t cur, i, maxres, outkey; + unsigned int slot; + int argi, chdir_status, getcwd_status, im; + + if (argc == 0) { + if (chdir(paths->paths[0]) == -1) + warn("%s", paths->paths[0]); + else if (dbm_open(MANDOC_DB) == -1) + warn("%s/%s", paths->paths[0], MANDOC_DB); + else { + dbm_dump(); + dbm_close(); + } + *sz = 0; + return 0; + } - if (argc == 0 || (e = exprcomp(search, argc, argv)) == NULL) { + argi = 0; + if (argc == 0 || (e = exprcomp(search, argc, argv, &argi)) == NULL) { *sz = 0; return 0; } + if (argi < argc) + warnx("ignoring unmatched right parenthesis%s", + argi + 1 < argc ? " and all following arguments" : ""); + exprdump(e, 0); cur = maxres = 0; *res = NULL; - if (NULL != search->outkey) { - outbit = TYPE_Nd; - for (indexbit = 0, iterbit = 1; - indexbit < mansearch_keymax; - indexbit++, iterbit <<= 1) { + outkey = KEY_Nd; + if (search->outkey != NULL) + for (im = 0; im < KEY_MAX; im++) if (0 == strcasecmp(search->outkey, - mansearch_keynames[indexbit])) { - outbit = iterbit; + mansearch_keynames[im])) { + outkey = im; break; } - } - } else - outbit = 0; /* * Remember the original working directory, if possible. @@ -203,8 +146,6 @@ mansearch(const struct mansearch *search } else getcwd_status = 1; - sql = sql_statement(e); - /* * Loop over the directories (containing databases) for us to * search. @@ -230,123 +171,48 @@ mansearch(const struct mansearch *search } chdir_status = 1; - c = sqlite3_open_v2(MANDOC_DB, &db, - SQLITE_OPEN_READONLY, NULL); - - if (SQLITE_OK != c) { + if (dbm_open(MANDOC_DB) == -1) { warn("%s/%s", paths->paths[i], MANDOC_DB); - sqlite3_close(db); continue; } - /* - * Define the SQL functions for substring - * and regular expression matching. - */ - - c = sqlite3_create_function(db, "match", 2, - SQLITE_UTF8 | SQLITE_DETERMINISTIC, - NULL, sql_match, NULL, NULL); - assert(SQLITE_OK == c); - c = sqlite3_create_function(db, "regexp", 2, - SQLITE_UTF8 | SQLITE_DETERMINISTIC, - NULL, sql_regexp, NULL, NULL); - assert(SQLITE_OK == c); - - j = 1; - c = sqlite3_prepare_v2(db, sql, -1, &s, NULL); - if (SQLITE_OK != c) - errx((int)MANDOCLEVEL_SYSERR, - "%s", sqlite3_errmsg(db)); - - for (ep = e; NULL != ep; ep = ep->next) { - if (NULL == ep->substr) { - SQL_BIND_BLOB(db, s, j, ep->regexp); - } else - SQL_BIND_TEXT(db, s, j, ep->substr); - if (0 == ((TYPE_Nd | TYPE_Nm) & ep->bits)) - SQL_BIND_INT64(db, s, j, ep->bits); + if ((htab = manmerge(e, NULL)) == NULL) { + dbm_close(); + continue; } - mandoc_ohash_init(&htab, 4, offsetof(struct match, pageid)); - - /* - * Hash each entry on its [unique] document identifier. - * This is a uint64_t. - * Instead of using a hash function, simply convert the - * uint64_t to a uint32_t, the hash value's type. - * This gives good performance and preserves the - * distribution of buckets in the table. - */ - while (SQLITE_ROW == (c = sqlite3_step(s))) { - pageid = sqlite3_column_int64(s, 2); - idx = ohash_lookup_memory(&htab, - (char *)&pageid, sizeof(uint64_t), - (uint32_t)pageid); + for (rp = ohash_first(htab, &slot); rp != NULL; + rp = ohash_next(htab, &slot)) { + page = dbm_page_get(rp->page); - if (NULL != ohash_find(&htab, idx)) + if (lstmatch(search->sec, page->sect) == 0 || + lstmatch(search->arch, page->arch) == 0) continue; - mp = mandoc_calloc(1, sizeof(struct match)); - mp->pageid = pageid; - mp->form = sqlite3_column_int(s, 1); - mp->bits = sqlite3_column_int64(s, 3); - if (TYPE_Nd == outbit) - mp->desc = mandoc_strdup((const char *) - sqlite3_column_text(s, 0)); - ohash_insert(&htab, idx, mp); - } - - if (SQLITE_DONE != c) - warnx("%s", sqlite3_errmsg(db)); - - sqlite3_finalize(s); - - c = sqlite3_prepare_v2(db, - "SELECT sec, arch, name, pageid FROM mlinks " - "WHERE pageid=? ORDER BY sec, arch, name", - -1, &s, NULL); - if (SQLITE_OK != c) - errx((int)MANDOCLEVEL_SYSERR, - "%s", sqlite3_errmsg(db)); - - c = sqlite3_prepare_v2(db, - "SELECT bits, key, pageid FROM keys " - "WHERE pageid=? AND bits & ?", - -1, &s2, NULL); - if (SQLITE_OK != c) - errx((int)MANDOCLEVEL_SYSERR, - "%s", sqlite3_errmsg(db)); - - for (mp = ohash_first(&htab, &idx); - NULL != mp; - mp = ohash_next(&htab, &idx)) { if (cur + 1 > maxres) { maxres += 1024; *res = mandoc_reallocarray(*res, - maxres, sizeof(struct manpage)); + maxres, sizeof(**res)); } mpage = *res + cur; + mandoc_asprintf(&mpage->file, "%s/%s", + paths->paths[i], page->file + 1); + mpage->names = buildnames(page); + mpage->output = (int)outkey == KEY_Nd ? + mandoc_strdup(page->desc) : + buildoutput(outkey, page->addr); mpage->ipath = i; - mpage->bits = mp->bits; - mpage->sec = 10; - mpage->form = mp->form; - buildnames(search, mpage, db, s, mp->pageid, - paths->paths[i], mp->form); - if (mpage->names != NULL) { - mpage->output = TYPE_Nd & outbit ? - mp->desc : outbit ? - buildoutput(db, s2, mp->pageid, outbit) : - NULL; - cur++; - } - free(mp); - } - - sqlite3_finalize(s); - sqlite3_finalize(s2); - sqlite3_close(db); - ohash_delete(&htab); + mpage->bits = rp->bits; + mpage->sec = *page->sect - '0'; + if (mpage->sec < 0 || mpage->sec > 9) + mpage->sec = 10; + mpage->form = *page->file; + free(rp); + cur++; + } + ohash_delete(htab); + free(htab); + dbm_close(); /* * In man(1) mode, prefer matches in earlier trees @@ -360,291 +226,300 @@ mansearch(const struct mansearch *search if (chdir_status && getcwd_status && chdir(buf) == -1) warn("%s", buf); exprfree(e); - free(sql); *sz = cur; return 1; } -void -mansearch_free(struct manpage *res, size_t sz) +/* + * Merge the results for the expression tree rooted at e + * into the the result list htab. + */ +static struct ohash * +manmerge(struct expr *e, struct ohash *htab) { - size_t i; - - for (i = 0; i < sz; i++) { - free(res[i].file); - free(res[i].names); - free(res[i].output); + switch (e->type) { + case EXPR_TERM: + return manmerge_term(e, htab); + case EXPR_OR: + return manmerge_or(e->child, htab); + case EXPR_AND: + return manmerge_and(e->child, htab); + default: + abort(); } - free(res); } -static int -manpage_compare(const void *vp1, const void *vp2) +static struct ohash * +manmerge_term(struct expr *e, struct ohash *htab) { - const struct manpage *mp1, *mp2; - int diff; + struct dbm_res res, *rp; + uint64_t ib; + unsigned int slot; + int im; - mp1 = vp1; - mp2 = vp2; - return (diff = mp2->bits - mp1->bits) ? diff : - (diff = mp1->sec - mp2->sec) ? diff : - strcasecmp(mp1->names, mp2->names); -} + if (htab == NULL) { + htab = mandoc_malloc(sizeof(*htab)); + mandoc_ohash_init(htab, 4, offsetof(struct dbm_res, page)); + } -static void -buildnames(const struct mansearch *search, struct manpage *mpage, - sqlite3 *db, sqlite3_stmt *s, - uint64_t pageid, const char *path, int form) -{ - glob_t globinfo; - char *firstname, *newnames, *prevsec, *prevarch; - const char *oldnames, *sep1, *name, *sec, *sep2, *arch, *fsec; - size_t i; - int c, globres; - - mpage->file = NULL; - mpage->names = NULL; - firstname = prevsec = prevarch = NULL; - i = 1; - SQL_BIND_INT64(db, s, i, pageid); - while (SQLITE_ROW == (c = sqlite3_step(s))) { - - /* Decide whether we already have some names. */ - - if (NULL == mpage->names) { - oldnames = ""; - sep1 = ""; - } else { - oldnames = mpage->names; - sep1 = ", "; + for (im = 0, ib = 1; im < KEY_MAX; im++, ib <<= 1) { + if ((e->bits & ib) == 0) + continue; + + switch (ib) { + case TYPE_arch: + dbm_page_byarch(&e->match); + break; + case TYPE_sec: + dbm_page_bysect(&e->match); + break; + case TYPE_Nm: + dbm_page_byname(&e->match); + break; + case TYPE_Nd: + dbm_page_bydesc(&e->match); + break; + default: + dbm_page_bymacro(im - 2, &e->match); + break; } - /* Fetch the next name, rejecting sec/arch mismatches. */ + /* + * When hashing for deduplication, use the unique + * page ID itself instead of a hash function; + * that is quite efficient. + */ - sec = (const char *)sqlite3_column_text(s, 0); - if (search->sec != NULL && strcasecmp(sec, search->sec)) - continue; - arch = (const char *)sqlite3_column_text(s, 1); - if (search->arch != NULL && *arch != '\0' && - strcasecmp(arch, search->arch)) - continue; - name = (const char *)sqlite3_column_text(s, 2); + for (;;) { + res = dbm_page_next(); + if (res.page == -1) + break; + slot = ohash_lookup_memory(htab, + (char *)&res, sizeof(res), res.page); + if ((rp = ohash_find(htab, slot)) != NULL) { + rp->bits |= res.bits; + continue; + } + rp = mandoc_malloc(sizeof(*rp)); + *rp = res; + ohash_insert(htab, slot, rp); + } + } + return htab; +} - /* Remember the first section found. */ +static struct ohash * +manmerge_or(struct expr *e, struct ohash *htab) +{ + while (e != NULL) { + htab = manmerge(e, htab); + e = e->next; + } + return htab; +} - if (9 < mpage->sec && '1' <= *sec && '9' >= *sec) - mpage->sec = (*sec - '1') + 1; +static struct ohash * +manmerge_and(struct expr *e, struct ohash *htab) +{ + struct ohash *hand, *h1, *h2; + int32_t *pp; + unsigned int slot1, slot2; - /* If the section changed, append the old one. */ + /* Evaluate the first term of the AND clause. */ - if (NULL != prevsec && - (strcmp(sec, prevsec) || - strcmp(arch, prevarch))) { - sep2 = '\0' == *prevarch ? "" : "/"; - mandoc_asprintf(&newnames, "%s(%s%s%s)", - oldnames, prevsec, sep2, prevarch); - free(mpage->names); - oldnames = mpage->names = newnames; - free(prevsec); - free(prevarch); - prevsec = prevarch = NULL; - } + hand = manmerge(e, NULL); - /* Save the new section, to append it later. */ + while ((e = e->next) != NULL) { - if (NULL == prevsec) { - prevsec = mandoc_strdup(sec); - prevarch = mandoc_strdup(arch); - } + /* Evaluate the next term and prepare for ANDing. */ - /* Append the new name. */ + h2 = manmerge(e, NULL); + if (ohash_entries(h2) < ohash_entries(hand)) { + h1 = h2; + h2 = hand; + } else + h1 = hand; + hand = mandoc_malloc(sizeof(*hand)); + mandoc_ohash_init(hand, 4, offsetof(struct dbm_res, page)); - mandoc_asprintf(&newnames, "%s%s%s", - oldnames, sep1, name); - free(mpage->names); - mpage->names = newnames; + /* Keep all pages that are in both result sets. */ - /* Also save the first file name encountered. */ + for (pp = ohash_first(h1, &slot1); pp != NULL; + pp = ohash_next(h1, &slot1)) { + if (ohash_find(h2, ohash_lookup_memory(h2, + (char *)pp, sizeof(*pp), *pp)) == NULL) + free(pp); + else + ohash_insert(hand, ohash_lookup_memory(hand, + (char *)pp, sizeof(*pp), *pp), pp); + } - if (mpage->file != NULL) - continue; + /* Discard the merged results. */ - if (form & FORM_SRC) { - sep1 = "man"; - fsec = sec; - } else { - sep1 = "cat"; - fsec = "0"; - } - sep2 = *arch == '\0' ? "" : "/"; - mandoc_asprintf(&mpage->file, "%s/%s%s%s%s/%s.%s", - path, sep1, sec, sep2, arch, name, fsec); - if (access(mpage->file, R_OK) != -1) - continue; + for (pp = ohash_first(h2, &slot2); pp != NULL; + pp = ohash_next(h2, &slot2)) + free(pp); + ohash_delete(h2); + free(h2); + ohash_delete(h1); + free(h1); + } + + /* Merge the result of the AND into htab. */ - /* Handle unusual file name extensions. */ + if (htab == NULL) + return hand; - if (firstname == NULL) - firstname = mpage->file; + for (pp = ohash_first(hand, &slot1); pp != NULL; + pp = ohash_next(hand, &slot1)) { + slot2 = ohash_lookup_memory(htab, + (char *)pp, sizeof(*pp), *pp); + if (ohash_find(htab, slot2) == NULL) + ohash_insert(htab, slot2, pp); else - free(mpage->file); - mandoc_asprintf(&mpage->file, "%s/%s%s%s%s/%s.*", - path, sep1, sec, sep2, arch, name); - globres = glob(mpage->file, 0, NULL, &globinfo); - free(mpage->file); - mpage->file = globres ? NULL : - mandoc_strdup(*globinfo.gl_pathv); - globfree(&globinfo); - } - if (c != SQLITE_DONE) - warnx("%s", sqlite3_errmsg(db)); - sqlite3_reset(s); - - /* If none of the files is usable, use the first name. */ - - if (mpage->file == NULL) - mpage->file = firstname; - else if (mpage->file != firstname) - free(firstname); - - /* Append one final section to the names. */ - - if (prevsec != NULL) { - sep2 = *prevarch == '\0' ? "" : "/"; - mandoc_asprintf(&newnames, "%s(%s%s%s)", - mpage->names, prevsec, sep2, prevarch); - free(mpage->names); - mpage->names = newnames; - free(prevsec); - free(prevarch); + free(pp); } + + /* Discard the merged result. */ + + ohash_delete(hand); + free(hand); + return htab; } -static char * -buildoutput(sqlite3 *db, sqlite3_stmt *s, uint64_t pageid, uint64_t outbit) +void +mansearch_free(struct manpage *res, size_t sz) { - char *output, *newoutput; - const char *oldoutput, *sep1, *data; - size_t i; - int c; + size_t i; - output = NULL; - i = 1; - SQL_BIND_INT64(db, s, i, pageid); - SQL_BIND_INT64(db, s, i, outbit); - while (SQLITE_ROW == (c = sqlite3_step(s))) { - if (NULL == output) { - oldoutput = ""; - sep1 = ""; - } else { - oldoutput = output; - sep1 = " # "; - } - data = (const char *)sqlite3_column_text(s, 1); - mandoc_asprintf(&newoutput, "%s%s%s", - oldoutput, sep1, data); - free(output); - output = newoutput; + for (i = 0; i < sz; i++) { + free(res[i].file); + free(res[i].names); + free(res[i].output); } - if (SQLITE_DONE != c) - warnx("%s", sqlite3_errmsg(db)); - sqlite3_reset(s); - return output; + free(res); } -/* - * Implement substring match as an application-defined SQL function. - * Using the SQL LIKE or GLOB operators instead would be a bad idea - * because that would require escaping metacharacters in the string - * being searched for. - */ -static void -sql_match(sqlite3_context *context, int argc, sqlite3_value **argv) +static int +manpage_compare(const void *vp1, const void *vp2) { + const struct manpage *mp1, *mp2; + int diff; + + mp1 = vp1; + mp2 = vp2; + return (diff = mp2->bits - mp1->bits) ? diff : + (diff = mp1->sec - mp2->sec) ? diff : + strcasecmp(mp1->names, mp2->names); +} + +static char * +buildnames(const struct dbm_page *page) +{ + char *buf; + size_t i, sz; - assert(2 == argc); - sqlite3_result_int(context, NULL != strcasestr( - (const char *)sqlite3_value_text(argv[1]), - (const char *)sqlite3_value_text(argv[0]))); + sz = lstlen(page->name) + 1 + lstlen(page->sect) + + (page->arch == NULL ? 0 : 1 + lstlen(page->arch)) + 2; + buf = mandoc_malloc(sz); + i = 0; + lstcat(buf, &i, page->name); + buf[i++] = '('; + lstcat(buf, &i, page->sect); + if (page->arch != NULL) { + buf[i++] = '/'; + lstcat(buf, &i, page->arch); + } + buf[i++] = ')'; + buf[i++] = '\0'; + assert(i == sz); + return buf; } /* - * Implement regular expression match - * as an application-defined SQL function. + * Count the buffer space needed to print the NUL-terminated + * list of NUL-terminated strings, when printing two separator + * characters between strings. */ -static void -sql_regexp(sqlite3_context *context, int argc, sqlite3_value **argv) +static size_t +lstlen(const char *cp) { + size_t sz; - assert(2 == argc); - sqlite3_result_int(context, !regexec( - (regex_t *)sqlite3_value_blob(argv[0]), - (const char *)sqlite3_value_text(argv[1]), - 0, NULL, 0)); + for (sz = 0;; sz++) { + if (cp[0] == '\0') { + if (cp[1] == '\0') + break; + sz++; + } + cp++; + } + return sz; } +/* + * Print the NUL-terminated list of NUL-terminated strings + * into the buffer, seperating strings with a comma and a blank. + */ static void -sql_append(char **sql, size_t *sz, const char *newstr, int count) +lstcat(char *buf, size_t *i, const char *cp) { - size_t newsz; + for (;;) { + if (cp[0] == '\0') { + if (cp[1] == '\0') + break; + buf[(*i)++] = ','; + buf[(*i)++] = ' '; + } else + buf[(*i)++] = cp[0]; + cp++; + } +} - newsz = 1 < count ? (size_t)count : strlen(newstr); - *sql = mandoc_realloc(*sql, *sz + newsz + 1); - if (1 < count) - memset(*sql + *sz, *newstr, (size_t)count); - else - memcpy(*sql + *sz, newstr, newsz); - *sz += newsz; - (*sql)[*sz] = '\0'; +/* + * Return 1 if the string *want occurs in any of the strings + * in the NUL-terminated string list *have, or 0 otherwise. + * If either argument is NULL or empty, assume no filtering + * is desired and return 1. + */ +static int +lstmatch(const char *want, const char *have) +{ + if (want == NULL || have == NULL || *have == '\0') + return 1; + while (*have != '\0') { + if (strcasestr(have, want) != NULL) + return 1; + have = strchr(have, '\0') + 1; + } + return 0; } /* - * Prepare the search SQL statement. + * Build a list of values taken by the macro im + * in the manual page with big-endian address addr. */ static char * -sql_statement(const struct expr *e) +buildoutput(size_t im, int32_t addr) { - char *sql; - size_t sz; - int needop; - - sql = mandoc_strdup(e->equal ? - "SELECT desc, form, pageid, bits " - "FROM mpages NATURAL JOIN names WHERE " : - "SELECT desc, form, pageid, 0 FROM mpages WHERE "); - sz = strlen(sql); - - for (needop = 0; NULL != e; e = e->next) { - if (e->and) - sql_append(&sql, &sz, " AND ", 1); - else if (needop) - sql_append(&sql, &sz, " OR ", 1); - if (e->open) - sql_append(&sql, &sz, "(", e->open); - sql_append(&sql, &sz, - TYPE_Nd & e->bits - ? (NULL == e->substr - ? "desc REGEXP ?" - : "desc MATCH ?") - : TYPE_Nm == e->bits - ? (NULL == e->substr - ? "pageid IN (SELECT pageid FROM names " - "WHERE name REGEXP ?)" - : e->equal - ? "name = ? " - : "pageid IN (SELECT pageid FROM names " - "WHERE name MATCH ?)") - : (NULL == e->substr - ? "pageid IN (SELECT pageid FROM keys " - "WHERE key REGEXP ? AND bits & ?)" - : "pageid IN (SELECT pageid FROM keys " - "WHERE key MATCH ? AND bits & ?)"), 1); - if (e->close) - sql_append(&sql, &sz, ")", e->close); - needop = 1; - } + char *output, *value; + char *oldoutput, *sep, *newoutput; - return sql; + output = NULL; + dbm_macro_bypage(im - 2, addr); + while ((value = dbm_macro_next()) != NULL) { + if (output == NULL) { + oldoutput = ""; + sep = ""; + } else { + oldoutput = output; + sep = " # "; + } + mandoc_asprintf(&newoutput, "%s%s%s", oldoutput, sep, value); + free(output); + output = newoutput; + } + return output; } /* @@ -653,188 +528,246 @@ sql_statement(const struct expr *e) * "(", "foo=bar", etc.). */ static struct expr * -exprcomp(const struct mansearch *search, int argc, char *argv[]) +exprcomp(const struct mansearch *search, int argc, char *argv[], int *argi) { - uint64_t mask; - int i, toopen, logic, igncase, toclose; - struct expr *first, *prev, *cur, *next; - - first = cur = NULL; - logic = igncase = toopen = toclose = 0; - - for (i = 0; i < argc; i++) { - if (0 == strcmp("(", argv[i])) { - if (igncase) - goto fail; - toopen++; - toclose++; - continue; - } else if (0 == strcmp(")", argv[i])) { - if (toopen || logic || igncase || NULL == cur) - goto fail; - cur->close++; - if (0 > --toclose) - goto fail; - continue; - } else if (0 == strcmp("-a", argv[i])) { - if (toopen || logic || igncase || NULL == cur) - goto fail; - logic = 1; - continue; - } else if (0 == strcmp("-o", argv[i])) { - if (toopen || logic || igncase || NULL == cur) - goto fail; - logic = 2; + struct expr *parent, *child; + int needterm; + + needterm = 1; + parent = child = NULL; + while (*argi < argc) { + if (strcmp(")", argv[*argi]) == 0) { + if (needterm) + warnx("missing term " + "before closing parenthesis"); + needterm = 0; + break; + } + if (strcmp("-o", argv[*argi]) == 0) { + if (needterm) { + if (*argi > 0) + warnx("ignoring -o after %s", + argv[*argi - 1]); + else + warnx("ignoring initial -o"); + } + needterm = 1; + ++*argi; continue; - } else if (0 == strcmp("-i", argv[i])) { - if (igncase) - goto fail; - igncase = 1; + } + needterm = 0; + if (child == NULL) { + child = expr_and(search, argc, argv, argi); continue; } - next = exprterm(search, argv[i], !igncase); - if (NULL == next) - goto fail; - if (NULL == first) - first = next; - else - cur->next = next; - prev = cur = next; + if (parent == NULL) { + parent = mandoc_malloc(sizeof(*parent)); + parent->type = EXPR_OR; + parent->next = NULL; + parent->child = child; + } + child->next = expr_and(search, argc, argv, argi); + child = child->next; + } + if (needterm && *argi) + warnx("ignoring trailing %s", argv[*argi - 1]); + return parent == NULL ? child : parent; +} - /* - * Searching for descriptions must be split out - * because they are stored in the mpages table, - * not in the keys table. - */ +static struct expr * +expr_and(const struct mansearch *search, int argc, char *argv[], int *argi) +{ + struct expr *parent, *child; + int needterm; - for (mask = TYPE_Nm; mask <= TYPE_Nd; mask <<= 1) { - if (mask & cur->bits && ~mask & cur->bits) { - next = mandoc_calloc(1, - sizeof(struct expr)); - memcpy(next, cur, sizeof(struct expr)); - prev->open = 1; - cur->bits = mask; - cur->next = next; - cur = next; - cur->bits &= ~mask; + needterm = 1; + parent = child = NULL; + while (*argi < argc) { + if (strcmp(")", argv[*argi]) == 0) { + if (needterm) + warnx("missing term " + "before closing parenthesis"); + needterm = 0; + break; + } + if (strcmp("-o", argv[*argi]) == 0) + break; + if (strcmp("-a", argv[*argi]) == 0) { + if (needterm) { + if (*argi > 0) + warnx("ignoring -a after %s", + argv[*argi - 1]); + else + warnx("ignoring initial -a"); } + needterm = 1; + ++*argi; + continue; } - prev->and = (1 == logic); - prev->open += toopen; - if (cur != prev) - cur->close = 1; - - toopen = logic = igncase = 0; - } - if ( ! (toopen || logic || igncase || toclose)) - return first; - -fail: - if (NULL != first) - exprfree(first); - return NULL; + if (needterm == 0) + break; + if (child == NULL) { + child = exprterm(search, argc, argv, argi); + if (child != NULL) + needterm = 0; + continue; + } + needterm = 0; + if (parent == NULL) { + parent = mandoc_malloc(sizeof(*parent)); + parent->type = EXPR_AND; + parent->next = NULL; + parent->child = child; + } + child->next = exprterm(search, argc, argv, argi); + if (child->next != NULL) { + child = child->next; + needterm = 0; + } + } + if (needterm && *argi) + warnx("ignoring trailing %s", argv[*argi - 1]); + return parent == NULL ? child : parent; } static struct expr * -exprterm(const struct mansearch *search, char *buf, int cs) +exprterm(const struct mansearch *search, int argc, char *argv[], int *argi) { char errbuf[BUFSIZ]; struct expr *e; char *key, *val; uint64_t iterbit; - int i, irc; + int cs, i, irc; - if ('\0' == *buf) - return NULL; + if (strcmp("(", argv[*argi]) == 0) { + ++*argi; + e = exprcomp(search, argc, argv, argi); + if (*argi < argc) { + assert(strcmp(")", argv[*argi]) == 0); + ++*argi; + } else + warnx("unclosed parenthesis"); + return e; + } - e = mandoc_calloc(1, sizeof(struct expr)); + e = mandoc_malloc(sizeof(*e)); + e->type = EXPR_TERM; + e->bits = 0; + e->next = NULL; + e->child = NULL; if (search->argmode == ARG_NAME) { e->bits = TYPE_Nm; - e->substr = buf; - e->equal = 1; + e->match.type = DBM_EXACT; + e->match.str = argv[(*argi)++]; return e; } /* * Separate macro keys from search string. - * If needed, request regular expression handling - * by setting e->substr to NULL. + * If needed, request regular expression handling. */ if (search->argmode == ARG_WORD) { e->bits = TYPE_Nm; - e->substr = NULL; - mandoc_asprintf(&val, "[[:<:]]%s[[:>:]]", buf); + e->match.type = DBM_REGEX; + mandoc_asprintf(&val, "[[:<:]]%s[[:>:]]", argv[*argi]); cs = 0; - } else if ((val = strpbrk(buf, "=~")) == NULL) { + } else if ((val = strpbrk(argv[*argi], "=~")) == NULL) { e->bits = TYPE_Nm | TYPE_Nd; - e->substr = buf; + e->match.type = DBM_SUB; + e->match.str = argv[*argi]; } else { - if (val == buf) + if (val == argv[*argi]) e->bits = TYPE_Nm | TYPE_Nd; - if ('=' == *val) - e->substr = val + 1; + if (*val == '=') { + e->match.type = DBM_SUB; + e->match.str = val + 1; + } else + e->match.type = DBM_REGEX; *val++ = '\0'; - if (NULL != strstr(buf, "arch")) + if (strstr(argv[*argi], "arch") != NULL) cs = 0; } /* Compile regular expressions. */ - if (NULL == e->substr) { - irc = regcomp(&e->regexp, val, + if (e->match.type == DBM_REGEX) { + e->match.re = mandoc_malloc(sizeof(*e->match.re)); + irc = regcomp(e->match.re, val, REG_EXTENDED | REG_NOSUB | (cs ? 0 : REG_ICASE)); + if (irc) { + regerror(irc, e->match.re, errbuf, sizeof(errbuf)); + warnx("regcomp /%s/: %s", val, errbuf); + } if (search->argmode == ARG_WORD) free(val); if (irc) { - regerror(irc, &e->regexp, errbuf, sizeof(errbuf)); - warnx("regcomp: %s", errbuf); + free(e->match.re); free(e); + ++*argi; return NULL; } } - if (e->bits) + if (e->bits) { + ++*argi; return e; + } /* * Parse out all possible fields. * If the field doesn't resolve, bail. */ - while (NULL != (key = strsep(&buf, ","))) { + while (NULL != (key = strsep(&argv[*argi], ","))) { if ('\0' == *key) continue; - for (i = 0, iterbit = 1; - i < mansearch_keymax; - i++, iterbit <<= 1) { - if (0 == strcasecmp(key, - mansearch_keynames[i])) { + for (i = 0, iterbit = 1; i < KEY_MAX; i++, iterbit <<= 1) { + if (0 == strcasecmp(key, mansearch_keynames[i])) { e->bits |= iterbit; break; } } - if (i == mansearch_keymax) { - if (strcasecmp(key, "any")) { - free(e); - return NULL; - } + if (i == KEY_MAX) { + if (strcasecmp(key, "any")) + warnx("treating unknown key " + "\"%s\" as \"any\"", key); e->bits |= ~0ULL; } } + ++*argi; return e; } static void -exprfree(struct expr *p) +exprdump(struct expr *e, int indent) { - struct expr *pp; - - while (NULL != p) { - pp = p->next; - free(p); - p = pp; + while (e != NULL) { + switch (e->type) { + case EXPR_TERM: + printf("%*s%010llx %d %p %s\n", indent, "", e->bits, + e->match.type, e->match.re, e->match.str); + break; + case EXPR_OR: + printf("%*sOR\n", indent, ""); + break; + case EXPR_AND: + printf("%*sAND\n", indent, ""); + break; + } + exprdump(e->child, indent + 2); + e = e->next; } +} + +static void +exprfree(struct expr *e) +{ + if (e->next != NULL) + exprfree(e->next); + if (e->child != NULL) + exprfree(e->child); + free(e); } Index: mansearch.h =================================================================== RCS file: /cvs/src/usr.bin/mandoc/mansearch.h,v retrieving revision 1.20 diff -u -p -r1.20 mansearch.h --- mansearch.h 7 Nov 2015 13:57:55 -0000 1.20 +++ mansearch.h 1 Jul 2016 01:57:58 -0000 @@ -16,7 +16,13 @@ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. */ -#define MANDOC_DB "mandoc.db" +#define MANDOC_DB "mandoc.new.db" +#define MANDOCDB_MAGIC 0x3a7d0cdb +#define MANDOCDB_VERSION 0 /* XXX Start counting in production. */ + +#define MACRO_MAX 36 +#define KEY_Nd 39 +#define KEY_MAX 40 #define TYPE_arch 0x0000000000000001ULL #define TYPE_sec 0x0000000000000002ULL @@ -66,9 +72,11 @@ #define NAME_FILE 0x0000004000000010ULL #define NAME_MASK 0x000000000000001fULL -#define FORM_CAT 0 /* manual page is preformatted */ -#define FORM_SRC 1 /* format is mdoc(7) or man(7) */ -#define FORM_NONE 4 /* format is unknown */ +enum form { + FORM_SRC = 1, /* Format is mdoc(7) or man(7). */ + FORM_CAT, /* Manual page is preformatted. */ + FORM_NONE /* Format is unknown. */ +}; enum argmode { ARG_FILE = 0, @@ -84,7 +92,7 @@ struct manpage { size_t ipath; /* number of the manpath */ uint64_t bits; /* name type mask */ int sec; /* section number, 10 means invalid */ - int form; /* 0 == catpage */ + enum form form; }; struct mansearch { @@ -98,7 +106,6 @@ struct mansearch { struct manpaths; -int mansearch_setup(int); int mansearch(const struct mansearch *cfg, /* options */ const struct manpaths *paths, /* manpaths */ int argc, /* size of argv */ Index: mansearch_const.c =================================================================== RCS file: mansearch_const.c diff -N mansearch_const.c --- mansearch_const.c 1 Dec 2014 08:05:02 -0000 1.6 +++ /dev/null 1 Jan 1970 00:00:00 -0000 @@ -1,31 +0,0 @@ -/* $OpenBSD: mansearch_const.c,v 1.6 2014/12/01 08:05:02 schwarze Exp $ */ -/* - * Copyright (c) 2014 Ingo Schwarze <schwa...@openbsd.org> - * - * Permission to use, copy, modify, and distribute this software for any - * purpose with or without fee is hereby granted, provided that the above - * copyright notice and this permission notice appear in all copies. - * - * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES - * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF - * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR - * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES - * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN - * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF - * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. - */ -#include <sys/types.h> - -#include <stdint.h> - -#include "mansearch.h" - -const int mansearch_keymax = 40; - -const char *const mansearch_keynames[40] = { - "arch", "sec", "Xr", "Ar", "Fa", "Fl", "Dv", "Fn", - "Ic", "Pa", "Cm", "Li", "Em", "Cd", "Va", "Ft", - "Tn", "Er", "Ev", "Sy", "Sh", "In", "Ss", "Ox", - "An", "Mt", "St", "Bx", "At", "Nx", "Fx", "Lk", - "Ms", "Bsx", "Dx", "Rs", "Vt", "Lb", "Nm", "Nd" -};