> That looks promising. Would you like to do some more empirical tests for > tuning? Make the allocation page-cluster size a global variable, and then > use gdb to adjust its value in between test runs. If you'd like to do > that, I'd be happy to put that form of your changes in right away while you > (and/or others) experiment to find the optimal tuning.
I gave up trying to do 200 40 minute compiles. Instead, I used the following two pieces of code to test the buffer allocation algorithm: Here, we do raw allocation, i.e. we always call mmap for each page that we need: /---------- | #include <sys/mman.h> | #include <stdio.h> | #include <assert.h> | #include <error.h> | #include <cthreads.h> | | /* Returns a single page page-aligned buffer. */ | static void * | get_page_buf () | { | void *buf = mmap (0, vm_page_size, | PROT_READ|PROT_WRITE, MAP_ANON, 0, 0); | if (buf == MAP_FAILED) | buf = 0; | | return buf; | } | | /* Frees (or recycles) a block returned by get_page_buf. */ | static inline void | free_page_buf (void *buf) | { | munmap (buf, vm_page_size); | } | | | int | main (int argc, char *argv[]) | { | int i; | int ta; | #define ALLOCS 100 | void *allocs[ALLOCS]; | | | if (argc < 2) | error (1, 0, | "Usage: %s total-allocations", | argv[0]); | | ta = atoi (argv[1]); | | printf ("Total allocations: %d\n", | ta); | | for (; ta > 0; ta -= ALLOCS + ALLOCS / 2) | { | for (i = 0; i < ALLOCS; i ++) | { | void *p = allocs[i] = get_page_buf (); | memset (p, 1, vm_page_size); | } | | for (i = 0; i < ALLOCS / 2; i ++) | free_page_buf (allocs[i]); | | for (i = 0; i < ALLOCS / 2; i ++) | { | void *p = allocs[i] = get_page_buf (); | memset (p, 1, vm_page_size); | } | | for (i = 0; i < ALLOCS; i ++) | free_page_buf (allocs[i]); | } | | return 0; | } \---------- Program 1: Using MMAP Directly Here we do allocation for a pool of buffers. The number of pages to allocate each time is a tunable parameter (from the command line): /---------- | #include <sys/mman.h> | #include <stdio.h> | #include <cthreads.h> | #include <assert.h> | #include <error.h> | | int FREE_PAGE_BUFS; | | /* Returns a single page page-aligned buffer. */ | static void * | get_page_buf () | { | static struct mutex free_page_bufs_lock = MUTEX_INITIALIZER; | static void *free_page_bufs; | static int num_free_page_bufs; | void *buf; | | mutex_lock (&free_page_bufs_lock); | if (num_free_page_bufs > 0) | { | buf = free_page_bufs; | num_free_page_bufs --; | if (num_free_page_bufs > 0) | free_page_bufs += vm_page_size; | #ifndef NDEBUG | else | free_page_bufs = 0; | #endif /* ! NDEBUG */ | } | else | { | assert (free_page_bufs == 0); | buf = mmap (0, vm_page_size * FREE_PAGE_BUFS, | PROT_READ|PROT_WRITE, MAP_ANON, 0, 0); | if (buf == MAP_FAILED) | buf = 0; | else | { | free_page_bufs = buf + vm_page_size; | num_free_page_bufs = FREE_PAGE_BUFS - 1; | #ifndef NDEBUG | if (num_free_page_bufs == 0) | free_page_bufs = 0; | #endif | } | } | | mutex_unlock (&free_page_bufs_lock); | return buf; | } | | /* Frees (or recycles) a block returned by get_page_buf. */ | static inline void | free_page_buf (void *buf) | { | munmap (buf, vm_page_size); | } | | | int | main (int argc, char *argv[]) | { | int i; | int ta; | #define ALLOCS 100 | void *allocs[ALLOCS]; | | | if (argc < 3) | error (1, 0, | "Usage: %s total-allocations get-free-buf-preallocation-size\n", | argv[0]); | | ta = atoi (argv[1]); | FREE_PAGE_BUFS = atoi (argv[2]); | | printf ("Total allocations: %d\nPreallocation size: %d\n", | ta, FREE_PAGE_BUFS); | | for (; ta > 0; ta -= ALLOCS + ALLOCS / 2) | { | for (i = 0; i < ALLOCS; i ++) | { | void *p = allocs[i] = get_page_buf (); | memset (p, 1, vm_page_size); | } | | for (i = 0; i < ALLOCS / 2; i ++) | free_page_buf (allocs[i]); | | for (i = 0; i < ALLOCS / 2; i ++) | { | void *p = allocs[i] = get_page_buf (); | memset (p, 1, vm_page_size); | } | | for (i = 0; i < ALLOCS; i ++) | free_page_buf (allocs[i]); | } | | return 0; | } \---------- Program 2: Using Preallocation I ran the tests on a Xeon 450 Mhz work station with 768MB of ram. Each of the programs were compiled with gcc 2.95.3 using the `-O2' optimization flag. The tests were timed to see how long it took to do between 65k and 1 million allocations while varying the size of the preallocation pool from nothing to a size of 65k pages. The value that I choose for the total number of allocations was based on previous data that I had gathered: during a compile of the Hurd, get_page_buf is called approximately 65k times. The test marked as `unbuffered' used the former program, while the rest of the tests used the latter. Note that preallocating a single buffer is (effectively) the equivalent of doing no preallocation at all. I ran each combination several times and, for certain buffer preallocation sizes, found the standard deviation of at least ten runs. Here is function that I used to compute the standard deviation: /---------- | (defun std (d) | (let* ((average (/ (apply '+ d) (length d))) | (l (length d))) | (sqrt | (/ | (apply '+ | (mapcar | (lambda (e) | (* e e)) | (mapcar (lambda (e) | (- average e)) | d))) | (- (length d) 1))))) \---------- Function 1: Standard Deviation Here are the results: +------------+--------------------------------------------------------+ | Buffer pre-| Total Number of Allocations | | allocation +----------+----------+----------+-----------+-----------+ | count | 65536 | 131072 | 262144 | 524288 | 1048576 | +------------+----------+----------+----------+-----------+-----------+ | unbuffered | 0m2.360s | 0m4.690s | 0m9.410s | 0m18.590s | 0m37.200s | | 1 | 0m2.350s | 0m4.670s | 0m9.290s | 0m18.540s | 0m36.890s | | 2 | 0m2.160s | 0m4.340s | 0m8.700s | 0m17.310s | 0m34.670s | | 4 | 0m2.100s | 0m4.150s | 0m8.260s | 0m16.440s | 0m32.720s | | 6 | 0m2.010s | 0m4.000s | 0m7.920s | 0m15.840s | 0m31.900s | | 7 | 0m2.000s | 0m3.930s | 0m7.890s | 0m15.640s | 0m31.400s | | 8 | 0m1.990s | 0m3.920s | 0m7.810s | 0m15.690s | 0m31.260s | | 9 | 0m2.010s | 0m3.990s | 0m8.000s | 0m15.940s | 0m31.570s | | 10 | 0m1.970s | 0m3.960s | 0m7.890s | 0m15.750s | 0m31.380s | | 11 | 0m2.000s | 0m3.950s | 0m7.920s | 0m15.720s | 0m31.220s | | 12 | 0m1.980s | 0m3.860s | 0m7.740s | 0m15.370s | 0m30.990s | | 13 | 0m2.070s | 0m4.070s | 0m8.130s | 0m16.190s | 0m32.390s | | 14 | 0m1.950s | 0m3.950s | 0m7.820s | 0m15.530s | 0m31.040s | | 15 | 0m1.960s | 0m3.930s | 0m7.820s | 0m15.500s | 0m31.040s | | 16 | 0m1.980s | 0m3.960s | 0m7.910s | 0m15.670s | 0m31.380s | | 17 | 0m2.100s | 0m4.190s | 0m8.340s | 0m16.650s | 0m33.030s | | 18 | 0m2.100s | 0m4.210s | 0m8.350s | 0m16.690s | 0m33.240s | | 19 | 0m1.980s | 0m3.950s | 0m7.850s | 0m15.690s | 0m31.280s | | 20 | 0m1.970s | 0m3.930s | 0m7.840s | 0m15.620s | 0m31.260s | | 21 | 0m2.020s | 0m4.020s | 0m7.980s | 0m15.950s | 0m31.780s | | 22 | 0m1.970s | 0m3.970s | 0m7.860s | 0m15.730s | 0m31.320s | | 23 | 0m2.020s | 0m4.030s | 0m8.020s | 0m15.970s | 0m32.000s | | 24 | 0m1.940s | 0m3.820s | 0m7.660s | 0m15.260s | 0m30.500s | | 26 | 0m2.090s | 0m4.140s | 0m8.260s | 0m16.500s | 0m32.810s | | 28 | 0m2.010s | 0m4.030s | 0m8.000s | 0m15.920s | 0m31.870s | | 30 | 0m1.940s | 0m3.840s | 0m7.650s | 0m15.190s | 0m30.220s | | 32 | 0m2.000s | 0m3.990s | 0m7.980s | 0m15.820s | 0m31.670s | | 64 | 0m2.050s | 0m4.010s | 0m8.050s | 0m16.010s | 0m32.130s | | 128 | 0m2.090s | 0m4.180s | 0m8.340s | 0m16.550s | 0m33.250s | | 256 | 0m2.150s | 0m4.310s | 0m8.600s | 0m17.060s | 0m34.210s | | 512 | 0m2.210s | 0m4.440s | 0m8.810s | 0m17.560s | 0m35.110s | | 1024 | 0m2.250s | 0m4.470s | 0m8.850s | 0m17.710s | 0m35.320s | | 2048 | 0m2.320s | 0m4.560s | 0m9.140s | 0m18.280s | 0m36.370s | | 4096 | 0m2.350s | 0m4.680s | 0m9.280s | 0m18.560s | 0m37.050s | | 8192 | 0m2.350s | 0m4.710s | 0m9.410s | 0m18.740s | 0m37.380s | | 16384 | 0m2.340s | 0m4.710s | 0m9.360s | 0m18.670s | 0m37.270s | | 32768 | 0m2.380s | 0m4.730s | 0m9.410s | 0m18.800s | 0m37.530s | | 65536 | 0m2.370s | 0m4.890s | 0m9.410s | 0m18.810s | 0m37.390s | +------------+----------+----------+----------+-----------+-----------+ | Standard Deviation | +------------+----------+----------+----------+-----------+-----------+ | | 65536 | 131072 | 262144 | 524288 | 1048576 | +------------+----------+----------+----------+-----------+-----------+ | unbuffered | 0m0.0250s| 0m0.0269s| 0m0.0590s| 0m0.1132s | 0m0.2480s | | 1 | 0m0.0158s| 0m0.0181s| 0m0.0239s| 0m0.0473s | 0m0.1502s | | 2 | 0m0.0126s| 0m0.0135s| 0m0.0643s| 0m0.1164s | 0m0.1887s | | 17 | 0m0.0055s| 0m0.0118s| 0m0.0094s| 0m0.0230s | 0m0.1041s | | 18 | 0m0.0054s| 0m0.0151s| 0m0.0248s| 0m0.0419s | 0m0.0791s | | 19 | 0m0.0126s| 0m0.0122s| 0m0.0118s| 0m0.0300s | 0m0.0624s | | 256 | 0m0.0164s| 0m0.0134s| 0m0.0340s| 0m0.0471s | 0m0.1526s | | 1024 | 0m0.0258s| 0m0.0446s| 0m0.1221s| 0m0.1879s | 0m0.4676s | +------------+----------+----------+----------+-----------+-----------+ Table 1: The Results of the Tests As we can see, the results are relatively consistent. Especially seeing that the standard error for a million page allocations is very small: +------------+-----------+ | Pool size | 1M allocs.| +------------+-----------+ | unbuffered | 0m0.0784s | | 1 | 0m0.0475s | | 2 | 0m0.0597s | | 17 | 0m0.0329s | | 18 | 0m0.0250s | | 19 | 0m0.0197s | | 256 | 0m0.0483s | | 1024 | 0m0.1479s | +------------+-----------+ Table 2: Standard error In fact, even in the worse case, the standard error is not greater than 0.5% of the total run time. Returning to table one, we observe that using mmap is equal (within error) to using an empty pool. This is important as this demonstrates the added overhead of the get_page_buf function in minimal. We also see that the optimal pool sizes are between 14 and 16 pages (inclusive), 19 pages, 20 pages, 22 pages and 24 pages. Other pool sizes near these, for instance a pool size of 17 or 18 pages suggests that either vm_map or mmap is taking a nonoptimal path. However, this is only speculation and has not yet been investigated. It is interesting to note that the performance starts to steadily degrade once the pool size is greater than approximately 64 pages. Also, once the pool size exceeds about 4000 pages, the performance is in fact worse then just using mmap by itself (i.e. unbuffered allocation). I am not entirely sure why this is, however, I am interested in theories. _______________________________________________ Bug-hurd mailing list [EMAIL PROTECTED] http://mail.gnu.org/mailman/listinfo/bug-hurd