Hi Karsten,

On 2015-09-28 12:39, Karsten Blees wrote:
> Different git variants record file times in the index with different
> precisions, according to their capabilities. E.g. git compiled with NO_NSEC
> records seconds only, JGit records the mtime in milliseconds, but leaves
> ctime blank (because ctime is unavailable in Java).
> 
> This causes performance issues in git compiled with USE_NSEC, because index
> entries with such 'incomplete' timestamps are considered dirty, triggering
> unnecessary content checks.
> 
> Add a file time comparison function that auto-detects the precision based
> on the number of trailing 0 digits, and compares with the lower precision
> of both values. This initial version supports the known precisions seconds
> (git + NO_NSEC), milliseconds (JGit) and nanoseconds (git + USE_NSEC), but
> can be easily extended to e.g. microseconds.
> 
> Use the new comparison function in both dirty and racy checks. As a side
> effect, this fixes racy detection in USE_NSEC-enabled git with
> core.checkStat=minimal, as the coreStat setting now affects racy checks as
> well.
> 
> Finally, do not check ctime if ctime.sec is 0 (as recorded by JGit).

Great analysis, and nice patch. I would like to offer one suggestion in 
addition:

> diff --git a/read-cache.c b/read-cache.c
> index 87204a5..3a4e6cd 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -99,23 +99,50 @@ void fill_stat_data(struct stat_data *sd, struct stat *st)
>       sd->sd_size = st->st_size;
>  }
>  
> +/*
> + * Compares two file times. Returns 0 if equal, <0 if t1 < t2, >0 if t1 > t2.
> + * Auto-detects precision based on trailing 0 digits. Compares seconds only 
> if
> + * core.checkStat=minimal.
> + */
> +static inline int cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
> +                            uint32_t t2_sec, uint32_t t2_nsec) {
> +#ifdef USE_NSEC
> +     /*
> +      * Compare seconds and return result if different, or checkStat=mimimal,
> +      * or one of the time stamps has second precision only (nsec == 0).
> +      */
> +     int diff = t1_sec - t2_sec;
> +     if (diff || !check_stat || !t1_nsec || !t2_nsec)
> +             return diff;
> +
> +     /*
> +      * Check if one of the time stamps has millisecond precision only (i.e.
> +      * the trailing 6 digits are 0). First check the trailing 6 bits so that
> +      * we only do (slower) modulo division if necessary.
> +      */
> +     if ((!(t1_nsec & 0x3f) && !(t1_nsec % 1000000)) ||
> +         (!(t2_nsec & 0x3f) && !(t2_nsec % 1000000)))
> +             /* Compare milliseconds. */
> +             return (t1_nsec - t2_nsec) / 1000000;
> +
> +     /* Compare nanoseconds */
> +     return t1_nsec - t2_nsec;
> +#else
> +     return t1_sec - t2_sec;
> +#endif
> +}

As this affects only setups where the same repository is accessed via clients 
with different precision, would it make sense to hide this behind a config 
option? I.e. something like

static int cmp_filetime_precise(uint32_t t1_sec, uint32_t t1_nsec,
                                uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
        return t1_sec != t2_sec ? t1_sec - t2_sec : t1_nsec - t2_nsec;
#else
        return t1_sec - t2_sec;
#endif
}

static int cmp_filetime_mixed(uint32_t t1_sec, uint32_t t1_nsec,
                              uint32_t t2_sec, uint32_t t2_nsec)
{
#ifdef USE_NSEC
        ... detect lower precision and compare with lower precision only...
#else
        return t1_sec - t2_sec;
#endif
}

static (int *)cmp_filetime(uint32_t t1_sec, uint32_t t1_nsec,
                           uint32_t t2_sec, uint32_t t2_nsec)
        = cmp_filetime_precise;

... modify cmp_filetime_precise if core.mixedTimeSpec = true...

Otherwise there would be that little loop-hole where (nsec % 1000) == 0 *by 
chance* and we assume the timestamps to be identical even if they are not.

Ciao,
Dscho
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Reply via email to