I was asked, in the previous discussion about code formatting, to provide some research on the subject. I'm sorry I didn't get back to that fast enough, but here we are.

We have a lot of data on this: Michael Hansen at Indiana University has done a series of eye-tracking experiments regarding code formatting standards and programmer efficiency and comprehension.

He's written a blog post about it here: http://synesthesiam.com/posts/what-makes-code-hard-to-understand.html

...and the longer publication is here: http://arxiv.org/pdf/1304.5257v2.pdf - linked from http://arxiv.org/abs/1304.5257 if you want citation information or other formats.

You should read the whole thing, obviously, but a few choice observations include the fact that small inconsistencies in formatting have been unequivocally proven to cause dramatic drops in comprehension and productivity. Likewise, dead code or even just code that doesn't follow logically from the code above it.

Another observation is that experienced programmers just don't read code the same way novices do. Novices go word-by-word, line-by-line, and veterans tend to jump around scanning for keywords and code blocks; the kicker is that when experienced software engineers see a file or code block that's in an inconsistent _or just unfamiliar_ format, their brains kick them out of their block-analysis mode and they have to go back to reading line by line.

You can actually _see this happen_ with eye tracking software, is the amazing thing.

The implication, relevant to individual employees and community contributors as well as Mozilla as a fast-moving organization, is that:

- by not having a single coding standard we are making it a lot more expensive and error-prone for contributors and employees to move from one part of the project to another, and
- those costs are very real and largely hidden.

There's a long history of research - Solloway and Ehrlich as early as 1984 is one example - that says pretty much the same thing. Experienced programmers have very strong, learned expectations about what a codebase _should_ look like, and when it does not look like that, or when they have to get used to a new system, productivity and comprehension drop sharply.

Their paper's available here: http://www.ics.uci.edu/~redmiles/inf233-FQ07/oldpapers/SollowayEhrlich.pdf - but it's pretty typically hard-on-the-eyes early IEEE stuff, a scan of a printout.

Coding standards are nice, and we should have them for sure, but they should inform the design of a formatting tool, not be a style coach. We should have a standard, and it should be enforced by a machine before any of this ever hits a human - particularly a reviewer - in the eyes. And from a community-engagement perspective, this matters. Every time we type "nit" and send a patch back instead of cleaned up and ship-ready by a tool is one more chance to miss an bunch of opportunities, to have somebody get fed up and leave and let that patch rot on the vine. It's one more small, annoying barrier to participation, and it'd be great if we could have automation in place to make it go away.


- mhoye


On 1/6/2014, 12:55 PM, Martin Thomson wrote:
I think that this is a good start, but it doesn’t go quite far enough.

Part of the problem with a policy that requires people to avoid reformatting of 
stuff they don’t touch is that it propagates formatting divergence.  Sometimes 
because it’s better to conform to the style immediately adjacent the changed, 
but more so because it prevents the use of tools that reformat entire files 
(here, I’m not talking about things like names, but more whitespace 
conventions).

For whitespace at least, I think that we need to do the following:

  1. pick a style

I really, really don’t care what this is.  I’m thinking that we pick whatever 
people think that the current style is and give folks a fixed period to debate 
changes.

  2. create some tools

These tools should help people conform to the style.

Primarily, what is needed is a tool with appropriate configuration that runs on 
the command line — e.g., mach reformat …  clang-format is looking like a good 
candidate for C/C++, it just needs a configuration.  For JavaScript, I’ve used 
the python js-beautify, but it’s pretty light on configuration options, it 
might need some hacking to make it better.

Ideally though, there should be set of configuration files for common editors.  
I’m certain there are plenty out there already.  Let’s bring them all together 
(attaching the files 
https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style might be 
enough).

  3. reformat everything

Take the command line tool and run it over all the code.  I realise that this 
is the contentious part, but you don’t get the real benefits until you do this.

Once you do this, it’s safe to reformat files at any time without messing with 
parts of files that you haven’t touched.  This is important because many tools 
only reformat entire files.

As for `hg blame` and related tools, I have a workaround for that.  I’ve 
knocked together a tool that takes a file and a reformatter command line, and 
churns out a series of patches that retain blame: 
https://github.com/martinthomson/blame-bridge

The patch bitrot problem is not easy to work around, but that will depend on 
how closely the affected files already conformed to the style guide.

4. enforce compliance

This is probably a step for the future, but if there was - for example - a 
commit bot that waited for a clean build+test run, adding a format check to 
that run would allow the bot to block patches that screwed up the formatting of 
files.

—Martin


On 2014-01-05, at 18:34, Nicholas Nethercote <n.netherc...@gmail.com> wrote:

We've had some recent discussions about code style. I have a propasal

For the purpose of this proposal I will assume that there is consensus on the
following ideas.

- Having multiple code styles is bad.

- Therefore, reducing the number of code styles in our code is a win (though
  there are some caveats relating to how we get to that state, which I discuss
  below).

- The standard Mozilla style is good enough. (It's not perfect, and it should
  continue to evolve, but if you have any pet peeves please mention them in a
  different thread to this one.)

With these ideas in mind, a goal is clear: convert non-Mozilla-style code to
Mozilla-style code, within reason.

There are two notions that block this goal.

- Our rule of thumb is to follow existing style in a file. From the style
  guide:

  "The following norms should be followed for new code, and for Tower of Babel
  code that needs cleanup. For existing code, use the prevailing style in a
  file or module, or ask the owner if you are on someone else's turf and it's
  not clear what style to use."

  This implies that large-scale changes to convert existing code to standard
  style are discouraged. (I'd be interested to hear if people think this
  implication is incorrect, though in my experience it is not.)

  I propose that we officially remove this implicit discouragement, and even
  encourage changes that convert non-Mozilla-style code to Mozilla-style (with
  some exceptions; see below). When modifying badly-styled code, following
  existing style is still probably best.

  However, large-scale style fixes have the following downsides.

  - They complicate |hg blame|, but plenty of existing refactorings (e.g.
    removing old types) have done likewise, and these are bearable if they
    aren't too common. Therefore, style conversions should do entire files in
    a single patch, where possible, and such patches should not make any
    non-style changes. (However, to ease reviewing, it might be worth
    putting fixes to separate style problems in separate patches. E.g. all
    indentation fixes could be in one patch, separate from other changes.
    These would be combined before landing. See bug 956199 for an example.)

  - They can bitrot patches. This is hard to avoid.

  However, I imagine changes would happen in a piecemeal fashion, e.g. one
  module or directory at a time, or even one file at a time. (Again, see bug
  956199 for an example.) A gigantic change-all-the-code patch seems
  unrealistic.

- There is an semi-official policy that the owner of a module can dictate its
  style. Examples: SpiderMonkey, Storage, MFBT.

  There appears to be no good reason for this and I propose we remove it.
  Possibly with the exception of SpiderMonkey (and XPConnect?), due to it being
  an old and large module with its own well-established style.

  Also, we probably shouldn't change the style of imported third-party code;
  even if we aren't tracking upstream, we might still want to trade patches.
  (Indeed, it might even be worth having some kind of marking at the top of
  files to indicate this, a bit like a modeline?)

Finally, this is a proposal only to reduce the number of styles in our
codebase. There are other ideas floating around, such as using automated tools
to enforce consistency, but I consider them orthogonal to or
follow-ups/refinements of this proposal -- nothing can happen unless we agree
on a direction (fewer styles!) and a way to move in that direction (non-trivial
style changes are ok!)

Nick
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform

Reply via email to