I was asked, in the previous discussion about code formatting, to
provide some research on the subject. I'm sorry I didn't get back to
that fast enough, but here we are.
We have a lot of data on this: Michael Hansen at Indiana University has
done a series of eye-tracking experiments regarding code formatting
standards and programmer efficiency and comprehension.
He's written a blog post about it here:
http://synesthesiam.com/posts/what-makes-code-hard-to-understand.html
...and the longer publication is here:
http://arxiv.org/pdf/1304.5257v2.pdf - linked from
http://arxiv.org/abs/1304.5257 if you want citation information or other
formats.
You should read the whole thing, obviously, but a few choice
observations include the fact that small inconsistencies in formatting
have been unequivocally proven to cause dramatic drops in comprehension
and productivity. Likewise, dead code or even just code that doesn't
follow logically from the code above it.
Another observation is that experienced programmers just don't read code
the same way novices do. Novices go word-by-word, line-by-line, and
veterans tend to jump around scanning for keywords and code blocks; the
kicker is that when experienced software engineers see a file or code
block that's in an inconsistent _or just unfamiliar_ format, their
brains kick them out of their block-analysis mode and they have to go
back to reading line by line.
You can actually _see this happen_ with eye tracking software, is the
amazing thing.
The implication, relevant to individual employees and community
contributors as well as Mozilla as a fast-moving organization, is that:
- by not having a single coding standard we are making it a lot more
expensive and error-prone for contributors and employees to move from
one part of the project to another, and
- those costs are very real and largely hidden.
There's a long history of research - Solloway and Ehrlich as early as
1984 is one example - that says pretty much the same thing. Experienced
programmers have very strong, learned expectations about what a codebase
_should_ look like, and when it does not look like that, or when they
have to get used to a new system, productivity and comprehension drop
sharply.
Their paper's available here:
http://www.ics.uci.edu/~redmiles/inf233-FQ07/oldpapers/SollowayEhrlich.pdf
- but it's pretty typically hard-on-the-eyes early IEEE stuff, a scan of
a printout.
Coding standards are nice, and we should have them for sure, but they
should inform the design of a formatting tool, not be a style coach. We
should have a standard, and it should be enforced by a machine before
any of this ever hits a human - particularly a reviewer - in the eyes.
And from a community-engagement perspective, this matters. Every time we
type "nit" and send a patch back instead of cleaned up and ship-ready by
a tool is one more chance to miss an bunch of opportunities, to have
somebody get fed up and leave and let that patch rot on the vine. It's
one more small, annoying barrier to participation, and it'd be great if
we could have automation in place to make it go away.
- mhoye
On 1/6/2014, 12:55 PM, Martin Thomson wrote:
I think that this is a good start, but it doesn’t go quite far enough.
Part of the problem with a policy that requires people to avoid reformatting of
stuff they don’t touch is that it propagates formatting divergence. Sometimes
because it’s better to conform to the style immediately adjacent the changed,
but more so because it prevents the use of tools that reformat entire files
(here, I’m not talking about things like names, but more whitespace
conventions).
For whitespace at least, I think that we need to do the following:
1. pick a style
I really, really don’t care what this is. I’m thinking that we pick whatever
people think that the current style is and give folks a fixed period to debate
changes.
2. create some tools
These tools should help people conform to the style.
Primarily, what is needed is a tool with appropriate configuration that runs on
the command line — e.g., mach reformat … clang-format is looking like a good
candidate for C/C++, it just needs a configuration. For JavaScript, I’ve used
the python js-beautify, but it’s pretty light on configuration options, it
might need some hacking to make it better.
Ideally though, there should be set of configuration files for common editors.
I’m certain there are plenty out there already. Let’s bring them all together
(attaching the files
https://developer.mozilla.org/en-US/docs/Developer_Guide/Coding_Style might be
enough).
3. reformat everything
Take the command line tool and run it over all the code. I realise that this
is the contentious part, but you don’t get the real benefits until you do this.
Once you do this, it’s safe to reformat files at any time without messing with
parts of files that you haven’t touched. This is important because many tools
only reformat entire files.
As for `hg blame` and related tools, I have a workaround for that. I’ve
knocked together a tool that takes a file and a reformatter command line, and
churns out a series of patches that retain blame:
https://github.com/martinthomson/blame-bridge
The patch bitrot problem is not easy to work around, but that will depend on
how closely the affected files already conformed to the style guide.
4. enforce compliance
This is probably a step for the future, but if there was - for example - a
commit bot that waited for a clean build+test run, adding a format check to
that run would allow the bot to block patches that screwed up the formatting of
files.
—Martin
On 2014-01-05, at 18:34, Nicholas Nethercote <n.netherc...@gmail.com> wrote:
We've had some recent discussions about code style. I have a propasal
For the purpose of this proposal I will assume that there is consensus on the
following ideas.
- Having multiple code styles is bad.
- Therefore, reducing the number of code styles in our code is a win (though
there are some caveats relating to how we get to that state, which I discuss
below).
- The standard Mozilla style is good enough. (It's not perfect, and it should
continue to evolve, but if you have any pet peeves please mention them in a
different thread to this one.)
With these ideas in mind, a goal is clear: convert non-Mozilla-style code to
Mozilla-style code, within reason.
There are two notions that block this goal.
- Our rule of thumb is to follow existing style in a file. From the style
guide:
"The following norms should be followed for new code, and for Tower of Babel
code that needs cleanup. For existing code, use the prevailing style in a
file or module, or ask the owner if you are on someone else's turf and it's
not clear what style to use."
This implies that large-scale changes to convert existing code to standard
style are discouraged. (I'd be interested to hear if people think this
implication is incorrect, though in my experience it is not.)
I propose that we officially remove this implicit discouragement, and even
encourage changes that convert non-Mozilla-style code to Mozilla-style (with
some exceptions; see below). When modifying badly-styled code, following
existing style is still probably best.
However, large-scale style fixes have the following downsides.
- They complicate |hg blame|, but plenty of existing refactorings (e.g.
removing old types) have done likewise, and these are bearable if they
aren't too common. Therefore, style conversions should do entire files in
a single patch, where possible, and such patches should not make any
non-style changes. (However, to ease reviewing, it might be worth
putting fixes to separate style problems in separate patches. E.g. all
indentation fixes could be in one patch, separate from other changes.
These would be combined before landing. See bug 956199 for an example.)
- They can bitrot patches. This is hard to avoid.
However, I imagine changes would happen in a piecemeal fashion, e.g. one
module or directory at a time, or even one file at a time. (Again, see bug
956199 for an example.) A gigantic change-all-the-code patch seems
unrealistic.
- There is an semi-official policy that the owner of a module can dictate its
style. Examples: SpiderMonkey, Storage, MFBT.
There appears to be no good reason for this and I propose we remove it.
Possibly with the exception of SpiderMonkey (and XPConnect?), due to it being
an old and large module with its own well-established style.
Also, we probably shouldn't change the style of imported third-party code;
even if we aren't tracking upstream, we might still want to trade patches.
(Indeed, it might even be worth having some kind of marking at the top of
files to indicate this, a bit like a modeline?)
Finally, this is a proposal only to reduce the number of styles in our
codebase. There are other ideas floating around, such as using automated tools
to enforce consistency, but I consider them orthogonal to or
follow-ups/refinements of this proposal -- nothing can happen unless we agree
on a direction (fewer styles!) and a way to move in that direction (non-trivial
style changes are ok!)
Nick
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform
_______________________________________________
dev-platform mailing list
dev-platform@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-platform