Repository: accumulo Updated Branches: refs/heads/1.6.0-SNAPSHOT 4fabfbaa1 -> 0297276e6
ACCUMULO-2441 outline the file prefix conventions Project: http://git-wip-us.apache.org/repos/asf/accumulo/repo Commit: http://git-wip-us.apache.org/repos/asf/accumulo/commit/0297276e Tree: http://git-wip-us.apache.org/repos/asf/accumulo/tree/0297276e Diff: http://git-wip-us.apache.org/repos/asf/accumulo/diff/0297276e Branch: refs/heads/1.6.0-SNAPSHOT Commit: 0297276e692d117cd515ec31d1ca1412570e4785 Parents: 4fabfba Author: Eric Newton <eric.new...@gmail.com> Authored: Fri Mar 7 19:05:56 2014 -0500 Committer: Eric Newton <eric.new...@gmail.com> Committed: Fri Mar 7 19:05:56 2014 -0500 ---------------------------------------------------------------------- .../chapters/troubleshooting.tex | 29 ++++++++++++++++++++ 1 file changed, 29 insertions(+) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/accumulo/blob/0297276e/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex ---------------------------------------------------------------------- diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex b/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex index 8ba7176..18d472f 100644 --- a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex +++ b/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex @@ -599,3 +599,32 @@ but the basic approach is: \item Recreate tables, users and permissions \item Import the directories under \texttt{/corrupt/tables/<id>} into the new instance \end{itemize} + +\section{File Naming Conventions} + +Q. Why are files named like they are? Why do some start with ``C'' and others with ``F''? + +A. The file names give you a basic idea for the source of the file. + +The base of the filename is a base-36 unique number. All filenames in accumulo are coordinated +with a counter in zookeeper, so they are always unique, which is useful for debugging. + +The leading letter gives you an idea of how the file was created: + +\begin{itemize} + \item F - Flush: entries in memory were written to a file (Minor Compaction) + \item M - Merging compaction: entries in memory were combined with the smallest file to create one new file + \item C - Several files, but not all files, were combined to produce this file (Major Compaction) + \item A - All files were compacted, delete entries were dropped + \item I - Bulk import, complete, sorted index files. Always in a directory starting with "b-" +\end{itemize} + +This simple file naming convention allows you to see the basic structure of the files from just +their filenames, and reason about what should be happening to them next, just +by scanning their entries in the metadata tables. + +For example, if you see multiple files with ``M'' prefixes, the tablet is, or was, up against it's +maximum file limit, so it began merging memory updates with files to keep the file count reasonable. This +slows down ingest performance, so knowing there are many files like this tells you that the system +is struggling to keep up with ingest vs the compaction strategy which reduces the number of files. +