http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex ---------------------------------------------------------------------- diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex b/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex deleted file mode 100644 index ff1cebd..0000000 --- a/docs/src/main/latex/accumulo_user_manual/chapters/table_design.tex +++ /dev/null @@ -1,343 +0,0 @@ - -% Licensed to the Apache Software Foundation (ASF) under one or more -% contributor license agreements. See the NOTICE file distributed with -% this work for additional information regarding copyright ownership. -% The ASF licenses this file to You under the Apache License, Version 2.0 -% (the "License"); you may not use this file except in compliance with -% the License. You may obtain a copy of the License at -% -% http://www.apache.org/licenses/LICENSE-2.0 -% -% Unless required by applicable law or agreed to in writing, software -% distributed under the License is distributed on an "AS IS" BASIS, -% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -% See the License for the specific language governing permissions and -% limitations under the License. - -\chapter{Table Design} - -\section{Basic Table} - -Since Accumulo tables are sorted by row ID, each table can be thought of as being -indexed by the row ID. Lookups performed by row ID can be executed quickly, by doing -a binary search, first across the tablets, and then within a tablet. Clients should -choose a row ID carefully in order to support their desired application. A simple rule -is to select a unique identifier as the row ID for each entity to be stored and assign -all the other attributes to be tracked to be columns under this row ID. For example, -if we have the following data in a comma-separated file: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - userid,age,address,account-balance -\end{verbatim}\endgroup - -We might choose to store this data using the userid as the rowID, the column -name in the column family, and a blank column qualifier: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -Mutation m = new Mutation(userid); -final String column_qualifier = ""; -m.put("age", column_qualifier, age); -m.put("address", column_qualifier, address); -m.put("balance", column_qualifier, account_balance); - -writer.add(m); -\end{verbatim}\endgroup - -We could then retrieve any of the columns for a specific userid by specifying the -userid as the range of a scanner and fetching specific columns: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -Range r = new Range(userid, userid); // single row -Scanner s = conn.createScanner("userdata", auths); -s.setRange(r); -s.fetchColumnFamily(new Text("age")); - -for(Entry<Key,Value> entry : s) - System.out.println(entry.getValue().toString()); -\end{verbatim}\endgroup - -\section{RowID Design} - -Often it is necessary to transform the rowID in order to have rows ordered in a way -that is optimal for anticipated access patterns. A good example of this is reversing -the order of components of internet domain names in order to group rows of the -same parent domain together: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -com.google.code -com.google.labs -com.google.mail -com.yahoo.mail -com.yahoo.research -\end{verbatim}\endgroup - -Some data may result in the creation of very large rows - rows with many columns. -In this case the table designer may wish to split up these rows for better load -balancing while keeping them sorted together for scanning purposes. This can be -done by appending a random substring at the end of the row: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -com.google.code_00 -com.google.code_01 -com.google.code_02 -com.google.labs_00 -com.google.mail_00 -com.google.mail_01 -\end{verbatim}\endgroup - -It could also be done by adding a string representation of some period of time such as date to the week -or month: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -com.google.code_201003 -com.google.code_201004 -com.google.code_201005 -com.google.labs_201003 -com.google.mail_201003 -com.google.mail_201004 -\end{verbatim}\endgroup - -Appending dates provides the additional capability of restricting a scan to a given -date range. - -\section{Lexicoders} -Since Keys in Accumulo are sorted lexicographically by default, it's often useful to encode -common data types into a byte format in which their sort order corresponds to the sort order -in their native form. An example of this is encoding dates and numerical data so that they can -be better seeked or searched in ranges. - -The lexicoders are a standard and extensible way of encoding Java types. Here's an example -of a lexicoder that encodes a java Date object so that it sorts lexicographically: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -// create new date lexicoder -DateLexicoder dateEncoder = new DateLexicoder(); - -// truncate time to hours -long epoch = System.currentTimeMillis(); -Date hour = new Date(epoch - (epoch % 3600000)); - -// encode the rowId so that it is sorted lexicographically -Mutation mutation = new Mutation(dateEncoder.encode(hour)); -mutation.put(new Text("colf"), new Text("colq"), new Value(new byte[]{})); -\end{verbatim}\endgroup - -If we want to return the most recent date first, we can reverse the sort order -with the reverse lexicoder: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -// create new date lexicoder and reverse lexicoder -DateLexicoder dateEncoder = new DateLexicoder(); -ReverseLexicoder reverseEncoder = new ReverseLexicoder(dateEncoder); - -// truncate date to hours -long epoch = System.currentTimeMillis(); -Date hour = new Date(epoch - (epoch % 3600000)); - -// encode the rowId so that it sorts in reverse lexicographic order -Mutation mutation = new Mutation(reverseEncoder.encode(hour)); -mutation.put(new Text("colf"), new Text("colq"), new Value(new byte[]{})); -\end{verbatim}\endgroup - - -\section{Indexing} -In order to support lookups via more than one attribute of an entity, additional -indexes can be built. However, because Accumulo tables can support any number of -columns without specifying them beforehand, a single additional index will often -suffice for supporting lookups of records in the main table. Here, the index has, as -the rowID, the Value or Term from the main table, the column families are the same, -and the column qualifier of the index table contains the rowID from the main table. - -\begin{center} -$\begin{array}{|c|c|c|c|c|c|} \hline -\multicolumn{5}{|c|}{\mbox{Key}} & \multirow{3}{*}{\mbox{Value}}\\ \cline{1-5} -\multirow{2}{*}{\mbox{Row ID}}& \multicolumn{3}{|c|}{\mbox{Column}} & \multirow{2}{*}{\mbox{Timestamp}} & \\ \cline{2-4} -& \mbox{Family} & \mbox{Qualifier} & \mbox{Visibility} & & \\ \hline \hline -\mbox{Term} & \mbox{Field Name} & \mbox{MainRowID} & & &\\ \hline -\end{array}$ -\end{center} - -Note: We store rowIDs in the column qualifier rather than the Value so that we can -have more than one rowID associated with a particular term within the index. If we -stored this in the Value we would only see one of the rows in which the value -appears since Accumulo is configured by default to return the one most recent -value associated with a key. - -Lookups can then be done by scanning the Index Table first for occurrences of the -desired values in the columns specified, which returns a list of row ID from the main -table. These can then be used to retrieve each matching record, in their entirety, or a -subset of their columns, from the Main Table. - -To support efficient lookups of multiple rowIDs from the same table, the Accumulo -client library provides a BatchScanner. Users specify a set of Ranges to the -BatchScanner, which performs the lookups in multiple threads to multiple servers -and returns an Iterator over all the rows retrieved. The rows returned are NOT in -sorted order, as is the case with the basic Scanner interface. - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -// first we scan the index for IDs of rows matching our query - -Text term = new Text("mySearchTerm"); - -HashSet<Range> matchingRows = new HashSet<Range>(); - -Scanner indexScanner = createScanner("index", auths); -indexScanner.setRange(new Range(term, term)); - -// we retrieve the matching rowIDs and create a set of ranges -for(Entry<Key,Value> entry : indexScanner) - matchingRows.add(new Range(entry.getKey().getColumnQualifier())); - -// now we pass the set of rowIDs to the batch scanner to retrieve them -BatchScanner bscan = conn.createBatchScanner("table", auths, 10); - -bscan.setRanges(matchingRows); -bscan.fetchColumnFamily(new Text("attributes")); - -for(Entry<Key,Value> entry : bscan) - System.out.println(entry.getValue()); -\end{verbatim}\endgroup - -One advantage of the dynamic schema capabilities of Accumulo is that different -fields may be indexed into the same physical table. However, it may be necessary to -create different index tables if the terms must be formatted differently in order to -maintain proper sort order. For example, real numbers must be formatted -differently than their usual notation in order to be sorted correctly. In these cases, -usually one index per unique data type will suffice. - -\section{Entity-Attribute and Graph Tables} - -Accumulo is ideal for storing entities and their attributes, especially of the -attributes are sparse. It is often useful to join several datasets together on common -entities within the same table. This can allow for the representation of graphs, -including nodes, their attributes, and connections to other nodes. - -Rather than storing individual events, Entity-Attribute or Graph tables store -aggregate information about the entities involved in the events and the -relationships between entities. This is often preferrable when single events aren't -very useful and when a continuously updated summarization is desired. - -The physical schema for an entity-attribute or graph table is as follows: - -\begin{center} -$\begin{array}{|c|c|c|c|c|c|} \hline -\multicolumn{5}{|c|}{\mbox{Key}} & \multirow{3}{*}{\mbox{Value}}\\ \cline{1-5} -\multirow{2}{*}{\mbox{Row ID}}& \multicolumn{3}{|c|}{\mbox{Column}} & \multirow{2}{*}{\mbox{Timestamp}} & \\ \cline{2-4} -& \mbox{Family} & \mbox{Qualifier} & \mbox{Visibility} & & \\ \hline \hline -\mbox{EntityID} & \mbox{Attribute Name} & \mbox{Attribute Value} & & & \mbox{Weight} \\ \hline -\mbox{EntityID} & \mbox{Edge Type} & \mbox{Related EntityID} & & & \mbox{Weight} \\ \hline -\end{array}$ -\end{center} - -For example, to keep track of employees, managers and products the following -entity-attribute table could be used. Note that the weights are not always necessary -and are set to 0 when not used. - -$\begin{array}{llll} -\bf{RowID} & \bf{Column Family} & \bf{Column Qualifier} & \bf{Value} \\ -\\ -E001 & name & bob & 0 \\ -E001 & department & sales & 0 \\ -E001 & hire\_date & 20030102 & 0 \\ -E001 & units\_sold & P001 & 780 \\ -\\ -E002 & name & george & 0 \\ -E002 & department & sales & 0 \\ -E002 & manager\_of & E001 & 0 \\ -E002 & manager\_of & E003 & 0 \\ -\\ -E003 & name & harry & 0 \\ -E003 & department & accounts\_recv & 0 \\ -E003 & hire\_date & 20000405 & 0 \\ -E003 & units\_sold & P002 & 566 \\ -E003 & units\_sold & P001 & 232 \\ -\\ -P001 & product\_name & nike\_airs & 0 \\ -P001 & product\_type & shoe & 0 \\ -P001 & in\_stock & germany & 900 \\ -P001 & in\_stock & brazil & 200 \\ -\\ -P002 & product\_name & basic\_jacket & 0 \\ -P002 & product\_type & clothing & 0 \\ -P002 & in\_stock & usa & 3454 \\ -P002 & in\_stock & germany & 700 \\ -\end{array}$ -\vspace{5mm} - -To allow efficient updating of edge weights, an aggregating iterator can be -configured to add the value of all mutations applied with the same key. These types -of tables can easily be created from raw events by simply extracting the entities, -attributes, and relationships from individual events and inserting the keys into -Accumulo each with a count of 1. The aggregating iterator will take care of -maintaining the edge weights. - -\section{Document-Partitioned Indexing} - -Using a simple index as described above works well when looking for records that -match one of a set of given criteria. When looking for records that match more than -one criterion simultaneously, such as when looking for documents that contain all of -the words `the' and `white' and `house', there are several issues. - -First is that the set of all records matching any one of the search terms must be sent -to the client, which incurs a lot of network traffic. The second problem is that the -client is responsible for performing set intersection on the sets of records returned -to eliminate all but the records matching all search terms. The memory of the client -may easily be overwhelmed during this operation. - -For these reasons Accumulo includes support for a scheme known as sharded -indexing, in which these set operations can be performed at the TabletServers and -decisions about which records to include in the result set can be made without -incurring network traffic. - -This is accomplished via partitioning records into bins that each reside on at most -one TabletServer, and then creating an index of terms per record within each bin as -follows: - -\begin{center} -$\begin{array}{|c|c|c|c|c|c|} \hline -\multicolumn{5}{|c|}{\mbox{Key}} & \multirow{3}{*}{\mbox{Value}}\\ \cline{1-5} -\multirow{2}{*}{\mbox{Row ID}}& \multicolumn{3}{|c|}{\mbox{Column}} & \multirow{2}{*}{\mbox{Timestamp}} & \\ \cline{2-4} -& \mbox{Family} & \mbox{Qualifier} & \mbox{Visibility} & & \\ \hline \hline -\mbox{BinID} & \mbox{Term} & \mbox{DocID} & & & \mbox{Weight} \\ \hline -\end{array}$ -\end{center} - -Documents or records are mapped into bins by a user-defined ingest application. By -storing the BinID as the RowID we ensure that all the information for a particular -bin is contained in a single tablet and hosted on a single TabletServer since -Accumulo never splits rows across tablets. Storing the Terms as column families -serves to enable fast lookups of all the documents within this bin that contain the -given term. - -Finally, we perform set intersection operations on the TabletServer via a special -iterator called the Intersecting Iterator. Since documents are partitioned into many -bins, a search of all documents must search every bin. We can use the BatchScanner -to scan all bins in parallel. The Intersecting Iterator should be enabled on a -BatchScanner within user query code as follows: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -Text[] terms = {new Text("the"), new Text("white"), new Text("house")}; - -BatchScanner bs = conn.createBatchScanner(table, auths, 20); -IteratorSetting iter = new IteratorSetting(20, "ii", IntersectingIterator.class); -IntersectingIterator.setColumnFamilies(iter, terms); -bs.addScanIterator(iter); -bs.setRanges(Collections.singleton(new Range())); - -for(Entry<Key,Value> entry : bs) { - System.out.println(" " + entry.getKey().getColumnQualifier()); -} -\end{verbatim}\endgroup - -This code effectively has the BatchScanner scan all tablets of a table, looking for -documents that match all the given terms. Because all tablets are being scanned for -every query, each query is more expensive than other Accumulo scans, which -typically involve a small number of TabletServers. This reduces the number of -concurrent queries supported and is subject to what is known as the `straggler' -problem in which every query runs as slow as the slowest server participating. - -Of course, fast servers will return their results to the client which can display them -to the user immediately while they wait for the rest of the results to arrive. If the -results are unordered this is quite effective as the first results to arrive are as good -as any others to the user. -
http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex ---------------------------------------------------------------------- diff --git a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex b/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex deleted file mode 100644 index 203fe0c..0000000 --- a/docs/src/main/latex/accumulo_user_manual/chapters/troubleshooting.tex +++ /dev/null @@ -1,794 +0,0 @@ - -% Licensed to the Apache Software Foundation (ASF) under one or more -% contributor license agreements. See the NOTICE file distributed with -% this work for additional information regarding copyright ownership. -% The ASF licenses this file to You under the Apache License, Version 2.0 -% (the "License"); you may not use this file except in compliance with -% the License. You may obtain a copy of the License at -% -% http://www.apache.org/licenses/LICENSE-2.0 -% -% Unless required by applicable law or agreed to in writing, software -% distributed under the License is distributed on an "AS IS" BASIS, -% WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -% See the License for the specific language governing permissions and -% limitations under the License. - -\chapter{Troubleshooting} - -\section{Logs} - -Q. The tablet server does not seem to be running!? What happened? - -Accumulo is a distributed system. It is supposed to run on remote -equipment, across hundreds of computers. Each program that runs on -these remote computers writes down events as they occur, into a local -file. By default, this is defined in -\texttt{\$ACCUMULO\_HOME}/conf/accumule-env.sh as ACCUMULO\_LOG\_DIR. - -A. Look in the \texttt{\$ACCUMULO\_LOG\_DIR}/tserver*.log file. Specifically, check the end of the file. - -Q. The tablet server did not start and the debug log does not exists! What happened? - -When the individual programs are started, the stdout and stderr output -of these programs are stored in ``.out'' and ``.err'' files in -\texttt{\$ACCUMULO\_LOG\_DIR}. Often, when there are missing configuration -options, files or permissions, messages will be left in these files. - -A. Probably a start-up problem. Look in \texttt{\$ACCUMULO\_LOG\_DIR}/tserver*.err - -\section{Monitor} - -Q. Accumulo is not working, what's wrong? - -There's a small web server that collects information about all the -components that make up a running Accumulo instance. It will highlight -unusual or unexpected conditions. - -A. Point your browser to the monitor (typically the master host, on port 50095). Is anything red or yellow? - -Q. My browser is reporting connection refused, and I cannot get to the monitor - -The monitor program's output is also written to .err and .out files in -the \texttt{\$ACCUMULO\_LOG\_DIR}. Look for problems in this file if the -\texttt{\$ACCUMULO\_LOG\_DIR/monitor*.log} file does not exist. - -A. The monitor program is probably not running. Check the log files for errors. - -Q. My browser hangs trying to talk to the monitor. - -Your browser needs to be able to reach the monitor program. Often -large clusters are firewalled, or use a VPN for internal -communications. You can use SSH to proxy your browser to the cluster, -or consult with your system administrator to gain access to the server -from your browser. - -It is sometimes helpful to use a text-only browser to sanity-check the -monitor while on the machine running the monitor: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ links http://localhost:50095 -\end{verbatim}\endgroup - -A. Verify that you are not firewalled from the monitor if it is running on a remote host. - -Q. The monitor responds, but there are no numbers for tservers and tables. The summary page says the master is down. - -The monitor program gathers all the details about the master and the -tablet servers through the master. It will be mostly blank if the -master is down. - -A. Check for a running master. - -\section{HDFS} - -Accumulo reads and writes to the Hadoop Distributed File System. -Accumulo needs this file system available at all times for normal operations. - -Q. Accumulo is having problems ``getting a block blk\_1234567890123.'' How do I fix it? - -This troubleshooting guide does not cover HDFS, but in general, you -want to make sure that all the datanodes are running and an fsck check -finds the file system clean: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ hadoop fsck /accumulo -\end{verbatim}\endgroup - -You can use: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ hadoop fsck /accumulo/path/to/corrupt/file -locations -blocks -files -\end{verbatim}\endgroup - -to locate the block references of individual corrupt files and use those -references to search the name node and individual data node logs to determine which -servers those blocks have been assigned and then try to fix any underlying file -system issues on those nodes. - -On a larger cluster, you may need to increase the number of Xceivers - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - <property> - <name>dfs.datanode.max.xcievers</name> - <value>4096</value> - </property> -\end{verbatim}\endgroup - -A. Verify HDFS is healthy, check the datanode logs. - -\section{Zookeeper} - -Q. \texttt{accumulo init} is hanging. It says something about talking to zookeeper. - -Zookeeper is also a distributed service. You will need to ensure that -it is up. You can run the zookeeper command line tool to connect to -any one of the zookeeper servers: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ zkCli.sh -server zoohost -... -[zk: zoohost:2181(CONNECTED) 0] -\end{verbatim}\endgroup - -It is important to see the word \texttt{CONNECTED}! If you only see -\texttt{CONNECTING} you will need to diagnose zookeeper errors. - -A. Check to make sure that zookeeper is up, and that -\texttt{\$ACCUMULO\_HOME/conf/accumulo-site.xml} has been pointed to -your zookeeper server(s). - -Q. Zookeeper is running, but it does not say \texttt{CONNECTED} - -Zookeeper processes talk to each other to elect a leader. All updates -go through the leader and propagate to a majority of all the other -nodes. If a majority of the nodes cannot be reached, zookeeper will -not allow updates. Zookeeper also limits the number connections to a -server from any other single host. By default, this limit can be as small as 10 -and can be reached in some everything-on-one-machine test configurations. - -You can check the election status and connection status of clients by -asking the zookeeper nodes for their status. You connect to zookeeper -and ask it with the four-letter ``stat'' command: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ nc zoohost 2181 -stat -Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT -Clients: - /127.0.0.1:58289[0](queued=0,recved=1,sent=0) - /127.0.0.1:60231[1](queued=0,recved=53910,sent=53915) - -Latency min/avg/max: 0/5/3008 -Received: 1561459 -Sent: 1561592 -Connections: 2 -Outstanding: 0 -Zxid: 0x621a3b -Mode: standalone -Node count: 22524 -$ -\end{verbatim}\endgroup - - -A. Check zookeeper status, verify that it has a quorum, and has not exceeded maxClientCnxns. - -Q. My tablet server crashed! The logs say that it lost it's zookeeper lock. - -Tablet servers reserve a lock in zookeeper to maintain their ownership -over the tablets that have been assigned to them. Part of their -responsibility for keeping the lock is to send zookeeper a keep-alive -message periodically. If the tablet server fails to send a message in -a timely fashion, zookeeper will remove the lock and notify the tablet -server. If the tablet server does not receive a message from -zookeeper, it will assume its lock has been lost, too. If a tablet -server loses its lock, it kills itself: everything assumes it is dead -already. - -A. Investigate why the tablet server did not send a timely message to -zookeeper. - -\subsection{Keeping the tablet server lock} - -Q. My tablet server lost its lock. Why? - -The primary reason a tablet server loses its lock is that it has been pushed into swap. - -A large java program (like the tablet server) may have a large portion -of its memory image unused. The operation system will favor pushing -this allocated, but unused memory into swap so that the memory can be -re-used as a disk buffer. When the java virtual machine decides to -access this memory, the OS will begin flushing disk buffers to return that -memory to the VM. This can cause the entire process to block long -enough for the zookeeper lock to be lost. - -A. Configure your system to reduce the kernel parameter ``swappiness'' from the default (60) to zero. - -Q. My tablet server lost its lock, and I have already set swappiness to -zero. Why? - -Be careful not to over-subscribe memory. This can be easy to do if -your accumulo processes run on the same nodes as hadoop's map-reduce -framework. Remember to add up: - -\begin{itemize} -\item{size of the JVM for the tablet server} -\item{size of the in-memory map, if using the native map implementation} -\item{size of the JVM for the data node} -\item{size of the JVM for the task tracker} -\item{size of the JVM times the maximum number of mappers and reducers} -\item{size of the kernel and any support processes} -\end{itemize} - -If a 16G node can run 2 mappers and 2 reducers, and each can be 2G, -then there is only 8G for the data node, tserver, task tracker and OS. - -A. Reduce the memory footprint of each component until it fits comfortably. - -Q. My tablet server lost its lock, swappiness is zero, and my node has lots of unused memory! - -The JVM memory garbage collector may fall behind and cause a -``stop-the-world'' garbage collection. On a large memory virtual -machine, this collection can take a long time. This happens more -frequently when the JVM is getting low on free memory. Check the logs -of the tablet server. You will see lines like this: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -2013-06-20 13:43:20,607 [tabletserver.TabletServer] DEBUG: gc ParNew=0.00(+0.00) secs - ConcurrentMarkSweep=0.00(+0.00) secs freemem=1,868,325,952(+1,868,325,952) totalmem=2,040,135,680 -\end{verbatim}\endgroup - -When ``freemem'' becomes small relative to the amount of memory -needed, the JVM will spend more time finding free memory than -performing work. This can cause long delays in sending keep-alive -messages to zookeeper. - -A. Ensure the tablet server JVM is not running low on memory. - -\section{Tools} - -The accumulo script can be used to run classes from the command line. -This section shows how a few of the utilities work, but there are many -more. - -There's a class that will examine an accumulo storage file and print -out basic metadata. - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo /accumulo/tables/1/default_tablet/A000000n.rf -2013-07-16 08:17:14,778 [util.NativeCodeLoader] INFO : Loaded the native-hadoop library -Locality group : <DEFAULT> - Start block : 0 - Num blocks : 1 - Index level 0 : 62 bytes 1 blocks - First key : 288be9ab4052fe9e span:34078a86a723e5d3:3da450f02108ced5 [] 1373373521623 false - Last key : start:13fc375709e id:615f5ee2dd822d7a [] 1373373821660 false - Num entries : 466 - Column families : [waitForCommits, start, md major compactor 1, md major compactor 2, md major compactor 3, - bringOnline, prep, md major compactor 4, md major compactor 5, md root major compactor 3, - minorCompaction, wal, compactFiles, md root major compactor 4, md root major compactor 1, - md root major compactor 2, compact, id, client:update, span, update, commit, write, - majorCompaction] - -Meta block : BCFile.index - Raw size : 4 bytes - Compressed size : 12 bytes - Compression type : gz - -Meta block : RFile.index - Raw size : 780 bytes - Compressed size : 344 bytes - Compression type : gz -\end{verbatim}\endgroup - -When trying to diagnose problems related to key size, the PrintInfo tool can provide a histogram of the individual key sizes: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo --histogram /accumulo/tables/1/default_tablet/A000000n.rf -... -Up to size count %-age - 10 : 222 28.23% - 100 : 244 71.77% - 1000 : 0 0.00% - 10000 : 0 0.00% - 100000 : 0 0.00% - 1000000 : 0 0.00% - 10000000 : 0 0.00% - 100000000 : 0 0.00% - 1000000000 : 0 0.00% -10000000000 : 0 0.00% -\end{verbatim}\endgroup - -Likewise, PrintInfo will dump the key-value pairs and show you the contents of the RFile: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.core.file.rfile.PrintInfo --dump /accumulo/tables/1/default_tablet/A000000n.rf -row columnFamily:columnQualifier [visibility] timestamp deleteFlag -> Value -... -\end{verbatim}\endgroup - -Q. Accumulo is not showing me any data! - -A. Do you have your auths set so that it matches your visibilities? - -Q. What are my visibilities? - -A. Use ``PrintInfo'' on a representative file to get some idea of the visibilities in the underlying data. - -Note that the use of PrintInfo is an administrative tool and can only -by used by someone who can access the underlying Accumulo data. It -does not provide the normal access controls in Accumulo. - -If you would like to backup, or otherwise examine the contents of Zookeeper, there are commands to dump and load to/from XML. - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.server.util.DumpZookeeper --root /accumulo >dump.xml -$ ./bin/accumulo org.apache.accumulo.server.util.RestoreZookeeper --overwrite < dump.xml -\end{verbatim}\endgroup - -Q. How can I get the information in the monitor page for my cluster monitoring system? - -A. Use GetMasterStats: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.test.GetMasterStats | grep Load - OS Load Average: 0.27 -\end{verbatim}\endgroup - -Q. The monitor page is showing an offline tablet. How can I find out which tablet it is? - -A. Use FindOfflineTablets: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.server.util.FindOfflineTablets -2<<@(null,null,localhost:9997) is UNASSIGNED #walogs:2 -\end{verbatim}\endgroup - -Here's what the output means: - -\begin{enumerate} -\item{\texttt{2<<} This is the tablet from (-inf, +inf) for the - table with id 2. ``tables -l'' in the shell will show table ids for - tables.} -\item{@(null, null, localhost:9997)} Location information. The - format is \texttt{@(assigned, hosted, last)}. In this case, the - tablet has not been assigned, is not hosted anywhere, and was once - hosted on localhost. -\item{\#walogs:2} The number of write-ahead logs that this tablet requires for recovery. -\end{enumerate} - -An unassigned tablet with write-ahead logs is probably waiting for -logs to be sorted for efficient recovery. - -Q. How can I be sure that the metadata tables are up and consistent? - -A. \texttt{CheckForMetadataProblems} will verify the start/end of -every tablet matches, and the start and stop for the table is empty: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.server.util.CheckForMetadataProblems -u root --password -Enter the connection password: -All is well for table !0 -All is well for table 1 -\end{verbatim}\endgroup - -Q. My hadoop cluster has lost a file due to a NameNode failure. How can I remove the file? - -A. There's a utility that will check every file reference and ensure -that the file exists in HDFS. Optionally, it will remove the -reference: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.server.util.RemoveEntriesForMissingFiles -u root --password -Enter the connection password: -2013-07-16 13:10:57,293 [util.RemoveEntriesForMissingFiles] INFO : File /accumulo/tables/2/default_tablet/F0000005.rf - is missing -2013-07-16 13:10:57,296 [util.RemoveEntriesForMissingFiles] INFO : 1 files of 3 missing -\end{verbatim}\endgroup - -Q. I have many entries in zookeeper for old instances I no longer need. How can I remove them? - -A. Use CleanZookeeper: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.server.util.CleanZookeeper -\end{verbatim}\endgroup - -This command will not delete the instance pointed to by the local \texttt{conf/accumulo-site.xml} file. - -Q. I need to decommission a node. How do I stop the tablet server on it? - -A. Use the admin command: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo admin stop hostname:9997 -2013-07-16 13:15:38,403 [util.Admin] INFO : Stopping server 12.34.56.78:9997 -\end{verbatim}\endgroup - -Q. I cannot login to a tablet server host, and the tablet server will not shut down. How can I kill the server? - -A. Sometimes you can kill a ``stuck'' tablet server by deleting it's lock in zookeeper: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks --list - 127.0.0.1:9997 TSERV_CLIENT=127.0.0.1:9997 -$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -delete 127.0.0.1:9997 -$ ./bin/accumulo org.apache.accumulo.server.util.TabletServerLocks -list - 127.0.0.1:9997 null -\end{verbatim}\endgroup - -You can find the master and instance id for any accumulo instances using the same zookeeper instance: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -$ ./bin/accumulo org.apache.accumulo.server.util.ListInstances -INFO : Using ZooKeepers localhost:2181 - - Instance Name | Instance ID | Master ----------------------+--------------------------------------+------------------------------- - "test" | 6140b72e-edd8-4126-b2f5-e74a8bbe323b | 127.0.0.1:9999 -\end{verbatim}\endgroup - -\section{System Metadata Tables} -\label{sec:metadata} - -Accumulo tracks information about tables in metadata tables. The metadata for -most tables is contained within the metadata table in the accumulo namespace, -while metadata for that table is contained in the root table in the accumulo -namespace. The root table is composed of a single tablet, which does not -split, so it is also called the root tablet. Information about the root -table, such as its location and write-ahead logs, are stored in ZooKeeper. - -Let's create a table and put some data into it: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -shell> createtable test -shell> tables -l -accumulo.metadata => !0 -accumulo.root => +r -test => 2 -trace => 1 -shell> insert a b c d -shell> flush -w -\end{verbatim}\endgroup - -Now let's take a look at the metadata for this table: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -shell> table accumulo.metadata -shell> scan -b 3; -e 3< -3< file:/default_tablet/F000009y.rf [] 186,1 -3< last:13fe86cd27101e5 [] 127.0.0.1:9997 -3< loc:13fe86cd27101e5 [] 127.0.0.1:9997 -3< log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6 -3< srv:dir [] /default_tablet -3< srv:flush [] 1 -3< srv:lock [] tservers/127.0.0.1:9997/zlock-0000000001$13fe86cd27101e5 -3< srv:time [] M1373998392323 -3< ~tab:~pr [] \x00 -\end{verbatim}\endgroup - -Let's decode this little session: - -\begin{enumerate} -\item{\texttt{scan -b 3; -e 3<}\\ - Every tablet gets its own row. Every row starts with the table id followed by - ``;'' or ``<'', and followed by the end row split point for that tablet.} -\item{\texttt{file:/default\_tablet/F000009y.rf [] 186,1}\\ - File entry for this tablet. This tablet contains a single file reference. The - file is ``/accumulo/tables/3/default\_tablet/F000009y.rf''. It contains 1 - key/value pair, and is 186 bytes long.} -\item{\texttt{last:13fe86cd27101e5 [] 127.0.0.1:9997}\\ - Last location for this tablet. It was last held on 127.0.0.1:9997, and the - unique tablet server lock data was ``13fe86cd27101e5''. The default balancer - will tend to put tablets back on their last location.} -\item{\texttt{loc:13fe86cd27101e5 [] 127.0.0.1:9997}\\ - The current location of this tablet.} -\item{\texttt{log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0. ...}\\ - This tablet has a reference to a single write-ahead log. This file can be found in\\ - /accumulo/wal/127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995. The value - of this entry could refer to multiple files. This tablet's data is encoded as - ``6'' within the log.} -\item{\texttt{srv:dir [] /default\_tablet}\\ - Files written for this tablet will be placed into - /accumulo/tables/3/default\_tablet.} -\item{\texttt{srv:flush [] 1}\\ - Flush id. This table has successfully completed the flush with the id of - ``1''.} -\item{\texttt{srv:lock [] tservers/127.0.0.1:9997/zlock-0000000001\$13fe86cd27101e5}\\ - This is the lock information for the tablet holding the present lock. This - information is checked against zookeeper whenever this is updated, which - prevents a metadata update from a tablet server that no longer holds its - lock.} -\item{\texttt{srv:time [] M1373998392323} } -\item{\texttt{\textasciitilde{}tab:\textasciitilde{}pr [] \textbackslash{}x00}\\ - The end-row marker for the previous tablet (prev-row). The first byte - indicates the presence of a prev-row. This tablet has the range (-inf, +inf), - so it has no prev-row (or end row).} -\end{enumerate} - -Besides these columns, you may see: - -\begin{enumerate} -\item{\texttt{rowId future:zooKeeperID location} Tablet has been assigned to a tablet, but not yet loaded.} -\item{\texttt{\textasciitilde{}del:filename} When a tablet server is done use a file, it will create a delete marker in the appropriate metadata table, unassociated with any tablet. The garbage collector will remove the marker, and the file, when no other reference to the file exists.} -\item{\texttt{\textasciitilde{}blip:txid} Bulk-Load In Progress marker} -\item{\texttt{rowId loaded:filename} A file has been bulk-loaded into this tablet, however the bulk load has not yet completed on other tablets, so this is marker prevents the file from being loaded multiple times.} -\item{\texttt{rowId !cloned} A marker that indicates that this tablet has been successfully cloned.} -\item{\texttt{rowId splitRatio:ratio} A marker that indicates a split is in progress, and the files are being split at the given ratio.} -\item{\texttt{rowId chopped} A marker that indicates that the files in the tablet do not contain keys outside the range of the tablet.} -\item{\texttt{rowId scan} A marker that prevents a file from being removed while there are still active scans using it.} - -\end{enumerate} - -\section{Simple System Recovery} - -Q. One of my Accumulo processes died. How do I bring it back? - -The easiest way to bring all services online for an Accumulo instance is to run the ``start-all.sh`` script. - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ bin/start-all.sh -\end{verbatim}\endgroup - -This process will check the process listing, using ``jps`` on each host before attempting to restart a service on the given host. -Typically, this check is sufficient except in the face of a hung/zombie process. For large clusters, it may be -undesirable to ssh to every node in the cluster to ensure that all hosts are running the appropriate processes and ``start-here.sh`` may be of use. - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ ssh host_with_dead_process - $ bin/start-here.sh -\end{verbatim}\endgroup - -``start-here.sh`` should be invoked on the host which is missing a given process. Like start-all.sh, it will start all -necessary processes that are not currently running, but only on the current host and not cluster-wide. Tools such as ``pssh`` or -``pdsh`` can be used to automate this process. - -``start-server.sh`` can also be used to start a process on a given host; however, it is not generally recommended for -users to issue this directly as the ``start-all.sh`` and ``start-here.sh`` scripts provide the same functionality with -more automation and are less prone to user error. - -A. Use ``start-all.sh`` or ``start-here.sh``. - -Q. My process died again. Should I restart it via ``cron`` or tools like ``supervisord``? - -A. A repeatedly dying Accumulo process is a sign of a larger problem. Typically these problems are due to a -misconfiguration of Accumulo or over-saturation of resources. Blind automation of any service restart inside of Accumulo -is generally an undesirable situation as it is indicative of a problem that is being masked and ignored. Accumulo -processes should be stable on the order of months and not require frequent restart. - - -\section{Advanced System Recovery} - -\subsection{HDFS Failure} -Q. I had disasterous HDFS failure. After bringing everything back up, several tablets refuse to go online. - -Data written to tablets is written into memory before being written into indexed files. In case the server -is lost before the data is saved into a an indexed file, all data stored in memory is first written into a -write-ahead log (WAL). When a tablet is re-assigned to a new tablet server, the write-ahead logs are read to -recover any mutations that were in memory when the tablet was last hosted. - -If a write-ahead log cannot be read, then the tablet is not re-assigned. All it takes is for one of -the blocks in the write-ahead log to be missing. This is unlikely unless multiple data nodes in HDFS have been -lost. - -A. Get the WAL files online and healthy. Restore any data nodes that may be down. - -Q. How do find out which tablets are offline? - -A. Use ``accumulo admin checkTablets'' - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ bin/accumulo admin checkTablets -\end{verbatim}\endgroup - -Q. I lost three data nodes, and I'm missing blocks in a WAL. I don't care about data loss, how -can I get those tablets online? - -See the discussion in section~\ref{sec:metadata}, which shows a typical metadata table listing. -The entries with a column family of ``log'' are references to the WAL for that tablet. -If you know what WAL is bad, you can find all the references with a grep in the shell: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -shell> grep 0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 -3< log:127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 [] 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995|6 -\end{verbatim}\endgroup - -A. You can remove the WAL references in the metadata table. - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} -shell> grant -u root Table.WRITE -t accumulo.metadata -shell> delete 3< log 127.0.0.1+9997/0cb7ce52-ac46-4bf7-ae1d-acdcfaa97995 -\end{verbatim}\endgroup - -Note: the colon (``:'') is omitted when specifying the ``row cf cq'' for the delete command. - -The master will automatically discover the tablet no longer has a bad WAL reference and will -assign the tablet. You will need to remove the reference from all the tablets to get them -online. - - -Q. The metadata (or root) table has references to a corrupt WAL. - -This is a much more serious state, since losing updates to the metadata table will result -in references to old files which may not exist, or lost references to new files, resulting -in tablets that cannot be read, or large amounts of data loss. - -The best hope is to restore the WAL by fixing HDFS data nodes and bringing the data back online. -If this is not possible, the best approach is to re-create the instance and bulk import all files from -the old instance into a new tables. - -A complete set of instructions for doing this is outside the scope of this guide, -but the basic approach is: - -\begin{itemize} - \item Use ``tables -l'' in the shell to discover the table name to table id mapping - \item Stop all accumulo processes on all nodes - \item Move the accumulo directory in HDFS out of the way: -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $ hadoop fs -mv /accumulo /corrupt -\end{verbatim}\endgroup - \item Re-initalize accumulo - \item Recreate tables, users and permissions - \item Import the directories under \texttt{/corrupt/tables/<id>} into the new instance -\end{itemize} - -Q. One or more HDFS Files under /accumulo/tables are corrupt - -Accumulo maintains multiple references into the tablet files in the METADATA -table and within the tablet server hosting the file, this makes it difficult to -reliably just remove those references. - -The directory structure in HDFS for tables will follow the general structure: - -\small -\begin{verbatim} - /accumulo - /accumulo/tables/ - /accumulo/tables/!0 - /accumulo/tables/!0/default_tablet/A000001.rf - /accumulo/tables/!0/t-00001/A000002.rf - /accumulo/tables/1 - /accumulo/tables/1/default_tablet/A000003.rf - /accumulo/tables/1/t-00001/A000004.rf - /accumulo/tables/1/t-00001/A000005.rf - /accumulo/tables/2/default_tablet/A000006.rf - /accumulo/tables/2/t-00001/A000007.rf -\end{verbatim} -\normalsize - -If files under /accumulo/tables are corrupt, the best course of action is to -recover those files in hdsf see the section on HDFS. Once these recovery efforts -have been exhausted, the next step depends on where the missing file(s) are -located. Different actions are required when the bad files are in Accumulo data -table files or if they are metadata table files. - -{\bf Data File Corruption} - -When an Accumulo data file is corrupt, the most reliable way to restore Accumulo -operations is to replace the missing file with an âemptyâ file so that -references to the file in the METADATA table and within the tablet server -hosting the file can be resolved by Accumulo. An empty file can be created using -the CreateEmpty utiity: - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $accumulo org.apache.accumulo.core.file.rfile.CreateEmpty /path/to/empty/file/empty.rf -\end{verbatim}\endgroup - -The process is to delete the corrupt file and then move the empty file into its -place (The generated empty file can be copied and used multiple times if necessary and does not need -to be regenerated each time) - -\begingroup\fontsize{8pt}{8pt}\selectfont\begin{verbatim} - $hadoop fs ârm /accumulo/tables/corrupt/file/thename.rf; \ - hadoop fs -mv /path/to/empty/file/empty.rf /accumulo/tables/corrupt/file/thename.rf -\end{verbatim}\endgroup - -{\bf Metadata File Corruption} - -If the corrupt files are metadata files, see \ref{sec:metadata} (under the path -\begin{verbatim}/accumulo/tables/!0\end{verbatim}) then you will need to rebuild -the metadata table by initializing a new instance of Accumulo and then importing -all of the existing data into the new instance. This is the same procedure as -recovering from a zookeeper failure (see \ref{ZooKeeper Failure}, except that -you will have the benefit of having the existing user and table authorizations -that are maintained in zookeeper. - -You can use the DumpZookeeper utility to save this information for reference -before creating the new instance. You will not be able to use RestoreZookeeper -because the table names and references are likely to be different between the -original and the new instances, but it can serve as a reference. - -A. If the files cannot be recovered, replace corrupt data files with a empty -rfiles to allow references in the metadata table and in the tablet servers to be -resolved. Rebuild the metadata table if the corrupt files are metadata files. - -\subsection{ZooKeeper Failure} -Q. I lost my ZooKeeper quorum (hardware failure), but HDFS is still intact. How can I recover my Accumulo instance? - -ZooKeeper, in addition to its lock-service capabilities, also serves to bootstrap an Accumulo -instance from some location in HDFS. It contains the pointers to the root tablet in HDFS which -is then used to load the Accumulo metadata tablets, which then loads all user tables. ZooKeeper -also stores all namespace and table configuration, the user database, the mapping of table IDs to -table names, and more across Accumulo restarts. - -Presently, the only way to recover such an instance is to initialize a new instance and import all -of the old data into the new instance. The easiest way to tackle this problem is to first recreate -the mapping of table ID to table name and then recreate each of those tables in the new instance. -Set any necessary configuration on the new tables and add some split points to the tables to close -the gap between how many splits the old table had and no splits. - -The directory structure in HDFS for tables will follow the general structure: - -\small -\begin{verbatim} - /accumulo - /accumulo/tables/ - /accumulo/tables/1 - /accumulo/tables/1/default_tablet/A000001.rf - /accumulo/tables/1/t-00001/A000002.rf - /accumulo/tables/1/t-00001/A000003.rf - /accumulo/tables/2/default_tablet/A000004.rf - /accumulo/tables/2/t-00001/A000005.rf -\end{verbatim} -\normalsize - -For each table, make a new directory that you can move (or copy if you have the HDFS space to do so) -all of the rfiles for a given table into. For example, to process the table with an ID of ``1``, make a new directory, -say ``/new-table-1`` and then copy all files from ``/accumulo/tables/1/*/*.rf`` into that directory. Additionally, -make a directory, ``/new-table-1-failures``, for any failures during the import process. Then, issue the import -command using the Accumulo shell into the new table, telling Accumulo to not re-set the timestamp: - -\small -\begin{verbatim} -user@instance new_table> importdirectory /new-table-1 /new-table-1-failures false -\end{verbatim} -\normalsize - -Any RFiles which were failed to be loaded will be placed in ``/new-table-1-failures``. Rfiles that were successfully -imported will no longer exist in ``/new-table-1``. For failures, move them back to the import directory and retry -the ``importdirectory`` command. - -It is \textbf{extremely} important to note that this approach may introduce stale data back into -the tables. For a few reasons, RFiles may exist in the table directory which are candidates for deletion but have -not yet been deleted. Additionally, deleted data which was not compacted away, but still exists in write-ahead logs if -the original instance was somehow recoverable, will be re-introduced in the new instance. Table splits and merges -(which also include the deleteRows API call on TableOperations, are also vulnerable to this problem. This process should -\textbf{not} be used if these are unacceptable risks. It is possible to try to re-create a view of the ``accumulo.metadata`` -table to prune out files that are candidates for deletion, but this is a difficult task that also may not be entirely accurate. - -Likewise, it is also possible that data loss may occur from write-ahead log (WAL) files which existed on the old table but -were not minor-compacted into an RFile. Again, it may be possible to reconstruct the state of these WAL files to -replay data not yet in an RFile; however, this is a difficult task and is not implemented in any automated fashion. - -A. The ``importdirectory`` shell command can be used to import RFiles from the old instance into a newly created instance, -but extreme care should go into the decision to do this as it may result in reintroduction of stale data or the -omission of new data. - -\section{File Naming Conventions} - -Q. Why are files named like they are? Why do some start with ``C'' and others with ``F''? - -A. The file names give you a basic idea for the source of the file. - -The base of the filename is a base-36 unique number. All filenames in accumulo are coordinated -with a counter in zookeeper, so they are always unique, which is useful for debugging. - -The leading letter gives you an idea of how the file was created: - -\begin{itemize} - \item F - Flush: entries in memory were written to a file (Minor Compaction) - \item M - Merging compaction: entries in memory were combined with the smallest file to create one new file - \item C - Several files, but not all files, were combined to produce this file (Major Compaction) - \item A - All files were compacted, delete entries were dropped - \item I - Bulk import, complete, sorted index files. Always in a directory starting with "b-" -\end{itemize} - -This simple file naming convention allows you to see the basic structure of the files from just -their filenames, and reason about what should be happening to them next, just -by scanning their entries in the metadata tables. - -For example, if you see multiple files with ``M'' prefixes, the tablet is, or was, up against it's -maximum file limit, so it began merging memory updates with files to keep the file count reasonable. This -slows down ingest performance, so knowing there are many files like this tells you that the system -is struggling to keep up with ingest vs the compaction strategy which reduces the number of files. - http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/images/data_distribution.png ---------------------------------------------------------------------- diff --git a/docs/src/main/latex/accumulo_user_manual/images/data_distribution.png b/docs/src/main/latex/accumulo_user_manual/images/data_distribution.png deleted file mode 100644 index 7f18d3f..0000000 Binary files a/docs/src/main/latex/accumulo_user_manual/images/data_distribution.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/docs/src/main/latex/accumulo_user_manual/images/failure_handling.png ---------------------------------------------------------------------- diff --git a/docs/src/main/latex/accumulo_user_manual/images/failure_handling.png b/docs/src/main/latex/accumulo_user_manual/images/failure_handling.png deleted file mode 100644 index c131de6..0000000 Binary files a/docs/src/main/latex/accumulo_user_manual/images/failure_handling.png and /dev/null differ http://git-wip-us.apache.org/repos/asf/accumulo/blob/900d6abb/pom.xml ---------------------------------------------------------------------- diff --git a/pom.xml b/pom.xml index 78af9c5..2b92402 100644 --- a/pom.xml +++ b/pom.xml @@ -230,7 +230,7 @@ <artifactId>accumulo-docs</artifactId> <version>${project.version}</version> <classifier>user-manual</classifier> - <type>pdf</type> + <type>html</type> </dependency> <dependency> <groupId>org.apache.accumulo</groupId> @@ -594,6 +594,11 @@ </configuration> </plugin> <plugin> + <groupId>org.asciidoctor</groupId> + <artifactId>asciidoctor-maven-plugin</artifactId> + <version>0.1.4</version> + </plugin> + <plugin> <groupId>org.codehaus.mojo</groupId> <artifactId>build-helper-maven-plugin</artifactId> <version>1.8</version> @@ -621,11 +626,6 @@ <version>1.2.1</version> </plugin> <plugin> - <groupId>org.codehaus.mojo</groupId> - <artifactId>latex-maven-plugin</artifactId> - <version>1.1</version> - </plugin> - <plugin> <groupId>org.eclipse.m2e</groupId> <artifactId>lifecycle-mapping</artifactId> <version>1.0.0</version>