http://www.sleuthkit.org/informer/sleuthkit-informer-16.htmlThe Sleuth Kit Informerhttp://www.sleuthkit.org/informerhttp://sleuthkit.sourceforge.net/informer Brian Carrier
Issue #16 Contents
IntroductionWhat's New?http://www.sleuthkit.org/sleuthkit/ Autopsy 2.02 was released on July 30, 2004 with improved support for NTFS deleted files. Version 2.03 was released on Sept 7, 2004 and it included updates for Unicode searching and a few other minor updates. http://www.sleuthkit.org/autopsy/ I added a link to the 'comeforth' script by Dan Higgens, which helps you to process unallocated space. Here is a paragraph from the readme: Parse raw filesystem blocks, or block image data produced by "dls", found in the Sleuth Kit. This was inspired by lazarus (www.porcupine.org/forensics) but provides a bit more flexibility for processing very large data sets. Blocks of certain file types or matching certain regular expressions are first found and saved in a scan phase. After scanning, blocks that have been saved can be viewed, and based on their contents files can be reassembled from various other blocks. An auto-assemble feature is provided which can reassemble a complete file in many cases, knowing only the first block in the file (only for ext2/ext3 filesystems). http://www.sleuthkit.org/sleuthkit/download.php Comments from ReadersIn response to the article about TestDisk in the last issue of The Informer, Daniel Sedory mentioned that he has a page that gives screen shots and an example of using TestDisk to recover a partition.
http://therdcom.com/testdisk.html Call For Papers
http://www.sleuthkit.org/informer/cfp.html Searchtools, Indexed Searching in Forensic ImagesPaul Bakker <p.j.bakker at brainspark dot nl>DescriptionThe collection of tools is called 'Searchtools' and can be found on http://www.brainspark.nl. This article gives an overview of how Searchtools works. The workings of SearchtoolsIndex ParametersBecause it is not viable to create an index containing all strings that are present inside an image, a number of parameters have consequences for the size of the index files and the strings that are present herein. Among these parameters the most important are:
The mimimum and maximum string length determine the lengths of the strings that will be indexed. Indexing strings shorter that 4 characters will result in a large amount of rubish due to the high chance that 3 indexable characters occur in succession in a piece of random or binary data. Indexing strings longer than 15 will not result in more useful information. The characters that are indexed should depend on the needs of the investigator. If only words have to be searched, it is wise to only index alphabetic characters in order to limit the size of the index and thus the time it takes to generate. The folding parameter specifies if diacritic characters should map to their non-diacritic character in the index. This allows for easier searching of words that contain diacritic characters. Currently only folding of diacritic iso_8859-1 characters is supported. Index TypesCurrently Searchtools is able to create two different types of indexes:
The Raw index type contains all the strings that are located in the raw image. This means that this index type does not take into account any form of structure that might be available or present on the image. If a string is located in a fragmented file and spans non-consecutive sectors, then it will not be found using the raw index. To find this string, the data on the image is indexed using the original file system structure. This index is called the raw fragment index. To reduce the raw fragment index size and prevent duplicate entries in the indexes, this index contains only the strings that start in one fragment and end in a non-consecutive fragment. Simplified Index ExampleIn order to visualize the data that is contained in an index, a small example is presented. The example creates a simplified raw index of a file containing only the string "This looks like a sentence: look looks looked". The default parameters are used, thus only strings with a length of 4 to 15 are indexed. A simplified parsing of the file results in the following index information: 0 this 22 ence 5 looks 28 look 6 ooks 33 looks 11 like 34 ooks 18 sentence 39 looked 19 entence 40 ooked 20 ntence 41 oked 21 tence Note: All locations are zero-based. Internally though the information is represented in a tree. So a more accurate representation would be the following simplified drawing: e - n - c - e(22)
/ \
/ t - e - n - c - e(19)
/
/- l - i - k - e(11)
/ \
/ o - o - k(28) - s(5,33)
/ \
/ e - d(39)
/
/----- n - t - e - n - c - e(20)
root
\ k - e - d(41)
\ /
\----- o - o - k - s(6,34)
\ \
\ e - d(40)
\
\--- s - e - n - t - e - n - c - e(18)
\
-- t - e - n - c - e(21)
\
h - i - s(0)
As can be seen, the internal representation uses the letters of the indexed strings as nodes in a tree. At the node of the final letter of an indexed string, the offsets of that string are located. So if one now wanted to search for the string "look", only the letters of the string have to be walked in the tree from the root node to see that the string is present at location 28. If all strings starting with "look" are to be found, all nodes beneath that node have to be accounted for too, thus resulting in the locations 5, 28, 33 and 39. Index DirectoriesIn order to facilitate indexed searches a directory is created: The index directory. This directory contains the resulting index for one image. The index itself currently consists of three different file types:
Exactly one index configuration is located in an index directory and this file contains the general information used for creating the index itself. This file is used by the different tools of the searchtools and contains a binary form of the configuration. This file is therefore not meant to be read by human beings. The actual index is split into a number of raw index files and a number of raw fragment index files. The reason for not using a single large index file is simple. The current process of generating an index requires a lot of memory. Each file represents a single piece of full memory dumped into a file. Thus if the generating computer has an immense amount of memory a single index file would be the result. The index files contain very compact and optimized tree representations created during the indexing process. As described above, each file contains the contents that could fit in one full memory piece. The tree in memory is optimized for in memory use. The tree in file format is optimized for searching with the least possible searches and thus disk seeks. Different SearchtoolsOverviewThe collection of searchtools consists of:
DemonstrationThis section continues with a short description/demonstration of the most used tools. Not all options will be demonstrated and this will definitely not be a complete manpage for these tools, but this demonstration will give a general idea of the possibilities and capabilities of the Searchtools. Some commands will be timed by using the standard 'time' command integrated in most shells. The image that we are using is a dd image of a 50 Mb linux ext2 partition that is packed with data. Packed meaning that almost all of the 50 Mb is used by the files present on the partition. # ls -l test.img -rw-r--r-- 1 paul paul 50M Jul 27 21:10 test.img First we will create a standard index of the image (With the most important parameters as specified above) # time indexer -v test.img idx_std Starting raw indexing. Done 100.0 percent: 282 kNodes 6447 kOffsets 27M Mem Saving. Read 52428800 bytes. Total nodes 369063. Total offsets 6447387. Starting raw fragment indexing. Done 100.0 percent: 1 kNodes 0 kOffsets 0M Mem 12824/12824 Inodes Saving. Total nodes 1380. Total offsets 437. real 0m35.398s user 0m28.750s sys 0m1.180s The output of the indexer command shows us that using these index parameters a total of 6,447,387 raw indexes where indexed and a small total of 437 raw fragment indexes. The total time to index this small 50 Mb image is around 35 seconds on this 2.4 GHz PC. Thus a rough correlation would result in about 11 minutes per gigabyte of image. Note though that whenever memory is filled (250 Mb by default), the contents have to be written to disk in order to continue. The resulting index directory contains the following files: # ls -l idx_std
516 index.cnf
20437 raw_frag_idx.000
17672517 raw_idx.000
As can be seen the raw index file is about 17 Mb and this file contains all the 6,477,387 raw indexes that were found during the previous step. Now the image is created we will search for 'notifications' which occurs once in the image. # time searcher test.img idx_std notifications
Type: Raw
50712898 notifications
real 0m0.003s
user 0m0.010s
sys 0m0.000s
The output of the searcher command shows us that the string 'notifications' is located on byte offset 50,712,898 of the image and that the search took only a fraction of a second. This time the image is searched for the string 'data' which occurs 23,270 times in the image. (Flag -i is used for case insensitive searching, -p for better parseble output format) # time searcher test.img idx_std data -i -p raw 253452 database raw 254271 data <snipped lots of results> raw 52357097 DataBase raw_frag 1913 12274 DATA real 0m0.132s user 0m0.070s sys 0m0.060s The search and recovery of these results took much less than one second and gives 28,189 results back. Wait a minute! Didn't I just point out that the string 'data' occurs only 23,270 times in this image? By default the searcher returns all strings starting with the search string. By specifying the '-w' flag, only the keywords that exactly match the search string are returned. Sometimes just searching will not do. In order to find special words you want to be able to look at the number of occerences for a specific keyword, or all keywords that are present within the image. The print_keywords command prints all the keywords in an index directory or in a specific index file. In order to facilitate scripting it is possible to let print_keywords skip the count that is appended to the end by default. # time print_keywords -d idx_std 0000 2307 00000 1982 <snipped lots of results> priorities 37 prioritized 76 prioritizing 2 priority 1824 prioritydata 42 prioritynames 2 <snipped lots of results> zzzvz 1 zzzz 2 zzzzz 2 real 0m2.761s user 0m1.650s sys 0m0.250s This concludes the small demonstration of the searchtools. ConclusionThis article only lightly discusses the internal workings of the searchtools, but I hope it is able to shed a little light on the subject for people interested in it. If any of you require extra information, don't hesitate to ask, as I probably want create the documentation anyway and then have an incentive as to actually doing it. Almost all functionality described in this article is also available in some way from the Autopsy interface. This article only used the commandline versions of the tools in order to visualize the actions done under the hood by the Autopsy interface.
sstrings and Unicode SearchingBrian CarrierOverviewExtracting ASCII-based Unicode StringsIntegration With AutopsySummaryNTFS Orphan FilesBrian Carrier# istat -f ntfs img.dd 180 MFT Entry Header Values: Entry: 180 Sequence: 4 $LogFile Sequence Number: 1608100 Not Allocated File ... $FILE_NAME Attribute Values: Flags: Archive Name: FILE1.DAT Parent MFT Entry: 31 Sequence: 1 ... # ffind -f ntfs img.dd 31 /DIR1 # ifind -f ntfs -p 31 img.dd -/r * 180: FILE1.DAT Copyright © 2004 by Brian Carrier. All Rights Reserved This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License. ![]() |

