Microsoft Windows uses a set of Registry keys
known as “shellbags” to maintain the size, view, icon, and position of a
folder when using Explorer. These keys are useful to a forensic
investigator. Shellbags persist information for directories even after
the directory is removed, which means that they can be used to enumerate
past mounted volumes, deleted files, and user actions.
Yuandong Zhu, Pavel Gladyshev, and Joshua James provided a nice overview of the investigative value of shellbags in “Using shellbag information to reconstruct user activities”
[pdf]; however, they do not describe how to programmatically access the
data. Allan S Hay went into greater detail in his December, 2004
document “MiTeC Registry Analyser”
[pdf], although he also leaves out a thorough analysis of the format.
TZWorks provides an effective closed-source shellbag parser sbag,
but does not explain its algorithm. Yogesh Khatri first described the
basic structure of Windows Shell Items in his blog post for 42 LLC
entitled Shell BAG Format Analysis. Joachim Metz went on to described the binary format of the Windows Shell Item structures with great detail in Windows Shell Item format specification
[pdf]. This page documents an approach to parsing shellbags in detail,
as well as introduces an open-source, cross-platform shellbag parser.
Shellbag locations
Shellbags
may be found in a few locations, depending on operating system version
and user profile. On a Windows XP system, shellbags may be found under:
The UsrClass.dat hive file persists the registry key HKEY\_USERS\{USERID}\.
Shellbag Parsing
Let us begin with the Shell\ key. The Shell\ key does not have any values. Under the Shell\ key are two keys: Shell\Bags\ and Shell\BagMRU\.
FOLDERDATA
Each subkey under Shell\Bags\ is named as increasing integers from one, such as Shell\Bags\1\ or Shell\Bags\2\. Let us call these subkeys FOLDERDATA,
since they each represent one item viewed in Explorer, and this is
usually a folder. FOLDERDATA subkeys do not have any values, but often
have subkeys. The most common subkey is Shell\Bags\{Int}\Shell\, but there are a few other possibilities (ComDlg, Desktop,
etc.). The subkeys under a FOLDERDATA describe the settings, position,
and icon when viewing the folder in Explorer. In particular, a Registry
value whose name begins with ItemPos specifies the location of the icons for a given desktop resolution. For example, on my Windows 7 system, the Registry key HKEY\_USERS\{USERID}\Local Settings\Software\Microsoft\Windows\Shell\Bags\6\Shell\{5C4F28B5-F869-4E84-8E60-F11DB97C5CC7} has 12 values that record various configurations. This set includes the value ItemPos1427x820(1) that has type REG_BIN with length 0x120:
With no tools beyond Regedit (or Regview.py), Windows 8.3 filenames (eg. MOZILL\~1.LNK)
and Unicode filenames (eg. Mozilla Firefox.lnk) stand out. Fortunately,
by applying the formats found in Joachim’s paper, more details can be
extracted. Throughout this document, I refer to this Registry value type
as an ITEMPOS value.
ITEMPOS values
The ITEMPOS value’s structure is a list of Windows File Entry Shell Items (SHITEM_FILEENTRY)
terminated by an entry whose size field is zero. The list begins at
offset 0x10. Items are preceeded by 0x8 bytes whose meaning is unknown.
The minimum size of a SHITEM_FILEENTRY structure is 0x15 bytes, so entries whose size field is less than 0x15 should be skipped. The valid SHITEM_FILEENTRY items have the following structure (in pseudo-C / 010 Editor template format): FILEREFERENCE is a 64bit MFT file reference structure (48 bits file MFT record number, 16 bits MFT sequence number). FILEATTRS
is a 16 bit set of flags that specifies attributes such as if the item
is read-only or a system file. Applying this template to the ITEMPOS
Registry value, we see there are four list items: one invalid entry, and
three SHITEM_FILEENTRY items.
Taking the first valid entry from offset 0x34, let’s parse out
the fields from the binary. The following block visually maps out the
relevant bytes, while the table translates each field into a human
readable value.
00 00 00 00 --> SHITEM_FILEENTRY size
00 00 00 00 --> filesize
00 00 00 00 --> timestamp
00 00 00 00 --> filename
0000 46 00 3A 00 02 02 00 0010 3D 0C 8E 20 00 43 79F.:.....w.=.Ž Cy
0010 67 77 69 6E 2E 6C 6E 6B 00 00 2C 00 03 00 04 00 gwin.lnk..,.....
0020 EF BE 10 3D 0C 8E 10 3D 0C 8E 14 00 00 00 43 00 ï¾.=.Ž.=.Ž....C.
0030 79 00 67 00 77 00 69 00 6E 00 2E 00 6C 00 6E 00y.g.w.i.n...l.n.
0040 6B 00 00 00 1A 00 k.....
Offset
Field
Value
0x00
ITEMPOS size
0x46
0x04
Filesize
0x202
0x08
Modified Date
August 16, 2010 at 17:48:24
0x0E
8.3 Filename
Cygwin.lnk
0x22
Created Date
August 16, 2010 at 17:48:24
0x26
Modified Date
August 16, 2010 at 17:48:24
0x2E
Unicode Filename
Cywgin.lnk
At
this point, it is easy to write parser that explores the FOLDERDATA
keys under the Shell registry key. For each FOLDERDATA, the parser might
enumerate each ITEMPOS value and consider the binary blob. By applying
the binary template above, the tool could identify filenames, MACB
timestamps, and other metadata independent of the filesystem MFT.
Unfortunately, we’re still missing a key piece of information: the full
file path.
BagMRU tree
To recover file paths from Shellbags, we’ll need to consider the Registry keys under BagMRU. The subkeys under Shell\BagMRU form a recursive, tree-like structure that mirrors the file system on disk. Shell\BagMRU
is the root of the tree. Each subkey is a node representing a folder,
and like a folder, may contain children nodes. Yet, unlike (most)
folders, the nodes are named as increasing integers from zero. For
example, the branch Shell\BagMRU\0 might have the children 0, 1, and 2.
All nodes in this tree have a value named MRUListEx, and many have a value named NodeSlot. NodeSlot is what interests us, as it forms the link between the filesystem tree structure and the FOLDERDATA keys. A NodeSlot value has type REG_DWORD and should be interpreted as a pointer to the FOLDERDATA key with the same name. For example, on my workstation, the key Shell\BagMRU\1\1\3\0 has a NodeSlot value of
This means that the FOLDERDATA Shell\Bags\144\ corresponds to a folder with a path of four components. What are they? The components are described by the values at Shell\BagMRU\1, Shell\BagMRU\1\1, Shell\BagMRU\1\1\3, and Shell\BagMRU\1\1\3\0.
SHITEMLIST
In addition to the values MRUListEx and NodeSlot, nodes of the Shell\BagMRU
tree have one value for each subkey. The values have the same name as
the subkey; since the subkeys are named as increasing integers, so are
the values. Each value records metadata about the filesystem path
component associated with the subkey. The values have type REG_BIN, and have an internal binary structure known as an SHITEMLIST. An SHITEMLIST is formed by contiguous items terminated by an empty item. Practically, though, the SHITEMLIST of a BagMRU node will have two entries: a relevant entry, and the empty terminator item. The first word of each SHITEM gives the item’s size.
Joachim’s paper on Window’s shell items is the best resource for understanding the variations among SHITEM entries. From a high level, there are at least ten types of items that range from SHITEM_FILEENTRY and SHITEM_FOLDERENTRY to SHITEM_CONTTROLPANELENTRY.
For each of these types, we can extract at least a path component such
as “My Documents” or “\myserver”. Fortunately, most items have type SHITEM_FOLDERENTRY,
which provides additional metadata including MAC timestamps. A small
number of items do not conform to the known structure, although these do
not usually contain any human readable strings or hints.
Putting it all together
With the SHITEMLIST
structure in hand, we now have enough information to comprehensively
parse Windows shellbags. To do this, first recurse down the Shell\BagMRU
keys while complete directory paths. At each node, record any available
metadata and lookup the associated FOLDERDATA. Recall that the
FOLDERDATA may indicate some of the items contained by the directory, so
record this metadata, too. Finally, format and enjoy!
The following code block lists the algorithm in a Pythonish language for the programmers in the room.
Shellbags.py
Using
these concepts, I’ve implemented a cross-platform shellbag parser for
Windows XP and greater in the Python programming language. The code is
freely available here,
so all algorithms and structures are accessible to interested parties.
I’ve licensed the code under the Apache 2.0 license, so please feel
encouraged to take and improve the routines as you feel fit. As a
benchmark, shellbags.py tends to identify at least the items returned by
the sbag utility, and in some cases returns more. Shellbags.py
accepts the path to a raw Registry hive acquired forensically as a
command line argument. To ensure interoperability, output is formatted
according to the Bodyfile specification by default. The following block
lists a demonstration of me running shellbags.py against a Windows XP NTUSER.dat Registry hive.
To improve
readability, I ran the output through the mactime utility to generate a
timeline of activity. The following block lists a portion of this
sample.
Help
For reference, the
following code block lists the command line parameters accepted by
shellbags.py. Now get going and try it out!
\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\EditionID: change from Ultimate to Professional or HOMEPREMIUM
\HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProductName: change from Windows 7 Ultimate to Windows 7 Professional or Windows 7 HOMEPREMIUM