Processing File Changes with CBFS Filter

One of the frequent tasks for filesystem filter drivers is to detect a change in file data and to process new data.

Two approaches can be followed to deal with this task: (1) reacting to each change or (2) postponing the processing till the moment the file is closed. Both approaches have pros and cons, which we will review next.

What Are File Changes?

With CBFilter, applications can track many types of changes to file data and metadata of files and directories, including changes to data in alternate data streams. In this article, we are focusing on changes in the file data, specifically in the main datastream of a file. With other types of changes, the approaches will be the same, but the events used to track the change are different.

File data can be altered by one of the following operations: * writing to a file; * changing file size with a corresponding call, without writing to a file; * opening of a file in the mode that immediately changes file size; this includes truncation of a file and superseding of a file; and * renaming of a file when replacing an existing file (new file.txt replaces the existing file.txt, effectively presenting new data under the same filename to a user).

Writing to File

In most cases, writing data to a file is done in two steps: first, the application writes data to the system cache, and second, the cached data are flushed to the medium (usually, a local drive because mounted network drives do not use caching). More rarely, an application or a system component can perform direct writing to a file.

File writes can be tracked by using _BeforeWriteFile_, _AfterWriteFile_, or NotifyWriteFile events. Each event includes the Direction parameter that specifies whether the data go from the application or system component to the cache or from the cache to the medium.

Why Is the Direction Important?

Depending on your task, you may need to react to any change instantly, even if this means slower data writes because of the synchronous processing of file data on each write operation. Consider, for example, security checks or data leak prevention. If process A writes the data to the cache and process B is allowed to read this data, your security solution may need to verify that process B does not access the "forbidden" data right when they are in the cache. In this case, you need to track all writes to the cache and handle the BeforeWriteFile event with the direction set to "Cached" (data are written to the cache or are read from the cache).

Note: To receive cached reads and writes, set the ProcessCachedIORequests property to True.

The disadvantage to reacting to cached writes is that an application may do several writes or even many small writes, and your application would need to react to each of them. Some applications have a bad habit of writing to files without internal buffering, meaning that they write really small chunks of data, relying on the system cache to do the buffering for them.

Writes of the cached data to a medium usually occur asynchronously, several seconds after the application is done writing a block of data. Also, such writes include large blocks of data, usually not less than the size of the system memory page (4 KB), unless the last block of the file is being written.

Reacting only to noncached writes or even processing changed file data when the file is being closed makes such processing more efficient, reduces the CPU, and increases overall system performance.

Changing File Size

A process can change file size using Win32 and NativeAPI functions. Both of these methods get down to sending a SetFileInformation request to the filesystem. This request supports many different subtypes (please refer to the description of information classes in the NtSetInformationFile topic).

For a "regular" file size, CBFilter offers a dedicated set of events (_BeforeSetFileSize_, _AfterSetFileSize_, _NotifySetFileSize_). Two more size-related properties of the files exist: (1) the allocation size and (2) the "valid data size". The allocation size has no effect on file contents and changing it should not be considered a change in the file data. Valid data size, however, tells the filesystem, which part of a file contains valid data. There are no dedicated events for a valid data size in CBFilter. If you need to track its change, your application should handle one of _BeforeSetFileInfo_, _AfterSetFileInfo_, NotifySetFileInfo events and check the information class. The class you need is _FileValidDataLengthInformation_; it carries the information about new valid data length.

File Opening

As neither the caller nor the operating system know whether a file exists, there is no separation in the OS between file creation and opening requests. Instead, a Create File request made by a process carries a parameter that tells the filesystem what it should do if a file already exists or not yet exists. This parameter is named CreateDisposition or CreationDisposition depending on the API function. Some of its values tell the filesystem to truncate the file if it exists or to replace an existing file (the difference is that replacing removes all old information of a file including its metadata). When this happens, the size of the file becomes zero (0) after opening, and the file contains no data before something is written into it.

CBFilter has two sets of events: _BeforeCreateFile_, _AfterCreateFile_, _NotifyCreateFile_, and _BeforeOpenFile_, _AfterOpenFile_, _NotifyOpenFile_. In most scenarios, both *CreateFile and the corresponding *OpenFile event should be handled in the same way. CBFilter fires *CreateFile events when CreateDisposition indicates that the file would lose its data if opened; in other cases, *OpenFile events are fired. Your application may make use of this difference.

File Renaming

If a file should be renamed or moved, but another file exists at the target location, the application can act in two ways: (1) it can use the ReplaceIfExists flag in the rename/move request to tell the filesystem to replace the existing file, or (2) it can move the target file away and place a new file to the desired place (after which, the old target file would be deleted or archived).

CBFilter offers _BeforeRenameOrMoveFile_, _AfterRenameOrMoveFile_, and NotifyRenameOrMoveFile events to track file renaming. The importance of the ReplaceIfExists flag is in the fact that if your application handles BeforeRenameOrMoveFile, it may need to know whether an existing target file would be removed if it exists. When handling AfterRenameOrMoveFile or NotifyRenameOrMoveFile, your application already knows the outcome of the operation, but also the data are already at the new location, which may be too late for some scenarios. If the purpose of your application is to prevent some data from getting into a certain location, you may need to deal with BeforeRenameOrMoveFile.

Immediate and Postponed Handling of Changes

At several points in time, your application may need to take an action: * before an operation is performed; * right after an operation is performed, before the corresponding request is returned to the caller; * a short time after an operation is performed, asynchronously; and * when a file is closed by the process that modified it or when a modified file is closed completely (all handles and kernel file objects are closed).

If the purpose of tracking is to prevent an operation, the right place to handle it is an event handler for the appropriate *Before event. Doing this makes it possible to deny the request or change its parameters if needed.

By handling an *After event, an application can be informed about the outcome of the operation as well as post-process it before execution returns to the process that initiated this operation. One of the scenarios, illustrated in the Directory Hide sample, is hiding a directory from a directory listing. There, the enumeration request first reaches the filesystem, and then the event handler post-processes the result.

When it comes to changing a file, your application may need to know that the change did take place and not failed for whatever reason. For this, an *After event handler is appropriate. A *Notify event handler may be used too, depending on the specific needs.

*Notify events are called asynchronously. Therefore, the corresponding event handler may be executed before, after, or at the time when the request processing is completed and execution returns to the initiator of the request. In general, *After and *Notify events are very similar, but the NotifyWriteFile event does not carry the block of data that was written, although it does include other parameters of the operation.

Finally, a change can be handled when the file is closed, in one of *CloseFile events. In this case, the decision should be made about whether the file should be processed when each handle and file object is closed or only when all of the currently existing handles and file objects are closed. In the second case, an application should track all open and close operations to have a reference counter and should process the file when the reference count drops to zero.

Handling of a change in the close event has the benefit of lowering the pressure on the system. At the same time, some applications keep files opened for a long time, and they may change file contents many time before the file is closed, which may be undesirable for your needs.

Another aspect to consider is how to deal with the delayed closing of a file. After the last file handle is closed, the system does not send the final close request to the filesystem for some time. If some process opens the file again during this time, it may occur that the open operation comes first, and the delayed close request comes later. In this case, the reference counter would be increased to two, and then decreased to one, but it would not reach zero until the second process closes the file and the file gets closed. This means that a situation is possible in which a file’s use count does not reach zero for a long time, thus letting processes get to the modified data before your application processes it.

How to Implement Postponed Handling of Changes

If a change is registered while a file is being used, but processing should take place when the file is closed, how should the information about changes be transferred?

CBFilter, in its *Before and *After events, has a parameter called _FileContext_. This is a holder for the application-provided data. An application can use it to store a reference or an identifier of some object in one event and access it in another event. The holder exists while the file is opened (has one or more handles to it or associated kernel file objects). When the file is fully closed and has no open handles of kernel file objects, the holder goes away. The next time the file is opened, a new holder is allocated, which is empty.

Therefore, your application can allocate an object in any event handler when a change is registered, can store the details of the change in this object, and can place the reference to this object in the FileContext parameter. If another change is registered, event handlers can reach the object through the value in FileContext or can allocate a new object if FileContext is empty. Then, they can update the object, replacing or adjusting the stored information. Finally, in the BeforeCloseFile or AfterCloseFile event handler, your application uses the information referenced by FileContext to obtain details of the file change and deal with that change.

If your application processes file data only when the file is fully closed, and you need a reference counter, the object referenced by FileContext is the right place to store this reference counter. For this scheme to work, you need to handle AfterCreateFile and AfterOpenFile events and allocate an object for FileContext there (unless it is already allocated) and increase the reference counter. In the BeforeCloseFile or AfterCloseFile event handler, your application decreases the reference counter; if it reaches zero, the application should process file data.

Getting Started with CBFilter

You can find an evaluation version of the SDK for your platform and programming language in the Download Center. After downloading and installing the SDK, you will receive a library, sample code, and comprehensive documentation on your system. The documentation includes a "Getting Started" section with programming instructions. Additionally, free technical support is available during the evaluation phase.

We appreciate your feedback. If you have any questions, comments, or suggestions about this article please contact our support team at support@callback.com.