Improving Speed of Scanning Files Before Opening
This article explores effective strategies for managing a high volume of file open requests while ensuring that file data are scanned before being accessed by any process. Learn how to optimize performance and maintain system efficiency in scenarios that require pre-access file scanning.
When designing an application that handles file open requests, you may need to process the BeforeOpenFile or AfterOpenFile event and block certain requests based on file content. This means that the decision to permit or block the request must be made immediately within the event handler, with no option to defer processing.
Which event to use
It seems obvious to use the BeforeOpenFile event as you want to prevent the file from opening, however using AfterOpenFile has its benefit. The latter is fired before the file open operation is complete and execution returns to the caller. So, if a decision is made to cancel this particular opening, the event handler can return a non-zero completion status or an error, and CBFilter driver will close the already opened file. The benefits of using AfterOpenFile are described below.
Addressing Slowdowns and Blockages
Performance issues arise when file open requests arrive faster than they can be processed, causing system slowdowns or blockages. The solution, although challenging to implement, is straightforward: reduce the "processing fee." This involves either speeding up request handling or minimizing the number of requests that require processing.
Optimizations at the Kernel Level
At the kernel level, you can exclude specific files from processing using two methods: * Extended Rules (added through the AddFilterRuleEx method): These allow for complex inclusion/exclusion conditions but are generally less effective for performance optimization. * Passthrough Rules (added through the AddPassthroughRule method): These enable the exclusion of specific files, such as registry-related files and are more practical for skipping unnecessary scans.
Optimizations at the User-Mode Level
In the user-mode (i.e., in event handlers), your application can use the DesiredAccess and CreateDisposition parameters of the BeforeOpenFile or AfterOpenFile event or leverage information about the requesting process (obtained using the calls to the GetOriginatorProcessName and GetOriginatorProcessId mehods) and file type (based on the file extension) to skip unnecessary processing.
Example 1: Skipping Web-Related Files
For web-related files (*.js, *.css, *.html, *.png, *.jpg), the application can identify browser processes as the request originator and skip scanning, assuming the browser already provides protection through plugins or built-in scanning features. This skipping can be made optional, allowing users to decide whether or not to avoid duplicate scans.
Key Parameters to Evaluate
- __CreateDisposition__: If the file will be truncated or replaced, scanning is unnecessary because existing data will be deleted.
- __DesiredAccess__: Indicates the type of operation that the requesting process intends to perform. If a request is made only to retrieve attributes or security attributes, or if the file is opened only for writing, file data scanning can be skipped.
Inspecting the attributes
Your application will likely skip directories. During the BeforeOpenFile event, the information about file attributes is not available unless you enable the AlwaysRequestAttributesOnOpen configuration setting. There, you can only use the Options parameter as described below or explicitly use the CreateFileDirect method to request the attributes of the file being opened. In an AfterOpenFile event handler, the application has one more option - inspect the ExistingAttributes parameter of the event and bypass processing of directories.
Leveraging the Options Parameter
The Options parameter may help determine whether the request is for a file or a directory. It may include one of the following flags (although they are not always present in the request): * FILE_NON_DIRECTORY_FILE (0x40): Indicates the request is specifically for a file. * FILE_DIRECTORY_FILE (0x1): Indicates the request is specifically for a directory. If a process tries to open a directory, your application may skip unnecessary processing: If the entry is a file, the request will fail regardless, and if it is a directory, then your application is not interested in directories.
Caution with Extended Rules
Extended rules can help you exclude directories from processing in the kernel. But you should be careful with such rules, however; they, as well as the AlwaysRequestAttributesOnOpen configuration setting, make the driver try to open each filesystem item (file or directory) whose name matches the rule's mask in order to obtain the item's attributes and, for files, size. Thus, the performance benefit of excluding files via an extended rule may be negated by the slow speed of opening a file or directory. Performance may be higher when a file or directory is opened in the event handler using CreateFileDirect, and the obtained handle is used to perform all operations starting with checking the attribute and file size.
Using Rules Effectively
Because the bottleneck stems from an imbalance between incoming requests and processing speed, adding multiple filter instances will not help. However, using multiple rules in one instance can reduce the load on your application. * __Efficiency of Rules__: Rules are stored in a tree structure for fast processing, so even a relatively large number of rules (e.g., 30 to 40 rules) will not dramatically affect performance. At the same time, performance effects of using a larger number of rules must be analyzed and compared with the performance of firing the BeforeOpenFile event and checking file-related information in the event handler. * __Rule Ordering__: As rules are not kept in a linear list, do not expect the order in which filter and passthrough rules are added to affect processing. First, all filter rules are evaluated. If a file or directory matches any rule, then all passthrough rules are evaluated. If a passthrough rule is matched, it cancels the result of a filter rule. * __Dynamic Rule Management__: Rules in CBFilter can be dynamically added or removed, even within event handlers, provided proper synchronization is implemented (as rule lists are not thread-safe).
Example 2: Browser-Specific Rules
Instead of preloading rules for all browsers, the application can dynamically detect which browsers are being used and add process-specific rules during file open events. This approach is faster than analyzing the originator process name for every request.
Caching the Knowledge About Files
For advanced solutions in which performance is of the highest concern, it is reasonable to keep a cache of checked files so that consequent openings of the same file don't require its complete scanning (unless the data content was changed).
With this approach, once the file is scanned in a BeforeOpenFile event handler, the file information (path, size, last modification date) is added to some application-global cache. Next, the changes to the file are tracked; if they are detected, the file entry is removed from the cache. In the consequent BeforeOpenFile events, the lookup in the cache is done, and if the file is known, no data scanning takes place. The Knowledge Base contains an article about processing file changes with CBFilter; it describes how the changes in the file content can be traced with CBFS Filter.
Avoiding Concurrent Operations
It is not uncommon for a file to be opened twice concurrently. For instance, a second file open request may arrive while your application is still processing the first request for the same file. If you use a cache and the file information is not yet cached, a second scan is triggered, leading to the file being opened again.
This can result in two potential issues: * The second file open operation may fail. * If the second operation succeeds, it creates duplicate and redundant processing, wasting resources.
To prevent these inefficiencies, applications should implement proper synchronization mechanisms. A BeforeOpenFile event handler should detect when a scan for the same file is already in progress. If so, it should wait for the previous scan to complete before proceeding with any further actions.
By implementing these strategies, you can optimize the handling of file open requests, minimize delays, and maintain system performance even when processing high volumes of file-scanning operations.
Getting Started with CBFilter
You can find an evaluation version of the SDK for your platform and programming language in the Download Center. After downloading and installing the SDK, you will receive a library, sample code, and comprehensive documentation on your system. The documentation includes a "Getting Started" section with programming instructions. Additionally, free technical support is available during the evaluation phase.
We appreciate your feedback. If you have any questions, comments, or suggestions about this article please contact our support team at support@callback.com.