Getting an instant downloads statistic
Submitted on August 23, 2004
One of the most important website activity parameters is the resource access statistic. Such information is necessary for many purposes - optimizing of the website content, marketing campaigns improvements and also for some diagnostic tests. The detailed information regarding resource access statistic saved by the web server into the log file(s).
There are lots of applications and program tools such as "WebTrends Log Analyser" (by http://www.webtrends.com) which can parse the web server activity logs, compose the statistical information and finally display this information in user-friendly format. Majority of these programs can provide the information with resource access statistic during some fixed time interval. Also such report generators require some time to process the log files and prepare the statistic reports.
In this article we will provide simple ASP.NET application which can walk through the web server activity logs, parse them on a fly and finally display the summary statistic report for each fixed time interval (day, month, year) chronologically.
Log File Parsing
We need to provide access to the web server activity log files in order to allow the ASP.NET application parse them. For demo purposes we will assume that our test web server configured to save all log files to the same PC where our ASP.NET application runs. All what we need is to read the log files in an appropriate order, parse each of them and finally enumerate all occurrences of the given key phrase, lexeme or a resource name.
We also will assume that the current web server stores its log files daily and names them using the following file mask: "exYYYYMMDD.log". Where YYYY denotes the year part of the log file creation date, MM - month and DD - day correspondingly. This will allow us not to parse each log file for the extracting of the log file creation date.
Finally, the algorithm of iterating through the log files and finding all occurrences of the specified phrase is shown below:
Displaying the statistic information on the web page
The resource access statistic information can be displayed chronologically for each time interval. Such representation is helpful when you want to know the download statistics of the specified resource per each time interval (e.g, daily). The code below represents the modified version of the file enumerating algorithm from the previous chapter:
Multithreaded downloading statistic
Many users have special programs for downloading large files more effectively. Such programs (Download Managers) usually download one single web resource in multiple downloading threads simultaneously. Web server stores the corresponding log record per each downloading thread. In order to prevent our log parser from enumerating such duplicated log records we need to extract the user IP from each log record and check it for matching with all previously extracted IPs:
Source Code and working sample
The full source code of all classes described in this article can be downloaded at statistic.zip
This code is constantly being refined and improved and your comments and suggestions are always welcome.
With best regards,