Content-type: text/html

MIT Lincoln Laboratory DARPA 1998 Offline Intrusion Detection Evaluation

Preliminary Results 18 November 1998


Scores for Participant Stolfo

Please look over these results and if you have any questions or comments send email to Isaac Graf at igraf@ll.mit.edu and Richard Lippmann at rpl@ll.mit.edu. Please note that these results are preliminary. The method used to determine whether an attack was detected and the method of presentation may change before the final presentation.
This page provides links to several tables which present your scores for the test data in different ways. For the purpose of analysis, the attacks were assigned to one of four categories- denial of service, probe, user to root, or remote to local. In the tables, these categories are abbreviated as dos, probe, u2r, and r2l. Attacks used during testing are described in attack descriptions . Details concerning the evaluation are decribed in http://ideval.ll.mit.edu/ with username "ideval" and password "daRpa98!".

Raw Scores
Raw Scores: tcpdump
This table shows the raw data for each attack that successfully ran in the test data. The columns of the table provide the following information about each attack:
1) name of the attack
2) category of the attack
3,4,5) when the attack occurred (week, day, time of day)
6) whether the attack appeared in the training data (old) or is new to the test data (new)
7) whether the attack was implemented in the clear (clear) or the attacker attempted to be stealthy (stealthy)
8) whether the attack was run by hand (human) or was automated in a script (script)
9) session number of the attack (a few attacks extended over several sessions)
10) whether the attack succeeded (only those attacks that succeeded are included in the tables; attacks which did not succeed were ignored and a positive score of their sessions was not considered to be a false alarm)
11) destination machine of the attack
12) whether the attack appears in bsm data as well as tcpdump
13) The P50 average. This metric is calculated by selecting the 50% of attack connections with the highest scores and taking the mean of these top scores.
14) Histogram. A list of all the scores assigned to the sessions of the attack and the number of sessions with each score. For a given score "x" that was assigned to the attack "N" times the corresponding histogram entry would be N:x. For example, if 10 sessions were assigned the score 0.9, 5 sessions were assigned the score 0.4, and 1234 sessions were assigned the score 0.0, the histogram entry would be "10:0.9 5:0.4 1234:0.0".

Attack-Type Scores
Scores for each attack-type: Low False Alarm Rate tcpdump
Scores for each attack-type: High False Alarm Rate tcpdump
This table collapses the raw data into rows for each "attack,old/new,clear/stealthy" tuple. It indicates the number of instances and detections for each attack and also the number of times (if any) that the attack name was incorrectly attributed to a session (falsealarms). The determination as to whether an attack was detected or not was based on the following criteria. First, a threshold was set such that any score that is equal to or above the threshold is considered to be scored as an attack and any score assigned that is below the threshold is considered to be scored as a normal sesion. Performace at two different thresholds was analyzed. One threshold corresponds to a low tolerance of false alarms (one false alarm allowed per attack). The other threshold corresponds to a high tolerance of false alarms (up to 10 false alarms per attack). This threshold is lower. For each of these thresholds (or allowed rates of false alarms), the criteria for detection of an attack depended on the category of attack. For denial of service and probe attacks, credit for the attack was determined by the percentage of sessions of the attack that were scored at or above threshold. If 25% of the sessions were labeled as attack sessions, 25% credit was given for the attack. For user to root and remote to local attacks, the attack was detected if any session associated with the attack was assigned a score at or above threshold. In many cases there was no threshold that would produce exactly the number of false alarms that was desired. If the threshold used produced more than the prescribed number of false alarms, then the fraction of false alarms that would have to be thrown out to get the desired number of false alarms was calculated. The number of detections at the threshold was then multiplied by this same fraction. (This method corresponds to setting a threshold above which a session is scored as an attack and keeping a certain percentage of the sessions whose score falls out on the threshold).

Attack Scores
Scores for each attack: Low False Alarm Rate tcpdump
Scores for each attack: High False Alarm Rate tcpdump
This table is similar to the previous table except that each row corresponds to a single attack name (collapsed across the old/new and clear/stealthy categories).

Old Versus New
Comparison of Old/New attacks: Low False Alarm Rate tcpdump
Comparison of Old/New attacks: High False Alarm Rate tcpdump
This table collapses the scores within the four categories. The last line shows the total score collapsed across all attacks and also the total number of falsealarms (including the ones that were not assigned to any specific attack name).

Clear Versus Stealthy
Comparison of Clear/Stealthy attacks: Low False Alarm Rate tcpdump
Comparison of Clear/Stealthy attacks: High False Alarm Rate tcpdump
This table shows the results for each category for the old attacks and the new attacks separately.

Category Scores
Scores for each category: Low False Alarm Rate tcpdump
Scores for each category: High False Alarm Rate tcpdump
This table shows the results for each category for the clear attacks and the stealthy attacks separately.

ROC by Category
ROC's for each category: tcpdump
This table shows the ROC's by category of attack.

Total ROC
Total ROC: tcpdump
This table shows the ROC that includes all attacks.