Prev : Definitions Next: Two Straightforward Solutions
|
Some more Notations & Concepts Before we move on to Discuss the algorithm, let us be introduced with some notations and concepts that will be useful in describing the algorithm. Dkn Outlier: Given an Input data set with N points, parameters n and k, a point p is a Dkn Outlier if there are no more than n-1 other points p' such that Dk(p') > Dk(p). In other words, if we rank the points in the decreasing order of the distances from their respective nearest neighbors, the topmost n points are called the Dkn Outlier. Minimum Bounding Rectangle ( MBR ): The Minimum Bounding Rectangle ( MBR ) for a set of point P in dimension d, as the name suggests, is the minimum d dimensional rectangular region with sides parallel to the axis of the data space that encloses all of the points in P. It is represented by the two endpoints of its major diagonal. Let a point p in d dimensional space is denoted as [p1, p2, . . . ,pd]. Then an MBR is defined by the 2 endpoints of the major diagonal r = [r1, r2, . . . ,rd] and r'= [r'1, r'2, . . . ,r'd] , such that ri ≤ r'i for i= 1 to d. Click on the figure below to see the MBR of the points.
Figure 1: Minimum Bounding Rectangle for the Points The points in green denote r' and r respectively. In the two dimensional case, r1' > r1 and r2' > r2. The green and red points together represent the set of points P. Note that, generally r and r' may or may not belong to P. But,in this example, they belong to P. Outlier detection is aided by computation is minimum and maximum distance between two MBRs. How ? Well, we can approximate a set of points by their MBR. Then we may prune the entire MBR by computing the lower and upper bound for Dk(p) for each point p in the MBR, if we can identify that the MBR cannot possibly contain any of the n Outliers. However, a number of notation is to be defined for the detailed understanding of the process. Also, assume that the distance metric is the square of the Euclidean Distance. MINDIST(p,R): MINDIST ( p, R) is the minimum distance between point p and rectangle R. MINDIST ( p, R) = ∑ 1≤i ≤d xi2 where
ri - pi if ri > pi xi = pi - r'i if r'i < pi 0 otherwise
For example, in the figure below, the minimum distance between a point marked in red and the MBR of the blue and green points is shown. The red points have coordinate ( p1, p2). The MINDIST is denoted as D in the figure for different positions of p. It follows the above definition.
Figure 2: MINDIST(p,R) for various p. The red points show different positions for p(p1,p2). The endpoints of the major diagonal of the MBR are r (r1,r2) and r'(r1',r2')
MAXDIST(p,R):
MAXDIST ( p, R) is the maximum distance between point p and rectangle R. MAXDIST ( p, R) = ∑ 1≤i ≤d xi2 where
r'i - pi if pi < (r'i + r'i ) / 2 xi = pi - ri if otherwise For example, in the figure below, the maximum distance between a point marked in red and the MBR of the blue and green points is shown. The red points have coordinate ( p1, p2). The MAXDIST is denoted as D in the figure for different positions of p. It follows the above definition.
Figure 3: MAXDIST(p,R) for various p. The red points show different positions for p(p1,p2). The endpoints of the major diagonal of the MBR are r (r1,r2) and r'(r1',r2') Now, we are going to define MAXDIST and MINDIST between 2 MBRs.
MINDIST(R,S):
MINDIST ( R, S ) is the minimum distance between rectangle R and rectangle S. MINDIST ( R, S) = ∑ 1≤i ≤d xi2 where
ri - s'i if ri > s'i xi = si - r'i if r'i < si 0 otherwise In other words, for each direction in d-dimension, the MINDIST is zero if the two rectangles overlap in that direction. Otherwise, the distance between the nearest points between the rectangles is taken. MAXDIST(R,S):
MAXDIST ( R, S ) is the maximum distance between rectangle R and rectangle S. MAXDIST ( R, S) = ∑ 1≤i ≤d xi2 where xi = max { | ri - s'i| , | ri' - si| } In other words, for each direction in d-dimension, the maximum distance between any two endpoints of the major diagonal of the two rectangles is taken into the calculation of MAXDIST between the two rectangles. In the following figure, the minimum and maximum distance between rectangle R and different positions of rectangle S is shown. MINDIST are shown in blue and MAXDIST are shown in red.
Figure 4: MAXDIST (R,S) is shown in red. MINDIST ( R,S) is shown in blue for 3 separate positions of S , S1, S2 and S3 Let us examine 3 different cases based on different positions of S. Let, the endpoints of the major diagonal of S (in all the 3 positions S1, S2 and S3) is s ( s1,s2) and s' ( s1',s2'). ; [ s1' > s1 and s2' > s2] CASE 1: For S1 MINDIST( R,S1) = (r2-s2') ^2 ; [As the two rectangles overlap in the x direction] MAXDIST (R,S1) = (s1'-r1) ^2 + (r2'-s2) ^2 CASE 2: For S2 MINDIST( R,S1) = (s1-r1') ^2 ; [As the two rectangles overlap in the y direction] MAXDIST (R,S1) = (s1'-r1) ^2 + (s2'-r2) ^2 CASE 3: For S3 MINDIST( R,S1) = (s1'-r1) ^2 + (s2- r2') ^2 MAXDIST (R,S1) = (s1-r1') ^2 + (s2'-r2) ^2 |
|
Prev : Definitions Next: Two Straightforward Solutions | ||
Fireworks Photo Caption |
|