Friday, March 23, 2012

Stream: median in stream of numbers.


Find the median in stream of numbers.


You have to maintain two heaps one max heap ( which contains the smallest N/2 elements seen till now) and one min heap ( which contains the largest N/2 elements).
Always make sure following holds
Math.abs(sizeof (max-heap) - sizeof(min-heap)) <= 1


Get Median: O(1)
If N is odd, the median is root of the heap max elements among max/min heap.
If N is even, the median is the average between the tops of the 2 heaps.


Lets trace with following stream
S = 9, 8, 2, 7, 6, 5, 3 ...


data: 9
Max heap = 9
Min heap =      -
current median = 9


data: 8
Max heap =    
     9
---------
8
Now Max heap size (2) - min-heap size (0) is > 1
So lets remove the root of max and send it to min-heap
Max heap now: 8
Min heap: 9
current median = (8+9)/2


data: 2 (is less than 8 so we will insert in max-heap)
Max heap =          8
                          2--------
Min heap = 9
current median: 8


data 7 :
is less than max-heap root (8) but inserting there would make is size greater than min heap size by 2, so we move 8 to min heap and insert 7 into max heap.
Max heap =          7
                         2---------
Min heap =             8
                             9---------
current median: (7+8)/2


data 6:
is less than max heap root (7), lets insert it there since our equality Math.abs(sizeof (max-heap) - sizeof(min-heap)) <= 1 holds.
Max heap =            7
                          2-------6
Min heap =           8
                         9-------
Current median: 7


data 5:
is less than max heap root (7) but inserting it there would violate Math.abs(sizeof (max-heap) - sizeof(min-heap)) <= 1 so, we will move 7 to min heap and insert 5 in max heap.
Max heap =         6
                       2-------5
Min heap =         7
                        8-------9
current median: (6+7)/2


data 3:
is less than max heap root (3) and inserting it there would not violate Math.abs(sizeof (max-heap) - sizeof(min-heap)) <= 1
Max heap =            6
                          5--------2
                    3--------
Min heap =             7
                            8-------9
current median = 6


Time complexity: O(nlogn) - considering we have seen n elements so far.

1 comment: