Saturday, December 31, 2011

Finding repeated elements in an array.


Design an algorithm that, given a list of n elements in an array, finds all the elements that appear more than n/3 times in the list.
The algorithm should run in linear time. (n >=0 )
You are expected to use comparisons and achieve linear time. No hashing/excessive space/ and don't use standard linear time deterministic selection algo.

Approach:
Let N = size of the input array A.
Now we need to find all the elements in the array which occur more than (N/K) times in the input Array A.

maintain a Map (Key=A[i], and value is the frequency of Key in array A so far).
as soon as the size of the Map reaches K, decrement the value of all the Keys by 1, if the value is 1, then after decrementing it will be 0 - Delete those keys from Map.
At max you will have to do (N/K) deletions. 
In the end you will be left with atmost (K-1) unique Keys - which are candidate answers.
Now again re-iterate through the array A and report the Keys whose frequency is greater than (N/K).

Time Complexity: O(nlogk).
space complexity: O(k).

Ex:
int [] A = {5, 4, 4, 3, 2, 3, 4, 5, 5, 8};
N = 10;
K = 5;
we need to find all the elements in A which occurs more than (N/K) = 2 times.
As you iterate through the Array A, the steps would be - Imagine you are playing tetris, where same keys stack upon each other and you put the different keys in the adjacent column, when you have a row of size K - you delete that row:
When 8 is coming, the tetris would look like:
5 4
5 4 3
5 4 3 2

When 8 has dropped tetris would look like:
5 4
5 4 3
5 4 3 2 8 <= This row is now full (means its size has K=5)


After deleting the tetris snapshot would be:
5 4
5 4 3

So now 3,4,5 are your candidate elements which can occur more than (N/K) = (10/5) = 2 times in A.
Again iterate through the Array A, and report elements among 3,4,5 which occur more than 2 times.
Ans is:
Key: 4 Value: 3
Key: 5 Value: 3
you can prove that in the end at most (K-1) unique values would be remaining as candidate answer.


output for the below code:

Key: 4 Value: 6
Key: 10 Value: 6
End!






7 comments:

  1. you say no hashing , But yet use a hash table?

    ReplyDelete
  2. you say no hashing , But yet use a hash table?

    ReplyDelete
  3. what are you doing in b/w lines 79-89 ?

    In the explanation you say that

    >> Again iterate through the Array A, and report elements among 3,4,5 which occur more than 2 times. >>

    So, shouldn't this give you time = n*k ? Or you are again modifying the counts in kBagMap<> ?

    ReplyDelete


  4. heres the python code:
    let l=[1,2,3,4,5,6,1,1,2,3,4,5,6,2,1,2,47,7,8,5,5,5,5,5,5,5,6,4,12,7,78,4,5,5,5,5]
    c=max(l)

    b=[]
    for i in range(c+1):
    b.append(0)

    for i in l:
    b[i]=b[i]+1
    if b[i]>=(len(l))/3:
    print i

    ReplyDelete
  5. why there is log factor in time complexity ,can u explain that??/

    ReplyDelete
  6. Extreme cleverness! How does that come to your mind? :-)

    ReplyDelete