Saturday, April 9, 2011

Median of Numbers, which is distributed across N machines.

Approach (Based upon quick sort):
  1. Designate one computer as the master and others as slave.
  2. Let the Master choose one random number, say X from its array of numbers and ask all the other slave nodes to partition their arrays based on Pivot X.
  3. Then Slave nodes report back the count of numbers less than X and count of numbers more than X (can be done in parallel).
  4. The Master node then add all the numbers received less than X and more than X and checks if the median is less than or greater than X.
  5. Now choose X2 based upon the partition data depending upon where the median might be ie in the left partition or the right partition.
  6. Continue till we get median.

1 comment: