Two Sum | fedemengo

The two sum problem require us to find two elements in an arrays that sum to a given value. There are different approach to solve the problem depending on whether the arrays is sort or not.

A naive solution looks something like this

bool two_sum_brute_force(const vector<int> &v, int sum){
    for(int i=0; i<v.size(); ++i)
        for(int j=i+1; j<v.size(); ++j)
            if(v[i] + v[j] == sum)
                return true;
    return false;
}

This solution checks all combinations between two number and compare the result to the sum value, the time complexity is \(O(N^2)\).

Sorted array

If we are given a sorted array we can experiment and solve the problem with a binary search. For each element \(x\) in the array we need to check if an element \(sum - x\) exists.

bool two_sum_binary_search(const vector<int> &v, int sum){		// IF SORTED
    for(int i=0; i<v.size(); ++i)
        if(std::binary_search(v.begin(), v.end(), sum - v[i]))
            return true;
    return false;
}

With this we are performing a binary search for every elements, the total running time would be \(O(N\log N)\).

If the arrays is sorted we can also solve code a linear solution for the problem. This new approach use a sub-interval of the array and compare the sum of its leftmost and rightmost element (the limits of the interval) to the target sum. This require the arrays to be sorted because

If the sum is greater that the target, we can obtain a smaller sum by shrinking the interval on the right (excluding larger value)
If the sum is smaller the interval is shrunk on the left (to leave out smaller elements)

bool linear(const vector<int> &v, int sum){				// IF SORTED
    int low = 0, high = v.size()-1;
    while(low < high){
        int s = v[low] + v[high];
        if(s < sum) ++low;
        else if(s > sum) --high;
        else return true;
    }
    return false;
}

The running time for this solution is obviously \(O(N)\)

General case

If the array is not sorted, a general approach to solve the problem could be using an hash set. While iterating over all elements for each element \(x\) I check if an element with value \(sum-x\) exists, if it doesn’t exists I just put the element in the set.

bool hashing(const vector<int> &v, int sum){
    unordered_set<int> S;
    for(int x: v)
        if(S.count(sum-x)) return true;
        else S.insert(x);
    return false;
}

The running time is still “linear”, although some overhead is caused by the hash set/table, both for the time and memory usage.

Count pairs

The solutions above only return if two elements that sum up to a certain value exists. The following snippet counts how many of those pair exists

std::pair<int, bool> count(std::vector<int> &v, int index, int m) {
    std::vector<int>::iterator low, up;

    int count = 0;
    bool same = false;

    low = std::lower_bound(v.begin() + index + 1, v.end(), m - v[index]);
    up = std::upper_bound(v.begin() + index + 1, v.end(), m - v[index]);

    if(low != v.end()) {
        count = up - low;
        // if for the current element v[index] is true that 'v[index] * 2 = m' it's necessary to account for that
        // as we skip the following equal element by removing 1 from all possible values
        if(v[index] == m - v[index]) {
            same = true;
        }
    }

    return {count, same};
}

long countPair(std::vector<int> &v, int m) {
    // n log n
    std::sort(v.begin(), v.end());

    int lastE = -1;
    long total = 0;
    std::pair<int, bool> r = {0, false};
    // n * (log n)
    for(int i=0; i<v.size(); ++i) {
        int x = v[i];
        if(x == lastE) {
            if(r.second) {
                r.first--;
            }
            total += r.first;
            continue;
        }

        lastE = x;
        // log n
        r = count(v, i, m);
        total += r.first;
    }

    return total;
}

The idea is pretty straightforward: for every element, perform a binary search of the complementary element that sums up to the target value. Although the algorithm itself has a running time of \(O(N\cdot \log N)\) because of sorting, it’s possible to make it run a little bit faster by avoid recalculating the result for repeating elements. The only thing to be careful about is the case when \(element + element = target\), in that case, as the following repeating element are considered, is necessary to decrease the partial solution by one, to account for the element that is before the current.