RSS

Most votes on algorithm questions 9

Most votes on algorithm questions 9. #81 How to sort in-place using the merge sort algorithm? #82 Finding all possible combinations of numbers to reach a given sum #83 Mapping two integers to one, in a unique and deterministic way #84 Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C #85 How are ssl certificates verified? #86 Are there any cases where you would prefer a higher big-O time complexity algorithm over the lower one? #87 Recursion or Iteration? #88 What is the difference between LL and LR parsing? #89 Get a random number focused on center #90 Good Java graph algorithm library?

Read all the top votes questions and answers in a single page.

#81: How to sort in-place using the merge sort algorithm? (Score: 260)

Created: 2010-04-03 Last updated: 2012-10-20

Tags: arrays, algorithm, sorting, mergesort, in-place

I know the question is not too specific.

All I want is someone to tell me how to convert a normal merge sort into an in-place merge sort (or a merge sort with constant extra space overhead).

All I can find (on the net) is pages saying “it is too complex” or “out of scope of this text”.

The only known ways to merge in-place (without any extra space) are too complex to be reduced to practical program. (taken from here)

Even if it is too complex, what is the basic concept of how to make the merge sort in-place?

#81 Best answer 1 of How to sort in-place using the merge sort algorithm? (Score: 151)

Created: 2013-03-27 Last updated: 2020-12-22

Knuth left this as an exercise (Vol 3, 5.2.5). There do exist in-place merge sorts. They must be implemented carefully.

First, naive in-place merge such as described here isn’t the right solution. It downgrades the performance to O(N2).

The idea is to sort part of the array while using the rest as working area for merging.

For example like the following merge function.

void wmerge(Key* xs, int i, int m, int j, int n, int w) {
    while (i < m && j < n)
        swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
    while (i < m)
        swap(xs, w++, i++);
    while (j < n)
        swap(xs, w++, j++);
}  

It takes the array xs, the two sorted sub-arrays are represented as ranges [i, m) and [j, n) respectively. The working area starts from w. Compare with the standard merge algorithm given in most textbooks, this one exchanges the contents between the sorted sub-array and the working area. As the result, the previous working area contains the merged sorted elements, while the previous elements stored in the working area are moved to the two sub-arrays.

However, there are two constraints that must be satisfied:

  1. The work area should be within the bounds of the array. In other words, it should be big enough to hold elements exchanged in without causing any out-of-bound error.
  2. The work area can be overlapped with either of the two sorted arrays; however, it must ensure that none of the unmerged elements are overwritten.

With this merging algorithm defined, it’s easy to imagine a solution, which can sort half of the array; The next question is, how to deal with the rest of the unsorted part stored in work area as shown below:

... unsorted 1/2 array ... | ... sorted 1/2 array ...

One intuitive idea is to recursive sort another half of the working area, thus there are only 1/4 elements haven’t been sorted yet.

... unsorted 1/4 array ... | sorted 1/4 array B | sorted 1/2 array A ...

The key point at this stage is that we must merge the sorted 1/4 elements B with the sorted 1/2 elements A sooner or later.

Is the working area left, which only holds 1/4 elements, big enough to merge A and B? Unfortunately, it isn’t.

However, the second constraint mentioned above gives us a hint, that we can exploit it by arranging the working area to overlap with either sub-array if we can ensure the merging sequence that the unmerged elements won’t be overwritten.

Actually, instead of sorting the second half of the working area, we can sort the first half, and put the working area between the two sorted arrays like this:

... sorted 1/4 array B | unsorted work area | ... sorted 1/2 array A ...

This setup effectively arranges the work area overlap with the sub-array A. This idea is proposed in [Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. ``Practical in-place mergesort''. Nordic Journal of Computing, 1996].

So the only thing left is to repeat the above step, which reduces the working area from 1/2, 1/4, 1/8, … When the working area becomes small enough (for example, only two elements left), we can switch to a trivial insertion sort to end this algorithm.

Here is the implementation in ANSI C based on this paper.

void imsort(Key* xs, int l, int u);

void swap(Key* xs, int i, int j) {
    Key tmp = xs[i]; xs[i] = xs[j]; xs[j] = tmp;
}

/* 
 * sort xs[l, u), and put result to working area w. 
 * constraint, len(w) == u - l
 */
void wsort(Key* xs, int l, int u, int w) {
    int m;
    if (u - l > 1) {
        m = l + (u - l) / 2;
        imsort(xs, l, m);
        imsort(xs, m, u);
        wmerge(xs, l, m, m, u, w);
    }
    else
        while (l < u)
            swap(xs, l++, w++);
}

void imsort(Key* xs, int l, int u) {
    int m, n, w;
    if (u - l > 1) {
        m = l + (u - l) / 2;
        w = l + u - m;
        wsort(xs, l, m, w); /* the last half contains sorted elements */
        while (w - l > 2) {
            n = w;
            w = l + (n - l + 1) / 2;
            wsort(xs, w, n, l);  /* the first half of the previous working area contains sorted elements */
            wmerge(xs, l, l + n - w, n, u, w);
        }
        for (n = w; n > l; --n) /*switch to insertion sort*/
            for (m = n; m < u && xs[m] < xs[m-1]; ++m)
                swap(xs, m, m - 1);
    }
}

Where wmerge is defined previously.

The full source code can be found here and the detailed explanation can be found here

By the way, this version isn’t the fastest merge sort because it needs more swap operations. According to my test, it’s faster than the standard version, which allocates extra spaces in every recursion. But it’s slower than the optimized version, which doubles the original array in advance and uses it for further merging.

#81 Best answer 2 of How to sort in-place using the merge sort algorithm?(Score: 59)

Created: 2010-04-03 Last updated: 2019-01-24

Including its “big result”, this paper describes a couple of variants of in-place merge sort (PDF):

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.5514&rep=rep1&type=pdf

In-place sorting with fewer moves

Jyrki Katajainen, Tomi A. Pasanen

It is shown that an array of n elements can be sorted using O(1) extra space, O(n log n / log log n) element moves, and n log2n + O(n log log n) comparisons. This is the first in-place sorting algorithm requiring o(n log n) moves in the worst case while guaranteeing O(n log n) comparisons, but due to the constant factors involved the algorithm is predominantly of theoretical interest.

I think this is relevant too. I have a printout of it lying around, passed on to me by a colleague, but I haven’t read it. It seems to cover basic theory, but I’m not familiar enough with the topic to judge how comprehensively:

http://comjnl.oxfordjournals.org/cgi/content/abstract/38/8/681

Optimal Stable Merging

Antonios Symvonis

This paper shows how to stably merge two sequences A and B of sizes m and n, m ≤ n, respectively, with O(m+n) assignments, O(mlog(n/m+1)) comparisons and using only a constant amount of additional space. This result matches all known lower bounds…

See also original question in stackoverflow

#82: Finding all possible combinations of numbers to reach a given sum (Score: 259)

Created: 2011-01-08 Last updated: 2020-06-07

Tags: algorithm, search, language-agnostic, combinations, subset-sum

How would you go about testing all possible combinations of additions from a given set N of numbers so they add up to a given final number?

A brief example:

  • Set of numbers to add: N = {1,5,22,15,0,...}
  • Desired result: 12345

#82 Best answer 1 of Finding all possible combinations of numbers to reach a given sum (Score: 270)

Created: 2011-01-08 Last updated: 2016-11-11

This problem can be solved with a recursive combinations of all possible sums filtering out those that reach the target. Here is the algorithm in Python:

def subset_sum(numbers, target, partial=[]):
    s = sum(partial)

    # check if the partial sum is equals to target
    if s == target: 
        print "sum(%s)=%s" % (partial, target)
    if s >= target:
        return  # if we reach the number why bother to continue
    
    for i in range(len(numbers)):
        n = numbers[i]
        remaining = numbers[i+1:]
        subset_sum(remaining, target, partial + [n]) 
   

if __name__ == "__main__":
    subset_sum([3,9,8,4,5,7,10],15)

    #Outputs:
    #sum([3, 8, 4])=15
    #sum([3, 5, 7])=15
    #sum([8, 7])=15
    #sum([5, 10])=15

This type of algorithms are very well explained in the following Standford’s Abstract Programming lecture - this video is very recommendable to understand how recursion works to generate permutations of solutions.

Edit

The above as a generator function, making it a bit more useful. Requires Python 3.3+ because of yield from.

def subset_sum(numbers, target, partial=[], partial_sum=0):
    if partial_sum == target:
        yield partial
    if partial_sum >= target:
        return
    for i, n in enumerate(numbers):
        remaining = numbers[i + 1:]
        yield from subset_sum(remaining, target, partial + [n], partial_sum + n)

Here is the Java version of the same algorithm:

package tmp;

import java.util.ArrayList;
import java.util.Arrays;

class SumSet {
    static void sum_up_recursive(ArrayList<Integer> numbers, int target, ArrayList<Integer> partial) {
       int s = 0;
       for (int x: partial) s += x;
       if (s == target)
            System.out.println("sum("+Arrays.toString(partial.toArray())+")="+target);
       if (s >= target)
            return;
       for(int i=0;i<numbers.size();i++) {
             ArrayList<Integer> remaining = new ArrayList<Integer>();
             int n = numbers.get(i);
             for (int j=i+1; j<numbers.size();j++) remaining.add(numbers.get(j));
             ArrayList<Integer> partial_rec = new ArrayList<Integer>(partial);
             partial_rec.add(n);
             sum_up_recursive(remaining,target,partial_rec);
       }
    }
    static void sum_up(ArrayList<Integer> numbers, int target) {
        sum_up_recursive(numbers,target,new ArrayList<Integer>());
    }
    public static void main(String args[]) {
        Integer[] numbers = {3,9,8,4,5,7,10};
        int target = 15;
        sum_up(new ArrayList<Integer>(Arrays.asList(numbers)),target);
    }
}

It is exactly the same heuristic. My Java is a bit rusty but I think is easy to understand.

C# conversion of Java solution: (by @JeremyThompson)

public static void Main(string[] args)
{
    List<int> numbers = new List<int>() { 3, 9, 8, 4, 5, 7, 10 };
    int target = 15;
    sum_up(numbers, target);
}

private static void sum_up(List<int> numbers, int target)
{
    sum_up_recursive(numbers, target, new List<int>());
}

private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
    int s = 0;
    foreach (int x in partial) s += x;

    if (s == target)
        Console.WriteLine("sum(" + string.Join(",", partial.ToArray()) + ")=" + target);

    if (s >= target)
        return;

    for (int i = 0; i < numbers.Count; i++)
    {
        List<int> remaining = new List<int>();
        int n = numbers[i];
        for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);

        List<int> partial_rec = new List<int>(partial);
        partial_rec.Add(n);
        sum_up_recursive(remaining, target, partial_rec);
    }
}

Ruby solution: (by @emaillenin)

def subset_sum(numbers, target, partial=[])
  s = partial.inject 0, :+
# check if the partial sum is equals to target

  puts "sum(#{partial})=#{target}" if s == target

  return if s >= target # if we reach the number why bother to continue

  (0..(numbers.length - 1)).each do |i|
    n = numbers[i]
    remaining = numbers.drop(i+1)
    subset_sum(remaining, target, partial + [n])
  end
end

subset_sum([3,9,8,4,5,7,10],15)

Edit: complexity discussion

As others mention this is an NP-hard problem. It can be solved in exponential time O(2^n), for instance for n=10 there will be 1024 possible solutions. If the targets you are trying to reach are in a low range then this algorithm works. So for instance:

subset_sum([1,2,3,4,5,6,7,8,9,10],100000) generates 1024 branches because the target never gets to filter out possible solutions.

On the other hand subset_sum([1,2,3,4,5,6,7,8,9,10],10) generates only 175 branches, because the target to reach 10 gets to filter out many combinations.

If N and Target are big numbers one should move into an approximate version of the solution.

#82 Best answer 2 of Finding all possible combinations of numbers to reach a given sum(Score: 37)

Created: 2012-11-20 Last updated: 2017-08-23

The solution of this problem has been given a million times on the Internet. The problem is called The coin changing problem. One can find solutions at http://rosettacode.org/wiki/Count_the_coins and mathematical model of it at http://jaqm.ro/issues/volume-5,issue-2/pdfs/patterson_harmel.pdf (or Google coin change problem).

By the way, the Scala solution by Tsagadai, is interesting. This example produces either 1 or 0. As a side effect, it lists on the console all possible solutions. It displays the solution, but fails making it usable in any way.

To be as useful as possible, the code should return a List[List[Int]]in order to allow getting the number of solution (length of the list of lists), the “best” solution (the shortest list), or all the possible solutions.

Here is an example. It is very inefficient, but it is easy to understand.

object Sum extends App {

  def sumCombinations(total: Int, numbers: List[Int]): List[List[Int]] = {
    
    def add(x: (Int, List[List[Int]]), y: (Int, List[List[Int]])): (Int, List[List[Int]]) = {
      (x._1 + y._1, x._2 ::: y._2)
    }
    
    def sumCombinations(resultAcc: List[List[Int]], sumAcc: List[Int], total: Int, numbers: List[Int]): (Int, List[List[Int]]) = {
      if (numbers.isEmpty || total < 0) {
        (0, resultAcc)
      } else if (total == 0) {
        (1, sumAcc :: resultAcc)
      } else {
        add(sumCombinations(resultAcc, sumAcc, total, numbers.tail), sumCombinations(resultAcc, numbers.head :: sumAcc, total - numbers.head, numbers))
      }
    }
    
    sumCombinations(Nil, Nil, total, numbers.sortWith(_ > _))._2
  }
  
  println(sumCombinations(15, List(1, 2, 5, 10)) mkString "\n")
}

When run, it displays:

List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2)
List(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2)
List(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
List(1, 1, 1, 2, 2, 2, 2, 2, 2)
List(1, 2, 2, 2, 2, 2, 2, 2)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5)
List(1, 1, 1, 1, 1, 1, 1, 1, 2, 5)
List(1, 1, 1, 1, 1, 1, 2, 2, 5)
List(1, 1, 1, 1, 2, 2, 2, 5)
List(1, 1, 2, 2, 2, 2, 5)
List(2, 2, 2, 2, 2, 5)
List(1, 1, 1, 1, 1, 5, 5)
List(1, 1, 1, 2, 5, 5)
List(1, 2, 2, 5, 5)
List(5, 5, 5)
List(1, 1, 1, 1, 1, 10)
List(1, 1, 1, 2, 10)
List(1, 2, 2, 10)
List(5, 10)

The sumCombinations() function may be used by itself, and the result may be further analyzed to display the “best” solution (the shortest list), or the number of solutions (the number of lists).

Note that even like this, the requirements may not be fully satisfied. It might happen that the order of each list in the solution be significant. In such a case, each list would have to be duplicated as many time as there are combination of its elements. Or we might be interested only in the combinations that are different.

For example, we might consider that List(5, 10) should give two combinations: List(5, 10) and List(10, 5). For List(5, 5, 5) it could give three combinations or one only, depending on the requirements. For integers, the three permutations are equivalent, but if we are dealing with coins, like in the “coin changing problem”, they are not.

Also not stated in the requirements is the question of whether each number (or coin) may be used only once or many times. We could (and we should!) generalize the problem to a list of lists of occurrences of each number. This translates in real life into “what are the possible ways to make an certain amount of money with a set of coins (and not a set of coin values)”. The original problem is just a particular case of this one, where we have as many occurrences of each coin as needed to make the total amount with each single coin value.

See also original question in stackoverflow

#83: Mapping two integers to one, in a unique and deterministic way (Score: 253)

Created: 2009-05-28 Last updated: 2009-05-28

Tags: algorithm, mapping, integer, deterministic, math

Imagine two positive integers A and B. I want to combine these two into a single integer C.

There can be no other integers D and E which combine to C. So combining them with the addition operator doesn’t work. Eg 30 + 10 = 40 = 40 + 0 = 39 + 1 Neither does concatination work. Eg “31” + “2” = 312 = “3” + “12”

This combination operation should also be deterministic (always yield the same result with the same inputs) and should always yield an integer on either the positive or the negative side of integers.

#83 Best answer 1 of Mapping two integers to one, in a unique and deterministic way (Score: 241)

Created: 2009-05-28 Last updated: 2017-08-09

You’re looking for a bijective NxN -> N mapping. These are used for e.g. dovetailing. Have a look at this PDF for an introduction to so-called pairing functions. Wikipedia introduces a specific pairing function, namely the Cantor pairing function:

pi(k1, k2) = 1/2(k1 + k2)(k1 + k2 + 1) + k2

Three remarks:

  • As others have made clear, if you plan to implement a pairing function, you may soon find you need arbitrarily large integers (bignums).
  • If you don’t want to make a distinction between the pairs (a, b) and (b, a), then sort a and b before applying the pairing function.
  • Actually I lied. You are looking for a bijective ZxZ -> N mapping. Cantor’s function only works on non-negative numbers. This is not a problem however, because it’s easy to define a bijection f : Z -> N, like so:
  • f(n) = n * 2 if n >= 0
  • f(n) = -n * 2 - 1 if n < 0

#83 Best answer 2 of Mapping two integers to one, in a unique and deterministic way(Score: 240)

Created: 2012-12-14 Last updated: 2012-12-25

Cantor pairing function is really one of the better ones out there considering its simple, fast and space efficient, but there is something even better published at Wolfram by Matthew Szudzik, here. The limitation of Cantor pairing function (relatively) is that the range of encoded results doesn’t always stay within the limits of a 2N bit integer if the inputs are two N bit integers. That is, if my inputs are two 16 bit integers ranging from 0 to 2^16 -1, then there are 2^16 * (2^16 -1) combinations of inputs possible, so by the obvious Pigeonhole Principle, we need an output of size at least 2^16 * (2^16 -1), which is equal to 2^32 - 2^16, or in other words, a map of 32 bit numbers should be feasible ideally. This may not be of little practical importance in programming world.

Cantor pairing function:

(a + b) * (a + b + 1) / 2 + a; where a, b >= 0

The mapping for two maximum most 16 bit integers (65535, 65535) will be 8589803520 which as you see cannot be fit into 32 bits.

Enter Szudzik’s function:

a >= b ? a * a + a + b : a + b * b;  where a, b >= 0

The mapping for (65535, 65535) will now be 4294967295 which as you see is a 32 bit (0 to 2^32 -1) integer. This is where this solution is ideal, it simply utilizes every single point in that space, so nothing can get more space efficient.


Now considering the fact that we typically deal with the signed implementations of numbers of various sizes in languages/frameworks, let’s consider signed 16 bit integers ranging from -(2^15) to 2^15 -1 (later we’ll see how to extend even the ouput to span over signed range). Since a and b have to be positive they range from 0 to 2^15 - 1.

Cantor pairing function:

The mapping for two maximum most 16 bit signed integers (32767, 32767) will be 2147418112 which is just short of maximum value for signed 32 bit integer.

Now Szudzik’s function:

(32767, 32767) => 1073741823, much smaller..

Let’s account for negative integers. That’s beyond the original question I know, but just elaborating to help future visitors.

Cantor pairing function:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
(A + B) * (A + B + 1) / 2 + A;

(-32768, -32768) => 8589803520 which is Int64. 64 bit output for 16 bit inputs may be so unpardonable!!

Szudzik’s function:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
A >= B ? A * A + A + B : A + B * B;

(-32768, -32768) => 4294967295 which is 32 bit for unsigned range or 64 bit for signed range, but still better.

Now all this while the output has always been positive. In signed world, it will be even more space saving if we could transfer half the output to negative axis. You could do it like this for Szudzik’s:

A = a >= 0 ? 2 * a : -2 * a - 1;
B = b >= 0 ? 2 * b : -2 * b - 1;
C = (A >= B ? A * A + A + B : A + B * B) / 2;
a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;

(-32768, 32767) => -2147483648

(32767, -32768) => -2147450880

(0, 0) => 0	

(32767, 32767) => 2147418112

(-32768, -32768) => 2147483647

What I do: After applying a weight of 2 to the the inputs and going through the function, I then divide the ouput by two and take some of them to negative axis by multiplying by -1.

See the results, for any input in the range of a signed 16 bit number, the output lies within the limits of a signed 32 bit integer which is cool. I’m not sure how to go about the same way for Cantor pairing function but didn’t try as much as its not as efficient. Furthermore, more calculations involved in Cantor pairing function means its slower too.

Here is a C# implementation.

public static long PerfectlyHashThem(int a, int b)
{
    var A = (ulong)(a >= 0 ? 2 * (long)a : -2 * (long)a - 1);
    var B = (ulong)(b >= 0 ? 2 * (long)b : -2 * (long)b - 1);
    var C = (long)((A >= B ? A * A + A + B : A + B * B) / 2);
    return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

public static int PerfectlyHashThem(short a, short b)
{
	var A = (uint)(a >= 0 ? 2 * a : -2 * a - 1);
	var B = (uint)(b >= 0 ? 2 * b : -2 * b - 1);
	var C = (int)((A >= B ? A * A + A + B : A + B * B) / 2);
	return a < 0 && b < 0 || a >= 0 && b >= 0 ? C : -C - 1;
}

Since the intermediate calculations can exceed limits of 2N signed integer, I have used 4N integer type (the last division by 2 brings back the result to 2N).

The link I have provided on alternate solution nicely depicts a graph of the function utilizing every single point in space. Its amazing to see that you could uniquely encode a pair of coordinates to a single number reversibly! Magic world of numbers!!

See also original question in stackoverflow

#84: Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C (Score: 250)

Created: 2009-04-14 Last updated: 2020-01-29

Tags: c, algorithm, bit-manipulation

What is the most efficient algorithm to achieve the following:

0010 0000 => 0000 0100

The conversion is from MSB->LSB to LSB->MSB. All bits must be reversed; that is, this is not endianness-swapping.

#84 Best answer 1 of Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C (Score: 506)

Created: 2009-04-14 Last updated: 2018-10-01

NOTE: All algorithms below are in C, but should be portable to your language of choice (just don’t look at me when they’re not as fast :)

Options

Low Memory (32-bit int, 32-bit machine)(from here):

unsigned int
reverse(register unsigned int x)
{
	x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
	x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
	x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
	x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
	return((x >> 16) | (x << 16));

}

From the famous Bit Twiddling Hacks page:

Fastest (lookup table):

static const unsigned char BitReverseTable256[] = 
{
  0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0, 
  0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8, 
  0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4, 
  0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC, 
  0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2, 
  0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
  0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6, 
  0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
  0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
  0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9, 
  0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
  0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
  0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3, 
  0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
  0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7, 
  0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};

unsigned int v; // reverse 32-bit value, 8 bits at time
unsigned int c; // c will get v reversed

// Option 1:
c = (BitReverseTable256[v & 0xff] << 24) | 
    (BitReverseTable256[(v >> 8) & 0xff] << 16) | 
    (BitReverseTable256[(v >> 16) & 0xff] << 8) |
    (BitReverseTable256[(v >> 24) & 0xff]);

// Option 2:
unsigned char * p = (unsigned char *) &v;
unsigned char * q = (unsigned char *) &c;
q[3] = BitReverseTable256[p[0]]; 
q[2] = BitReverseTable256[p[1]]; 
q[1] = BitReverseTable256[p[2]]; 
q[0] = BitReverseTable256[p[3]];

You can extend this idea to 64-bit ints, or trade off memory for speed (assuming your L1 Data Cache is large enough), and reverse 16 bits at a time with a 64K-entry lookup table.


Others

Simple

unsigned int v;     // input bits to be reversed
unsigned int r = v & 1; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end

for (v >>= 1; v; v >>= 1)
{   
  r <<= 1;
  r |= v & 1;
  s--;
}
r <<= s; // shift when v's highest bits are zero

Faster (32-bit processor)

unsigned char b = x;
b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16; 

Faster (64-bit processor)

unsigned char b; // reverse this (8-bit) byte
b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;

If you want to do this on a 32-bit int, just reverse the bits in each byte, and reverse the order of the bytes. That is:

unsigned int toReverse;
unsigned int reversed;
unsigned char inByte0 = (toReverse & 0xFF);
unsigned char inByte1 = (toReverse & 0xFF00) >> 8;
unsigned char inByte2 = (toReverse & 0xFF0000) >> 16;
unsigned char inByte3 = (toReverse & 0xFF000000) >> 24;
reversed = (reverseBits(inByte0) << 24) | (reverseBits(inByte1) << 16) | (reverseBits(inByte2) << 8) | (reverseBits(inByte3);

Results

I benchmarked the two most promising solutions, the lookup table, and bitwise-AND (the first one). The test machine is a laptop w/ 4GB of DDR2-800 and a Core 2 Duo T7500 @ 2.4GHz, 4MB L2 Cache; YMMV. I used gcc 4.3.2 on 64-bit Linux. OpenMP (and the GCC bindings) were used for high-resolution timers.

reverse.c

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

unsigned int
reverse(register unsigned int x)
{
    x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1));
    x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2));
    x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4));
    x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8));
    return((x >> 16) | (x << 16));

}

int main()
{
    unsigned int *ints = malloc(100000000*sizeof(unsigned int));
    unsigned int *ints2 = malloc(100000000*sizeof(unsigned int));
    for(unsigned int i = 0; i < 100000000; i++)
      ints[i] = rand();
    
    unsigned int *inptr = ints;
    unsigned int *outptr = ints2;
    unsigned int *endptr = ints + 100000000;
    // Starting the time measurement
    double start = omp_get_wtime();
    // Computations to be measured
    while(inptr != endptr)
    {
      (*outptr) = reverse(*inptr);
      inptr++;
      outptr++;
    }
    // Measuring the elapsed time
    double end = omp_get_wtime();
    // Time calculation (in seconds)
    printf("Time: %f seconds\n", end-start);
    
    free(ints);
    free(ints2);
    
    return 0;
}

reverse_lookup.c

#include <stdlib.h>
#include <stdio.h>
#include <omp.h>

static const unsigned char BitReverseTable256[] = 
{
  0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0, 
  0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8, 
  0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4, 
  0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC, 
  0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2, 
  0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
  0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6, 
  0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
  0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
  0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9, 
  0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
  0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
  0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3, 
  0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
  0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7, 
  0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};

int main()
{
    unsigned int *ints = malloc(100000000*sizeof(unsigned int));
    unsigned int *ints2 = malloc(100000000*sizeof(unsigned int));
    for(unsigned int i = 0; i < 100000000; i++)
      ints[i] = rand();
    
    unsigned int *inptr = ints;
    unsigned int *outptr = ints2;
    unsigned int *endptr = ints + 100000000;
    // Starting the time measurement
    double start = omp_get_wtime();
    // Computations to be measured
    while(inptr != endptr)
    {
    unsigned int in = *inptr;  
    
    // Option 1:
    //*outptr = (BitReverseTable256[in & 0xff] << 24) | 
    //    (BitReverseTable256[(in >> 8) & 0xff] << 16) | 
    //    (BitReverseTable256[(in >> 16) & 0xff] << 8) |
    //    (BitReverseTable256[(in >> 24) & 0xff]);
    
    // Option 2:
    unsigned char * p = (unsigned char *) &(*inptr);
    unsigned char * q = (unsigned char *) &(*outptr);
    q[3] = BitReverseTable256[p[0]]; 
    q[2] = BitReverseTable256[p[1]]; 
    q[1] = BitReverseTable256[p[2]]; 
    q[0] = BitReverseTable256[p[3]];
    
      inptr++;
      outptr++;
    }
    // Measuring the elapsed time
    double end = omp_get_wtime();
    // Time calculation (in seconds)
    printf("Time: %f seconds\n", end-start);
    
    free(ints);
    free(ints2);
    
    return 0;
}

I tried both approaches at several different optimizations, ran 3 trials at each level, and each trial reversed 100 million random unsigned ints. For the lookup table option, I tried both schemes (options 1 and 2) given on the bitwise hacks page. Results are shown below.

Bitwise AND

[email protected]:~/code$ gcc -fopenmp -std=c99 -o reverse reverse.c
[email protected]:~/code$ ./reverse
Time: 2.000593 seconds
[email protected]:~/code$ ./reverse
Time: 1.938893 seconds
[email protected]:~/code$ ./reverse
Time: 1.936365 seconds
[email protected]:~/code$ gcc -fopenmp -std=c99 -O2 -o reverse reverse.c
[email protected]:~/code$ ./reverse
Time: 0.942709 seconds
[email protected]:~/code$ ./reverse
Time: 0.991104 seconds
[email protected]:~/code$ ./reverse
Time: 0.947203 seconds
[email protected]:~/code$ gcc -fopenmp -std=c99 -O3 -o reverse reverse.c
[email protected]:~/code$ ./reverse
Time: 0.922639 seconds
[email protected]:~/code$ ./reverse
Time: 0.892372 seconds
[email protected]:~/code$ ./reverse
Time: 0.891688 seconds

Lookup Table (option 1)

[email protected]:~/code$ gcc -fopenmp -std=c99 -o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.201127 seconds              
[email protected]:~/code$ ./reverse_lookup
Time: 1.196129 seconds              
[email protected]:~/code$ ./reverse_lookup
Time: 1.235972 seconds              
[email protected]:~/code$ gcc -fopenmp -std=c99 -O2 -o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 0.633042 seconds              
[email protected]:~/code$ ./reverse_lookup
Time: 0.655880 seconds              
[email protected]:~/code$ ./reverse_lookup
Time: 0.633390 seconds              
[email protected]:~/code$ gcc -fopenmp -std=c99 -O3 -o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 0.652322 seconds              
[email protected]:~/code$ ./reverse_lookup
Time: 0.631739 seconds              
[email protected]:~/code$ ./reverse_lookup
Time: 0.652431 seconds  

Lookup Table (option 2)

[email protected]:~/code$ gcc -fopenmp -std=c99 -o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.671537 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.688173 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.664662 seconds
[email protected]:~/code$ gcc -fopenmp -std=c99 -O2 -o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.049851 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.048403 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.085086 seconds
[email protected]:~/code$ gcc -fopenmp -std=c99 -O3 -o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.082223 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.053431 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.081224 seconds

Conclusion

Use the lookup table, with option 1 (byte addressing is unsurprisingly slow) if you’re concerned about performance. If you need to squeeze every last byte of memory out of your system (and you might, if you care about the performance of bit reversal), the optimized versions of the bitwise-AND approach aren’t too shabby either.

Caveat

Yes, I know the benchmark code is a complete hack. Suggestions on how to improve it are more than welcome. Things I know about:

  • I don’t have access to ICC. This may be faster (please respond in a comment if you can test this out).
  • A 64K lookup table may do well on some modern microarchitectures with large L1D.
  • -mtune=native didn’t work for -O2/-O3 (ld blew up with some crazy symbol redefinition error), so I don’t believe the generated code is tuned for my microarchitecture.
  • There may be a way to do this slightly faster with SSE. I have no idea how, but with fast replication, packed bitwise AND, and swizzling instructions, there’s got to be something there.
  • I know only enough x86 assembly to be dangerous; here’s the code GCC generated on -O3 for option 1, so somebody more knowledgable than myself can check it out:

32-bit

.L3:
movl    (%r12,%rsi), %ecx
movzbl  %cl, %eax
movzbl  BitReverseTable256(%rax), %edx
movl    %ecx, %eax
shrl    $24, %eax
mov     %eax, %eax
movzbl  BitReverseTable256(%rax), %eax
sall    $24, %edx
orl     %eax, %edx
movzbl  %ch, %eax
shrl    $16, %ecx
movzbl  BitReverseTable256(%rax), %eax
movzbl  %cl, %ecx
sall    $16, %eax
orl     %eax, %edx
movzbl  BitReverseTable256(%rcx), %eax
sall    $8, %eax
orl     %eax, %edx
movl    %edx, (%r13,%rsi)
addq    $4, %rsi
cmpq    $400000000, %rsi
jne     .L3

EDIT: I also tried using uint64_t types on my machine to see if there was any performance boost. Performance was about 10% faster than 32-bit, and was nearly identical whether you were just using 64-bit types to reverse bits on two 32-bit int types at a time, or whether you were actually reversing bits in half as many 64-bit values. The assembly code is shown below (for the former case, reversing bits for two 32-bit int types at a time):

.L3:
movq    (%r12,%rsi), %rdx
movq    %rdx, %rax
shrq    $24, %rax
andl    $255, %eax
movzbl  BitReverseTable256(%rax), %ecx
movzbq  %dl,%rax
movzbl  BitReverseTable256(%rax), %eax
salq    $24, %rax
orq     %rax, %rcx
movq    %rdx, %rax
shrq    $56, %rax
movzbl  BitReverseTable256(%rax), %eax
salq    $32, %rax
orq     %rax, %rcx
movzbl  %dh, %eax
shrq    $16, %rdx
movzbl  BitReverseTable256(%rax), %eax
salq    $16, %rax
orq     %rax, %rcx
movzbq  %dl,%rax
shrq    $16, %rdx
movzbl  BitReverseTable256(%rax), %eax
salq    $8, %rax
orq     %rax, %rcx
movzbq  %dl,%rax
shrq    $8, %rdx
movzbl  BitReverseTable256(%rax), %eax
salq    $56, %rax
orq     %rax, %rcx
movzbq  %dl,%rax
shrq    $8, %rdx
movzbl  BitReverseTable256(%rax), %eax
andl    $255, %edx
salq    $48, %rax
orq     %rax, %rcx
movzbl  BitReverseTable256(%rdx), %eax
salq    $40, %rax
orq     %rax, %rcx
movq    %rcx, (%r13,%rsi)
addq    $8, %rsi
cmpq    $400000000, %rsi
jne     .L3

#84 Best answer 2 of Efficient Algorithm for Bit Reversal (from MSB->LSB to LSB->MSB) in C(Score: 85)

Created: 2014-06-05 Last updated: 2018-10-01

This thread caught my attention since it deals with a simple problem that requires a lot of work (CPU cycles) even for a modern CPU. And one day I also stood there with the same ¤#%"#" problem. I had to flip millions of bytes. However I know all my target systems are modern Intel-based so let’s start optimizing to the extreme!!!

So I used Matt J’s lookup code as the base. the system I’m benchmarking on is a i7 haswell 4700eq.

Matt J’s lookup bitflipping 400 000 000 bytes: Around 0.272 seconds.

I then went ahead and tried to see if Intel’s ISPC compiler could vectorise the arithmetics in the reverse.c.

I’m not going to bore you with my findings here since I tried a lot to help the compiler find stuff, anyhow I ended up with performance of around 0.15 seconds to bitflip 400 000 000 bytes. It’s a great reduction but for my application that’s still way way too slow..

So people let me present the fastest Intel based bitflipper in the world. Clocked at:

Time to bitflip 400000000 bytes: 0.050082 seconds !!!!!

// Bitflip using AVX2 - The fastest Intel based bitflip in the world!!
// Made by Anders Cedronius 2014 (anders.cedronius (you know what) gmail.com)

#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>

using namespace std;

#define DISPLAY_HEIGHT	4
#define DISPLAY_WIDTH 	32
#define NUM_DATA_BYTES	400000000

// Constants (first we got the mask, then the high order nibble look up table and last we got the low order nibble lookup table)
__attribute__ ((aligned(32))) static unsigned char k1[32*3]={
        0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,
        0x00,0x08,0x04,0x0c,0x02,0x0a,0x06,0x0e,0x01,0x09,0x05,0x0d,0x03,0x0b,0x07,0x0f,0x00,0x08,0x04,0x0c,0x02,0x0a,0x06,0x0e,0x01,0x09,0x05,0x0d,0x03,0x0b,0x07,0x0f,
        0x00,0x80,0x40,0xc0,0x20,0xa0,0x60,0xe0,0x10,0x90,0x50,0xd0,0x30,0xb0,0x70,0xf0,0x00,0x80,0x40,0xc0,0x20,0xa0,0x60,0xe0,0x10,0x90,0x50,0xd0,0x30,0xb0,0x70,0xf0
};

// The data to be bitflipped (+32 to avoid the quantization out of memory problem)
__attribute__ ((aligned(32))) static unsigned char data[NUM_DATA_BYTES+32]={};

extern "C" {
void bitflipbyte(unsigned char[],unsigned int,unsigned char[]);
}

int main()
{

	for(unsigned int i = 0; i < NUM_DATA_BYTES; i++)
	{
        data[i] = rand();
	}

	printf ("\r\nData in(start):\r\n");
	for (unsigned int j = 0; j < 4; j++)
	{
        for (unsigned int i = 0; i < DISPLAY_WIDTH; i++)
        {
            printf ("0x%02x,",data[i+(j*DISPLAY_WIDTH)]);
        }
        printf ("\r\n");
	}

	printf ("\r\nNumber of 32-byte chunks to convert: %d\r\n",(unsigned int)ceil(NUM_DATA_BYTES/32.0));

	double start_time = omp_get_wtime();
	bitflipbyte(data,(unsigned int)ceil(NUM_DATA_BYTES/32.0),k1);
	double end_time = omp_get_wtime();

	printf ("\r\nData out:\r\n");
	for (unsigned int j = 0; j < 4; j++)
	{
        for (unsigned int i = 0; i < DISPLAY_WIDTH; i++)
        {
            printf ("0x%02x,",data[i+(j*DISPLAY_WIDTH)]);
        }
        printf ("\r\n");
	}
	printf("\r\n\r\nTime to bitflip %d bytes: %f seconds\r\n\r\n",NUM_DATA_BYTES, end_time-start_time);

	// return with no errors
	return 0;
}

The printf’s are for debugging..

Here is the workhorse:

bits 64
global bitflipbyte

bitflipbyte:	
        vmovdqa 	ymm2, [rdx]
        add 		rdx, 20h
        vmovdqa 	ymm3, [rdx]
        add 		rdx, 20h
        vmovdqa 	ymm4, [rdx]
bitflipp_loop:
        vmovdqa 	ymm0, [rdi] 
        vpand 		ymm1, ymm2, ymm0 
        vpandn 		ymm0, ymm2, ymm0 
        vpsrld 		ymm0, ymm0, 4h 
        vpshufb 	ymm1, ymm4, ymm1 
        vpshufb 	ymm0, ymm3, ymm0		 
        vpor 		ymm0, ymm0, ymm1
        vmovdqa 	[rdi], ymm0
        add		rdi, 20h
        dec		rsi
        jnz		bitflipp_loop
        ret

The code takes 32 bytes then masks out the nibbles. The high nibble gets shifted right by 4. Then I use vpshufb and ymm4 / ymm3 as lookup tables. I could use a single lookup table but then I would have to shift left before ORing the nibbles together again.

There are even faster ways of flipping the bits. But I’m bound to single thread and CPU so this was the fastest I could achieve. Can you make a faster version?

Please make no comments about using the Intel C/C++ Compiler Intrinsic Equivalent commands…

See also original question in stackoverflow

#85: How are ssl certificates verified? (Score: 248)

Created: 2008-10-09 Last updated: 2008-10-16

Tags: algorithm, security, ssl, certificate

What is the series of steps needed to securely verify a ssl certificate? My (very limited) understanding is that when you visit an https site, the server sends a certificate to the client (the browser) and the browser gets the certificate’s issuer information from that certificate, then uses that to contact the issuerer, and somehow compares certificates for validity.

  • How exactly is this done?
  • What about the process makes it immune to man-in-the-middle attacks?
  • What prevents some random person from setting up their own verification service to use in man-in-the-middle attacks, so everything “looks” secure?

#85 Best answer 1 of How are ssl certificates verified? (Score: 337)

Created: 2008-10-09 Last updated: 2016-11-30

Here is a very simplified explanation:

  1. Your web browser downloads the web server’s certificate, which contains the public key of the web server. This certificate is signed with the private key of a trusted certificate authority.

  2. Your web browser comes installed with the public keys of all of the major certificate authorities. It uses this public key to verify that the web server’s certificate was indeed signed by the trusted certificate authority.

  3. The certificate contains the domain name and/or ip address of the web server. Your web browser confirms with the certificate authority that the address listed in the certificate is the one to which it has an open connection.

  4. Your web browser generates a shared symmetric key which will be used to encrypt the HTTP traffic on this connection; this is much more efficient than using public/private key encryption for everything. Your browser encrypts the symmetric key with the public key of the web server then sends it back, thus ensuring that only the web server can decrypt it, since only the web server has its private key.

Note that the certificate authority (CA) is essential to preventing man-in-the-middle attacks. However, even an unsigned certificate will prevent someone from passively listening in on your encrypted traffic, since they have no way to gain access to your shared symmetric key.

#85 Best answer 2 of How are ssl certificates verified?(Score: 63)

Created: 2009-02-05 Last updated: 2020-04-16

It’s worth noting that in addition to purchasing a certificate (as mentioned above), you can also create your own for free; this is referred to as a “self-signed certificate”. The difference between a self-signed certificate and one that’s purchased is simple: the purchased one has been signed by a Certificate Authority that your browser already knows about. In other words, your browser can easily validate the authenticity of a purchased certificate.

Unfortunately this has led to a common misconception that self-signed certificates are inherently less secure than those sold by commercial CA’s like GoDaddy and Verisign, and that you have to live with browser warnings/exceptions if you use them; this is incorrect.

If you securely distribute a self-signed certificate (or CA cert, as bobince suggested) and install it in the browsers that will use your site, it’s just as secure as one that’s purchased and is not vulnerable to man-in-the-middle attacks and cert forgery. Obviously this means that it’s only feasible if only a few people need secure access to your site (e.g., internal apps, personal blogs, etc.).

See also original question in stackoverflow

#86: Are there any cases where you would prefer a higher big-O time complexity algorithm over the lower one? (Score: 245)

Created: 2015-12-09 Last updated: 2015-12-29

Tags: algorithm, big-o, time-complexity

Are there are any cases where you would prefer O(log n) time complexity to O(1) time complexity? Or O(n) to O(log n)?

Do you have any examples?

#86 Best answer 1 of Are there any cases where you would prefer a higher big-O time complexity algorithm over the lower one? (Score: 273)

Created: 2015-12-12 Last updated: 2017-12-12

There can be many reasons to prefer an algorithm with higher big O time complexity over the lower one:

  • most of the time, lower big-O complexity is harder to achieve and requires skilled implementation, a lot of knowledge and a lot of testing.

  • big-O hides the details about a constant: algorithm that performs in 10^5 is better from big-O point of view than 1/10^5 * log(n) (O(1) vs O(log(n)), but for most reasonable n the first one will perform better. For example the best complexity for matrix multiplication is O(n^2.373) but the constant is so high that no (to my knowledge) computational libraries use it.

  • big-O makes sense when you calculate over something big. If you need to sort array of three numbers, it matters really little whether you use O(n*log(n)) or O(n^2) algorithm.

  • sometimes the advantage of the lowercase time complexity can be really negligible. For example there is a data structure tango tree which gives a O(log log N) time complexity to find an item, but there is also a binary tree which finds the same in O(log n). Even for huge numbers of n = 10^20 the difference is negligible.

  • time complexity is not everything. Imagine an algorithm that runs in O(n^2) and requires O(n^2) memory. It might be preferable over O(n^3) time and O(1) space when the n is not really big. The problem is that you can wait for a long time, but highly doubt you can find a RAM big enough to use it with your algorithm

  • parallelization is a good feature in our distributed world. There are algorithms that are easily parallelizable, and there are some that do not parallelize at all. Sometimes it makes sense to run an algorithm on 1000 commodity machines with a higher complexity than using one machine with a slightly better complexity.

  • in some places (security) a complexity can be a requirement. No one wants to have a hash algorithm that can hash blazingly fast (because then other people can bruteforce you way faster)

  • although this is not related to switch of complexity, but some of the security functions should be written in a manner to prevent timing attack. They mostly stay in the same complexity class, but are modified in a way that it always takes worse case to do something. One example is comparing that strings are equal. In most applications it makes sense to break fast if the first bytes are different, but in security you will still wait for the very end to tell the bad news.

  • somebody patented the lower-complexity algorithm and it is more economical for a company to use higher complexity than to pay money.

  • some algorithms adapt well to particular situations. Insertion sort, for example, has an average time-complexity of O(n^2), worse than quicksort or mergesort, but as an online algorithm it can efficiently sort a list of values as they are received (as user input) where most other algorithms can only efficiently operate on a complete list of values.

#86 Best answer 2 of Are there any cases where you would prefer a higher big-O time complexity algorithm over the lower one?(Score: 228)

Created: 2015-12-09 Last updated: 2015-12-11

There is always the hidden constant, which can be lower on the O(log n) algorithm. So it can work faster in practice for real-life data.

There are also space concerns (e.g. running on a toaster).

There’s also developer time concern - O(log n) may be 1000× easier to implement and verify.

See also original question in stackoverflow

#87: Recursion or Iteration? (Score: 241)

Created: 2008-09-16 Last updated: 2019-05-15

Tags: performance, algorithm, language-agnostic, recursion

Is there a performance hit if we use a loop instead of recursion or vice versa in algorithms where both can serve the same purpose? Eg: Check if the given string is a palindrome. I have seen many programmers using recursion as a means to show off when a simple iteration algorithm can fit the bill. Does the compiler play a vital role in deciding what to use?

#87 Best answer 1 of Recursion or Iteration? (Score: 398)

Created: 2008-09-16 Last updated: 2014-03-04

Loops may achieve a performance gain for your program. Recursion may achieve a performance gain for your programmer. Choose which is more important in your situation!

#87 Best answer 2 of Recursion or Iteration?(Score: 186)

Created: 2008-09-16 Last updated: 2019-05-15

It is possible that recursion will be more expensive, depending on if the recursive function is tail recursive (the last line is recursive call). Tail recursion should be recognized by the compiler and optimized to its iterative counterpart (while maintaining the concise, clear implementation you have in your code).

I would write the algorithm in the way that makes the most sense and is the clearest for the poor sucker (be it yourself or someone else) that has to maintain the code in a few months or years. If you run into performance issues, then profile your code, and then and only then look into optimizing by moving over to an iterative implementation. You may want to look into memoization and dynamic programming.

See also original question in stackoverflow

#88: What is the difference between LL and LR parsing? (Score: 241)

Created: 2011-05-12 Last updated: 2020-01-03

Tags: algorithm, parsing, ll, lr

Can anyone give me a simple example of LL parsing versus LR parsing?

#88 Best answer 1 of What is the difference between LL and LR parsing? (Score: 507)

Created: 2011-07-26 Last updated: 2015-08-10

At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol.

An LL parse is a left-to-right, leftmost derivation. That is, we consider the input symbols from the left to the right and attempt to construct a leftmost derivation. This is done by beginning at the start symbol and repeatedly expanding out the leftmost nonterminal until we arrive at the target string. An LR parse is a left-to-right, rightmost derivation, meaning that we scan from the left to right and attempt to construct a rightmost derivation. The parser continuously picks a substring of the input and attempts to reverse it back to a nonterminal.

During an LL parse, the parser continuously chooses between two actions:

  1. Predict: Based on the leftmost nonterminal and some number of lookahead tokens, choose which production ought to be applied to get closer to the input string.
  2. Match: Match the leftmost guessed terminal symbol with the leftmost unconsumed symbol of input.

As an example, given this grammar:

  • S → E
  • E → T + E
  • E → T
  • T → int

Then given the string int + int + int, an LL(2) parser (which uses two tokens of lookahead) would parse the string as follows:

Production       Input              Action
---------------------------------------------------------
S                int + int + int    Predict S -> E
E                int + int + int    Predict E -> T + E
T + E            int + int + int    Predict T -> int
int + E          int + int + int    Match int
+ E              + int + int        Match +
E                int + int          Predict E -> T + E
T + E            int + int          Predict T -> int
int + E          int + int          Match int
+ E              + int              Match +
E                int                Predict E -> T
T                int                Predict T -> int
int              int                Match int
                                    Accept

Notice that in each step we look at the leftmost symbol in our production. If it’s a terminal, we match it, and if it’s a nonterminal, we predict what it’s going to be by choosing one of the rules.

In an LR parser, there are two actions:

  1. Shift: Add the next token of input to a buffer for consideration.
  2. Reduce: Reduce a collection of terminals and nonterminals in this buffer back to some nonterminal by reversing a production.

As an example, an LR(1) parser (with one token of lookahead) might parse that same string as follows:

Workspace        Input              Action
---------------------------------------------------------
                 int + int + int    Shift
int              + int + int        Reduce T -> int
T                + int + int        Shift
T +              int + int          Shift
T + int          + int              Reduce T -> int
T + T            + int              Shift
T + T +          int                Shift
T + T + int                         Reduce T -> int
T + T + T                           Reduce E -> T
T + T + E                           Reduce E -> T + E
T + E                               Reduce E -> T + E
E                                   Reduce S -> E
S                                   Accept

The two parsing algorithms you mentioned (LL and LR) are known to have different characteristics. LL parsers tend to be easier to write by hand, but they are less powerful than LR parsers and accept a much smaller set of grammars than LR parsers do. LR parsers come in many flavors (LR(0), SLR(1), LALR(1), LR(1), IELR(1), GLR(0), etc.) and are far more powerful. They also tend to have much more complex and are almost always generated by tools like yacc or bison. LL parsers also come in many flavors (including LL(*), which is used by the ANTLR tool), though in practice LL(1) is the most-widely used.

As a shameless plug, if you’d like to learn more about LL and LR parsing, I just finished teaching a compilers course and have some handouts and lecture slides on parsing on the course website. I’d be glad to elaborate on any of them if you think it would be useful.

#88 Best answer 2 of What is the difference between LL and LR parsing?(Score: 61)

Created: 2013-08-14 Last updated: 2015-09-29

Josh Haberman in his article LL and LR Parsing Demystified claims that LL parsing directly corresponds with the Polish Notation, whereas LR corresponds to Reverse Polish Notation. The difference between PN and RPN is the order of traversing the binary tree of the equation:

binary tree of an equation

+ 1 * 2 3  // Polish (prefix) expression; pre-order traversal.
1 2 3 * +  // Reverse Polish (postfix) expression; post-order traversal.

According to Haberman, this illustrates the main difference between LL and LR parsers:

The primary difference between how LL and LR parsers operate is that an LL parser outputs a pre-order traversal of the parse tree and an LR parser outputs a post-order traversal.

For the in-depth explanation, examples and conclusions check out Haberman’s article.

See also original question in stackoverflow

#89: Get a random number focused on center (Score: 238)

Created: 2015-05-27 Last updated: 2015-07-29

Tags: javascript, algorithm, random, numbers

Is it possible to get a random number between 1-100 and keep the results mainly within the 40-60 range? I mean, it will go out of that range rarely, but I want it to be mainly within that range… Is it possible with JavaScript/jQuery?

Right now I’m just using the basic Math.random() * 100 + 1.

#89 Best answer 1 of Get a random number focused on center (Score: 396)

Created: 2015-05-28 Last updated: 2020-07-29

The simplest way would be to generate two random numbers from 0-50 and add them together.

This gives a distribution biased towards 50, in the same way rolling two dice biases towards 7.

In fact, by using a larger number of “dice” (as @Falco suggests), you can make a closer approximation to a bell-curve:

function weightedRandom(max, numDice) {
    let num = 0;
    for (let i = 0; i < numDice; i++) {
        num += Math.random() * (max/numDice);
    }    
    return num;
}

Weighted random numbers

JSFiddle: http://jsfiddle.net/797qhcza/1/

#89 Best answer 2 of Get a random number focused on center(Score: 48)

Created: 2015-05-29

You have some good answers here that give specific solutions; let me describe for you the general solution. The problem is:

  • I have a source of more-or-less uniformly distributed random numbers between 0 and 1.
  • I wish to produce a sequence of random numbers that follow a different distribution.

The general solution to this problem is to work out the quantile function of your desired distribution, and then apply the quantile function to the output of your uniform source.

The quantile function is the inverse of the integral of your desired distribution function. The distribution function is the function where the area under a portion of the curve is equal to the probability that the randomly-chosen item will be in that portion.

I give an example of how to do so here:

http://ericlippert.com/2012/02/21/generating-random-non-uniform-data/

The code in there is in C#, but the principles apply to any language; it should be straightforward to adapt the solution to JavaScript.

See also original question in stackoverflow

#90: Good Java graph algorithm library? (Score: 237)

Created: 2008-09-09 Last updated: 2013-12-07

Tags: java, algorithm, graph

Has anyone had good experiences with any Java libraries for Graph algorithms. I’ve tried JGraph and found it ok, and there are a lot of different ones in google. Are there any that people are actually using successfully in production code or would recommend?

To clarify, I’m not looking for a library that produces graphs/charts, I’m looking for one that helps with Graph algorithms, eg minimum spanning tree, Kruskal’s algorithm Nodes, Edges, etc. Ideally one with some good algorithms/data structures in a nice Java OO API.

#90 Best answer 1 of Good Java graph algorithm library? (Score: 110)

Created: 2008-09-09 Last updated: 2015-05-03

If you were using JGraph, you should give a try to JGraphT which is designed for algorithms. One of its features is visualization using the JGraph library. It’s still developed, but pretty stable. I analyzed the complexity of JGraphT algorithms some time ago. Some of them aren’t the quickest, but if you’re going to implement them on your own and need to display your graph, then it might be the best choice. I really liked using its API, when I quickly had to write an app that was working on graph and displaying it later.

#90 Best answer 2 of Good Java graph algorithm library?(Score: 66)

Created: 2009-05-12 Last updated: 2017-07-26

Summary:

See also original question in stackoverflow


Notes:
  1. This page use API to get the relevant data from stackoverflow community.
  2. Content license on this page is CC BY-SA 3.0.
  3. score = up votes - down votes.