Most votes on algorithm questions 9
Read all the top votes questions and answers in a single page.
#81: How to sort inplace using the merge sort algorithm? (Score: 260)
Created: 20100403 Last updated: 20121020
Tags: arrays, algorithm, sorting, mergesort, inplace
I know the question is not too specific.
All I want is someone to tell me how to convert a normal merge sort into an inplace merge sort (or a merge sort with constant extra space overhead).
All I can find (on the net) is pages saying “it is too complex” or “out of scope of this text”.
The only known ways to merge inplace (without any extra space) are too complex to be reduced to practical program. (taken from here)
Even if it is too complex, what is the basic concept of how to make the merge sort inplace?
#81 Best answer 1 of How to sort inplace using the merge sort algorithm? (Score: 151)
Created: 20130327 Last updated: 20201222
Knuth left this as an exercise (Vol 3, 5.2.5). There do exist inplace merge sorts. They must be implemented carefully.
First, naive inplace merge such as described here isn’t the right solution. It downgrades the performance to O(N^{2}).
The idea is to sort part of the array while using the rest as working area for merging.
For example like the following merge function.
void wmerge(Key* xs, int i, int m, int j, int n, int w) {
while (i < m && j < n)
swap(xs, w++, xs[i] < xs[j] ? i++ : j++);
while (i < m)
swap(xs, w++, i++);
while (j < n)
swap(xs, w++, j++);
}
It takes the array xs
, the two sorted subarrays are represented as ranges [i, m)
and [j, n)
respectively. The working area starts from w
. Compare with the standard merge algorithm given in most textbooks, this one exchanges the contents between the sorted subarray and the working area. As the result, the previous working area contains the merged sorted elements, while the previous elements stored in the working area are moved to the two subarrays.
However, there are two constraints that must be satisfied:
 The work area should be within the bounds of the array. In other words, it should be big enough to hold elements exchanged in without causing any outofbound error.
 The work area can be overlapped with either of the two sorted arrays; however, it must ensure that none of the unmerged elements are overwritten.
With this merging algorithm defined, it’s easy to imagine a solution, which can sort half of the array; The next question is, how to deal with the rest of the unsorted part stored in work area as shown below:
... unsorted 1/2 array ...  ... sorted 1/2 array ...
One intuitive idea is to recursive sort another half of the working area, thus there are only 1/4 elements haven’t been sorted yet.
... unsorted 1/4 array ...  sorted 1/4 array B  sorted 1/2 array A ...
The key point at this stage is that we must merge the sorted 1/4 elements B with the sorted 1/2 elements A sooner or later.
Is the working area left, which only holds 1/4 elements, big enough to merge A and B? Unfortunately, it isn’t.
However, the second constraint mentioned above gives us a hint, that we can exploit it by arranging the working area to overlap with either subarray if we can ensure the merging sequence that the unmerged elements won’t be overwritten.
Actually, instead of sorting the second half of the working area, we can sort the first half, and put the working area between the two sorted arrays like this:
... sorted 1/4 array B  unsorted work area  ... sorted 1/2 array A ...
This setup effectively arranges the work area overlap with the subarray A. This idea is proposed in [Jyrki Katajainen, Tomi Pasanen, Jukka Teuhola. ``Practical inplace mergesort''. Nordic Journal of Computing, 1996].
So the only thing left is to repeat the above step, which reduces the working area from 1/2, 1/4, 1/8, … When the working area becomes small enough (for example, only two elements left), we can switch to a trivial insertion sort to end this algorithm.
Here is the implementation in ANSI C based on this paper.
void imsort(Key* xs, int l, int u);
void swap(Key* xs, int i, int j) {
Key tmp = xs[i]; xs[i] = xs[j]; xs[j] = tmp;
}
/*
* sort xs[l, u), and put result to working area w.
* constraint, len(w) == u  l
*/
void wsort(Key* xs, int l, int u, int w) {
int m;
if (u  l > 1) {
m = l + (u  l) / 2;
imsort(xs, l, m);
imsort(xs, m, u);
wmerge(xs, l, m, m, u, w);
}
else
while (l < u)
swap(xs, l++, w++);
}
void imsort(Key* xs, int l, int u) {
int m, n, w;
if (u  l > 1) {
m = l + (u  l) / 2;
w = l + u  m;
wsort(xs, l, m, w); /* the last half contains sorted elements */
while (w  l > 2) {
n = w;
w = l + (n  l + 1) / 2;
wsort(xs, w, n, l); /* the first half of the previous working area contains sorted elements */
wmerge(xs, l, l + n  w, n, u, w);
}
for (n = w; n > l; n) /*switch to insertion sort*/
for (m = n; m < u && xs[m] < xs[m1]; ++m)
swap(xs, m, m  1);
}
}
Where wmerge is defined previously.
The full source code can be found here and the detailed explanation can be found here
By the way, this version isn’t the fastest merge sort because it needs more swap operations. According to my test, it’s faster than the standard version, which allocates extra spaces in every recursion. But it’s slower than the optimized version, which doubles the original array in advance and uses it for further merging.
#81 Best answer 2 of How to sort inplace using the merge sort algorithm?(Score: 59)
Created: 20100403 Last updated: 20190124
Including its “big result”, this paper describes a couple of variants of inplace merge sort (PDF):
http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.5514&rep=rep1&type=pdf
Inplace sorting with fewer moves
Jyrki Katajainen, Tomi A. Pasanen
It is shown that an array of n elements can be sorted using O(1) extra space, O(n log n / log log n) element moves, and n log_{2}n + O(n log log n) comparisons. This is the first inplace sorting algorithm requiring o(n log n) moves in the worst case while guaranteeing O(n log n) comparisons, but due to the constant factors involved the algorithm is predominantly of theoretical interest.
I think this is relevant too. I have a printout of it lying around, passed on to me by a colleague, but I haven’t read it. It seems to cover basic theory, but I’m not familiar enough with the topic to judge how comprehensively:
http://comjnl.oxfordjournals.org/cgi/content/abstract/38/8/681
Optimal Stable Merging
Antonios Symvonis
This paper shows how to stably merge two sequences A and B of sizes m and n, m ≤ n, respectively, with O(m+n) assignments, O(mlog(n/m+1)) comparisons and using only a constant amount of additional space. This result matches all known lower bounds…
See also original question in stackoverflow
#82: Finding all possible combinations of numbers to reach a given sum (Score: 259)
Created: 20110108 Last updated: 20200607
Tags: algorithm, search, languageagnostic, combinations, subsetsum
How would you go about testing all possible combinations of additions from a given set N
of numbers so they add up to a given final number?
A brief example:
 Set of numbers to add:
N = {1,5,22,15,0,...}
 Desired result:
12345
#82 Best answer 1 of Finding all possible combinations of numbers to reach a given sum (Score: 270)
Created: 20110108 Last updated: 20161111
This problem can be solved with a recursive combinations of all possible sums filtering out those that reach the target. Here is the algorithm in Python:
def subset_sum(numbers, target, partial=[]):
s = sum(partial)
# check if the partial sum is equals to target
if s == target:
print "sum(%s)=%s" % (partial, target)
if s >= target:
return # if we reach the number why bother to continue
for i in range(len(numbers)):
n = numbers[i]
remaining = numbers[i+1:]
subset_sum(remaining, target, partial + [n])
if __name__ == "__main__":
subset_sum([3,9,8,4,5,7,10],15)
#Outputs:
#sum([3, 8, 4])=15
#sum([3, 5, 7])=15
#sum([8, 7])=15
#sum([5, 10])=15
This type of algorithms are very well explained in the following Standford’s Abstract Programming lecture  this video is very recommendable to understand how recursion works to generate permutations of solutions.
Edit
The above as a generator function, making it a bit more useful. Requires Python 3.3+ because of yield from
.
def subset_sum(numbers, target, partial=[], partial_sum=0):
if partial_sum == target:
yield partial
if partial_sum >= target:
return
for i, n in enumerate(numbers):
remaining = numbers[i + 1:]
yield from subset_sum(remaining, target, partial + [n], partial_sum + n)
Here is the Java version of the same algorithm:
package tmp;
import java.util.ArrayList;
import java.util.Arrays;
class SumSet {
static void sum_up_recursive(ArrayList<Integer> numbers, int target, ArrayList<Integer> partial) {
int s = 0;
for (int x: partial) s += x;
if (s == target)
System.out.println("sum("+Arrays.toString(partial.toArray())+")="+target);
if (s >= target)
return;
for(int i=0;i<numbers.size();i++) {
ArrayList<Integer> remaining = new ArrayList<Integer>();
int n = numbers.get(i);
for (int j=i+1; j<numbers.size();j++) remaining.add(numbers.get(j));
ArrayList<Integer> partial_rec = new ArrayList<Integer>(partial);
partial_rec.add(n);
sum_up_recursive(remaining,target,partial_rec);
}
}
static void sum_up(ArrayList<Integer> numbers, int target) {
sum_up_recursive(numbers,target,new ArrayList<Integer>());
}
public static void main(String args[]) {
Integer[] numbers = {3,9,8,4,5,7,10};
int target = 15;
sum_up(new ArrayList<Integer>(Arrays.asList(numbers)),target);
}
}
It is exactly the same heuristic. My Java is a bit rusty but I think is easy to understand.
C# conversion of Java solution: (by @JeremyThompson)
public static void Main(string[] args)
{
List<int> numbers = new List<int>() { 3, 9, 8, 4, 5, 7, 10 };
int target = 15;
sum_up(numbers, target);
}
private static void sum_up(List<int> numbers, int target)
{
sum_up_recursive(numbers, target, new List<int>());
}
private static void sum_up_recursive(List<int> numbers, int target, List<int> partial)
{
int s = 0;
foreach (int x in partial) s += x;
if (s == target)
Console.WriteLine("sum(" + string.Join(",", partial.ToArray()) + ")=" + target);
if (s >= target)
return;
for (int i = 0; i < numbers.Count; i++)
{
List<int> remaining = new List<int>();
int n = numbers[i];
for (int j = i + 1; j < numbers.Count; j++) remaining.Add(numbers[j]);
List<int> partial_rec = new List<int>(partial);
partial_rec.Add(n);
sum_up_recursive(remaining, target, partial_rec);
}
}
Ruby solution: (by @emaillenin)
def subset_sum(numbers, target, partial=[])
s = partial.inject 0, :+
# check if the partial sum is equals to target
puts "sum(#{partial})=#{target}" if s == target
return if s >= target # if we reach the number why bother to continue
(0..(numbers.length  1)).each do i
n = numbers[i]
remaining = numbers.drop(i+1)
subset_sum(remaining, target, partial + [n])
end
end
subset_sum([3,9,8,4,5,7,10],15)
Edit: complexity discussion
As others mention this is an NPhard problem. It can be solved in exponential time O(2^n), for instance for n=10 there will be 1024 possible solutions. If the targets you are trying to reach are in a low range then this algorithm works. So for instance:
subset_sum([1,2,3,4,5,6,7,8,9,10],100000)
generates 1024 branches because the target never gets to filter out possible solutions.
On the other hand subset_sum([1,2,3,4,5,6,7,8,9,10],10)
generates only 175 branches, because the target to reach 10
gets to filter out many combinations.
If N
and Target
are big numbers one should move into an approximate version of the solution.
#82 Best answer 2 of Finding all possible combinations of numbers to reach a given sum(Score: 37)
Created: 20121120 Last updated: 20170823
The solution of this problem has been given a million times on the Internet. The problem is called The coin changing problem. One can find solutions at http://rosettacode.org/wiki/Count_the_coins and mathematical model of it at http://jaqm.ro/issues/volume5,issue2/pdfs/patterson_harmel.pdf (or Google coin change problem).
By the way, the Scala solution by Tsagadai, is interesting. This example produces either 1 or 0. As a side effect, it lists on the console all possible solutions. It displays the solution, but fails making it usable in any way.
To be as useful as possible, the code should return a List[List[Int]]
in order to allow getting the number of solution (length of the list of lists), the “best” solution (the shortest list), or all the possible solutions.
Here is an example. It is very inefficient, but it is easy to understand.
object Sum extends App {
def sumCombinations(total: Int, numbers: List[Int]): List[List[Int]] = {
def add(x: (Int, List[List[Int]]), y: (Int, List[List[Int]])): (Int, List[List[Int]]) = {
(x._1 + y._1, x._2 ::: y._2)
}
def sumCombinations(resultAcc: List[List[Int]], sumAcc: List[Int], total: Int, numbers: List[Int]): (Int, List[List[Int]]) = {
if (numbers.isEmpty  total < 0) {
(0, resultAcc)
} else if (total == 0) {
(1, sumAcc :: resultAcc)
} else {
add(sumCombinations(resultAcc, sumAcc, total, numbers.tail), sumCombinations(resultAcc, numbers.head :: sumAcc, total  numbers.head, numbers))
}
}
sumCombinations(Nil, Nil, total, numbers.sortWith(_ > _))._2
}
println(sumCombinations(15, List(1, 2, 5, 10)) mkString "\n")
}
When run, it displays:
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2)
List(1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2)
List(1, 1, 1, 1, 1, 2, 2, 2, 2, 2)
List(1, 1, 1, 2, 2, 2, 2, 2, 2)
List(1, 2, 2, 2, 2, 2, 2, 2)
List(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 5)
List(1, 1, 1, 1, 1, 1, 1, 1, 2, 5)
List(1, 1, 1, 1, 1, 1, 2, 2, 5)
List(1, 1, 1, 1, 2, 2, 2, 5)
List(1, 1, 2, 2, 2, 2, 5)
List(2, 2, 2, 2, 2, 5)
List(1, 1, 1, 1, 1, 5, 5)
List(1, 1, 1, 2, 5, 5)
List(1, 2, 2, 5, 5)
List(5, 5, 5)
List(1, 1, 1, 1, 1, 10)
List(1, 1, 1, 2, 10)
List(1, 2, 2, 10)
List(5, 10)
The sumCombinations()
function may be used by itself, and the result may be further analyzed to display the “best” solution (the shortest list), or the number of solutions (the number of lists).
Note that even like this, the requirements may not be fully satisfied. It might happen that the order of each list in the solution be significant. In such a case, each list would have to be duplicated as many time as there are combination of its elements. Or we might be interested only in the combinations that are different.
For example, we might consider that List(5, 10)
should give two combinations: List(5, 10)
and List(10, 5)
. For List(5, 5, 5)
it could give three combinations or one only, depending on the requirements. For integers, the three permutations are equivalent, but if we are dealing with coins, like in the “coin changing problem”, they are not.
Also not stated in the requirements is the question of whether each number (or coin) may be used only once or many times. We could (and we should!) generalize the problem to a list of lists of occurrences of each number. This translates in real life into “what are the possible ways to make an certain amount of money with a set of coins (and not a set of coin values)”. The original problem is just a particular case of this one, where we have as many occurrences of each coin as needed to make the total amount with each single coin value.
See also original question in stackoverflow
#83: Mapping two integers to one, in a unique and deterministic way (Score: 253)
Created: 20090528 Last updated: 20090528
Tags: algorithm, mapping, integer, deterministic, math
Imagine two positive integers A and B. I want to combine these two into a single integer C.
There can be no other integers D and E which combine to C. So combining them with the addition operator doesn’t work. Eg 30 + 10 = 40 = 40 + 0 = 39 + 1 Neither does concatination work. Eg “31” + “2” = 312 = “3” + “12”
This combination operation should also be deterministic (always yield the same result with the same inputs) and should always yield an integer on either the positive or the negative side of integers.
#83 Best answer 1 of Mapping two integers to one, in a unique and deterministic way (Score: 241)
Created: 20090528 Last updated: 20170809
You’re looking for a bijective NxN > N
mapping. These are used for e.g. dovetailing. Have a look at this PDF for an introduction to socalled pairing functions. Wikipedia introduces a specific pairing function, namely the Cantor pairing function:
Three remarks:
 As others have made clear, if you plan to implement a pairing function, you may soon find you need arbitrarily large integers (bignums).
 If you don’t want to make a distinction between the pairs (a, b) and (b, a), then sort a and b before applying the pairing function.
 Actually I lied. You are looking for a bijective
ZxZ > N
mapping. Cantor’s function only works on nonnegative numbers. This is not a problem however, because it’s easy to define a bijectionf : Z > N
, like so:  f(n) = n * 2 if n >= 0
 f(n) = n * 2  1 if n < 0
#83 Best answer 2 of Mapping two integers to one, in a unique and deterministic way(Score: 240)
Created: 20121214 Last updated: 20121225
Cantor pairing function is really one of the better ones out there considering its simple, fast and space efficient, but there is something even better published at Wolfram by Matthew Szudzik, here. The limitation of Cantor pairing function (relatively) is that the range of encoded results doesn’t always stay within the limits of a 2N
bit integer if the inputs are two N
bit integers. That is, if my inputs are two 16
bit integers ranging from 0 to 2^16 1
, then there are 2^16 * (2^16 1)
combinations of inputs possible, so by the obvious Pigeonhole Principle, we need an output of size at least 2^16 * (2^16 1)
, which is equal to 2^32  2^16
, or in other words, a map of 32
bit numbers should be feasible ideally. This may not be of little practical importance in programming world.
Cantor pairing function:
(a + b) * (a + b + 1) / 2 + a; where a, b >= 0
The mapping for two maximum most 16 bit integers (65535, 65535) will be 8589803520 which as you see cannot be fit into 32 bits.
Enter Szudzik’s function:
a >= b ? a * a + a + b : a + b * b; where a, b >= 0
The mapping for (65535, 65535) will now be 4294967295 which as you see is a 32 bit (0 to 2^32 1) integer. This is where this solution is ideal, it simply utilizes every single point in that space, so nothing can get more space efficient.
Now considering the fact that we typically deal with the signed implementations of numbers of various sizes in languages/frameworks, let’s consider signed 16
bit integers ranging from (2^15) to 2^15 1
(later we’ll see how to extend even the ouput to span over signed range). Since a
and b
have to be positive they range from 0 to 2^15  1
.
Cantor pairing function:
The mapping for two maximum most 16 bit signed integers (32767, 32767) will be 2147418112 which is just short of maximum value for signed 32 bit integer.
Now Szudzik’s function:
(32767, 32767) => 1073741823, much smaller..
Let’s account for negative integers. That’s beyond the original question I know, but just elaborating to help future visitors.
Cantor pairing function:
A = a >= 0 ? 2 * a : 2 * a  1;
B = b >= 0 ? 2 * b : 2 * b  1;
(A + B) * (A + B + 1) / 2 + A;
(32768, 32768) => 8589803520 which is Int64. 64 bit output for 16 bit inputs may be so unpardonable!!
Szudzik’s function:
A = a >= 0 ? 2 * a : 2 * a  1;
B = b >= 0 ? 2 * b : 2 * b  1;
A >= B ? A * A + A + B : A + B * B;
(32768, 32768) => 4294967295 which is 32 bit for unsigned range or 64 bit for signed range, but still better.
Now all this while the output has always been positive. In signed world, it will be even more space saving if we could transfer half the output to negative axis. You could do it like this for Szudzik’s:
A = a >= 0 ? 2 * a : 2 * a  1;
B = b >= 0 ? 2 * b : 2 * b  1;
C = (A >= B ? A * A + A + B : A + B * B) / 2;
a < 0 && b < 0  a >= 0 && b >= 0 ? C : C  1;
(32768, 32767) => 2147483648
(32767, 32768) => 2147450880
(0, 0) => 0
(32767, 32767) => 2147418112
(32768, 32768) => 2147483647
What I do: After applying a weight of 2
to the the inputs and going through the function, I then divide the ouput by two and take some of them to negative axis by multiplying by 1
.
See the results, for any input in the range of a signed 16
bit number, the output lies within the limits of a signed 32
bit integer which is cool. I’m not sure how to go about the same way for Cantor pairing function but didn’t try as much as its not as efficient. Furthermore, more calculations involved in Cantor pairing function means its slower too.
Here is a C# implementation.
public static long PerfectlyHashThem(int a, int b)
{
var A = (ulong)(a >= 0 ? 2 * (long)a : 2 * (long)a  1);
var B = (ulong)(b >= 0 ? 2 * (long)b : 2 * (long)b  1);
var C = (long)((A >= B ? A * A + A + B : A + B * B) / 2);
return a < 0 && b < 0  a >= 0 && b >= 0 ? C : C  1;
}
public static int PerfectlyHashThem(short a, short b)
{
var A = (uint)(a >= 0 ? 2 * a : 2 * a  1);
var B = (uint)(b >= 0 ? 2 * b : 2 * b  1);
var C = (int)((A >= B ? A * A + A + B : A + B * B) / 2);
return a < 0 && b < 0  a >= 0 && b >= 0 ? C : C  1;
}
Since the intermediate calculations can exceed limits of 2N
signed integer, I have used 4N
integer type (the last division by 2
brings back the result to 2N
).
The link I have provided on alternate solution nicely depicts a graph of the function utilizing every single point in space. Its amazing to see that you could uniquely encode a pair of coordinates to a single number reversibly! Magic world of numbers!!
See also original question in stackoverflow
#84: Efficient Algorithm for Bit Reversal (from MSB>LSB to LSB>MSB) in C (Score: 250)
Created: 20090414 Last updated: 20200129
Tags: c, algorithm, bitmanipulation
What is the most efficient algorithm to achieve the following:
0010 0000 => 0000 0100
The conversion is from MSB>LSB to LSB>MSB. All bits must be reversed; that is, this is not endiannessswapping.
#84 Best answer 1 of Efficient Algorithm for Bit Reversal (from MSB>LSB to LSB>MSB) in C (Score: 506)
Created: 20090414 Last updated: 20181001
NOTE: All algorithms below are in C, but should be portable to your language of choice (just don’t look at me when they’re not as fast :)
Options
Low Memory (32bit int
, 32bit machine)(from here):
unsigned int
reverse(register unsigned int x)
{
x = (((x & 0xaaaaaaaa) >> 1)  ((x & 0x55555555) << 1));
x = (((x & 0xcccccccc) >> 2)  ((x & 0x33333333) << 2));
x = (((x & 0xf0f0f0f0) >> 4)  ((x & 0x0f0f0f0f) << 4));
x = (((x & 0xff00ff00) >> 8)  ((x & 0x00ff00ff) << 8));
return((x >> 16)  (x << 16));
}
From the famous Bit Twiddling Hacks page:
Fastest (lookup table):
static const unsigned char BitReverseTable256[] =
{
0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};
unsigned int v; // reverse 32bit value, 8 bits at time
unsigned int c; // c will get v reversed
// Option 1:
c = (BitReverseTable256[v & 0xff] << 24) 
(BitReverseTable256[(v >> 8) & 0xff] << 16) 
(BitReverseTable256[(v >> 16) & 0xff] << 8) 
(BitReverseTable256[(v >> 24) & 0xff]);
// Option 2:
unsigned char * p = (unsigned char *) &v;
unsigned char * q = (unsigned char *) &c;
q[3] = BitReverseTable256[p[0]];
q[2] = BitReverseTable256[p[1]];
q[1] = BitReverseTable256[p[2]];
q[0] = BitReverseTable256[p[3]];
You can extend this idea to 64bit int
s, or trade off memory for speed (assuming your L1 Data Cache is large enough), and reverse 16 bits at a time with a 64Kentry lookup table.
Others
Simple
unsigned int v; // input bits to be reversed
unsigned int r = v & 1; // r will be reversed bits of v; first get LSB of v
int s = sizeof(v) * CHAR_BIT  1; // extra shift needed at end
for (v >>= 1; v; v >>= 1)
{
r <<= 1;
r = v & 1;
s;
}
r <<= s; // shift when v's highest bits are zero
Faster (32bit processor)
unsigned char b = x;
b = ((b * 0x0802LU & 0x22110LU)  (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16;
Faster (64bit processor)
unsigned char b; // reverse this (8bit) byte
b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023;
If you want to do this on a 32bit int
, just reverse the bits in each byte, and reverse the order of the bytes. That is:
unsigned int toReverse;
unsigned int reversed;
unsigned char inByte0 = (toReverse & 0xFF);
unsigned char inByte1 = (toReverse & 0xFF00) >> 8;
unsigned char inByte2 = (toReverse & 0xFF0000) >> 16;
unsigned char inByte3 = (toReverse & 0xFF000000) >> 24;
reversed = (reverseBits(inByte0) << 24)  (reverseBits(inByte1) << 16)  (reverseBits(inByte2) << 8)  (reverseBits(inByte3);
Results
I benchmarked the two most promising solutions, the lookup table, and bitwiseAND (the first one). The test machine is a laptop w/ 4GB of DDR2800 and a Core 2 Duo T7500 @ 2.4GHz, 4MB L2 Cache; YMMV. I used gcc 4.3.2 on 64bit Linux. OpenMP (and the GCC bindings) were used for highresolution timers.
reverse.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
unsigned int
reverse(register unsigned int x)
{
x = (((x & 0xaaaaaaaa) >> 1)  ((x & 0x55555555) << 1));
x = (((x & 0xcccccccc) >> 2)  ((x & 0x33333333) << 2));
x = (((x & 0xf0f0f0f0) >> 4)  ((x & 0x0f0f0f0f) << 4));
x = (((x & 0xff00ff00) >> 8)  ((x & 0x00ff00ff) << 8));
return((x >> 16)  (x << 16));
}
int main()
{
unsigned int *ints = malloc(100000000*sizeof(unsigned int));
unsigned int *ints2 = malloc(100000000*sizeof(unsigned int));
for(unsigned int i = 0; i < 100000000; i++)
ints[i] = rand();
unsigned int *inptr = ints;
unsigned int *outptr = ints2;
unsigned int *endptr = ints + 100000000;
// Starting the time measurement
double start = omp_get_wtime();
// Computations to be measured
while(inptr != endptr)
{
(*outptr) = reverse(*inptr);
inptr++;
outptr++;
}
// Measuring the elapsed time
double end = omp_get_wtime();
// Time calculation (in seconds)
printf("Time: %f seconds\n", endstart);
free(ints);
free(ints2);
return 0;
}
reverse_lookup.c
#include <stdlib.h>
#include <stdio.h>
#include <omp.h>
static const unsigned char BitReverseTable256[] =
{
0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0,
0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8,
0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4,
0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC,
0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2,
0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA,
0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6,
0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE,
0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1,
0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9,
0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5,
0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD,
0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3,
0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB,
0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7,
0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF
};
int main()
{
unsigned int *ints = malloc(100000000*sizeof(unsigned int));
unsigned int *ints2 = malloc(100000000*sizeof(unsigned int));
for(unsigned int i = 0; i < 100000000; i++)
ints[i] = rand();
unsigned int *inptr = ints;
unsigned int *outptr = ints2;
unsigned int *endptr = ints + 100000000;
// Starting the time measurement
double start = omp_get_wtime();
// Computations to be measured
while(inptr != endptr)
{
unsigned int in = *inptr;
// Option 1:
//*outptr = (BitReverseTable256[in & 0xff] << 24) 
// (BitReverseTable256[(in >> 8) & 0xff] << 16) 
// (BitReverseTable256[(in >> 16) & 0xff] << 8) 
// (BitReverseTable256[(in >> 24) & 0xff]);
// Option 2:
unsigned char * p = (unsigned char *) &(*inptr);
unsigned char * q = (unsigned char *) &(*outptr);
q[3] = BitReverseTable256[p[0]];
q[2] = BitReverseTable256[p[1]];
q[1] = BitReverseTable256[p[2]];
q[0] = BitReverseTable256[p[3]];
inptr++;
outptr++;
}
// Measuring the elapsed time
double end = omp_get_wtime();
// Time calculation (in seconds)
printf("Time: %f seconds\n", endstart);
free(ints);
free(ints2);
return 0;
}
I tried both approaches at several different optimizations, ran 3 trials at each level, and each trial reversed 100 million random unsigned ints
. For the lookup table option, I tried both schemes (options 1 and 2) given on the bitwise hacks page. Results are shown below.
Bitwise AND
[email protected]:~/code$ gcc fopenmp std=c99 o reverse reverse.c
[email protected]:~/code$ ./reverse
Time: 2.000593 seconds
[email protected]:~/code$ ./reverse
Time: 1.938893 seconds
[email protected]:~/code$ ./reverse
Time: 1.936365 seconds
[email protected]:~/code$ gcc fopenmp std=c99 O2 o reverse reverse.c
[email protected]:~/code$ ./reverse
Time: 0.942709 seconds
[email protected]:~/code$ ./reverse
Time: 0.991104 seconds
[email protected]:~/code$ ./reverse
Time: 0.947203 seconds
[email protected]:~/code$ gcc fopenmp std=c99 O3 o reverse reverse.c
[email protected]:~/code$ ./reverse
Time: 0.922639 seconds
[email protected]:~/code$ ./reverse
Time: 0.892372 seconds
[email protected]:~/code$ ./reverse
Time: 0.891688 seconds
Lookup Table (option 1)
[email protected]:~/code$ gcc fopenmp std=c99 o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.201127 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.196129 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.235972 seconds
[email protected]:~/code$ gcc fopenmp std=c99 O2 o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 0.633042 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 0.655880 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 0.633390 seconds
[email protected]:~/code$ gcc fopenmp std=c99 O3 o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 0.652322 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 0.631739 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 0.652431 seconds
Lookup Table (option 2)
[email protected]:~/code$ gcc fopenmp std=c99 o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.671537 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.688173 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.664662 seconds
[email protected]:~/code$ gcc fopenmp std=c99 O2 o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.049851 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.048403 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.085086 seconds
[email protected]:~/code$ gcc fopenmp std=c99 O3 o reverse_lookup reverse_lookup.c
[email protected]:~/code$ ./reverse_lookup
Time: 1.082223 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.053431 seconds
[email protected]:~/code$ ./reverse_lookup
Time: 1.081224 seconds
Conclusion
Use the lookup table, with option 1 (byte addressing is unsurprisingly slow) if you’re concerned about performance. If you need to squeeze every last byte of memory out of your system (and you might, if you care about the performance of bit reversal), the optimized versions of the bitwiseAND approach aren’t too shabby either.
Caveat
Yes, I know the benchmark code is a complete hack. Suggestions on how to improve it are more than welcome. Things I know about:
 I don’t have access to ICC. This may be faster (please respond in a comment if you can test this out).
 A 64K lookup table may do well on some modern microarchitectures with large L1D.
 mtune=native didn’t work for O2/O3 (
ld
blew up with some crazy symbol redefinition error), so I don’t believe the generated code is tuned for my microarchitecture.  There may be a way to do this slightly faster with SSE. I have no idea how, but with fast replication, packed bitwise AND, and swizzling instructions, there’s got to be something there.
 I know only enough x86 assembly to be dangerous; here’s the code GCC generated on O3 for option 1, so somebody more knowledgable than myself can check it out:
32bit
.L3:
movl (%r12,%rsi), %ecx
movzbl %cl, %eax
movzbl BitReverseTable256(%rax), %edx
movl %ecx, %eax
shrl $24, %eax
mov %eax, %eax
movzbl BitReverseTable256(%rax), %eax
sall $24, %edx
orl %eax, %edx
movzbl %ch, %eax
shrl $16, %ecx
movzbl BitReverseTable256(%rax), %eax
movzbl %cl, %ecx
sall $16, %eax
orl %eax, %edx
movzbl BitReverseTable256(%rcx), %eax
sall $8, %eax
orl %eax, %edx
movl %edx, (%r13,%rsi)
addq $4, %rsi
cmpq $400000000, %rsi
jne .L3
EDIT: I also tried using uint64_t
types on my machine to see if there was any performance boost. Performance was about 10% faster than 32bit, and was nearly identical whether you were just using 64bit types to reverse bits on two 32bit int
types at a time, or whether you were actually reversing bits in half as many 64bit values. The assembly code is shown below (for the former case, reversing bits for two 32bit int
types at a time):
.L3:
movq (%r12,%rsi), %rdx
movq %rdx, %rax
shrq $24, %rax
andl $255, %eax
movzbl BitReverseTable256(%rax), %ecx
movzbq %dl,%rax
movzbl BitReverseTable256(%rax), %eax
salq $24, %rax
orq %rax, %rcx
movq %rdx, %rax
shrq $56, %rax
movzbl BitReverseTable256(%rax), %eax
salq $32, %rax
orq %rax, %rcx
movzbl %dh, %eax
shrq $16, %rdx
movzbl BitReverseTable256(%rax), %eax
salq $16, %rax
orq %rax, %rcx
movzbq %dl,%rax
shrq $16, %rdx
movzbl BitReverseTable256(%rax), %eax
salq $8, %rax
orq %rax, %rcx
movzbq %dl,%rax
shrq $8, %rdx
movzbl BitReverseTable256(%rax), %eax
salq $56, %rax
orq %rax, %rcx
movzbq %dl,%rax
shrq $8, %rdx
movzbl BitReverseTable256(%rax), %eax
andl $255, %edx
salq $48, %rax
orq %rax, %rcx
movzbl BitReverseTable256(%rdx), %eax
salq $40, %rax
orq %rax, %rcx
movq %rcx, (%r13,%rsi)
addq $8, %rsi
cmpq $400000000, %rsi
jne .L3
#84 Best answer 2 of Efficient Algorithm for Bit Reversal (from MSB>LSB to LSB>MSB) in C(Score: 85)
Created: 20140605 Last updated: 20181001
This thread caught my attention since it deals with a simple problem that requires a lot of work (CPU cycles) even for a modern CPU. And one day I also stood there with the same ¤#%"#" problem. I had to flip millions of bytes. However I know all my target systems are modern Intelbased so let’s start optimizing to the extreme!!!
So I used Matt J’s lookup code as the base. the system I’m benchmarking on is a i7 haswell 4700eq.
Matt J’s lookup bitflipping 400 000 000 bytes: Around 0.272 seconds.
I then went ahead and tried to see if Intel’s ISPC compiler could vectorise the arithmetics in the reverse.c.
I’m not going to bore you with my findings here since I tried a lot to help the compiler find stuff, anyhow I ended up with performance of around 0.15 seconds to bitflip 400 000 000 bytes. It’s a great reduction but for my application that’s still way way too slow..
So people let me present the fastest Intel based bitflipper in the world. Clocked at:
Time to bitflip 400000000 bytes: 0.050082 seconds !!!!!
// Bitflip using AVX2  The fastest Intel based bitflip in the world!!
// Made by Anders Cedronius 2014 (anders.cedronius (you know what) gmail.com)
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <omp.h>
using namespace std;
#define DISPLAY_HEIGHT 4
#define DISPLAY_WIDTH 32
#define NUM_DATA_BYTES 400000000
// Constants (first we got the mask, then the high order nibble look up table and last we got the low order nibble lookup table)
__attribute__ ((aligned(32))) static unsigned char k1[32*3]={
0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,0x0f,
0x00,0x08,0x04,0x0c,0x02,0x0a,0x06,0x0e,0x01,0x09,0x05,0x0d,0x03,0x0b,0x07,0x0f,0x00,0x08,0x04,0x0c,0x02,0x0a,0x06,0x0e,0x01,0x09,0x05,0x0d,0x03,0x0b,0x07,0x0f,
0x00,0x80,0x40,0xc0,0x20,0xa0,0x60,0xe0,0x10,0x90,0x50,0xd0,0x30,0xb0,0x70,0xf0,0x00,0x80,0x40,0xc0,0x20,0xa0,0x60,0xe0,0x10,0x90,0x50,0xd0,0x30,0xb0,0x70,0xf0
};
// The data to be bitflipped (+32 to avoid the quantization out of memory problem)
__attribute__ ((aligned(32))) static unsigned char data[NUM_DATA_BYTES+32]={};
extern "C" {
void bitflipbyte(unsigned char[],unsigned int,unsigned char[]);
}
int main()
{
for(unsigned int i = 0; i < NUM_DATA_BYTES; i++)
{
data[i] = rand();
}
printf ("\r\nData in(start):\r\n");
for (unsigned int j = 0; j < 4; j++)
{
for (unsigned int i = 0; i < DISPLAY_WIDTH; i++)
{
printf ("0x%02x,",data[i+(j*DISPLAY_WIDTH)]);
}
printf ("\r\n");
}
printf ("\r\nNumber of 32byte chunks to convert: %d\r\n",(unsigned int)ceil(NUM_DATA_BYTES/32.0));
double start_time = omp_get_wtime();
bitflipbyte(data,(unsigned int)ceil(NUM_DATA_BYTES/32.0),k1);
double end_time = omp_get_wtime();
printf ("\r\nData out:\r\n");
for (unsigned int j = 0; j < 4; j++)
{
for (unsigned int i = 0; i < DISPLAY_WIDTH; i++)
{
printf ("0x%02x,",data[i+(j*DISPLAY_WIDTH)]);
}
printf ("\r\n");
}
printf("\r\n\r\nTime to bitflip %d bytes: %f seconds\r\n\r\n",NUM_DATA_BYTES, end_timestart_time);
// return with no errors
return 0;
}
The printf’s are for debugging..
Here is the workhorse:
bits 64
global bitflipbyte
bitflipbyte:
vmovdqa ymm2, [rdx]
add rdx, 20h
vmovdqa ymm3, [rdx]
add rdx, 20h
vmovdqa ymm4, [rdx]
bitflipp_loop:
vmovdqa ymm0, [rdi]
vpand ymm1, ymm2, ymm0
vpandn ymm0, ymm2, ymm0
vpsrld ymm0, ymm0, 4h
vpshufb ymm1, ymm4, ymm1
vpshufb ymm0, ymm3, ymm0
vpor ymm0, ymm0, ymm1
vmovdqa [rdi], ymm0
add rdi, 20h
dec rsi
jnz bitflipp_loop
ret
The code takes 32 bytes then masks out the nibbles. The high nibble gets shifted right by 4. Then I use vpshufb and ymm4 / ymm3 as lookup tables. I could use a single lookup table but then I would have to shift left before ORing the nibbles together again.
There are even faster ways of flipping the bits. But I’m bound to single thread and CPU so this was the fastest I could achieve. Can you make a faster version?
Please make no comments about using the Intel C/C++ Compiler Intrinsic Equivalent commands…
See also original question in stackoverflow
#85: How are ssl certificates verified? (Score: 248)
Created: 20081009 Last updated: 20081016
Tags: algorithm, security, ssl, certificate
What is the series of steps needed to securely verify a ssl certificate? My (very limited) understanding is that when you visit an https site, the server sends a certificate to the client (the browser) and the browser gets the certificate’s issuer information from that certificate, then uses that to contact the issuerer, and somehow compares certificates for validity.
 How exactly is this done?
 What about the process makes it immune to maninthemiddle attacks?
 What prevents some random person from setting up their own verification service to use in maninthemiddle attacks, so everything “looks” secure?
#85 Best answer 1 of How are ssl certificates verified? (Score: 337)
Created: 20081009 Last updated: 20161130
Here is a very simplified explanation:

Your web browser downloads the web server’s certificate, which contains the public key of the web server. This certificate is signed with the private key of a trusted certificate authority.

Your web browser comes installed with the public keys of all of the major certificate authorities. It uses this public key to verify that the web server’s certificate was indeed signed by the trusted certificate authority.

The certificate contains the domain name and/or ip address of the web server. Your web browser confirms with the certificate authority that the address listed in the certificate is the one to which it has an open connection.

Your web browser generates a shared symmetric key which will be used to encrypt the HTTP traffic on this connection; this is much more efficient than using public/private key encryption for everything. Your browser encrypts the symmetric key with the public key of the web server then sends it back, thus ensuring that only the web server can decrypt it, since only the web server has its private key.
Note that the certificate authority (CA) is essential to preventing maninthemiddle attacks. However, even an unsigned certificate will prevent someone from passively listening in on your encrypted traffic, since they have no way to gain access to your shared symmetric key.
#85 Best answer 2 of How are ssl certificates verified?(Score: 63)
Created: 20090205 Last updated: 20200416
It’s worth noting that in addition to purchasing a certificate (as mentioned above), you can also create your own for free; this is referred to as a “selfsigned certificate”. The difference between a selfsigned certificate and one that’s purchased is simple: the purchased one has been signed by a Certificate Authority that your browser already knows about. In other words, your browser can easily validate the authenticity of a purchased certificate.
Unfortunately this has led to a common misconception that selfsigned certificates are inherently less secure than those sold by commercial CA’s like GoDaddy and Verisign, and that you have to live with browser warnings/exceptions if you use them; this is incorrect.
If you securely distribute a selfsigned certificate (or CA cert, as bobince suggested) and install it in the browsers that will use your site, it’s just as secure as one that’s purchased and is not vulnerable to maninthemiddle attacks and cert forgery. Obviously this means that it’s only feasible if only a few people need secure access to your site (e.g., internal apps, personal blogs, etc.).
See also original question in stackoverflow
#86: Are there any cases where you would prefer a higher bigO time complexity algorithm over the lower one? (Score: 245)
Created: 20151209 Last updated: 20151229
Tags: algorithm, bigo, timecomplexity
Are there are any cases where you would prefer O(log n)
time complexity to O(1)
time complexity? Or O(n)
to O(log n)
?
Do you have any examples?
#86 Best answer 1 of Are there any cases where you would prefer a higher bigO time complexity algorithm over the lower one? (Score: 273)
Created: 20151212 Last updated: 20171212
There can be many reasons to prefer an algorithm with higher big O time complexity over the lower one:

most of the time, lower bigO complexity is harder to achieve and requires skilled implementation, a lot of knowledge and a lot of testing.

bigO hides the details about a constant: algorithm that performs in
10^5
is better from bigO point of view than1/10^5 * log(n)
(O(1)
vsO(log(n)
), but for most reasonablen
the first one will perform better. For example the best complexity for matrix multiplication isO(n^2.373)
but the constant is so high that no (to my knowledge) computational libraries use it. 
bigO makes sense when you calculate over something big. If you need to sort array of three numbers, it matters really little whether you use
O(n*log(n))
orO(n^2)
algorithm. 
sometimes the advantage of the lowercase time complexity can be really negligible. For example there is a data structure tango tree which gives a
O(log log N)
time complexity to find an item, but there is also a binary tree which finds the same inO(log n)
. Even for huge numbers ofn = 10^20
the difference is negligible. 
time complexity is not everything. Imagine an algorithm that runs in
O(n^2)
and requiresO(n^2)
memory. It might be preferable overO(n^3)
time andO(1)
space when the n is not really big. The problem is that you can wait for a long time, but highly doubt you can find a RAM big enough to use it with your algorithm 
parallelization is a good feature in our distributed world. There are algorithms that are easily parallelizable, and there are some that do not parallelize at all. Sometimes it makes sense to run an algorithm on 1000 commodity machines with a higher complexity than using one machine with a slightly better complexity.

in some places (security) a complexity can be a requirement. No one wants to have a hash algorithm that can hash blazingly fast (because then other people can bruteforce you way faster) 
although this is not related to switch of complexity, but some of the security functions should be written in a manner to prevent timing attack. They mostly stay in the same complexity class, but are modified in a way that it always takes worse case to do something. One example is comparing that strings are equal. In most applications it makes sense to break fast if the first bytes are different, but in security you will still wait for the very end to tell the bad news.

somebody patented the lowercomplexity algorithm and it is more economical for a company to use higher complexity than to pay money.

some algorithms adapt well to particular situations. Insertion sort, for example, has an average timecomplexity of
O(n^2)
, worse than quicksort or mergesort, but as an online algorithm it can efficiently sort a list of values as they are received (as user input) where most other algorithms can only efficiently operate on a complete list of values.
#86 Best answer 2 of Are there any cases where you would prefer a higher bigO time complexity algorithm over the lower one?(Score: 228)
Created: 20151209 Last updated: 20151211
There is always the hidden constant, which can be lower on the O(log n) algorithm. So it can work faster in practice for reallife data.
There are also space concerns (e.g. running on a toaster).
There’s also developer time concern  O(log n) may be 1000× easier to implement and verify.
See also original question in stackoverflow
#87: Recursion or Iteration? (Score: 241)
Created: 20080916 Last updated: 20190515
Tags: performance, algorithm, languageagnostic, recursion
Is there a performance hit if we use a loop instead of recursion or vice versa in algorithms where both can serve the same purpose? Eg: Check if the given string is a palindrome. I have seen many programmers using recursion as a means to show off when a simple iteration algorithm can fit the bill. Does the compiler play a vital role in deciding what to use?
#87 Best answer 1 of Recursion or Iteration? (Score: 398)
Created: 20080916 Last updated: 20140304
Loops may achieve a performance gain for your program. Recursion may achieve a performance gain for your programmer. Choose which is more important in your situation!
#87 Best answer 2 of Recursion or Iteration?(Score: 186)
Created: 20080916 Last updated: 20190515
It is possible that recursion will be more expensive, depending on if the recursive function is tail recursive (the last line is recursive call). Tail recursion should be recognized by the compiler and optimized to its iterative counterpart (while maintaining the concise, clear implementation you have in your code).
I would write the algorithm in the way that makes the most sense and is the clearest for the poor sucker (be it yourself or someone else) that has to maintain the code in a few months or years. If you run into performance issues, then profile your code, and then and only then look into optimizing by moving over to an iterative implementation. You may want to look into memoization and dynamic programming.
See also original question in stackoverflow
#88: What is the difference between LL and LR parsing? (Score: 241)
Created: 20110512 Last updated: 20200103
Tags: algorithm, parsing, ll, lr
Can anyone give me a simple example of LL parsing versus LR parsing?
#88 Best answer 1 of What is the difference between LL and LR parsing? (Score: 507)
Created: 20110726 Last updated: 20150810
At a high level, the difference between LL parsing and LR parsing is that LL parsers begin at the start symbol and try to apply productions to arrive at the target string, whereas LR parsers begin at the target string and try to arrive back at the start symbol.
An LL parse is a lefttoright, leftmost derivation. That is, we consider the input symbols from the left to the right and attempt to construct a leftmost derivation. This is done by beginning at the start symbol and repeatedly expanding out the leftmost nonterminal until we arrive at the target string. An LR parse is a lefttoright, rightmost derivation, meaning that we scan from the left to right and attempt to construct a rightmost derivation. The parser continuously picks a substring of the input and attempts to reverse it back to a nonterminal.
During an LL parse, the parser continuously chooses between two actions:
 Predict: Based on the leftmost nonterminal and some number of lookahead tokens, choose which production ought to be applied to get closer to the input string.
 Match: Match the leftmost guessed terminal symbol with the leftmost unconsumed symbol of input.
As an example, given this grammar:
 S → E
 E → T + E
 E → T
 T →
int
Then given the string int + int + int
, an LL(2) parser (which uses two tokens of lookahead) would parse the string as follows:
Production Input Action

S int + int + int Predict S > E
E int + int + int Predict E > T + E
T + E int + int + int Predict T > int
int + E int + int + int Match int
+ E + int + int Match +
E int + int Predict E > T + E
T + E int + int Predict T > int
int + E int + int Match int
+ E + int Match +
E int Predict E > T
T int Predict T > int
int int Match int
Accept
Notice that in each step we look at the leftmost symbol in our production. If it’s a terminal, we match it, and if it’s a nonterminal, we predict what it’s going to be by choosing one of the rules.
In an LR parser, there are two actions:
 Shift: Add the next token of input to a buffer for consideration.
 Reduce: Reduce a collection of terminals and nonterminals in this buffer back to some nonterminal by reversing a production.
As an example, an LR(1) parser (with one token of lookahead) might parse that same string as follows:
Workspace Input Action

int + int + int Shift
int + int + int Reduce T > int
T + int + int Shift
T + int + int Shift
T + int + int Reduce T > int
T + T + int Shift
T + T + int Shift
T + T + int Reduce T > int
T + T + T Reduce E > T
T + T + E Reduce E > T + E
T + E Reduce E > T + E
E Reduce S > E
S Accept
The two parsing algorithms you mentioned (LL and LR) are known to have different characteristics. LL parsers tend to be easier to write by hand, but they are less powerful than LR parsers and accept a much smaller set of grammars than LR parsers do. LR parsers come in many flavors (LR(0), SLR(1), LALR(1), LR(1), IELR(1), GLR(0), etc.) and are far more powerful. They also tend to have much more complex and are almost always generated by tools like yacc
or bison
. LL parsers also come in many flavors (including LL(*), which is used by the ANTLR
tool), though in practice LL(1) is the mostwidely used.
As a shameless plug, if you’d like to learn more about LL and LR parsing, I just finished teaching a compilers course and have some handouts and lecture slides on parsing on the course website. I’d be glad to elaborate on any of them if you think it would be useful.
#88 Best answer 2 of What is the difference between LL and LR parsing?(Score: 61)
Created: 20130814 Last updated: 20150929
Josh Haberman in his article LL and LR Parsing Demystified claims that LL parsing directly corresponds with the Polish Notation, whereas LR corresponds to Reverse Polish Notation. The difference between PN and RPN is the order of traversing the binary tree of the equation:
+ 1 * 2 3 // Polish (prefix) expression; preorder traversal.
1 2 3 * + // Reverse Polish (postfix) expression; postorder traversal.
According to Haberman, this illustrates the main difference between LL and LR parsers:
The primary difference between how LL and LR parsers operate is that an LL parser outputs a preorder traversal of the parse tree and an LR parser outputs a postorder traversal.
For the indepth explanation, examples and conclusions check out Haberman’s article.
See also original question in stackoverflow
#89: Get a random number focused on center (Score: 238)
Created: 20150527 Last updated: 20150729
Tags: javascript, algorithm, random, numbers
Is it possible to get a random number between 1100 and keep the results mainly within the 4060 range? I mean, it will go out of that range rarely, but I want it to be mainly within that range… Is it possible with JavaScript/jQuery?
Right now I’m just using the basic Math.random() * 100 + 1
.
#89 Best answer 1 of Get a random number focused on center (Score: 396)
Created: 20150528 Last updated: 20200729
The simplest way would be to generate two random numbers from 050 and add them together.
This gives a distribution biased towards 50, in the same way rolling two dice biases towards 7.
In fact, by using a larger number of “dice” (as @Falco suggests), you can make a closer approximation to a bellcurve:
function weightedRandom(max, numDice) {
let num = 0;
for (let i = 0; i < numDice; i++) {
num += Math.random() * (max/numDice);
}
return num;
}
JSFiddle: http://jsfiddle.net/797qhcza/1/
#89 Best answer 2 of Get a random number focused on center(Score: 48)
Created: 20150529
You have some good answers here that give specific solutions; let me describe for you the general solution. The problem is:
 I have a source of moreorless uniformly distributed random numbers between 0 and 1.
 I wish to produce a sequence of random numbers that follow a different distribution.
The general solution to this problem is to work out the quantile function of your desired distribution, and then apply the quantile function to the output of your uniform source.
The quantile function is the inverse of the integral of your desired distribution function. The distribution function is the function where the area under a portion of the curve is equal to the probability that the randomlychosen item will be in that portion.
I give an example of how to do so here:
http://ericlippert.com/2012/02/21/generatingrandomnonuniformdata/
The code in there is in C#, but the principles apply to any language; it should be straightforward to adapt the solution to JavaScript.
See also original question in stackoverflow
#90: Good Java graph algorithm library? (Score: 237)
Created: 20080909 Last updated: 20131207
Tags: java, algorithm, graph
Has anyone had good experiences with any Java libraries for Graph algorithms. I’ve tried JGraph and found it ok, and there are a lot of different ones in google. Are there any that people are actually using successfully in production code or would recommend?
To clarify, I’m not looking for a library that produces graphs/charts, I’m looking for one that helps with Graph algorithms, eg minimum spanning tree, Kruskal’s algorithm Nodes, Edges, etc. Ideally one with some good algorithms/data structures in a nice Java OO API.
#90 Best answer 1 of Good Java graph algorithm library? (Score: 110)
Created: 20080909 Last updated: 20150503
If you were using JGraph, you should give a try to JGraphT which is designed for algorithms. One of its features is visualization using the JGraph library. It’s still developed, but pretty stable. I analyzed the complexity of JGraphT algorithms some time ago. Some of them aren’t the quickest, but if you’re going to implement them on your own and need to display your graph, then it might be the best choice. I really liked using its API, when I quickly had to write an app that was working on graph and displaying it later.
#90 Best answer 2 of Good Java graph algorithm library?(Score: 66)
Created: 20090512 Last updated: 20170726
Summary:
 JGraphT if you are more interested in data structures and algorithms.
 JGraph if your primary focus is visualization.
 Jung, yWorks, and BFG are other things people tried using.
 Prefuse is a no no since one has to rewrite most of it.
 Google Guava if you need good datastructures only.
 Apache Commons Graph. Currently dormant, but provides implementations for many algorithms. See https://issues.apache.org/jira/browse/SANDBOX458 for a list of implemented algorithms, also compared with Jung, GraphT, Prefuse, jBPT
See also original question in stackoverflow
Notes:
 This page use API to get the relevant data from stackoverflow community.
 Content license on this page is CC BYSA 3.0.
score
=up votes
down votes
.
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that.