Most votes on algorithm questions 3

Most votes on algorithm questions 3. #21 Generate an integer that is not among four billion given ones #22 How to generate all permutations of a list? #23 What is the difference between a generative and a discriminative algorithm? #24 How to check if a number is a power of 2 #25 How can building a heap be O(n) time complexity? #26 Algorithm to return all combinations of k elements from n #27 What is the most effective way for float and double comparison? #28 A simple explanation of Naive Bayes Classification #29 What algorithms compute directions from point A to point B on a map? #30 How do you detect Credit card type based on number?

Read all the top votes questions and answers in a single page.

#21: Generate an integer that is not among four billion given ones (Score: 702)

Created: 2011-08-22 Last updated: 2020-03-10

Tags: algorithm, file, search, out-of-memory, memory-limit

I have been given this interview question:

Given an input file with four billion integers, provide an algorithm to generate an integer which is not contained in the file. Assume you have 1 GB memory. Follow up with what you would do if you have only 10 MB of memory.

My analysis:

The size of the file is 4×109×4 bytes = 16 GB.

We can do external sorting, thus letting us know the range of the integers.

My question is what is the best way to detect the missing integer in the sorted big integer sets?

My understanding (after reading all the answers):

Assuming we are talking about 32-bit integers, there are 232 = 4*109 distinct integers.

Case 1: we have 1 GB = 1 * 109 * 8 bits = 8 billion bits memory.


If we use one bit representing one distinct integer, it is enough. we don’t need sort.


int radix = 8;
byte[] bitfield = new byte[0xffffffff/radix];
void F() throws FileNotFoundException{
    Scanner in = new Scanner(new FileReader("a.txt"));
        int n = in.nextInt();
        bitfield[n/radix] |= (1 << (n%radix));

    for(int i = 0; i< bitfield.lenght; i++){
        for(int j =0; j<radix; j++){
            if( (bitfield[i] & (1<<j)) == 0) System.out.print(i*radix+j);

Case 2: 10 MB memory = 10 * 106 * 8 bits = 80 million bits


For all possible 16-bit prefixes, there are 216 number of integers = 65536, we need 216 * 4 * 8 = 2 million bits. We need build 65536 buckets. For each bucket, we need 4 bytes holding all possibilities because the worst case is all the 4 billion integers belong to the same bucket.

  1. Build the counter of each bucket through the first pass through the file.
  2. Scan the buckets, find the first one who has less than 65536 hit.
  3. Build new buckets whose high 16-bit prefixes are we found in step2 through second pass of the file
  4. Scan the buckets built in step3, find the first bucket which doesnt have a hit.

The code is very similar to above one.

Conclusion: We decrease memory through increasing file pass.

A clarification for those arriving late: The question, as asked, does not say that there is exactly one integer that is not contained in the file—at least that's not how most people interpret it. Many comments in the comment thread **are** about that variation of the task, though. Unfortunately the comment that **introduced** it to the comment thread was later deleted by its author, so now it looks like the orphaned replies to it just misunderstood everything. It's very confusing, sorry.

#21 Best answer 1 of Generate an integer that is not among four billion given ones (Score: 536)

Created: 2011-08-22 Last updated: 2020-03-10

Assuming that “integer” means 32 bits: 10 MB of space is more than enough for you to count how many numbers there are in the input file with any given 16-bit prefix, for all possible 16-bit prefixes in one pass through the input file. At least one of the buckets will have be hit less than 216 times. Do a second pass to find of which of the possible numbers in that bucket are used already.

If it means more than 32 bits, but still of bounded size: Do as above, ignoring all input numbers that happen to fall outside the (signed or unsigned; your choice) 32-bit range.

If “integer” means mathematical integer: Read through the input once and keep track of the largest number length of the longest number you’ve ever seen. When you’re done, output the maximum plus one a random number that has one more digit. (One of the numbers in the file may be a bignum that takes more than 10 MB to represent exactly, but if the input is a file, then you can at least represent the length of anything that fits in it).

#21 Best answer 2 of Generate an integer that is not among four billion given ones(Score: 202)

Created: 2011-08-23 Last updated: 2014-12-28

Statistically informed algorithms solve this problem using fewer passes than deterministic approaches.

If very large integers are allowed then one can generate a number that is likely to be unique in O(1) time. A pseudo-random 128-bit integer like a GUID will only collide with one of the existing four billion integers in the set in less than one out of every 64 billion billion billion cases.

If integers are limited to 32 bits then one can generate a number that is likely to be unique in a single pass using much less than 10 MB. The odds that a pseudo-random 32-bit integer will collide with one of the 4 billion existing integers is about 93% (4e9 / 2^32). The odds that 1000 pseudo-random integers will all collide is less than one in 12,000 billion billion billion (odds-of-one-collision ^ 1000). So if a program maintains a data structure containing 1000 pseudo-random candidates and iterates through the known integers, eliminating matches from the candidates, it is all but certain to find at least one integer that is not in the file.

See also original question in stackoverflow

#22: How to generate all permutations of a list? (Score: 687)

Created: 2008-09-19 Last updated: 2019-12-05

Tags: python, algorithm, permutation, combinatorics, python-2.5

How do you generate all the permutations of a list in Python, independently of the type of elements in that list?

For example:



permutations([1, 2])
[1, 2]
[2, 1]

permutations([1, 2, 3])
[1, 2, 3]
[1, 3, 2]
[2, 1, 3]
[2, 3, 1]
[3, 1, 2]
[3, 2, 1]

#22 Best answer 1 of How to generate all permutations of a list? (Score: 597)

Created: 2008-09-19 Last updated: 2021-01-13

There’s a function in the standard-library for this: itertools.permutations.

import itertools
list(itertools.permutations([1, 2, 3]))

If for some reason you want to implement it yourself or are just curious to know how it works, here’s one nice approach, taken from

def all_perms(elements):
    if len(elements) <=1:
        yield elements
        for perm in all_perms(elements[1:]):
            for i in range(len(elements)):
                # nb elements[0:1] works in both string and list contexts
                yield perm[:i] + elements[0:1] + perm[i:]

A couple of alternative approaches are listed in the documentation of itertools.permutations. Here’s one:

def permutations(iterable, r=None):
    # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC
    # permutations(range(3)) --> 012 021 102 120 201 210
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    if r > n:
    indices = range(n)
    cycles = range(n, n-r, -1)
    yield tuple(pool[i] for i in indices[:r])
    while n:
        for i in reversed(range(r)):
            cycles[i] -= 1
            if cycles[i] == 0:
                indices[i:] = indices[i+1:] + indices[i:i+1]
                cycles[i] = n - i
                j = cycles[i]
                indices[i], indices[-j] = indices[-j], indices[i]
                yield tuple(pool[i] for i in indices[:r])

And another, based on itertools.product:

def permutations(iterable, r=None):
    pool = tuple(iterable)
    n = len(pool)
    r = n if r is None else r
    for indices in product(range(n), repeat=r):
        if len(set(indices)) == r:
            yield tuple(pool[i] for i in indices)

#22 Best answer 2 of How to generate all permutations of a list?(Score: 347)

Created: 2008-09-19 Last updated: 2015-05-16

And in Python 2.6 onwards:

import itertools

(returned as a generator. Use list(permutations(l)) to return as a list.)

See also original question in stackoverflow

#23: What is the difference between a generative and a discriminative algorithm? (Score: 628)

Created: 2009-05-18 Last updated: 2020-12-05

Tags: algorithm, machine-learning, generative

What is the difference between a generative and a discriminative algorithm?

#23 Best answer 1 of What is the difference between a generative and a discriminative algorithm? (Score: 883)

Created: 2009-05-18 Last updated: 2018-11-03

Let’s say you have input data x and you want to classify the data into labels y. A generative model learns the joint probability distribution p(x,y) and a discriminative model learns the conditional probability distribution p(y|x) - which you should read as “the probability of y given x".

Here’s a really simple example. Suppose you have the following data in the form (x,y):

(1,0), (1,0), (2,0), (2, 1)

p(x,y) is

      y=0   y=1
x=1 | 1/2   0
x=2 | 1/4   1/4

p(y|x) is

      y=0   y=1
x=1 | 1     0
x=2 | 1/2   1/2

If you take a few minutes to stare at those two matrices, you will understand the difference between the two probability distributions.

The distribution p(y|x) is the natural distribution for classifying a given example x into a class y, which is why algorithms that model this directly are called discriminative algorithms. Generative algorithms model p(x,y), which can be transformed into p(y|x) by applying Bayes rule and then used for classification. However, the distribution p(x,y) can also be used for other purposes. For example, you could use p(x,y) to generate likely (x,y) pairs.

From the description above, you might be thinking that generative models are more generally useful and therefore better, but it’s not as simple as that. This paper is a very popular reference on the subject of discriminative vs. generative classifiers, but it’s pretty heavy going. The overall gist is that discriminative models generally outperform generative models in classification tasks.

#23 Best answer 2 of What is the difference between a generative and a discriminative algorithm?(Score: 306)

Created: 2009-12-05 Last updated: 2018-05-03

A generative algorithm models how the data was generated in order to categorize a signal. It asks the question: based on my generation assumptions, which category is most likely to generate this signal?

A discriminative algorithm does not care about how the data was generated, it simply categorizes a given signal.

See also original question in stackoverflow

#24: How to check if a number is a power of 2 (Score: 628)

Created: 2009-03-01 Last updated: 2015-03-29

Tags: c#, algorithm, math

Today I needed a simple algorithm for checking if a number is a power of 2.

The algorithm needs to be:

  1. Simple
  2. Correct for any ulong value.

I came up with this simple algorithm:

private bool IsPowerOfTwo(ulong number)
    if (number == 0)
        return false;

    for (ulong power = 1; power > 0; power = power << 1)
        // This for loop used shifting for powers of 2, meaning
        // that the value will become 0 after the last shift
        // (from binary 1000...0000 to 0000...0000) then, the 'for'
        // loop will break out.

        if (power == number)
            return true;
        if (power > number)
            return false;
    return false;

But then I thought, how about checking if log2 x is an exactly round number? But when I checked for 2^63+1, Math.Log returned exactly 63 because of rounding. So I checked if 2 to the power 63 is equal to the original number - and it is, because the calculation is done in doubles and not in exact numbers:

private bool IsPowerOfTwo_2(ulong number)
    double log = Math.Log(number, 2);
    double pow = Math.Pow(2, Math.Round(log));
    return pow == number;

This returned true for the given wrong value: 9223372036854775809.

Is there a better algorithm?

#24 Best answer 1 of How to check if a number is a power of 2 (Score: 1323)

Created: 2009-03-01 Last updated: 2018-05-16

There’s a simple trick for this problem:

bool IsPowerOfTwo(ulong x)
    return (x & (x - 1)) == 0;

Note, this function will report true for 0, which is not a power of 2. If you want to exclude that, here’s how:

bool IsPowerOfTwo(ulong x)
    return (x != 0) && ((x & (x - 1)) == 0);


First and foremost the bitwise binary & operator from MSDN definition:

Binary & operators are predefined for the integral types and bool. For integral types, & computes the logical bitwise AND of its operands. For bool operands, & computes the logical AND of its operands; that is, the result is true if and only if both its operands are true.

Now let’s take a look at how this all plays out:

The function returns boolean (true / false) and accepts one incoming parameter of type unsigned long (x, in this case). Let us for the sake of simplicity assume that someone has passed the value 4 and called the function like so:

bool b = IsPowerOfTwo(4)

Now we replace each occurrence of x with 4:

return (4 != 0) && ((4 & (4-1)) == 0);

Well we already know that 4 != 0 evals to true, so far so good. But what about:

((4 & (4-1)) == 0)

This translates to this of course:

((4 & 3) == 0)

But what exactly is 4&3?

The binary representation of 4 is 100 and the binary representation of 3 is 011 (remember the & takes the binary representation of these numbers). So we have:

100 = 4
011 = 3

Imagine these values being stacked up much like elementary addition. The & operator says that if both values are equal to 1 then the result is 1, otherwise it is 0. So 1 & 1 = 1, 1 & 0 = 0, 0 & 0 = 0, and 0 & 1 = 0. So we do the math:


The result is simply 0. So we go back and look at what our return statement now translates to:

return (4 != 0) && ((4 & 3) == 0);

Which translates now to:

return true && (0 == 0);
return true && true;

We all know that true && true is simply true, and this shows that for our example, 4 is a power of 2.

#24 Best answer 2 of How to check if a number is a power of 2(Score: 101)

Created: 2009-03-01 Last updated: 2016-02-28

Some sites that document and explain this and other bit twiddling hacks are:

And the grandaddy of them, the book “Hacker’s Delight” by Henry Warren, Jr.:

As Sean Anderson’s page explains, the expression ((x & (x - 1)) == 0) incorrectly indicates that 0 is a power of 2. He suggests to use:

(!(x & (x - 1)) && x)

to correct that problem.

See also original question in stackoverflow

#25: How can building a heap be O(n) time complexity? (Score: 616)

Created: 2012-03-18 Last updated: 2021-04-30

Tags: algorithm, big-o, heap, complexity-theory, construction

Can someone help explain how can building a heap be O(n) complexity?

Inserting an item into a heap is O(log n), and the insert is repeated n/2 times (the remainder are leaves, and can’t violate the heap property). So, this means the complexity should be O(n log n), I would think.

In other words, for each item we “heapify”, it has the potential to have to filter down (i.e., sift down) once for each level for the heap so far (which is log n levels).

What am I missing?

#25 Best answer 1 of How can building a heap be O(n) time complexity? (Score: 609)

Created: 2013-09-11 Last updated: 2021-01-19

I think there are several questions buried in this topic:

  • How do you implement buildHeap so it runs in O(n) time?
  • How do you show that buildHeap runs in O(n) time when implemented correctly?
  • Why doesn’t that same logic work to make heap sort run in O(n) time rather than O(n log n)?

How do you implement buildHeap so it runs in O(n) time?

Often, answers to these questions focus on the difference between siftUp and siftDown. Making the correct choice between siftUp and siftDown is critical to get O(n) performance for buildHeap, but does nothing to help one understand the difference between buildHeap and heapSort in general. Indeed, proper implementations of both buildHeap and heapSort will only use siftDown. The siftUp operation is only needed to perform inserts into an existing heap, so it would be used to implement a priority queue using a binary heap, for example.

I’ve written this to describe how a max heap works. This is the type of heap typically used for heap sort or for a priority queue where higher values indicate higher priority. A min heap is also useful; for example, when retrieving items with integer keys in ascending order or strings in alphabetical order. The principles are exactly the same; simply switch the sort order.

The heap property specifies that each node in a binary heap must be at least as large as both of its children. In particular, this implies that the largest item in the heap is at the root. Sifting down and sifting up are essentially the same operation in opposite directions: move an offending node until it satisfies the heap property:

  • siftDown swaps a node that is too small with its largest child (thereby moving it down) until it is at least as large as both nodes below it.
  • siftUp swaps a node that is too large with its parent (thereby moving it up) until it is no larger than the node above it.

The number of operations required for siftDown and siftUp is proportional to the distance the node may have to move. For siftDown, it is the distance to the bottom of the tree, so siftDown is expensive for nodes at the top of the tree. With siftUp, the work is proportional to the distance to the top of the tree, so siftUp is expensive for nodes at the bottom of the tree. Although both operations are O(log n) in the worst case, in a heap, only one node is at the top whereas half the nodes lie in the bottom layer. So it shouldn’t be too surprising that if we have to apply an operation to every node, we would prefer siftDown over siftUp.

The buildHeap function takes an array of unsorted items and moves them until they all satisfy the heap property, thereby producing a valid heap. There are two approaches one might take for buildHeap using the siftUp and siftDown operations we’ve described.

  1. Start at the top of the heap (the beginning of the array) and call siftUp on each item. At each step, the previously sifted items (the items before the current item in the array) form a valid heap, and sifting the next item up places it into a valid position in the heap. After sifting up each node, all items satisfy the heap property.

  2. Or, go in the opposite direction: start at the end of the array and move backwards towards the front. At each iteration, you sift an item down until it is in the correct location.

Which implementation for buildHeap is more efficient?

Both of these solutions will produce a valid heap. Unsurprisingly, the more efficient one is the second operation that uses siftDown.

Let h = log n represent the height of the heap. The work required for the siftDown approach is given by the sum

(0 * n/2) + (1 * n/4) + (2 * n/8) + ... + (h * 1).

Each term in the sum has the maximum distance a node at the given height will have to move (zero for the bottom layer, h for the root) multiplied by the number of nodes at that height. In contrast, the sum for calling siftUp on each node is

(h * n/2) + ((h-1) * n/4) + ((h-2)*n/8) + ... + (0 * 1).

It should be clear that the second sum is larger. The first term alone is hn/2 = 1/2 n log n, so this approach has complexity at best O(n log n).

How do we prove the sum for the siftDown approach is indeed O(n)?

One method (there are other analyses that also work) is to turn the finite sum into an infinite series and then use Taylor series. We may ignore the first term, which is zero:

Taylor series for buildHeap complexity

If you aren’t sure why each of those steps works, here is a justification for the process in words:

  • The terms are all positive, so the finite sum must be smaller than the infinite sum.
  • The series is equal to a power series evaluated at x=1/2.
  • That power series is equal to (a constant times) the derivative of the Taylor series for f(x)=1/(1-x).
  • x=1/2 is within the interval of convergence of that Taylor series.
  • Therefore, we can replace the Taylor series with 1/(1-x), differentiate, and evaluate to find the value of the infinite series.

Since the infinite sum is exactly n, we conclude that the finite sum is no larger, and is therefore, O(n).

Why does heap sort require O(n log n) time?

If it is possible to run buildHeap in linear time, why does heap sort require O(n log n) time? Well, heap sort consists of two stages. First, we call buildHeap on the array, which requires O(n) time if implemented optimally. The next stage is to repeatedly delete the largest item in the heap and put it at the end of the array. Because we delete an item from the heap, there is always an open spot just after the end of the heap where we can store the item. So heap sort achieves a sorted order by successively removing the next largest item and putting it into the array starting at the last position and moving towards the front. It is the complexity of this last part that dominates in heap sort. The loop looks likes this:

for (i = n - 1; i > 0; i--) {
    arr[i] = deleteMax();

Clearly, the loop runs O(n) times (n - 1 to be precise, the last item is already in place). The complexity of deleteMax for a heap is O(log n). It is typically implemented by removing the root (the largest item left in the heap) and replacing it with the last item in the heap, which is a leaf, and therefore one of the smallest items. This new root will almost certainly violate the heap property, so you have to call siftDown until you move it back into an acceptable position. This also has the effect of moving the next largest item up to the root. Notice that, in contrast to buildHeap where for most of the nodes we are calling siftDown from the bottom of the tree, we are now calling siftDown from the top of the tree on each iteration! Although the tree is shrinking, it doesn’t shrink fast enough: The height of the tree stays constant until you have removed the first half of the nodes (when you clear out the bottom layer completely). Then for the next quarter, the height is h - 1. So the total work for this second stage is

h*n/2 + (h-1)*n/4 + ... + 0 * 1.

Notice the switch: now the zero work case corresponds to a single node and the h work case corresponds to half the nodes. This sum is O(n log n) just like the inefficient version of buildHeap that is implemented using siftUp. But in this case, we have no choice since we are trying to sort and we require the next largest item be removed next.

In summary, the work for heap sort is the sum of the two stages: O(n) time for buildHeap and O(n log n) to remove each node in order, so the complexity is O(n log n). You can prove (using some ideas from information theory) that for a comparison-based sort, O(n log n) is the best you could hope for anyway, so there’s no reason to be disappointed by this or expect heap sort to achieve the O(n) time bound that buildHeap does.

#25 Best answer 2 of How can building a heap be O(n) time complexity?(Score: 333)

Created: 2012-03-18 Last updated: 2021-04-18

Your analysis is correct. However, it is not tight.

It is not really easy to explain why building a heap is a linear operation, you should better read it.

A great analysis of the algorithm can be seen here.

The main idea is that in the build_heap algorithm the actual heapify cost is not O(log n)for all elements.

When heapify is called, the running time depends on how far an element might move down in the tree before the process terminates. In other words, it depends on the height of the element in the heap. In the worst case, the element might go down all the way to the leaf level.

Let us count the work done level by level.

At the bottommost level, there are 2^(h)nodes, but we do not call heapify on any of these, so the work is 0. At the next level there are 2^(h − 1) nodes, and each might move down by 1 level. At the 3rd level from the bottom, there are 2^(h − 2) nodes, and each might move down by 2 levels.

As you can see not all heapify operations are O(log n), this is why you are getting O(n).

See also original question in stackoverflow

#26: Algorithm to return all combinations of k elements from n (Score: 601)

Created: 2008-09-24 Last updated: 2011-12-13

Tags: algorithm, combinations

I want to write a function that takes an array of letters as an argument and a number of those letters to select.

Say you provide an array of 8 letters and want to select 3 letters from that. Then you should get:

8! / ((8 - 3)! * 3!) = 56

Arrays (or words) in return consisting of 3 letters each.

#26 Best answer 1 of Algorithm to return all combinations of k elements from n (Score: 428)

Created: 2008-09-24 Last updated: 2019-05-15

Art of Computer Programming Volume 4: Fascicle 3 has a ton of these that might fit your particular situation better than how I describe.

Gray Codes

An issue that you will come across is of course memory and pretty quickly, you’ll have problems by 20 elements in your set – 20C3 = 1140. And if you want to iterate over the set it’s best to use a modified gray code algorithm so you aren’t holding all of them in memory. These generate the next combination from the previous and avoid repetitions. There are many of these for different uses. Do we want to maximize the differences between successive combinations? minimize? et cetera.

Some of the original papers describing gray codes:

  1. Some Hamilton Paths and a Minimal Change Algorithm
  2. Adjacent Interchange Combination Generation Algorithm

Here are some other papers covering the topic:

  1. An Efficient Implementation of the Eades, Hickey, Read Adjacent Interchange Combination Generation Algorithm (PDF, with code in Pascal)
  2. Combination Generators
  3. Survey of Combinatorial Gray Codes (PostScript)
  4. An Algorithm for Gray Codes

Chase’s Twiddle (algorithm)

Phillip J Chase, `Algorithm 382: Combinations of M out of N Objects' (1970)

The algorithm in C

Index of Combinations in Lexicographical Order (Buckles Algorithm 515)

You can also reference a combination by its index (in lexicographical order). Realizing that the index should be some amount of change from right to left based on the index we can construct something that should recover a combination.

So, we have a set {1,2,3,4,5,6}… and we want three elements. Let’s say {1,2,3} we can say that the difference between the elements is one and in order and minimal. {1,2,4} has one change and is lexicographically number 2. So the number of ‘changes’ in the last place accounts for one change in the lexicographical ordering. The second place, with one change {1,3,4} has one change but accounts for more change since it’s in the second place (proportional to the number of elements in the original set).

The method I’ve described is a deconstruction, as it seems, from set to the index, we need to do the reverse – which is much trickier. This is how Buckles solves the problem. I wrote some C to compute them, with minor changes – I used the index of the sets rather than a number range to represent the set, so we are always working from 0…n. Note:

  1. Since combinations are unordered, {1,3,2} = {1,2,3} –we order them to be lexicographical.
  2. This method has an implicit 0 to start the set for the first difference.

Index of Combinations in Lexicographical Order (McCaffrey)

There is another way:, its concept is easier to grasp and program but it’s without the optimizations of Buckles. Fortunately, it also does not produce duplicate combinations:

The set x_k…x_1 in N that maximizes i = C(x_1,k) + C(x_2,k-1) + … + C(x_k,1), where C(n,r) = {n choose r}.

For an example: 27 = C(6,4) + C(5,3) + C(2,2) + C(1,1). So, the 27th lexicographical combination of four things is: {1,2,5,6}, those are the indexes of whatever set you want to look at. Example below (OCaml), requires choose function, left to reader:

(* this will find the [x] combination of a [set] list when taking [k] elements *)
let combination_maccaffery set k x =
    (* maximize function -- maximize a that is aCb              *)
    (* return largest c where c < i and choose(c,i) <= z        *)
    let rec maximize a b x =
        if (choose a b ) <= x then a else maximize (a-1) b x
    let rec iterate n x i = match i with
        | 0 -> []
        | i ->
            let max = maximize n i x in
            max :: iterate n (x - (choose max i)) (i-1)
    if x < 0 then failwith "errors" else
    let idxs =  iterate (List.length set) x k in (List.nth set) (List.sort (-) idxs)

A small and simple combinations iterator

The following two algorithms are provided for didactic purposes. They implement an iterator and (a more general) folder overall combinations. They are as fast as possible, having the complexity O(nCk). The memory consumption is bound by k.

We will start with the iterator, which will call a user provided function for each combination

let iter_combs n k f =
  let rec iter v s j =
    if j = k then f v
    else for i = s to n - 1 do iter (i::v) (i+1) (j+1) done in
  iter [] 0 0

A more general version will call the user provided function along with the state variable, starting from the initial state. Since we need to pass the state between different states we won’t use the for-loop, but instead, use recursion,

let fold_combs n k f x =
  let rec loop i s c x =
    if i < n then
      loop (i+1) s c @@
      let c = i::c and s = s + 1 and i = i + 1 in
      if s < k then loop i s c x else f c x
    else x in
  loop 0 0 [] x

#26 Best answer 2 of Algorithm to return all combinations of k elements from n(Score: 201)

Created: 2009-12-14 Last updated: 2016-04-20

In C#:

public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
  return k == 0 ? new[] { new T[0] } :
    elements.SelectMany((e, i) =>
      elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] {e}).Concat(c)));


var result = Combinations(new[] { 1, 2, 3, 4, 5 }, 3);



See also original question in stackoverflow

#27: What is the most effective way for float and double comparison? (Score: 575)

Created: 2008-08-20 Last updated: 2016-12-21

Tags: c++, algorithm, optimization, floating-point

What would be the most efficient way to compare two double or two float values?

Simply doing this is not correct:

bool CompareDoubles1 (double A, double B)
   return A == B;

But something like:

bool CompareDoubles2 (double A, double B) 
   diff = A - B;
   return (diff < EPSILON) && (-diff < EPSILON);

Seems to waste processing.

Does anyone know a smarter float comparer?

#27 Best answer 1 of What is the most effective way for float and double comparison? (Score: 488)

Created: 2008-09-16 Last updated: 2016-07-19

Be extremely careful using any of the other suggestions. It all depends on context.

I have spent a long time tracing a bugs in a system that presumed a==b if |a-b|<epsilon. The underlying problems were:

  1. The implicit presumption in an algorithm that if a==b and b==c then a==c.

  2. Using the same epsilon for lines measured in inches and lines measured in mils (.001 inch). That is a==b but 1000a!=1000b. (This is why AlmostEqual2sComplement asks for the epsilon or max ULPS).

  3. The use of the same epsilon for both the cosine of angles and the length of lines!

  4. Using such a compare function to sort items in a collection. (In this case using the builtin C++ operator == for doubles produced correct results.)

Like I said: it all depends on context and the expected size of a and b.

BTW, std::numeric_limits<double>::epsilon() is the “machine epsilon”. It is the difference between 1.0 and the next value representable by a double. I guess that it could be used in the compare function but only if the expected values are less than 1. (This is in response to @cdv’s answer…)

Also, if you basically have int arithmetic in doubles (here we use doubles to hold int values in certain cases) your arithmetic will be correct. For example 4.0/2.0 will be the same as 1.0+1.0. This is as long as you do not do things that result in fractions (4.0/3.0) or do not go outside of the size of an int.

#27 Best answer 2 of What is the most effective way for float and double comparison?(Score: 189)

Created: 2008-08-20 Last updated: 2015-11-20

The comparison with an epsilon value is what most people do (even in game programming).

You should change your implementation a little though:

bool AreSame(double a, double b)
    return fabs(a - b) < EPSILON;

Edit: Christer has added a stack of great info on this topic on a recent blog post. Enjoy.

See also original question in stackoverflow

#28: A simple explanation of Naive Bayes Classification (Score: 563)

Created: 2012-04-08 Last updated: 2016-12-26

Tags: algorithm, machine-learning, dataset, classification, naivebayes

I am finding it hard to understand the process of Naive Bayes, and I was wondering if someone could explain it with a simple step by step process in English. I understand it takes comparisons by times occurred as a probability, but I have no idea how the training data is related to the actual dataset.

Please give me an explanation of what role the training set plays. I am giving a very simple example for fruits here, like banana for example

training set---


#28 Best answer 1 of A simple explanation of Naive Bayes Classification (Score: 1073)

Created: 2013-12-12 Last updated: 2021-03-16

The accepted answer has many elements of k-NN (k-nearest neighbors), a different algorithm.

Both k-NN and NaiveBayes are classification algorithms. Conceptually, k-NN uses the idea of “nearness” to classify new entities. In k-NN ‘nearness’ is modeled with ideas such as Euclidean Distance or Cosine Distance. By contrast, in NaiveBayes, the concept of ‘probability’ is used to classify new entities.

Since the question is about Naive Bayes, here’s how I’d describe the ideas and steps to someone. I’ll try to do it with as few equations and in plain English as much as possible.

First, Conditional Probability & Bayes' Rule

Before someone can understand and appreciate the nuances of Naive Bayes', they need to know a couple of related concepts first, namely, the idea of Conditional Probability, and Bayes' Rule. (If you are familiar with these concepts, skip to the section titled Getting to Naive Bayes')

Conditional Probability in plain English: What is the probability that something will happen, given that something else has already happened.

Let’s say that there is some Outcome O. And some Evidence E. From the way these probabilities are defined: The Probability of having both the Outcome O and Evidence E is: (Probability of O occurring) multiplied by the (Prob of E given that O happened)

One Example to understand Conditional Probability:

Let say we have a collection of US Senators. Senators could be Democrats or Republicans. They are also either male or female.

If we select one senator completely randomly, what is the probability that this person is a female Democrat? Conditional Probability can help us answer that.

Probability of (Democrat and Female Senator)= Prob(Senator is Democrat) multiplied by Conditional Probability of Being Female given that they are a Democrat.

  P(Democrat & Female) = P(Democrat) * P(Female | Democrat) 

We could compute the exact same thing, the reverse way:

  P(Democrat & Female) = P(Female) * P(Democrat | Female) 

Understanding Bayes Rule

Conceptually, this is a way to go from P(Evidence| Known Outcome) to P(Outcome|Known Evidence). Often, we know how frequently some particular evidence is observed, given a known outcome. We have to use this known fact to compute the reverse, to compute the chance of that outcome happening, given the evidence.

P(Outcome given that we know some Evidence) = P(Evidence given that we know the Outcome) times Prob(Outcome), scaled by the P(Evidence)

The classic example to understand Bayes' Rule:

Probability of Disease D given Test-positive = 

               P(Test is positive|Disease) * P(Disease)
     (scaled by) P(Testing Positive, with or without the disease)

Now, all this was just preamble, to get to Naive Bayes.

Getting to Naive Bayes'

So far, we have talked only about one piece of evidence. In reality, we have to predict an outcome given multiple evidence. In that case, the math gets very complicated. To get around that complication, one approach is to ‘uncouple’ multiple pieces of evidence, and to treat each of piece of evidence as independent. This approach is why this is called naive Bayes.

P(Outcome|Multiple Evidence) = 
P(Evidence1|Outcome) * P(Evidence2|outcome) * ... * P(EvidenceN|outcome) * P(Outcome)
scaled by P(Multiple Evidence)

Many people choose to remember this as:

                      P(Likelihood of Evidence) * Prior prob of outcome
P(outcome|evidence) = _________________________________________________

Notice a few things about this equation:

  • If the Prob(evidence|outcome) is 1, then we are just multiplying by 1.
  • If the Prob(some particular evidence|outcome) is 0, then the whole prob. becomes 0. If you see contradicting evidence, we can rule out that outcome.
  • Since we divide everything by P(Evidence), we can even get away without calculating it.
  • The intuition behind multiplying by the prior is so that we give high probability to more common outcomes, and low probabilities to unlikely outcomes. These are also called base rates and they are a way to scale our predicted probabilities.

How to Apply NaiveBayes to Predict an Outcome?

Just run the formula above for each possible outcome. Since we are trying to classify, each outcome is called a class and it has a class label. Our job is to look at the evidence, to consider how likely it is to be this class or that class, and assign a label to each entity. Again, we take a very simple approach: The class that has the highest probability is declared the “winner” and that class label gets assigned to that combination of evidences.

Fruit Example

Let’s try it out on an example to increase our understanding: The OP asked for a ‘fruit’ identification example.

Let’s say that we have data on 1000 pieces of fruit. They happen to be Banana, Orange or some Other Fruit. We know 3 characteristics about each fruit:

  1. Whether it is Long
  2. Whether it is Sweet and
  3. If its color is Yellow.

This is our ‘training set.’ We will use this to predict the type of any new fruit we encounter.

Type           Long | Not Long || Sweet | Not Sweet || Yellow |Not Yellow|Total
Banana      |  400  |    100   || 350   |    150    ||  450   |  50      |  500
Orange      |    0  |    300   || 150   |    150    ||  300   |   0      |  300
Other Fruit |  100  |    100   || 150   |     50    ||   50   | 150      |  200
Total       |  500  |    500   || 650   |    350    ||  800   | 200      | 1000

We can pre-compute a lot of things about our fruit collection.

The so-called “Prior” probabilities. (If we didn’t know any of the fruit attributes, this would be our guess.) These are our base rates.

 P(Banana)      = 0.5 (500/1000)
 P(Orange)      = 0.3
 P(Other Fruit) = 0.2

Probability of “Evidence”

p(Long)   = 0.5
P(Sweet)  = 0.65
P(Yellow) = 0.8

Probability of “Likelihood”

P(Long|Banana) = 0.8
P(Long|Orange) = 0  [Oranges are never long in all the fruit we have seen.]

P(Yellow|Other Fruit)     =  50/200 = 0.25
P(Not Yellow|Other Fruit) = 0.75

Given a Fruit, how to classify it?

Let’s say that we are given the properties of an unknown fruit, and asked to classify it. We are told that the fruit is Long, Sweet and Yellow. Is it a Banana? Is it an Orange? Or Is it some Other Fruit?

We can simply run the numbers for each of the 3 outcomes, one by one. Then we choose the highest probability and ‘classify’ our unknown fruit as belonging to the class that had the highest probability based on our prior evidence (our 1000 fruit training set):

P(Banana|Long, Sweet and Yellow) 
      P(Long|Banana) * P(Sweet|Banana) * P(Yellow|Banana) * P(banana)
    = _______________________________________________________________
                      P(Long) * P(Sweet) * P(Yellow)
    = 0.8 * 0.7 * 0.9 * 0.5 / P(evidence)

    = 0.252 / P(evidence)

P(Orange|Long, Sweet and Yellow) = 0

P(Other Fruit|Long, Sweet and Yellow)
      P(Long|Other fruit) * P(Sweet|Other fruit) * P(Yellow|Other fruit) * P(Other Fruit)
    = ____________________________________________________________________________________

    = (100/200 * 150/200 * 50/200 * 200/1000) / P(evidence)

    = 0.01875 / P(evidence)

By an overwhelming margin (0.252 >> 0.01875), we classify this Sweet/Long/Yellow fruit as likely to be a Banana.

Look at what it eventually comes down to. Just some counting and multiplication. We can pre-compute all these terms, and so classifying becomes easy, quick and efficient.

Let z = 1 / P(evidence). Now we quickly compute the following three quantities.

P(Banana|evidence) = z * Prob(Banana) * Prob(Evidence1|Banana) * Prob(Evidence2|Banana) ...
P(Orange|Evidence) = z * Prob(Orange) * Prob(Evidence1|Orange) * Prob(Evidence2|Orange) ...
P(Other|Evidence)  = z * Prob(Other)  * Prob(Evidence1|Other)  * Prob(Evidence2|Other)  ...

Assign the class label of whichever is the highest number, and you are done.

Despite the name, Naive Bayes turns out to be excellent in certain applications. Text classification is one area where it really shines.

Hope that helps in understanding the concepts behind the Naive Bayes algorithm.

#28 Best answer 2 of A simple explanation of Naive Bayes Classification(Score: 680)

Created: 2012-04-08 Last updated: 2019-04-05

Your question as I understand it is divided in two parts, part one being you need a better understanding of the Naive Bayes classifier & part two being the confusion surrounding Training set.

In general all of Machine Learning Algorithms need to be trained for supervised learning tasks like classification, prediction etc. or for unsupervised learning tasks like clustering.

During the training step, the algorithms are taught with a particular input dataset (training set) so that later on we may test them for unknown inputs (which they have never seen before) for which they may classify or predict etc (in case of supervised learning) based on their learning. This is what most of the Machine Learning techniques like Neural Networks, SVM, Bayesian etc. are based upon.

So in a general Machine Learning project basically you have to divide your input set to a Development Set (Training Set + Dev-Test Set) & a Test Set (or Evaluation set). Remember your basic objective would be that your system learns and classifies new inputs which they have never seen before in either Dev set or test set.

The test set typically has the same format as the training set. However, it is very important that the test set be distinct from the training corpus: if we simply reused the training set as the test set, then a model that simply memorized its input, without learning how to generalize to new examples, would receive misleadingly high scores.

In general, for an example, 70% of our data can be used as training set cases. Also remember to partition the original set into the training and test sets randomly.

Now I come to your other question about Naive Bayes.

To demonstrate the concept of Naïve Bayes Classification, consider the example given below:

enter image description here

As indicated, the objects can be classified as either GREEN or RED. Our task is to classify new cases as they arrive, i.e., decide to which class label they belong, based on the currently existing objects.

Since there are twice as many GREEN objects as RED, it is reasonable to believe that a new case (which hasn’t been observed yet) is twice as likely to have membership GREEN rather than RED. In the Bayesian analysis, this belief is known as the prior probability. Prior probabilities are based on previous experience, in this case the percentage of GREEN and RED objects, and often used to predict outcomes before they actually happen.

Thus, we can write:

Prior Probability of GREEN: number of GREEN objects / total number of objects

Prior Probability of RED: number of RED objects / total number of objects

Since there is a total of 60 objects, 40 of which are GREEN and 20 RED, our prior probabilities for class membership are:

Prior Probability for GREEN: 40 / 60

Prior Probability for RED: 20 / 60

Having formulated our prior probability, we are now ready to classify a new object (WHITE circle in the diagram below). Since the objects are well clustered, it is reasonable to assume that the more GREEN (or RED) objects in the vicinity of X, the more likely that the new cases belong to that particular color. To measure this likelihood, we draw a circle around X which encompasses a number (to be chosen a priori) of points irrespective of their class labels. Then we calculate the number of points in the circle belonging to each class label. From this we calculate the likelihood:

enter image description here

enter image description here

From the illustration above, it is clear that Likelihood of X given GREEN is smaller than Likelihood of X given RED, since the circle encompasses 1 GREEN object and 3 RED ones. Thus:

enter image description here

enter image description here

Although the prior probabilities indicate that X may belong to GREEN (given that there are twice as many GREEN compared to RED) the likelihood indicates otherwise; that the class membership of X is RED (given that there are more RED objects in the vicinity of X than GREEN). In the Bayesian analysis, the final classification is produced by combining both sources of information, i.e., the prior and the likelihood, to form a posterior probability using the so-called Bayes' rule (named after Rev. Thomas Bayes 1702-1761).

enter image description here

Finally, we classify X as RED since its class membership achieves the largest posterior probability.

See also original question in stackoverflow

#29: What algorithms compute directions from point A to point B on a map? (Score: 554)

Created: 2009-01-09 Last updated: 2020-12-03

Tags: algorithm, path-finding

How do map providers (such as Google or Yahoo! Maps) suggest directions?

I mean, they probably have real-world data in some form, certainly including distances but also perhaps things like driving speeds, presence of sidewalks, train schedules, etc. But suppose the data were in a simpler format, say a very large directed graph with edge weights reflecting distances. I want to be able to quickly compute directions from one arbitrary point to another. Sometimes these points will be close together (within one city) while sometimes they will be far apart (cross-country).

Graph algorithms like Dijkstra’s algorithm will not work because the graph is enormous. Luckily, heuristic algorithms like A* will probably work. However, our data is very structured, and perhaps some kind of tiered approach might work? (For example, store precomputed directions between certain “key” points far apart, as well as some local directions. Then directions for two far-away points will involve local directions to a key points, global directions to another key point, and then local directions again.)

What algorithms are actually used in practice?

PS. This question was motivated by finding quirks in online mapping directions. Contrary to the triangle inequality, sometimes Google Maps thinks that X-Z takes longer and is farther than using an intermediate point as in X-Y-Z. But maybe their walking directions optimize for another parameter, too?

PPS. Here’s another violation of the triangle inequality that suggests (to me) that they use some kind of tiered approach: X-Z versus X-Y-Z. The former seems to use prominent Boulevard de Sebastopol even though it’s slightly out of the way.

Edit: Neither of these examples seem to work anymore, but both did at the time of the original post.

#29 Best answer 1 of What algorithms compute directions from point A to point B on a map? (Score: 532)

Created: 2009-01-11 Last updated: 2010-05-09

Speaking as someone who spent 18 months working at a mapping company, which included working on the routing algorithm… yes, Dijkstra’s does work, with a couple of modifications:

  • Instead of doing Dijkstra’s once from source to dest, you start at each end, and expand both sides until they meet in the middle. This eliminates roughly half the work (2pi(r/2)^2 vs pi*r^2).
  • To avoid exploring the back-alleys of every city between your source and destination, you can have several layers of map data: A ‘highways’ layer that contains only highways, a ‘secondary’ layer that contains only secondary streets, and so forth. Then, you explore only smaller sections of the more detailed layers, expanding as necessary. Obviously this description leaves out a lot of detail, but you get the idea.

With modifications along those lines, you can do even cross-country routing in a very reasonable timeframe.

#29 Best answer 2 of What algorithms compute directions from point A to point B on a map?(Score: 114)

Created: 2009-02-21 Last updated: 2009-11-02

This question has been an active area of research in the last years. The main idea is to do a preprocessing on the graph once, to speed up all following queries. With this additional information itineraries can be computed very fast. Still, Dijkstra’s Algorithm is the basis for all optimisations.

Arachnid described the usage of bidirectional search and edge pruning based on hierarchical information. These speedup techniques work quite well, but the most recent algorithms outperform these techniques by all means. With current algorithms a shortest paths can be computed in considerable less time than one millisecond on a continental road network. A fast implementation of the unmodified algorithm of Dijkstra needs about 10 seconds.

The article Engineering Fast Route Planning Algorithms gives an overview of the progress of research in that field. See the references of that paper for further information.

The fastest known algorithms do not use information about the hierarchical status of the road in the data, i.e. if it is a highway or a local road. Instead, they compute in a preprocessing step an own hierarchy that optimised to speed up route planning. This precomputation can then be used to prune the search: Far away from start and destination slow roads need not be considered during Dijkstra’s Algorithm. The benefits are very good performance and a correctness guarantee for the result.

The first optimised route planning algorithms dealt only with static road networks, that means an edge in the graph has a fixed cost value. This not true in practice, since we want to take dynamic information like traffic jams or vehicle dependent restrictrions into account. Latest algorithms can also deal with such issues, but there are still problems to solve and the research is going on.

If you need the shortest path distances to compute a solution for the TSP, then you are probably interested in matrices that contain all distances between your sources and destinations. For this you could consider Computing Many-to-Many Shortest Paths Using Highway Hierarchies. Note, that this has been improved by newer approaches in the last 2 years.

See also original question in stackoverflow

#30: How do you detect Credit card type based on number? (Score: 549)

Created: 2008-09-16 Last updated: 2011-12-30

Tags: algorithm, language-agnostic, e-commerce

I’m trying to figure out how to detect the type of credit card based purely on its number. Does anyone know of a definitive, reliable way to find this?

#30 Best answer 1 of How do you detect Credit card type based on number? (Score: 812)

Created: 2008-09-16 Last updated: 2019-05-15

The credit/debit card number is referred to as a PAN, or Primary Account Number. The first six digits of the PAN are taken from the IIN, or Issuer Identification Number, belonging to the issuing bank (IINs were previously known as BIN — Bank Identification Numbers — so you may see references to that terminology in some documents). These six digits are subject to an international standard, ISO/IEC 7812, and can be used to determine the type of card from the number.

Unfortunately the actual ISO/IEC 7812 database is not publicly available, however, there are unofficial lists, both commercial and free, including on Wikipedia.

Anyway, to detect the type from the number, you can use a regular expression like the ones below: Credit for original expressions

Visa: ^4[0-9]{6,}$ Visa card numbers start with a 4.

MasterCard: ^5[1-5][0-9]{5,}|222[1-9][0-9]{3,}|22[3-9][0-9]{4,}|2[3-6][0-9]{5,}|27[01][0-9]{4,}|2720[0-9]{3,}$ Before 2016, MasterCard numbers start with the numbers 51 through 55, but this will only detect MasterCard credit cards; there are other cards issued using the MasterCard system that do not fall into this IIN range. In 2016, they will add numbers in the range (222100-272099).

American Express: ^3[47][0-9]{5,}$ American Express card numbers start with 34 or 37.

Diners Club: ^3(?:0[0-5]|[68][0-9])[0-9]{4,}$ Diners Club card numbers begin with 300 through 305, 36 or 38. There are Diners Club cards that begin with 5 and have 16 digits. These are a joint venture between Diners Club and MasterCard and should be processed like a MasterCard.

Discover: ^6(?:011|5[0-9]{2})[0-9]{3,}$ Discover card numbers begin with 6011 or 65.

JCB: ^(?:2131|1800|35[0-9]{3})[0-9]{3,}$ JCB cards begin with 2131, 1800 or 35.

Unfortunately, there are a number of card types processed with the MasterCard system that do not live in MasterCard’s IIN range (numbers starting 51…55); the most important case is that of Maestro cards, many of which have been issued from other banks’ IIN ranges and so are located all over the number space. As a result, it may be best to assume that any card that is not of some other type you accept must be a MasterCard.

Important: card numbers do vary in length; for instance, Visa has in the past issued cards with 13 digit PANs and cards with 16 digit PANs. Visa’s documentation currently indicates that it may issue or may have issued numbers with between 12 and 19 digits. Therefore, you should not check the length of the card number, other than to verify that it has at least 7 digits (for a complete IIN plus one check digit, which should match the value predicted by the Luhn algorithm).

One further hint: before processing a cardholder PAN, strip any whitespace and punctuation characters from the input. Why? Because it’s typically much easier to enter the digits in groups, similar to how they’re displayed on the front of an actual credit card, i.e.

4444 4444 4444 4444

is much easier to enter correctly than


There’s really no benefit in chastising the user because they’ve entered characters you don’t expect here.

This also implies making sure that your entry fields have room for at least 24 characters, otherwise users who enter spaces will run out of room. I’d recommend that you make the field wide enough to display 32 characters and allow up to 64; that gives plenty of headroom for expansion.

Here’s an image that gives a little more insight:

UPDATE (2014): The checksum method no longer appears to be a valid way of verifying a card’s authenticity as noted in the comments on this answer.

UPDATE (2016): Mastercard is to implement new BIN ranges starting Ach Payment.

Credit Card Verification

#30 Best answer 2 of How do you detect Credit card type based on number?(Score: 81)

Created: 2013-10-02 Last updated: 2016-07-20

In javascript:

function detectCardType(number) {
    var re = {
        electron: /^(4026|417500|4405|4508|4844|4913|4917)\d+$/,
        maestro: /^(5018|5020|5038|5612|5893|6304|6759|6761|6762|6763|0604|6390)\d+$/,
        dankort: /^(5019)\d+$/,
        interpayment: /^(636)\d+$/,
        unionpay: /^(62|88)\d+$/,
        visa: /^4[0-9]{12}(?:[0-9]{3})?$/,
        mastercard: /^5[1-5][0-9]{14}$/,
        amex: /^3[47][0-9]{13}$/,
        diners: /^3(?:0[0-5]|[68][0-9])[0-9]{11}$/,
        discover: /^6(?:011|5[0-9]{2})[0-9]{12}$/,
        jcb: /^(?:2131|1800|35\d{3})\d{11}$/

    for(var key in re) {
        if(re[key].test(number)) {
            return key

Unit test:

describe('CreditCard', function() {
    describe('#detectCardType', function() {

        var cards = {
            '8800000000000000': 'UNIONPAY',

            '4026000000000000': 'ELECTRON',
            '4175000000000000': 'ELECTRON',
            '4405000000000000': 'ELECTRON',
            '4508000000000000': 'ELECTRON',
            '4844000000000000': 'ELECTRON',
            '4913000000000000': 'ELECTRON',
            '4917000000000000': 'ELECTRON',
            '5019000000000000': 'DANKORT',

            '5018000000000000': 'MAESTRO',
            '5020000000000000': 'MAESTRO',
            '5038000000000000': 'MAESTRO',
            '5612000000000000': 'MAESTRO',
            '5893000000000000': 'MAESTRO',
            '6304000000000000': 'MAESTRO',
            '6759000000000000': 'MAESTRO',
            '6761000000000000': 'MAESTRO',
            '6762000000000000': 'MAESTRO',
            '6763000000000000': 'MAESTRO',
            '0604000000000000': 'MAESTRO',
            '6390000000000000': 'MAESTRO',

            '3528000000000000': 'JCB',
            '3589000000000000': 'JCB',
            '3529000000000000': 'JCB',

            '6360000000000000': 'INTERPAYMENT',

            '4916338506082832': 'VISA',
            '4556015886206505': 'VISA',
            '4539048040151731': 'VISA',
            '4024007198964305': 'VISA',
            '4716175187624512': 'VISA',

            '5280934283171080': 'MASTERCARD',
            '5456060454627409': 'MASTERCARD',
            '5331113404316994': 'MASTERCARD',
            '5259474113320034': 'MASTERCARD',
            '5442179619690834': 'MASTERCARD',

            '6011894492395579': 'DISCOVER',
            '6011388644154687': 'DISCOVER',
            '6011880085013612': 'DISCOVER',
            '6011652795433988': 'DISCOVER',
            '6011375973328347': 'DISCOVER',

            '345936346788903': 'AMEX',
            '377669501013152': 'AMEX',
            '373083634595479': 'AMEX',
            '370710819865268': 'AMEX',
            '371095063560404': 'AMEX'

        Object.keys(cards).forEach(function(number) {
            it('should detect card ' + number + ' as ' + cards[number], function() {

See also original question in stackoverflow

  1. This page use API to get the relevant data from stackoverflow community.
  2. Content license on this page is CC BY-SA 3.0.
  3. score = up votes - down votes.