Most votes on algorithm questions 6

Most votes on algorithm questions 6. #51 When is it practical to use Depth-First Search (DFS) vs Breadth-First Search (BFS)? #52 How to find list of possible words from a letter matrix [Boggle Solver] #53 Difference between Big-O and Little-O Notation #54 List of Big-O for PHP functions #55 Determine if two rectangles overlap each other? #56 What is stability in sorting algorithms and why is it important? #57 Equation for testing if a point is inside a circle #58 How to implement classic sorting algorithms in modern C++? #59 What's the best way to build a string of delimited items in Java? #60 What is the fastest way to get the value of π?

Read all the top votes questions and answers in a single page.

#51: When is it practical to use Depth-First Search (DFS) vs Breadth-First Search (BFS)? (Score: 382)

Created: 2010-07-26 Last updated: 2018-03-11

Tags: algorithm, graph-algorithm, graph-theory, depth-first-search, breadth-first-search

I understand the differences between DFS and BFS, but I’m interested to know when it’s more practical to use one over the other?

Could anyone give any examples of how DFS would trump BFS and vice versa?

#51 Best answer 1 of When is it practical to use Depth-First Search (DFS) vs Breadth-First Search (BFS)? (Score: 400)

Created: 2010-07-26 Last updated: 2020-09-14

That heavily depends on the structure of the search tree and the number and location of solutions (aka searched-for items).

  • If you know a solution is not far from the root of the tree, a breadth first search (BFS) might be better.

  • If the tree is very deep and solutions are rare, depth first search (DFS) might take an extremely long time, but BFS could be faster.

  • If the tree is very wide, a BFS might need too much memory, so it might be completely impractical.

  • If solutions are frequent but located deep in the tree, BFS could be impractical.

  • If the search tree is very deep you will need to restrict the search depth for depth first search (DFS), anyway (for example with iterative deepening).

But these are just rules of thumb; you’ll probably need to experiment.

Another issue is parallelism: if you want to parallelize BFS you would need a shared datastructure between threads, which is a bad thing. DFS might be easier to distribute even between connected machines if you don’t insist on the exact order of visiting the nodes.

#51 Best answer 2 of When is it practical to use Depth-First Search (DFS) vs Breadth-First Search (BFS)?(Score: 179)

Created: 2016-05-06 Last updated: 2017-07-03

Depth-first searches are often used in simulations of games (and game-like situations in the real world). In a typical game you can choose one of several possible actions. Each choice leads to further choices, each of which leads to further choices, and so on into an ever-expanding tree-shaped graph of possibilities.

enter image description here

For example in games like Chess, tic-tac-toe when you are deciding what move to make, you can mentally imagine a move, then your opponent’s possible responses, then your responses, and so on. You can decide what to do by seeing which move leads to the best outcome.

Only some paths in a game tree lead to your win. Some lead to a win by your opponent, when you reach such an ending, you must back up, or backtrack, to a previous node and try a different path. In this way you explore the tree until you find a path with a successful conclusion. Then you make the first move along this path.

The breadth-first search has an interesting property: It first finds all the vertices that are one edge away from the starting point, then all the vertices that are two edges away, and so on. This is useful if you’re trying to find the shortest path from the starting vertex to a given vertex. You start a BFS, and when you find the specified vertex, you know the path you’ve traced so far is the shortest path to the node. If there were a shorter path, the BFS would have found it already.

Breadth-first search can be used for finding the neighbour nodes in peer to peer networks like BitTorrent, GPS systems to find nearby locations, social networking sites to find people in the specified distance and things like that.

See also original question in stackoverflow

#52: How to find list of possible words from a letter matrix [Boggle Solver] (Score: 379)

Created: 2009-04-14 Last updated: 2019-06-07

Tags: algorithm, puzzle, boggle

Lately I have been playing a game on my iPhone called Scramble. Some of you may know this game as Boggle. Essentially, when the game starts you get a matrix of letters like so:


The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB, TUX, SEA, FAME, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).

Sample Boggle

I am, unfortunately, not very good with algorithms or their efficiencies and so forth. My first attempt uses a dictionary such as this one (~2.3MB) and does a linear search trying to match combinations with dictionary entries. This takes a very long time to find the possible words, and since you only get 2 minutes per round, it is simply not adequate.

I am interested to see if any Stackoverflowers can come up with more efficient solutions. I am mostly looking for solutions using the Big 3 Ps: Python, PHP, and Perl, although anything with Java or C++ is cool too, since speed is essential.


#52 Best answer 1 of How to find list of possible words from a letter matrix [Boggle Solver] (Score: 145)

Created: 2009-04-15 Last updated: 2012-11-11

My answer works like the others here, but I’ll post it because it looks a bit faster than the other Python solutions, from setting up the dictionary faster. (I checked this against John Fouhy’s solution.) After setup, the time to solve is down in the noise.

grid = "fxie amlo ewbx astu".split()
nrows, ncols = len(grid), len(grid[0])

# A dictionary word that could be a solution must use only the grid's
# letters and have length >= 3. (With a case-insensitive match.)
import re
alphabet = ''.join(set(''.join(grid)))
bogglable = re.compile('[' + alphabet + ']{3,}$', re.I).match

words = set(word.rstrip('\n') for word in open('words') if bogglable(word))
prefixes = set(word[:i] for word in words
               for i in range(2, len(word)+1))

def solve():
    for y, row in enumerate(grid):
        for x, letter in enumerate(row):
            for result in extending(letter, ((x, y),)):
                yield result

def extending(prefix, path):
    if prefix in words:
        yield (prefix, path)
    for (nx, ny) in neighbors(path[-1]):
        if (nx, ny) not in path:
            prefix1 = prefix + grid[ny][nx]
            if prefix1 in prefixes:
                for result in extending(prefix1, path + ((nx, ny),)):
                    yield result

def neighbors((x, y)):
    for nx in range(max(0, x-1), min(x+2, ncols)):
        for ny in range(max(0, y-1), min(y+2, nrows)):
            yield (nx, ny)

Sample usage:

# Print a maximal-length word and its path:
print max(solve(), key=lambda (word, path): len(word))

Edit: Filter out words less than 3 letters long.

Edit 2: I was curious why Kent Fredric’s Perl solution was faster; it turns out to use regular-expression matching instead of a set of characters. Doing the same in Python about doubles the speed.

#52 Best answer 2 of How to find list of possible words from a letter matrix [Boggle Solver](Score: 116)

Created: 2009-04-14 Last updated: 2009-04-14

The fastest solution you’re going to get will probably involve storing your dictionary in a trie. Then, create a queue of triplets (x, y, s), where each element in the queue corresponds to a prefix s of a word which can be spelled in the grid, ending at location (x, y). Initialize the queue with N x N elements (where N is the size of your grid), one element for each square in the grid. Then, the algorithm proceeds as follows:

While the queue is not empty:
  Dequeue a triple (x, y, s)
  For each square (x', y') with letter c adjacent to (x, y):
    If s+c is a word, output s+c
    If s+c is a prefix of a word, insert (x', y', s+c) into the queue

If you store your dictionary in a trie, testing if s+c is a word or a prefix of a word can be done in constant time (provided you also keep some extra metadata in each queue datum, such as a pointer to the current node in the trie), so the running time of this algorithm is O(number of words that can be spelled).

[Edit] Here’s an implementation in Python that I just coded up:


class TrieNode:
    def __init__(self, parent, value):
        self.parent = parent
        self.children = [None] * 26
        self.isWord = False
        if parent is not None:
            parent.children[ord(value) - 97] = self

def MakeTrie(dictfile):
    dict = open(dictfile)
    root = TrieNode(None, '')
    for word in dict:
        curNode = root
        for letter in word.lower():
            if 97 <= ord(letter) < 123:
                nextNode = curNode.children[ord(letter) - 97]
                if nextNode is None:
                    nextNode = TrieNode(curNode, letter)
                curNode = nextNode
        curNode.isWord = True
    return root

def BoggleWords(grid, dict):
    rows = len(grid)
    cols = len(grid[0])
    queue = []
    words = []
    for y in range(cols):
        for x in range(rows):
            c = grid[y][x]
            node = dict.children[ord(c) - 97]
            if node is not None:
                queue.append((x, y, c, node))
    while queue:
        x, y, s, node = queue[0]
        del queue[0]
        for dx, dy in ((1, 0), (1, -1), (0, -1), (-1, -1), (-1, 0), (-1, 1), (0, 1), (1, 1)):
            x2, y2 = x + dx, y + dy
            if 0 <= x2 < cols and 0 <= y2 < rows:
                s2 = s + grid[y2][x2]
                node2 = node.children[ord(grid[y2][x2]) - 97]
                if node2 is not None:
                    if node2.isWord:
                    queue.append((x2, y2, s2, node2))

    return words

Example usage:

d = MakeTrie('/usr/share/dict/words')
print(BoggleWords(['fxie','amlo','ewbx','astu'], d))


[‘fa’, ‘xi’, ‘ie’, ‘io’, ‘el’, ‘am’, ‘ax’, ‘ae’, ‘aw’, ‘mi’, ‘ma’, ‘me’, ‘lo’, ‘li’, ‘oe’, ‘ox’, ‘em’, ‘ea’, ‘ea’, ‘es’, ‘wa’, ‘we’, ‘wa’, ‘bo’, ‘bu’, ‘as’, ‘aw’, ‘ae’, ‘st’, ‘se’, ‘sa’, ‘tu’, ‘ut’, ‘fam’, ‘fae’, ‘imi’, ‘eli’, ‘elm’, ‘elb’, ‘ami’, ‘ama’, ‘ame’, ‘aes’, ‘awl’, ‘awa’, ‘awe’, ‘awa’, ‘mix’, ‘mim’, ‘mil’, ‘mam’, ‘max’, ‘mae’, ‘maw’, ‘mew’, ‘mem’, ‘mes’, ‘lob’, ‘lox’, ‘lei’, ‘leo’, ‘lie’, ‘lim’, ‘oil’, ‘olm’, ‘ewe’, ‘eme’, ‘wax’, ‘waf’, ‘wae’, ‘waw’, ‘wem’, ‘wea’, ‘wea’, ‘was’, ‘waw’, ‘wae’, ‘bob’, ‘blo’, ‘bub’, ‘but’, ‘ast’, ‘ase’, ‘asa’, ‘awl’, ‘awa’, ‘awe’, ‘awa’, ‘aes’, ‘swa’, ‘swa’, ‘sew’, ‘sea’, ‘sea’, ‘saw’, ‘tux’, ‘tub’, ‘tut’, ‘twa’, ‘twa’, ‘tst’, ‘utu’, ‘fama’, ‘fame’, ‘ixil’, ‘imam’, ‘amli’, ‘amil’, ‘ambo’, ‘axil’, ‘axle’, ‘mimi’, ‘mima’, ‘mime’, ‘milo’, ‘mile’, ‘mewl’, ‘mese’, ‘mesa’, ‘lolo’, ‘lobo’, ‘lima’, ‘lime’, ‘limb’, ‘lile’, ‘oime’, ‘oleo’, ‘olio’, ‘oboe’, ‘obol’, ‘emim’, ‘emil’, ‘east’, ‘ease’, ‘wame’, ‘wawa’, ‘wawa’, ‘weam’, ‘west’, ‘wese’, ‘wast’, ‘wase’, ‘wawa’, ‘wawa’, ‘boil’, ‘bolo’, ‘bole’, ‘bobo’, ‘blob’, ‘bleo’, ‘bubo’, ‘asem’, ‘stub’, ‘stut’, ‘swam’, ‘semi’, ‘seme’, ‘seam’, ‘seax’, ‘sasa’, ‘sawt’, ‘tutu’, ‘tuts’, ‘twae’, ‘twas’, ‘twae’, ‘ilima’, ‘amble’, ‘axile’, ‘awest’, ‘mamie’, ‘mambo’, ‘maxim’, ‘mease’, ‘mesem’, ‘limax’, ‘limes’, ‘limbo’, ‘limbu’, ‘obole’, ‘emesa’, ‘embox’, ‘awest’, ‘swami’, ‘famble’, ‘mimble’, ‘maxima’, ‘embolo’, ‘embole’, ‘wamble’, ‘semese’, ‘semble’, ‘sawbwa’, ‘sawbwa’]

Notes: This program doesn’t output 1-letter words, or filter by word length at all. That’s easy to add but not really relevant to the problem. It also outputs some words multiple times if they can be spelled in multiple ways. If a given word can be spelled in many different ways (worst case: every letter in the grid is the same (e.g. ‘A’) and a word like ‘aaaaaaaaaa’ is in your dictionary), then the running time will get horribly exponential. Filtering out duplicates and sorting is trivial to due after the algorithm has finished.

See also original question in stackoverflow

#53: Difference between Big-O and Little-O Notation (Score: 367)

Created: 2009-09-01 Last updated: 2017-01-30

Tags: algorithm, time-complexity, big-o, asymptotic-complexity, little-o

What is the difference between Big-O notation O(n) and Little-O notation o(n)?

#53 Best answer 1 of Difference between Big-O and Little-O Notation (Score: 483)

Created: 2009-09-01 Last updated: 2017-12-16

f ∈ O(g) says, essentially

For at least one choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) <= k g(x) holds for all x > a.

Note that O(g) is the set of all functions for which this condition holds.

f ∈ o(g) says, essentially

For every choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) < k g(x) holds for all x > a.

Once again, note that o(g) is a set.

In Big-O, it is only necessary that you find a particular multiplier k for which the inequality holds beyond some minimum x.

In Little-o, it must be that there is a minimum x after which the inequality holds no matter how small you make k, as long as it is not negative or zero.

These both describe upper bounds, although somewhat counter-intuitively, Little-o is the stronger statement. There is a much larger gap between the growth rates of f and g if f ∈ o(g) than if f ∈ O(g).

One illustration of the disparity is this: f ∈ O(f) is true, but f ∈ o(f) is false. Therefore, Big-O can be read as “f ∈ O(g) means that f’s asymptotic growth is no faster than g’s”, whereas “f ∈ o(g) means that f’s asymptotic growth is strictly slower than g’s”. It’s like <= versus <.

More specifically, if the value of g(x) is a constant multiple of the value of f(x), then f ∈ O(g) is true. This is why you can drop constants when working with big-O notation.

However, for f ∈ o(g) to be true, then g must include a higher power of x in its formula, and so the relative separation between f(x) and g(x) must actually get larger as x gets larger.

To use purely math examples (rather than referring to algorithms):

The following are true for Big-O, but would not be true if you used little-o:

  • x² ∈ O(x²)
  • x² ∈ O(x² + x)
  • x² ∈ O(200 * x²)

The following are true for little-o:

  • x² ∈ o(x³)
  • x² ∈ o(x!)
  • ln(x) ∈ o(x)

Note that if f ∈ o(g), this implies f ∈ O(g). e.g. x² ∈ o(x³) so it is also true that x² ∈ O(x³), (again, think of O as <= and o as <)

#53 Best answer 2 of Difference between Big-O and Little-O Notation(Score: 213)

Created: 2009-09-01 Last updated: 2017-12-16

Big-O is to little-o as is to <. Big-O is an inclusive upper bound, while little-o is a strict upper bound.

For example, the function f(n) = 3n is:

  • in O(n²), o(n²), and O(n)
  • not in O(lg n), o(lg n), or o(n)

Analogously, the number 1 is:

  • ≤ 2, < 2, and ≤ 1
  • not ≤ 0, < 0, or < 1

Here’s a table, showing the general idea:

Big o table

(Note: the table is a good guide but its limit definition should be in terms of the superior limit instead of the normal limit. For example, 3 + (n mod 2) oscillates between 3 and 4 forever. It’s in O(1) despite not having a normal limit, because it still has a lim sup: 4.)

I recommend memorizing how the Big-O notation converts to asymptotic comparisons. The comparisons are easier to remember, but less flexible because you can’t say things like nO(1) = P.

See also original question in stackoverflow

#54: List of Big-O for PHP functions (Score: 356)

Created: 2010-03-18 Last updated: 2019-04-16

Tags: php, performance, algorithm, arrays, big-o

After using PHP for a while now, I’ve noticed that not all built-in PHP functions are as fast as expected. Consider these two possible implementations of a function that finds if a number is prime using a cached array of primes.

//very slow for large $prime_array
$prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... );
$result_array = array();
foreach( $prime_array => $number ) {
    $result_array[$number] = in_array( $number, $large_prime_array );

//speed is much less dependent on size of $prime_array, and runs much faster.
$prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL,
                       11 => NULL, 13 => NULL, .... 104729 => NULL, ... );
foreach( $prime_array => $number ) {
    $result_array[$number] = array_key_exists( $number, $large_prime_array );

This is because in_array is implemented with a linear search O(n) which will linearly slow down as $prime_array grows. Where the array_key_exists function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it’s only O(n)).

So far I’ve had to discover the big-O’s via trial and error, and occasionally looking at the source code. Now for the question…

Is there a list of the theoretical (or practical) big O times for all the built-in PHP functions?*

*or at least the interesting ones

For example, I find it very hard to predict the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge, array_merge_recursive, array_reverse, array_intersect, array_combine, str_replace (with array inputs), etc.

#54 Best answer 1 of List of Big-O for PHP functions (Score: 674)

Created: 2010-03-20 Last updated: 2018-06-01

Since it doesn’t seem like anyone has done this before I thought it’d be good idea to have it for reference somewhere. I’ve gone though and either via benchmark or code-skimming to characterize the array_* functions. I’ve tried to put the more interesting Big-O near the top. This list is not complete.

Note: All the Big-O where calculated assuming a hash lookup is O(1) even though it’s really O(n). The coefficient of the n is so low, the ram overhead of storing a large enough array would hurt you before the characteristics of lookup Big-O would start taking effect. For example the difference between a call to array_key_exists at N=1 and N=1,000,000 is ~50% time increase.

Interesting Points:

  1. isset/array_key_exists is much faster than in_array and array_search
  2. +(union) is a bit faster than array_merge (and looks nicer). But it does work differently so keep that in mind.
  3. shuffle is on the same Big-O tier as array_rand
  4. array_pop/array_push is faster than array_shift/array_unshift due to re-index penalty


array_key_exists O(n) but really close to O(1) - this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic big-O. For example the different between N=1000 and N=100000 is only about 50% slow down.

isset( $array[$index] ) O(n) but really close to O(1) - it uses the same lookup as array_key_exists. Since it’s language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.

in_array O(n) - this is because it does a linear search though the array until it finds the value.

array_search O(n) - it uses the same core function as in_array but returns value.

Queue functions:

array_push O(∑ var_i, for all i)

array_pop O(1)

array_shift O(n) - it has to reindex all the keys

array_unshift O(n + ∑ var_i, for all i) - it has to reindex all the keys

Array Intersection, Union, Subtraction:

array_intersect_key if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

array_intersect if intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)

array_intersect_assoc if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)

array_diff O(π param_i_size, for all i) - That’s product of all the param_sizes

array_diff_key O(∑ param_i_size, for i != 1) - this is because we don’t need to iterate over the first array.

array_merge O( ∑ array_i, i != 1 ) - doesn’t need to iterate over the first array

+ (union) O(n), where n is size of the 2nd array (ie array_first + array_second) - less overhead than array_merge since it doesn’t have to renumber

array_replace O( ∑ array_i, for all i )


shuffle O(n)

array_rand O(n) - Requires a linear poll.

Obvious Big-O:

array_fill O(n)

array_fill_keys O(n)

range O(n)

array_splice O(offset + length)

array_slice O(offset + length) or O(n) if length = NULL

array_keys O(n)

array_values O(n)

array_reverse O(n)

array_pad O(pad_size)

array_flip O(n)

array_sum O(n)

array_product O(n)

array_reduce O(n)

array_filter O(n)

array_map O(n)

array_chunk O(n)

array_combine O(n)

I’d like to thank Eureqa for making it easy to find the Big-O of the functions. It’s an amazing free program that can find the best fitting function for arbitrary data.


For those who doubt that PHP array lookups are O(N), I’ve written a benchmark to test that (they are still effectively O(1) for most realistic values).

php array lookup graph

$tests = 1000000;
$max = 5000001;

for( $i = 1; $i <= $max; $i += 10000 ) {
    //create lookup array
    $array = array_fill( 0, $i, NULL );
    //build test indexes
    $test_indexes = array();
    for( $j = 0; $j < $tests; $j++ ) {
        $test_indexes[] = rand( 0, $i-1 );
    //benchmark array lookups
    $start = microtime( TRUE );
    foreach( $test_indexes as $test_index ) {
        $value = $array[ $test_index ];
        unset( $value );
    $stop = microtime( TRUE );
    unset( $array, $test_indexes, $test_index );
    printf( "%d,%1.15f\n", $i, $stop - $start ); //time per 1mil lookups
    unset( $stop, $start );

#54 Best answer 2 of List of Big-O for PHP functions(Score: 5)

Created: 2010-03-18

The explanation for the case you specifically describe is that associative arrays are implemented as hash tables - so lookup by key (and correspondingly, array_key_exists) is O(1). However, arrays aren’t indexed by value, so the only way in the general case to discover whether a value exists in the array is a linear search. There’s no surprise there.

I don’t think there’s specific comprehensive documentation of the algorithmic complexity of PHP methods. However, if it’s a big enough concern to warrant the effort, you can always look through the source code.

See also original question in stackoverflow

#55: Determine if two rectangles overlap each other? (Score: 349)

Created: 2008-11-20 Last updated: 2017-05-23

Tags: c++, algorithm, geometry, overlap, rectangles

I am trying to write a C++ program that takes the following inputs from the user to construct rectangles (between 2 and 5): height, width, x-pos, y-pos. All of these rectangles will exist parallel to the x and the y axis, that is all of their edges will have slopes of 0 or infinity.

I’ve tried to implement what is mentioned in this question but I am not having very much luck.

My current implementation does the following:

// Gets all the vertices for Rectangle 1 and stores them in an array -> arrRect1
// point 1 x: arrRect1[0], point 1 y: arrRect1[1] and so on...
// Gets all the vertices for Rectangle 2 and stores them in an array -> arrRect2

// rotated edge of point a, rect 1
int rot_x, rot_y;
rot_x = -arrRect1[3];
rot_y = arrRect1[2];
// point on rotated edge
int pnt_x, pnt_y;
pnt_x = arrRect1[2]; 
pnt_y = arrRect1[3];
// test point, a from rect 2
int tst_x, tst_y;
tst_x = arrRect2[0];
tst_y = arrRect2[1];

int value;
value = (rot_x * (tst_x - pnt_x)) + (rot_y * (tst_y - pnt_y));
cout << "Value: " << value;  

However I’m not quite sure if (a) I’ve implemented the algorithm I linked to correctly, or if I did exactly how to interpret this?

Any suggestions?

#55 Best answer 1 of Determine if two rectangles overlap each other? (Score: 744)

Created: 2008-11-20 Last updated: 2020-01-22

if (RectA.Left < RectB.Right && RectA.Right > RectB.Left &&
     RectA.Top > RectB.Bottom && RectA.Bottom < RectB.Top ) 

or, using Cartesian coordinates

(With X1 being left coord, X2 being right coord, increasing from left to right and Y1 being Top coord, and Y2 being Bottom coord, increasing from bottom to top – if this is not how your coordinate system [e.g. most computers have the Y direction reversed], swap the comparisons below) …

if (RectA.X1 < RectB.X2 && RectA.X2 > RectB.X1 &&
    RectA.Y1 > RectB.Y2 && RectA.Y2 < RectB.Y1) 

Say you have Rect A, and Rect B. Proof is by contradiction. Any one of four conditions guarantees that no overlap can exist:

  • Cond1. If A’s left edge is to the right of the B’s right edge, - then A is Totally to right Of B
  • Cond2. If A’s right edge is to the left of the B’s left edge, - then A is Totally to left Of B
  • Cond3. If A’s top edge is below B’s bottom edge, - then A is Totally below B
  • Cond4. If A’s bottom edge is above B’s top edge, - then A is Totally above B

So condition for Non-Overlap is

NON-Overlap => Cond1 Or Cond2 Or Cond3 Or Cond4

Therefore, a sufficient condition for Overlap is the opposite.

Overlap => NOT (Cond1 Or Cond2 Or Cond3 Or Cond4)

De Morgan’s law says
Not (A or B or C or D) is the same as Not A And Not B And Not C And Not D
so using De Morgan, we have

Not Cond1 And Not Cond2 And Not Cond3 And Not Cond4

This is equivalent to:

  • A’s Left Edge to left of B’s right edge, [RectA.Left < RectB.Right], and
  • A’s right edge to right of B’s left edge, [RectA.Right > RectB.Left], and
  • A’s top above B’s bottom, [RectA.Top > RectB.Bottom], and
  • A’s bottom below B’s Top [RectA.Bottom < RectB.Top]

Note 1: It is fairly obvious this same principle can be extended to any number of dimensions.
Note 2: It should also be fairly obvious to count overlaps of just one pixel, change the < and/or the > on that boundary to a <= or a >=.
Note 3: This answer, when utilizing Cartesian coordinates (X, Y) is based on standard algebraic Cartesian coordinates (x increases left to right, and Y increases bottom to top). Obviously, where a computer system might mechanize screen coordinates differently, (e.g., increasing Y from top to bottom, or X From right to left), the syntax will need to be adjusted accordingly/

#55 Best answer 2 of Determine if two rectangles overlap each other?(Score: 121)

Created: 2008-11-20 Last updated: 2014-04-11

struct rect
    int x;
    int y;
    int width;
    int height;

bool valueInRange(int value, int min, int max)
{ return (value >= min) && (value <= max); }

bool rectOverlap(rect A, rect B)
    bool xOverlap = valueInRange(A.x, B.x, B.x + B.width) ||
                    valueInRange(B.x, A.x, A.x + A.width);
    bool yOverlap = valueInRange(A.y, B.y, B.y + B.height) ||
                    valueInRange(B.y, A.y, A.y + A.height);

    return xOverlap && yOverlap;

See also original question in stackoverflow

#56: What is stability in sorting algorithms and why is it important? (Score: 348)

Created: 2009-10-05 Last updated: 2017-12-27

Tags: algorithm, sorting, language-agnostic, stability

I’m very curious, why stability is or is not important in sorting algorithms?

#56 Best answer 1 of What is stability in sorting algorithms and why is it important? (Score: 443)

Created: 2009-10-05 Last updated: 2017-04-04

A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.

Background: a “stable” sorting algorithm keeps the items with the same sorting key in order. Suppose we have a list of 5-letter words:


If we sort the list by just the first letter of each word then a stable-sort would produce:


In an unstable sort algorithm, straw or spork may be interchanged, but in a stable one, they stay in the same relative positions (that is, since straw appears before spork in the input, it also appears before spork in the output).

We could sort the list of words using this algorithm: stable sorting by column 5, then 4, then 3, then 2, then 1. In the end, it will be correctly sorted. Convince yourself of that. (by the way, that algorithm is called radix sort)

Now to answer your question, suppose we have a list of first and last names. We are asked to sort “by last name, then by first”. We could first sort (stable or unstable) by the first name, then stable sort by the last name. After these sorts, the list is primarily sorted by the last name. However, where last names are the same, the first names are sorted.

You can’t stack unstable sorts in the same fashion.

#56 Best answer 2 of What is stability in sorting algorithms and why is it important?(Score: 72)

Created: 2017-05-06 Last updated: 2020-05-19

A stable sorting algorithm is the one that sorts the identical elements in their same order as they appear in the input, whilst unstable sorting may not satisfy the case. - I thank my algorithm lecturer Didem Gozupek to have provided insight into algorithms.

Stable Sorting Algorithms:

  • Insertion Sort
  • Merge Sort
  • Bubble Sort
  • Tim Sort
  • Counting Sort
  • Block Sort
  • Quadsort
  • Library Sort
  • Cocktail shaker Sort
  • Gnome Sort
  • Odd–even Sort

Unstable Sorting Algorithms:

  • Heap sort
  • Selection sort
  • Shell sort
  • Quick sort
  • Introsort (subject to Quicksort)
  • Tree sort
  • Cycle sort
  • Smoothsort
  • Tournament sort(subject to Hesapsort)

enter image description here

See also original question in stackoverflow

#57: Equation for testing if a point is inside a circle (Score: 339)

Created: 2009-01-26 Last updated: 2013-06-23

Tags: algorithm, language-agnostic, geometry

If you have a circle with center (center_x, center_y) and radius radius, how do you test if a given point with coordinates (x, y) is inside the circle?

#57 Best answer 1 of Equation for testing if a point is inside a circle (Score: 522)

Created: 2009-01-26 Last updated: 2013-06-23

In general, x and y must satisfy (x - center_x)^2 + (y - center_y)^2 < radius^2.

Please note that points that satisfy the above equation with < replaced by == are considered the points on the circle, and the points that satisfy the above equation with < replaced by > are considered the outside the circle.

#57 Best answer 2 of Equation for testing if a point is inside a circle(Score: 141)

Created: 2011-08-29 Last updated: 2015-02-07

Mathematically, Pythagoras is probably a simple method as many have already mentioned.

(x-center_x)^2 + (y - center_y)^2 < radius^2

Computationally, there are quicker ways. Define:

dx = abs(x-center_x)
dy = abs(y-center_y)
R = radius

If a point is more likely to be outside this circle then imagine a square drawn around it such that it’s sides are tangents to this circle:

if dx>R then 
    return false.
if dy>R then 
    return false.

Now imagine a square diamond drawn inside this circle such that it’s vertices touch this circle:

if dx + dy <= R then 
    return true.

Now we have covered most of our space and only a small area of this circle remains in between our square and diamond to be tested. Here we revert to Pythagoras as above.

if dx^2 + dy^2 <= R^2 then 
    return true
    return false.

If a point is more likely to be inside this circle then reverse order of first 3 steps:

if dx + dy <= R then 
    return true.
if dx > R then 
    return false.
if dy > R 
    then return false.
if dx^2 + dy^2 <= R^2 then 
    return true
    return false.

Alternate methods imagine a square inside this circle instead of a diamond but this requires slightly more tests and calculations with no computational advantage (inner square and diamonds have identical areas):

k = R/sqrt(2)
if dx <= k and dy <= k then 
    return true.


For those interested in performance I implemented this method in c, and compiled with -O3.

I obtained execution times by time ./a.out

I implemented this method, a normal method and a dummy method to determine timing overhead.

Normal: 21.3s This: 19.1s Overhead: 16.5s

So, it seems this method is more efficient in this implementation.

// compile gcc -O3 <filename>.c
// run: time ./a.out

#include <stdio.h>
#include <stdlib.h>

#define TRUE  (0==0)
#define FALSE (0==1)

#define ABS(x) (((x)<0)?(0-(x)):(x))

int xo, yo, R;

int inline inCircle( int x, int y ){  // 19.1, 19.1, 19.1
  int dx = ABS(x-xo);
  if (    dx >  R ) return FALSE;
  int dy = ABS(y-yo);
  if (    dy >  R ) return FALSE;
  if ( dx+dy <= R ) return TRUE;
  return ( dx*dx + dy*dy <= R*R );

int inline inCircleN( int x, int y ){  // 21.3, 21.1, 21.5
  int dx = ABS(x-xo);
  int dy = ABS(y-yo);
  return ( dx*dx + dy*dy <= R*R );

int inline dummy( int x, int y ){  // 16.6, 16.5, 16.4
  int dx = ABS(x-xo);
  int dy = ABS(y-yo);
  return FALSE;

#define N 1000000000

int main(){
  int x, y;
  xo = rand()%1000; yo = rand()%1000; R = 1;
  int n = 0;
  int c;
  for (c=0; c<N; c++){
    x = rand()%1000; y = rand()%1000;
//    if ( inCircle(x,y)  ){
    if ( inCircleN(x,y) ){
//    if ( dummy(x,y) ){
  printf( "%d of %d inside circle\n", n, N);

See also original question in stackoverflow

#58: How to implement classic sorting algorithms in modern C++? (Score: 335)

Created: 2014-07-09 Last updated: 2017-05-23

Tags: c++, algorithm, sorting, c++14, c++-faq

The std::sort algorithm (and its cousins std::partial_sort and std::nth_element) from the C++ Standard Library is in most implementations a complicated and hybrid amalgamation of more elementary sorting algorithms, such as selection sort, insertion sort, quick sort, merge sort, or heap sort.

There are many questions here and on sister sites such as related to bugs, complexity and other aspects of implementations of these classic sorting algorithms. Most of the offered implementations consist of raw loops, use index manipulation and concrete types, and are generally non-trivial to analyse in terms of correctness and efficiency.

Question: how can the above mentioned classic sorting algorithms be implemented using modern C++?

  • no raw loops, but combining the Standard Library’s algorithmic building blocks from <algorithm>
  • iterator interface and use of templates instead of index manipulation and concrete types
  • C++14 style, including the full Standard Library, as well as syntactic noise reducers such as auto, template aliases, transparent comparators and polymorphic lambdas.


  • for further references on implementations of sorting algorithms see Wikipedia, Rosetta Code or
  • according to Sean Parent’s conventions (slide 39), a raw loop is a for-loop longer than composition of two functions with an operator. So f(g(x)); or f(x); g(x); or f(x) + g(x); are not raw loops, and neither are the loops in selection_sort and insertion_sort below.
  • I follow Scott Meyers’s terminology to denote the current C++1y already as C++14, and to denote C++98 and C++03 both as C++98, so don’t flame me for that.
  • As suggested in the comments by @Mehrdad, I provide four implementations as a Live Example at the end of the answer: C++14, C++11, C++98 and Boost and C++98.
  • The answer itself is presented in terms of C++14 only. Where relevant, I denote the syntactic and library differences where the various language versions differ.

#58 Best answer 1 of How to implement classic sorting algorithms in modern C++? (Score: 399)

Created: 2014-07-09 Last updated: 2018-07-17

Algorithmic building blocks

We begin by assembling the algorithmic building blocks from the Standard Library:

#include <algorithm>    // min_element, iter_swap, 
                        // upper_bound, rotate, 
                        // partition, 
                        // inplace_merge,
                        // make_heap, sort_heap, push_heap, pop_heap,
                        // is_heap, is_sorted
#include <cassert>      // assert 
#include <functional>   // less
#include <iterator>     // distance, begin, end, next
  • the iterator tools such as non-member std::begin() / std::end() as well as with std::next() are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range in boost::begin() / boost::end(), and from Boost.Utility in boost::next().
  • the std::is_sorted algorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms of std::adjacent_find and a hand-written function object. Boost.Algorithm also provides a boost::algorithm::is_sorted as a substitute.
  • the std::is_heap algorithm is only available for C++11 and beyond.

Syntactical goodies

C++14 provides transparent comparators of the form std::less<> that act polymorphically on their arguments. This avoids having to provide an iterator’s type. This can be used in combination with C++11’s default function template arguments to create a single overload for sorting algorithms that take < as comparison and those that have a user-defined comparison function object.

template<class It, class Compare = std::less<>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++11, one can define a reusable template alias to extract an iterator’s value type which adds minor clutter to the sort algorithms' signatures:

template<class It>
using value_type_t = typename std::iterator_traits<It>::value_type;

template<class It, class Compare = std::less<value_type_t<It>>>
void xxx_sort(It first, It last, Compare cmp = Compare{});

In C++98, one needs to write two overloads and use the verbose typename xxx<yyy>::type syntax

template<class It, class Compare>
void xxx_sort(It first, It last, Compare cmp); // general implementation

template<class It>
void xxx_sort(It first, It last)
    xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>());
  • Another syntactical nicety is that C++14 facilitates wrapping user-defined comparators through polymorphic lambdas (with auto parameters that are deduced like function template arguments).
  • C++11 only has monomorphic lambdas, that require the use of the above template alias value_type_t.
  • In C++98, one either needs to write a standalone function object or resort to the verbose std::bind1st / std::bind2nd / std::not1 type of syntax.
  • Boost.Bind improves this with boost::bind and _1 / _2 placeholder syntax.
  • C++11 and beyond also have std::find_if_not, whereas C++98 needs std::find_if with a std::not1 around a function object.

C++ Style

There is no generally acceptable C++14 style yet. For better or for worse, I closely follow Scott Meyers’s draft Effective Modern C++ and Herb Sutter’s revamped GotW. I use the following style recommendations:

  • Herb Sutter’s “Almost Always Auto” and Scott Meyers’s “Prefer auto to specific type declarations” recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed.
  • Scott Meyers’s “Distinguish () and {} when creating objects” and consistently choose braced-initialization {} instead of the good old parenthesized initialization () (in order to side-step all most-vexing-parse issues in generic code).
  • Scott Meyers’s “Prefer alias declarations to typedefs”. For templates this is a must anyway, and using it everywhere instead of typedef saves time and adds consistency.
  • I use a for (auto it = first; it != last; ++it) pattern in some places, in order to allow for loop invariant checking for already sorted sub-ranges. In production code, the use of while (first != last) and a ++first somewhere inside the loop might be slightly better.

Selection sort

Selection sort does not adapt to the data in any way, so its runtime is always O(N²). However, selection sort has the property of minimizing the number of swaps. In applications where the cost of swapping items is high, selection sort very well may be the algorithm of choice.

To implement it using the Standard Library, repeatedly use std::min_element to find the remaining minimum element, and iter_swap to swap it into place:

template<class FwdIt, class Compare = std::less<>>
void selection_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
    for (auto it = first; it != last; ++it) {
        auto const selection = std::min_element(it, last, cmp);
        std::iter_swap(selection, it); 
        assert(std::is_sorted(first, std::next(it), cmp));

Note that selection_sort has the already processed range [first, it) sorted as its loop invariant. The minimal requirements are forward iterators, compared to std::sort’s random access iterators.

Details omitted:

  • selection sort can be optimized with an early test if (std::distance(first, last) <= 1) return; (or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;).
  • for bidirectional iterators, the above test can be combined with a loop over the interval [first, std::prev(last)), because the last element is guaranteed to be the minimal remaining element and doesn’t require a swap.

Insertion sort

Although it is one of the elementary sorting algorithms with O(N²) worst-case time, insertion sort is the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead). For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divide-and-conquer sorting algorithms, such as merge sort or quick sort.

To implement insertion_sort with the Standard Library, repeatedly use std::upper_bound to find the location where the current element needs to go, and use std::rotate to shift the remaining elements upward in the input range:

template<class FwdIt, class Compare = std::less<>>
void insertion_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
    for (auto it = first; it != last; ++it) {
        auto const insertion = std::upper_bound(first, it, *it, cmp);
        std::rotate(insertion, it, std::next(it)); 
        assert(std::is_sorted(first, std::next(it), cmp));

Note that insertion_sort has the already processed range [first, it) sorted as its loop invariant. Insertion sort also works with forward iterators.

Details omitted:

  • insertion sort can be optimized with an early test if (std::distance(first, last) <= 1) return; (or for forward / bidirectional iterators: if (first == last || std::next(first) == last) return;) and a loop over the interval [std::next(first), last), because the first element is guaranteed to be in place and doesn’t require a rotate.
  • for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear search using the Standard Library’s std::find_if_not algorithm.

Four Live Examples (C++14, C++11, C++98 and Boost, C++98) for the fragment below:

using RevIt = std::reverse_iterator<BiDirIt>;
auto const insertion = std::find_if_not(RevIt(it), RevIt(first), 
    [=](auto const& elem){ return cmp(*it, elem); }
  • For random inputs this gives O(N²) comparisons, but this improves to O(N) comparisons for almost sorted inputs. The binary search always uses O(N log N) comparisons.
  • For small input ranges, the better memory locality (cache, prefetching) of a linear search might also dominate a binary search (one should test this, of course).

Quick sort

When carefully implemented, quick sort is robust and has O(N log N) expected complexity, but with O(N²) worst-case complexity that can be triggered with adversarially chosen input data. When a stable sort is not needed, quick sort is an excellent general-purpose sort.

Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle element of the input range [first, last) as the pivot, then use two calls to std::partition (which are O(N)) to three-way partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:

template<class FwdIt, class Compare = std::less<>>
void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
    auto const N = std::distance(first, last);
    if (N <= 1) return;
    auto const pivot = *std::next(first, N / 2);
    auto const middle1 = std::partition(first, last, [=](auto const& elem){ 
        return cmp(elem, pivot); 
    auto const middle2 = std::partition(middle1, last, [=](auto const& elem){ 
        return !cmp(pivot, elem);
    quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp));
    quick_sort(middle2, last, cmp);  // assert(std::is_sorted(middle2, last, cmp));

However, quick sort is rather tricky to get correct and efficient, as each of the above steps has to be carefully checked and optimized for production level code. In particular, for O(N log N) complexity, the pivot has to result into a balanced partition of the input data, which cannot be guaranteed in general for an O(1) pivot, but which can be guaranteed if one sets the pivot as the O(N) median of the input range.

Details omitted:

  • the above implementation is particularly vulnerable to special inputs, e.g. it has O(N^2) complexity for the “organ pipe” input 1, 2, 3, ..., N/2, ... 3, 2, 1 (because the middle is always larger than all other elements).
  • median-of-3 pivot selection from randomly chosen elements from the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to O(N^2).
  • 3-way partitioning (separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to std::partition is not the most efficient O(N) algorithm to achieve this result.
  • for random access iterators, a guaranteed O(N log N) complexity can be achieved through median pivot selection using std::nth_element(first, middle, last), followed by recursive calls to quick_sort(first, middle, cmp) and quick_sort(middle, last, cmp).
  • this guarantee comes at a cost, however, because the constant factor of the O(N) complexity of std::nth_element can be more expensive than that of the O(1) complexity of a median-of-3 pivot followed by an O(N) call to std::partition (which is a cache-friendly single forward pass over the data).

Merge sort

If using O(N) extra space is of no concern, then merge sort is an excellent choice: it is the only stable O(N log N) sorting algorithm.

It is simple to implement using Standard algorithms: use a few iterator utilities to locate the middle of the input range [first, last) and combine two recursively sorted segments with a std::inplace_merge:

template<class BiDirIt, class Compare = std::less<>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{})
    auto const N = std::distance(first, last);
    if (N <= 1) return;                   
    auto const middle = std::next(first, N / 2);
    merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp));
    merge_sort(middle, last, cmp);  // assert(std::is_sorted(middle, last, cmp));
    std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp));

Merge sort requires bidirectional iterators, the bottleneck being the std::inplace_merge. Note that when sorting linked lists, merge sort requires only O(log N) extra space (for recursion). The latter algorithm is implemented by std::list<T>::sort in the Standard Library.

Heap sort

Heap sort is simple to implement, performs an O(N log N) in-place sort, but is not stable.

The first loop, O(N) “heapify” phase, puts the array into heap order. The second loop, the O(N log N) “sortdown” phase, repeatedly extracts the maximum and restores heap order. The Standard Library makes this extremely straightforward:

template<class RandomIt, class Compare = std::less<>>
void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{})
    lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp));
    lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp));

In case you consider it “cheating” to use std::make_heap and std::sort_heap, you can go one level deeper and write those functions yourself in terms of std::push_heap and std::pop_heap, respectively:

namespace lib {

// NOTE: is O(N log N), not O(N) as std::make_heap
template<class RandomIt, class Compare = std::less<>>
void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
    for (auto it = first; it != last;) {
        std::push_heap(first, ++it, cmp); 
        assert(std::is_heap(first, it, cmp));           

template<class RandomIt, class Compare = std::less<>>
void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
    for (auto it = last; it != first;) {
        std::pop_heap(first, it--, cmp);
        assert(std::is_heap(first, it, cmp));           

}   // namespace lib

The Standard Library specifies both push_heap and pop_heap as complexity O(log N). Note however that the outer loop over the range [first, last) results in O(N log N) complexity for make_heap, whereas std::make_heap has only O(N) complexity. For the overall O(N log N) complexity of heap_sort it doesn’t matter.

Details omitted**: O(N) implementation of make_heap**


Here are four Live Examples (C++14, C++11, C++98 and Boost, C++98) testing all five algorithms on a variety of inputs (not meant to be exhaustive or rigorous). Just note the huge differences in the LOC: C++11/C++14 need around 130 LOC, C++98 and Boost 190 (+50%) and C++98 more than 270 (+100%).

#58 Best answer 2 of How to implement classic sorting algorithms in modern C++?(Score: 15)

Created: 2016-05-08 Last updated: 2017-10-05

Another small and rather elegant one originally found on code review. I thought it was worth sharing.

Counting sort

While it is rather specialized, counting sort is a simple integer sorting algorithm and can often be really fast provided the values of the integers to sort are not too far apart. It’s probably ideal if one ever needs to sort a collection of one million integers known to be between 0 and 100 for example.

To implement a very simple counting sort that works with both signed and unsigned integers, one needs to find the smallest and greatest elements in the collection to sort; their difference will tell the size of the array of counts to allocate. Then, a second pass through the collection is done to count the number of occurrences of every element. Finally, we write back the required number of every integer back to the original collection.

template<typename ForwardIterator>
void counting_sort(ForwardIterator first, ForwardIterator last)
    if (first == last || std::next(first) == last) return;

    auto minmax = std::minmax_element(first, last);  // avoid if possible.
    auto min = *minmax.first;
    auto max = *minmax.second;
    if (min == max) return;

    using difference_type = typename std::iterator_traits<ForwardIterator>::difference_type;
    std::vector<difference_type> counts(max - min + 1, 0);

    for (auto it = first ; it != last ; ++it) {
        ++counts[*it - min];

    for (auto count: counts) {
        first = std::fill_n(first, count, min++);

While it is only useful when the range of the integers to sort is known to be small (generally not larger than the size of the collection to sort), making counting sort more generic would make it slower for its best cases. If the range is not known to be small, another algorithm such a radix sort, ska_sort or spreadsort can be used instead.

Details omitted:

  • We could have passed the bounds of the range of values accepted by the algorithm as parameters to totally get rid of the first std::minmax_element pass through the collection. This will make the algorithm even faster when a usefully-small range limit is known by other means. (It doesn’t have to be exact; passing a constant 0 to 100 is still much better than an extra pass over a million elements to find out that the true bounds are 1 to 95. Even 0 to 1000 would be worth it; the extra elements are written once with zero and read once).

  • Growing counts on the fly is another way to avoid a separate first pass. Doubling the counts size each time it has to grow gives amortized O(1) time per sorted element (see hash table insertion cost analysis for the proof that exponential grown is the key). Growing at the end for a new max is easy with std::vector::resize to add new zeroed elements. Changing min on the fly and inserting new zeroed elements at the front can be done with std::copy_backward after growing the vector. Then std::fill to zero the new elements.

  • The counts increment loop is a histogram. If the data is likely to be highly repetitive, and the number of bins is small, it can be worth unrolling over multiple arrays to reduce the serializing data dependency bottleneck of store/reload to the same bin. This means more counts to zero at the start, and more to loop over at the end, but should be worth it on most CPUs for our example of millions of 0 to 100 numbers, especially if the input might already be (partially) sorted and have long runs of the same number.

  • In the algorithm above, we use a min == max check to return early when every element has the same value (in which case the collection is sorted). It is actually possible to instead fully check whether the collection is already sorted while finding the extreme values of a collection with no additional time wasted (if the first pass is still memory bottlenecked with the extra work of updating min and max). However such an algorithm does not exist in the standard library and writing one would be more tedious than writing the rest of counting sort itself. It is left as an exercise for the reader.

  • Since the algorithm only works with integer values, static assertions could be used to prevent users from making obvious type mistakes. In some contexts, a substitution failure with std::enable_if_t might be preferred.

  • While modern C++ is cool, future C++ could be even cooler: structured bindings and some parts of the Ranges TS would make the algorithm even cleaner.

See also original question in stackoverflow

#59: What's the best way to build a string of delimited items in Java? (Score: 332)

Created: 2008-09-15 Last updated: 2020-04-21

Tags: java, string, algorithm

While working in a Java app, I recently needed to assemble a comma-delimited list of values to pass to another web service without knowing how many elements there would be in advance. The best I could come up with off the top of my head was something like this:

public String appendWithDelimiter( String original, String addition, String delimiter ) {
	if ( original.equals( "" ) ) {
        return addition;
	} else {
        return original + delimiter + addition;

String parameterString = "";
if ( condition ) parameterString = appendWithDelimiter( parameterString, "elementName", "," );
if ( anotherCondition ) parameterString = appendWithDelimiter( parameterString, "anotherElementName", "," );

I realize this isn’t particularly efficient, since there are strings being created all over the place, but I was going for clarity more than optimization.

In Ruby, I can do something like this instead, which feels much more elegant:

parameterArray = [];
parameterArray << "elementName" if condition;
parameterArray << "anotherElementName" if anotherCondition;
parameterString = parameterArray.join(",");

But since Java lacks a join command, I couldn’t figure out anything equivalent.

So, what’s the best way to do this in Java?

#59 Best answer 1 of What's the best way to build a string of delimited items in Java? (Score: 578)

Created: 2008-09-15 Last updated: 2015-08-21

Pre Java 8:

Apache’s commons lang is your friend here - it provides a join method very similar to the one you refer to in Ruby:


Java 8:

Java 8 provides joining out of the box via StringJoiner and String.join(). The snippets below show how you can use them:


StringJoiner joiner = new StringJoiner(",");
String joinedString = joiner.toString(); // "01,02,03"

String.join(CharSequence delimiter, CharSequence... elements))

String joinedString = String.join(" - ", "04", "05", "06"); // "04 - 05 - 06"

String.join(CharSequence delimiter, Iterable<? extends CharSequence> elements)

List<String> strings = new LinkedList<>();
String message = String.join(" ", strings);
//message returned is: "Java is cool"

#59 Best answer 2 of What's the best way to build a string of delimited items in Java?(Score: 52)

Created: 2008-09-15

You could write a little join-style utility method that works on java.util.Lists

public static String join(List<String> list, String delim) {

    StringBuilder sb = new StringBuilder();

    String loopDelim = "";

    for(String s : list) {


        loopDelim = delim;

    return sb.toString();

Then use it like so:

    List<String> list = new ArrayList<String>();

    if( condition )        list.add("elementName");
    if( anotherCondition ) list.add("anotherElementName");

    join(list, ",");

See also original question in stackoverflow

#60: What is the fastest way to get the value of π? (Score: 330)

Created: 2008-08-01 Last updated: 2019-05-15

Tags: performance, algorithm, language-agnostic, unix, pi

I’m looking for the fastest way to obtain the value of π, as a personal challenge. More specifically, I’m using ways that don’t involve using #define constants like M_PI, or hard-coding the number in.

The program below tests the various ways I know of. The inline assembly version is, in theory, the fastest option, though clearly not portable. I’ve included it as a baseline to compare against the other versions. In my tests, with built-ins, the 4 * atan(1) version is fastest on GCC 4.2, because it auto-folds the atan(1) into a constant. With -fno-builtin specified, the atan2(0, -1) version is fastest.

Here’s the main testing program (pitimes.c):

#include <math.h>
#include <stdio.h>
#include <time.h>

#define ITERS 10000000
#define TESTWITH(x) {                                                       \
    diff = 0.0;                                                             \
    time1 = clock();                                                        \
    for (i = 0; i < ITERS; ++i)                                             \
        diff += (x) - M_PI;                                                 \
    time2 = clock();                                                        \
    printf("%s\t=> %e, time => %f\n", #x, diff, diffclock(time2, time1));   \

static inline double
diffclock(clock_t time1, clock_t time0)
    return (double) (time1 - time0) / CLOCKS_PER_SEC;

    int i;
    clock_t time1, time2;
    double diff;

    /* Warmup. The atan2 case catches GCC's atan folding (which would
     * optimise the ``4 * atan(1) - M_PI'' to a no-op), if -fno-builtin
     * is not used. */
    TESTWITH(4 * atan(1))
    TESTWITH(4 * atan2(1, 1))

#if defined(__GNUC__) && (defined(__i386__) || defined(__amd64__))
    extern double fldpi();

    /* Actual tests start here. */
    TESTWITH(atan2(0, -1))
    TESTWITH(2 * asin(1))
    TESTWITH(4 * atan2(1, 1))
    TESTWITH(4 * atan(1))

    return 0;

And the inline assembly stuff (fldpi.c) that will only work for x86 and x64 systems:

    double pi;
    asm("fldpi" : "=t" (pi));
    return pi;

And a build script that builds all the configurations I’m testing (

gcc -O3 -Wall -c           -m32 -o fldpi-32.o fldpi.c
gcc -O3 -Wall -c           -m64 -o fldpi-64.o fldpi.c

gcc -O3 -Wall -ffast-math  -m32 -o pitimes1-32 pitimes.c fldpi-32.o
gcc -O3 -Wall              -m32 -o pitimes2-32 pitimes.c fldpi-32.o -lm
gcc -O3 -Wall -fno-builtin -m32 -o pitimes3-32 pitimes.c fldpi-32.o -lm
gcc -O3 -Wall -ffast-math  -m64 -o pitimes1-64 pitimes.c fldpi-64.o -lm
gcc -O3 -Wall              -m64 -o pitimes2-64 pitimes.c fldpi-64.o -lm
gcc -O3 -Wall -fno-builtin -m64 -o pitimes3-64 pitimes.c fldpi-64.o -lm

Apart from testing between various compiler flags (I’ve compared 32-bit against 64-bit too because the optimizations are different), I’ve also tried switching the order of the tests around. But still, the atan2(0, -1) version still comes out on top every time.

#60 Best answer 1 of What is the fastest way to get the value of π? (Score: 208)

Created: 2008-08-02 Last updated: 2018-09-05

The Monte Carlo method, as mentioned, applies some great concepts but it is, clearly, not the fastest, not by a long shot, not by any reasonable measure. Also, it all depends on what kind of accuracy you are looking for. The fastest π I know of is the one with the digits hard coded. Looking at Pi and Pi[PDF], there are a lot of formulae.

Here is a method that converges quickly — about 14 digits per iteration. PiFast, the current fastest application, uses this formula with the FFT. I’ll just write the formula, since the code is straightforward. This formula was almost found by Ramanujan and discovered by Chudnovsky. It is actually how he calculated several billion digits of the number — so it isn’t a method to disregard. The formula will overflow quickly and, since we are dividing factorials, it would be advantageous then to delay such calculations to remove terms.

enter image description here

enter image description here


enter image description here

Below is the Brent–Salamin algorithm. Wikipedia mentions that when a and b are “close enough” then (a + b)² / 4t will be an approximation of π. I’m not sure what “close enough” means, but from my tests, one iteration got 2 digits, two got 7, and three had 15, of course this is with doubles, so it might have an error based on its representation and the true calculation could be more accurate.

let pi_2 iters =
    let rec loop_ a b t p i =
        if i = 0 then a,b,t,p
            let a_n = (a +. b) /. 2.0 
            and b_n = sqrt (a*.b)
            and p_n = 2.0 *. p in
            let t_n = t -. (p *. (a -. a_n) *. (a -. a_n)) in
            loop_ a_n b_n t_n p_n (i - 1)
    let a,b,t,p = loop_ (1.0) (1.0 /. (sqrt 2.0)) (1.0/.4.0) (1.0) iters in
    (a +. b) *. (a +. b) /. (4.0 *. t)

Lastly, how about some pi golf (800 digits)? 160 characters!

int a=10000,b,c=2800,d,e,f[2801],g;main(){for(;b-c;)f[b++]=a/5;for(;d=0,g=c*2;c-=14,printf("%.4d",e+d/a),e=d%a)for(b=c;d+=f[b]*a,f[b]=d%--g,d/=g--,--b;d*=b);}

#60 Best answer 2 of What is the fastest way to get the value of π?(Score: 119)

Created: 2008-09-02 Last updated: 2018-09-03

I really like this program, because it approximates π by looking at its own area.

IOCCC 1988 : westley.c

#define _ -F<00||--F-OO--;
int F=00,OO=00;main(){F_OO();printf("%1.3f\n",4.*-F/OO/OO);}F_OO()

See also original question in stackoverflow

  1. This page use API to get the relevant data from stackoverflow community.
  2. Content license on this page is CC BY-SA 3.0.
  3. score = up votes - down votes.