Most votes on algorithm questions 6
Read all the top votes questions and answers in a single page.
#51: When is it practical to use DepthFirst Search (DFS) vs BreadthFirst Search (BFS)? (Score: 382)
Created: 20100726 Last updated: 20180311
Tags: algorithm, graphalgorithm, graphtheory, depthfirstsearch, breadthfirstsearch
I understand the differences between DFS and BFS, but I’m interested to know when it’s more practical to use one over the other?
Could anyone give any examples of how DFS would trump BFS and vice versa?
#51 Best answer 1 of When is it practical to use DepthFirst Search (DFS) vs BreadthFirst Search (BFS)? (Score: 400)
Created: 20100726 Last updated: 20200914
That heavily depends on the structure of the search tree and the number and location of solutions (aka searchedfor items).

If you know a solution is not far from the root of the tree, a breadth first search (BFS) might be better.

If the tree is very deep and solutions are rare, depth first search (DFS) might take an extremely long time, but BFS could be faster.

If the tree is very wide, a BFS might need too much memory, so it might be completely impractical.

If solutions are frequent but located deep in the tree, BFS could be impractical.

If the search tree is very deep you will need to restrict the search depth for depth first search (DFS), anyway (for example with iterative deepening).
But these are just rules of thumb; you’ll probably need to experiment.
Another issue is parallelism: if you want to parallelize BFS you would need a shared datastructure between threads, which is a bad thing. DFS might be easier to distribute even between connected machines if you don’t insist on the exact order of visiting the nodes.
#51 Best answer 2 of When is it practical to use DepthFirst Search (DFS) vs BreadthFirst Search (BFS)?(Score: 179)
Created: 20160506 Last updated: 20170703
Depthfirst Search
Depthfirst searches are often used in simulations of games (and gamelike situations in the real world). In a typical game you can choose one of several possible actions. Each choice leads to further choices, each of which leads to further choices, and so on into an everexpanding treeshaped graph of possibilities.
For example in games like Chess, tictactoe when you are deciding what move to make, you can mentally imagine a move, then your opponent’s possible responses, then your responses, and so on. You can decide what to do by seeing which move leads to the best outcome.
Only some paths in a game tree lead to your win. Some lead to a win by your opponent, when you reach such an ending, you must back up, or backtrack, to a previous node and try a different path. In this way you explore the tree until you find a path with a successful conclusion. Then you make the first move along this path.
Breadthfirst search
The breadthfirst search has an interesting property: It first finds all the vertices that are one edge away from the starting point, then all the vertices that are two edges away, and so on. This is useful if you’re trying to find the shortest path from the starting vertex to a given vertex. You start a BFS, and when you find the specified vertex, you know the path you’ve traced so far is the shortest path to the node. If there were a shorter path, the BFS would have found it already.
Breadthfirst search can be used for finding the neighbour nodes in peer to peer networks like BitTorrent, GPS systems to find nearby locations, social networking sites to find people in the specified distance and things like that.
See also original question in stackoverflow
#52: How to find list of possible words from a letter matrix [Boggle Solver] (Score: 379)
Created: 20090414 Last updated: 20190607
Tags: algorithm, puzzle, boggle
Lately I have been playing a game on my iPhone called Scramble. Some of you may know this game as Boggle. Essentially, when the game starts you get a matrix of letters like so:
F X I E
A M L O
E W B X
A S T U
The goal of the game is to find as many words as you can that can be formed by chaining letters together. You can start with any letter, and all the letters that surround it are fair game, and then once you move on to the next letter, all the letters that surround that letter are fair game, except for any previously used letters. So in the grid above, for example, I could come up with the words LOB
, TUX
, SEA
, FAME
, etc. Words must be at least 3 characters, and no more than NxN characters, which would be 16 in this game but can vary in some implementations. While this game is fun and addictive, I am apparently not very good at it and I wanted to cheat a little bit by making a program that would give me the best possible words (the longer the word the more points you get).
I am, unfortunately, not very good with algorithms or their efficiencies and so forth. My first attempt uses a dictionary such as this one (~2.3MB) and does a linear search trying to match combinations with dictionary entries. This takes a very long time to find the possible words, and since you only get 2 minutes per round, it is simply not adequate.
I am interested to see if any Stackoverflowers can come up with more efficient solutions. I am mostly looking for solutions using the Big 3 Ps: Python, PHP, and Perl, although anything with Java or C++ is cool too, since speed is essential.
CURRENT SOLUTIONS:
 Adam Rosenfield, Python, ~20s
 John Fouhy, Python, ~3s
 Kent Fredric, Perl, ~1s
 Darius Bacon, Python, ~1s
 rvarcher, VB.NET (live link), ~1s
 Paolo Bergantino, PHP (live link), ~5s (~2s locally)
#52 Best answer 1 of How to find list of possible words from a letter matrix [Boggle Solver] (Score: 145)
Created: 20090415 Last updated: 20121111
My answer works like the others here, but I’ll post it because it looks a bit faster than the other Python solutions, from setting up the dictionary faster. (I checked this against John Fouhy’s solution.) After setup, the time to solve is down in the noise.
grid = "fxie amlo ewbx astu".split()
nrows, ncols = len(grid), len(grid[0])
# A dictionary word that could be a solution must use only the grid's
# letters and have length >= 3. (With a caseinsensitive match.)
import re
alphabet = ''.join(set(''.join(grid)))
bogglable = re.compile('[' + alphabet + ']{3,}$', re.I).match
words = set(word.rstrip('\n') for word in open('words') if bogglable(word))
prefixes = set(word[:i] for word in words
for i in range(2, len(word)+1))
def solve():
for y, row in enumerate(grid):
for x, letter in enumerate(row):
for result in extending(letter, ((x, y),)):
yield result
def extending(prefix, path):
if prefix in words:
yield (prefix, path)
for (nx, ny) in neighbors(path[1]):
if (nx, ny) not in path:
prefix1 = prefix + grid[ny][nx]
if prefix1 in prefixes:
for result in extending(prefix1, path + ((nx, ny),)):
yield result
def neighbors((x, y)):
for nx in range(max(0, x1), min(x+2, ncols)):
for ny in range(max(0, y1), min(y+2, nrows)):
yield (nx, ny)
Sample usage:
# Print a maximallength word and its path:
print max(solve(), key=lambda (word, path): len(word))
Edit: Filter out words less than 3 letters long.
Edit 2: I was curious why Kent Fredric’s Perl solution was faster; it turns out to use regularexpression matching instead of a set of characters. Doing the same in Python about doubles the speed.
#52 Best answer 2 of How to find list of possible words from a letter matrix [Boggle Solver](Score: 116)
Created: 20090414 Last updated: 20090414
The fastest solution you’re going to get will probably involve storing your dictionary in a trie. Then, create a queue of triplets (x, y, s), where each element in the queue corresponds to a prefix s of a word which can be spelled in the grid, ending at location (x, y). Initialize the queue with N x N elements (where N is the size of your grid), one element for each square in the grid. Then, the algorithm proceeds as follows:
While the queue is not empty: Dequeue a triple (x, y, s) For each square (x', y') with letter c adjacent to (x, y): If s+c is a word, output s+c If s+c is a prefix of a word, insert (x', y', s+c) into the queue
If you store your dictionary in a trie, testing if s+c is a word or a prefix of a word can be done in constant time (provided you also keep some extra metadata in each queue datum, such as a pointer to the current node in the trie), so the running time of this algorithm is O(number of words that can be spelled).
[Edit] Here’s an implementation in Python that I just coded up:
#!/usr/bin/python
class TrieNode:
def __init__(self, parent, value):
self.parent = parent
self.children = [None] * 26
self.isWord = False
if parent is not None:
parent.children[ord(value)  97] = self
def MakeTrie(dictfile):
dict = open(dictfile)
root = TrieNode(None, '')
for word in dict:
curNode = root
for letter in word.lower():
if 97 <= ord(letter) < 123:
nextNode = curNode.children[ord(letter)  97]
if nextNode is None:
nextNode = TrieNode(curNode, letter)
curNode = nextNode
curNode.isWord = True
return root
def BoggleWords(grid, dict):
rows = len(grid)
cols = len(grid[0])
queue = []
words = []
for y in range(cols):
for x in range(rows):
c = grid[y][x]
node = dict.children[ord(c)  97]
if node is not None:
queue.append((x, y, c, node))
while queue:
x, y, s, node = queue[0]
del queue[0]
for dx, dy in ((1, 0), (1, 1), (0, 1), (1, 1), (1, 0), (1, 1), (0, 1), (1, 1)):
x2, y2 = x + dx, y + dy
if 0 <= x2 < cols and 0 <= y2 < rows:
s2 = s + grid[y2][x2]
node2 = node.children[ord(grid[y2][x2])  97]
if node2 is not None:
if node2.isWord:
words.append(s2)
queue.append((x2, y2, s2, node2))
return words
Example usage:
d = MakeTrie('/usr/share/dict/words')
print(BoggleWords(['fxie','amlo','ewbx','astu'], d))
Output:
[‘fa’, ‘xi’, ‘ie’, ‘io’, ‘el’, ‘am’, ‘ax’, ‘ae’, ‘aw’, ‘mi’, ‘ma’, ‘me’, ‘lo’, ‘li’, ‘oe’, ‘ox’, ‘em’, ‘ea’, ‘ea’, ‘es’, ‘wa’, ‘we’, ‘wa’, ‘bo’, ‘bu’, ‘as’, ‘aw’, ‘ae’, ‘st’, ‘se’, ‘sa’, ‘tu’, ‘ut’, ‘fam’, ‘fae’, ‘imi’, ‘eli’, ‘elm’, ‘elb’, ‘ami’, ‘ama’, ‘ame’, ‘aes’, ‘awl’, ‘awa’, ‘awe’, ‘awa’, ‘mix’, ‘mim’, ‘mil’, ‘mam’, ‘max’, ‘mae’, ‘maw’, ‘mew’, ‘mem’, ‘mes’, ‘lob’, ‘lox’, ‘lei’, ‘leo’, ‘lie’, ‘lim’, ‘oil’, ‘olm’, ‘ewe’, ‘eme’, ‘wax’, ‘waf’, ‘wae’, ‘waw’, ‘wem’, ‘wea’, ‘wea’, ‘was’, ‘waw’, ‘wae’, ‘bob’, ‘blo’, ‘bub’, ‘but’, ‘ast’, ‘ase’, ‘asa’, ‘awl’, ‘awa’, ‘awe’, ‘awa’, ‘aes’, ‘swa’, ‘swa’, ‘sew’, ‘sea’, ‘sea’, ‘saw’, ‘tux’, ‘tub’, ‘tut’, ‘twa’, ‘twa’, ‘tst’, ‘utu’, ‘fama’, ‘fame’, ‘ixil’, ‘imam’, ‘amli’, ‘amil’, ‘ambo’, ‘axil’, ‘axle’, ‘mimi’, ‘mima’, ‘mime’, ‘milo’, ‘mile’, ‘mewl’, ‘mese’, ‘mesa’, ‘lolo’, ‘lobo’, ‘lima’, ‘lime’, ‘limb’, ‘lile’, ‘oime’, ‘oleo’, ‘olio’, ‘oboe’, ‘obol’, ‘emim’, ‘emil’, ‘east’, ‘ease’, ‘wame’, ‘wawa’, ‘wawa’, ‘weam’, ‘west’, ‘wese’, ‘wast’, ‘wase’, ‘wawa’, ‘wawa’, ‘boil’, ‘bolo’, ‘bole’, ‘bobo’, ‘blob’, ‘bleo’, ‘bubo’, ‘asem’, ‘stub’, ‘stut’, ‘swam’, ‘semi’, ‘seme’, ‘seam’, ‘seax’, ‘sasa’, ‘sawt’, ‘tutu’, ‘tuts’, ‘twae’, ‘twas’, ‘twae’, ‘ilima’, ‘amble’, ‘axile’, ‘awest’, ‘mamie’, ‘mambo’, ‘maxim’, ‘mease’, ‘mesem’, ‘limax’, ‘limes’, ‘limbo’, ‘limbu’, ‘obole’, ‘emesa’, ‘embox’, ‘awest’, ‘swami’, ‘famble’, ‘mimble’, ‘maxima’, ‘embolo’, ‘embole’, ‘wamble’, ‘semese’, ‘semble’, ‘sawbwa’, ‘sawbwa’]
Notes: This program doesn’t output 1letter words, or filter by word length at all. That’s easy to add but not really relevant to the problem. It also outputs some words multiple times if they can be spelled in multiple ways. If a given word can be spelled in many different ways (worst case: every letter in the grid is the same (e.g. ‘A’) and a word like ‘aaaaaaaaaa’ is in your dictionary), then the running time will get horribly exponential. Filtering out duplicates and sorting is trivial to due after the algorithm has finished.
See also original question in stackoverflow
#53: Difference between BigO and LittleO Notation (Score: 367)
Created: 20090901 Last updated: 20170130
Tags: algorithm, timecomplexity, bigo, asymptoticcomplexity, littleo
What is the difference between BigO notation O(n)
and LittleO notation o(n)
?
#53 Best answer 1 of Difference between BigO and LittleO Notation (Score: 483)
Created: 20090901 Last updated: 20171216
f ∈ O(g) says, essentially
For at least one choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) <= k g(x) holds for all x > a.
Note that O(g) is the set of all functions for which this condition holds.
f ∈ o(g) says, essentially
For every choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) < k g(x) holds for all x > a.
Once again, note that o(g) is a set.
In BigO, it is only necessary that you find a particular multiplier k for which the inequality holds beyond some minimum x.
In Littleo, it must be that there is a minimum x after which the inequality holds no matter how small you make k, as long as it is not negative or zero.
These both describe upper bounds, although somewhat counterintuitively, Littleo is the stronger statement. There is a much larger gap between the growth rates of f and g if f ∈ o(g) than if f ∈ O(g).
One illustration of the disparity is this: f ∈ O(f) is true, but f ∈ o(f) is false. Therefore, BigO can be read as “f ∈ O(g) means that f’s asymptotic growth is no faster than g’s”, whereas “f ∈ o(g) means that f’s asymptotic growth is strictly slower than g’s”. It’s like <=
versus <
.
More specifically, if the value of g(x) is a constant multiple of the value of f(x), then f ∈ O(g) is true. This is why you can drop constants when working with bigO notation.
However, for f ∈ o(g) to be true, then g must include a higher power of x in its formula, and so the relative separation between f(x) and g(x) must actually get larger as x gets larger.
To use purely math examples (rather than referring to algorithms):
The following are true for BigO, but would not be true if you used littleo:
 x² ∈ O(x²)
 x² ∈ O(x² + x)
 x² ∈ O(200 * x²)
The following are true for littleo:
 x² ∈ o(x³)
 x² ∈ o(x!)
 ln(x) ∈ o(x)
Note that if f ∈ o(g), this implies f ∈ O(g). e.g. x² ∈ o(x³) so it is also true that x² ∈ O(x³), (again, think of O as <=
and o as <
)
#53 Best answer 2 of Difference between BigO and LittleO Notation(Score: 213)
Created: 20090901 Last updated: 20171216
BigO is to littleo as ≤
is to <
. BigO is an inclusive upper bound, while littleo is a strict upper bound.
For example, the function f(n) = 3n
is:
 in
O(n²)
,o(n²)
, andO(n)
 not in
O(lg n)
,o(lg n)
, oro(n)
Analogously, the number 1
is:
≤ 2
,< 2
, and≤ 1
 not
≤ 0
,< 0
, or< 1
Here’s a table, showing the general idea:
(Note: the table is a good guide but its limit definition should be in terms of the superior limit instead of the normal limit. For example, 3 + (n mod 2)
oscillates between 3 and 4 forever. It’s in O(1)
despite not having a normal limit, because it still has a lim sup
: 4.)
I recommend memorizing how the BigO notation converts to asymptotic comparisons. The comparisons are easier to remember, but less flexible because you can’t say things like n^{O(1)} = P.
See also original question in stackoverflow
#54: List of BigO for PHP functions (Score: 356)
Created: 20100318 Last updated: 20190416
Tags: php, performance, algorithm, arrays, bigo
After using PHP for a while now, I’ve noticed that not all builtin PHP functions are as fast as expected. Consider these two possible implementations of a function that finds if a number is prime using a cached array of primes.
//very slow for large $prime_array
$prime_array = array( 2, 3, 5, 7, 11, 13, .... 104729, ... );
$result_array = array();
foreach( $prime_array => $number ) {
$result_array[$number] = in_array( $number, $large_prime_array );
}
//speed is much less dependent on size of $prime_array, and runs much faster.
$prime_array => array( 2 => NULL, 3 => NULL, 5 => NULL, 7 => NULL,
11 => NULL, 13 => NULL, .... 104729 => NULL, ... );
foreach( $prime_array => $number ) {
$result_array[$number] = array_key_exists( $number, $large_prime_array );
}
This is because in_array
is implemented with a linear search O(n) which will linearly slow down as $prime_array
grows. Where the array_key_exists
function is implemented with a hash lookup O(1) which will not slow down unless the hash table gets extremely populated (in which case it’s only O(n)).
So far I’ve had to discover the bigO’s via trial and error, and occasionally looking at the source code. Now for the question…
Is there a list of the theoretical (or practical) big O times for all the builtin PHP functions?*
*or at least the interesting ones
For example, I find it very hard to predict the big O of functions listed because the possible implementation depends on unknown core data structures of PHP: array_merge
, array_merge_recursive
, array_reverse
, array_intersect
, array_combine
, str_replace
(with array inputs), etc.
#54 Best answer 1 of List of BigO for PHP functions (Score: 674)
Created: 20100320 Last updated: 20180601
Since it doesn’t seem like anyone has done this before I thought it’d be good idea to have it for reference somewhere. I’ve gone though and either via benchmark or codeskimming to characterize the array_*
functions. I’ve tried to put the more interesting BigO near the top. This list is not complete.
Note: All the BigO where calculated assuming a hash lookup is O(1) even though it’s really O(n). The coefficient of the n is so low, the ram overhead of storing a large enough array would hurt you before the characteristics of lookup BigO would start taking effect. For example the difference between a call to array_key_exists
at N=1 and N=1,000,000 is ~50% time increase.
Interesting Points:
isset
/array_key_exists
is much faster thanin_array
andarray_search
+
(union) is a bit faster thanarray_merge
(and looks nicer). But it does work differently so keep that in mind.shuffle
is on the same BigO tier asarray_rand
array_pop
/array_push
is faster thanarray_shift
/array_unshift
due to reindex penalty
Lookups:
array_key_exists
O(n) but really close to O(1)  this is because of linear polling in collisions, but because the chance of collisions is very small, the coefficient is also very small. I find you treat hash lookups as O(1) to give a more realistic bigO. For example the different between N=1000 and N=100000 is only about 50% slow down.
isset( $array[$index] )
O(n) but really close to O(1)  it uses the same lookup as array_key_exists. Since it’s language construct, will cache the lookup if the key is hardcoded, resulting in speed up in cases where the same key is used repeatedly.
in_array
O(n)  this is because it does a linear search though the array until it finds the value.
array_search
O(n)  it uses the same core function as in_array but returns value.
Queue functions:
array_push
O(∑ var_i, for all i)
array_pop
O(1)
array_shift
O(n)  it has to reindex all the keys
array_unshift
O(n + ∑ var_i, for all i)  it has to reindex all the keys
Array Intersection, Union, Subtraction:
array_intersect_key
if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
array_intersect
if intersection 100% do O(n^2*∑param_i_count, for all i), if intersection 0% intersect O(n^2)
array_intersect_assoc
if intersection 100% do O(Max(param_i_size)*∑param_i_count, for all i), if intersection 0% intersect O(∑param_i_size, for all i)
array_diff
O(π param_i_size, for all i)  That’s product of all the param_sizes
array_diff_key
O(∑ param_i_size, for i != 1)  this is because we don’t need to iterate over the first array.
array_merge
O( ∑ array_i, i != 1 )  doesn’t need to iterate over the first array
+
(union) O(n), where n is size of the 2nd array (ie array_first + array_second)  less overhead than array_merge since it doesn’t have to renumber
array_replace
O( ∑ array_i, for all i )
Random:
shuffle
O(n)
array_rand
O(n)  Requires a linear poll.
Obvious BigO:
array_fill
O(n)
array_fill_keys
O(n)
range
O(n)
array_splice
O(offset + length)
array_slice
O(offset + length) or O(n) if length = NULL
array_keys
O(n)
array_values
O(n)
array_reverse
O(n)
array_pad
O(pad_size)
array_flip
O(n)
array_sum
O(n)
array_product
O(n)
array_reduce
O(n)
array_filter
O(n)
array_map
O(n)
array_chunk
O(n)
array_combine
O(n)
I’d like to thank Eureqa for making it easy to find the BigO of the functions. It’s an amazing free program that can find the best fitting function for arbitrary data.
EDIT:
For those who doubt that PHP array lookups are O(N)
, I’ve written a benchmark to test that (they are still effectively O(1)
for most realistic values).
$tests = 1000000;
$max = 5000001;
for( $i = 1; $i <= $max; $i += 10000 ) {
//create lookup array
$array = array_fill( 0, $i, NULL );
//build test indexes
$test_indexes = array();
for( $j = 0; $j < $tests; $j++ ) {
$test_indexes[] = rand( 0, $i1 );
}
//benchmark array lookups
$start = microtime( TRUE );
foreach( $test_indexes as $test_index ) {
$value = $array[ $test_index ];
unset( $value );
}
$stop = microtime( TRUE );
unset( $array, $test_indexes, $test_index );
printf( "%d,%1.15f\n", $i, $stop  $start ); //time per 1mil lookups
unset( $stop, $start );
}
#54 Best answer 2 of List of BigO for PHP functions(Score: 5)
Created: 20100318
The explanation for the case you specifically describe is that associative arrays are implemented as hash tables  so lookup by key (and correspondingly, array_key_exists
) is O(1). However, arrays aren’t indexed by value, so the only way in the general case to discover whether a value exists in the array is a linear search. There’s no surprise there.
I don’t think there’s specific comprehensive documentation of the algorithmic complexity of PHP methods. However, if it’s a big enough concern to warrant the effort, you can always look through the source code.
See also original question in stackoverflow
#55: Determine if two rectangles overlap each other? (Score: 349)
Created: 20081120 Last updated: 20170523
Tags: c++, algorithm, geometry, overlap, rectangles
I am trying to write a C++ program that takes the following inputs from the user to construct rectangles (between 2 and 5): height, width, xpos, ypos. All of these rectangles will exist parallel to the x and the y axis, that is all of their edges will have slopes of 0 or infinity.
I’ve tried to implement what is mentioned in this question but I am not having very much luck.
My current implementation does the following:
// Gets all the vertices for Rectangle 1 and stores them in an array > arrRect1
// point 1 x: arrRect1[0], point 1 y: arrRect1[1] and so on...
// Gets all the vertices for Rectangle 2 and stores them in an array > arrRect2
// rotated edge of point a, rect 1
int rot_x, rot_y;
rot_x = arrRect1[3];
rot_y = arrRect1[2];
// point on rotated edge
int pnt_x, pnt_y;
pnt_x = arrRect1[2];
pnt_y = arrRect1[3];
// test point, a from rect 2
int tst_x, tst_y;
tst_x = arrRect2[0];
tst_y = arrRect2[1];
int value;
value = (rot_x * (tst_x  pnt_x)) + (rot_y * (tst_y  pnt_y));
cout << "Value: " << value;
However I’m not quite sure if (a) I’ve implemented the algorithm I linked to correctly, or if I did exactly how to interpret this?
Any suggestions?
#55 Best answer 1 of Determine if two rectangles overlap each other? (Score: 744)
Created: 20081120 Last updated: 20200122
if (RectA.Left < RectB.Right && RectA.Right > RectB.Left &&
RectA.Top > RectB.Bottom && RectA.Bottom < RectB.Top )
or, using Cartesian coordinates
(With X1 being left coord, X2 being right coord, increasing from left to right and Y1 being Top coord, and Y2 being Bottom coord, increasing from bottom to top – if this is not how your coordinate system [e.g. most computers have the Y direction reversed], swap the comparisons below) …
if (RectA.X1 < RectB.X2 && RectA.X2 > RectB.X1 &&
RectA.Y1 > RectB.Y2 && RectA.Y2 < RectB.Y1)
Say you have Rect A, and Rect B. Proof is by contradiction. Any one of four conditions guarantees that no overlap can exist:
 Cond1. If A’s left edge is to the right of the B’s right edge,  then A is Totally to right Of B
 Cond2. If A’s right edge is to the left of the B’s left edge,  then A is Totally to left Of B
 Cond3. If A’s top edge is below B’s bottom edge,  then A is Totally below B
 Cond4. If A’s bottom edge is above B’s top edge,  then A is Totally above B
So condition for NonOverlap is
NONOverlap => Cond1 Or Cond2 Or Cond3 Or Cond4
Therefore, a sufficient condition for Overlap is the opposite.
Overlap => NOT (Cond1 Or Cond2 Or Cond3 Or Cond4)
De Morgan’s law says
Not (A or B or C or D)
is the same as Not A And Not B And Not C And Not D
so using De Morgan, we have
Not Cond1 And Not Cond2 And Not Cond3 And Not Cond4
This is equivalent to:
 A’s Left Edge to left of B’s right edge, [
RectA.Left < RectB.Right
], and  A’s right edge to right of B’s left edge, [
RectA.Right > RectB.Left
], and  A’s top above B’s bottom, [
RectA.Top > RectB.Bottom
], and  A’s bottom below B’s Top [
RectA.Bottom < RectB.Top
]
Note 1: It is fairly obvious this same principle can be extended to any number of dimensions.
Note 2: It should also be fairly obvious to count overlaps of just one pixel, change the <
and/or the >
on that boundary to a <=
or a >=
.
Note 3: This answer, when utilizing Cartesian coordinates (X, Y) is based on standard algebraic Cartesian coordinates (x increases left to right, and Y increases bottom to top). Obviously, where a computer system might mechanize screen coordinates differently, (e.g., increasing Y from top to bottom, or X From right to left), the syntax will need to be adjusted accordingly/
#55 Best answer 2 of Determine if two rectangles overlap each other?(Score: 121)
Created: 20081120 Last updated: 20140411
struct rect
{
int x;
int y;
int width;
int height;
};
bool valueInRange(int value, int min, int max)
{ return (value >= min) && (value <= max); }
bool rectOverlap(rect A, rect B)
{
bool xOverlap = valueInRange(A.x, B.x, B.x + B.width) 
valueInRange(B.x, A.x, A.x + A.width);
bool yOverlap = valueInRange(A.y, B.y, B.y + B.height) 
valueInRange(B.y, A.y, A.y + A.height);
return xOverlap && yOverlap;
}
See also original question in stackoverflow
#56: What is stability in sorting algorithms and why is it important? (Score: 348)
Created: 20091005 Last updated: 20171227
Tags: algorithm, sorting, languageagnostic, stability
I’m very curious, why stability is or is not important in sorting algorithms?
#56 Best answer 1 of What is stability in sorting algorithms and why is it important? (Score: 443)
Created: 20091005 Last updated: 20170404
A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted. Some sorting algorithms are stable by nature like Insertion sort, Merge Sort, Bubble Sort, etc. And some sorting algorithms are not, like Heap Sort, Quick Sort, etc.
Background: a “stable” sorting algorithm keeps the items with the same sorting key in order. Suppose we have a list of 5letter words:
peach
straw
apple
spork
If we sort the list by just the first letter of each word then a stablesort would produce:
apple
peach
straw
spork
In an unstable sort algorithm, straw
or spork
may be interchanged, but in a stable one, they stay in the same relative positions (that is, since straw
appears before spork
in the input, it also appears before spork
in the output).
We could sort the list of words using this algorithm: stable sorting by column 5, then 4, then 3, then 2, then 1. In the end, it will be correctly sorted. Convince yourself of that. (by the way, that algorithm is called radix sort)
Now to answer your question, suppose we have a list of first and last names. We are asked to sort “by last name, then by first”. We could first sort (stable or unstable) by the first name, then stable sort by the last name. After these sorts, the list is primarily sorted by the last name. However, where last names are the same, the first names are sorted.
You can’t stack unstable sorts in the same fashion.
#56 Best answer 2 of What is stability in sorting algorithms and why is it important?(Score: 72)
Created: 20170506 Last updated: 20200519
A stable sorting algorithm is the one that sorts the identical elements in their same order as they appear in the input, whilst unstable sorting may not satisfy the case.  _{I thank my algorithm lecturer Didem Gozupek to have provided insight into algorithms}.
Stable Sorting Algorithms:
 Insertion Sort
 Merge Sort
 Bubble Sort
 Tim Sort
 Counting Sort
 Block Sort
 Quadsort
 Library Sort
 Cocktail shaker Sort
 Gnome Sort
 Odd–even Sort
Unstable Sorting Algorithms:
 Heap sort
 Selection sort
 Shell sort
 Quick sort
 Introsort (subject to Quicksort)
 Tree sort
 Cycle sort
 Smoothsort
 Tournament sort(subject to Hesapsort)
See also original question in stackoverflow
#57: Equation for testing if a point is inside a circle (Score: 339)
Created: 20090126 Last updated: 20130623
Tags: algorithm, languageagnostic, geometry
If you have a circle with center (center_x, center_y)
and radius radius
, how do you test if a given point with coordinates (x, y)
is inside the circle?
#57 Best answer 1 of Equation for testing if a point is inside a circle (Score: 522)
Created: 20090126 Last updated: 20130623
In general, x
and y
must satisfy (x  center_x)^2 + (y  center_y)^2 < radius^2
.
Please note that points that satisfy the above equation with <
replaced by ==
are considered the points on the circle, and the points that satisfy the above equation with <
replaced by >
are considered the outside the circle.
#57 Best answer 2 of Equation for testing if a point is inside a circle(Score: 141)
Created: 20110829 Last updated: 20150207
Mathematically, Pythagoras is probably a simple method as many have already mentioned.
(xcenter_x)^2 + (y  center_y)^2 < radius^2
Computationally, there are quicker ways. Define:
dx = abs(xcenter_x)
dy = abs(ycenter_y)
R = radius
If a point is more likely to be outside this circle then imagine a square drawn around it such that it’s sides are tangents to this circle:
if dx>R then
return false.
if dy>R then
return false.
Now imagine a square diamond drawn inside this circle such that it’s vertices touch this circle:
if dx + dy <= R then
return true.
Now we have covered most of our space and only a small area of this circle remains in between our square and diamond to be tested. Here we revert to Pythagoras as above.
if dx^2 + dy^2 <= R^2 then
return true
else
return false.
If a point is more likely to be inside this circle then reverse order of first 3 steps:
if dx + dy <= R then
return true.
if dx > R then
return false.
if dy > R
then return false.
if dx^2 + dy^2 <= R^2 then
return true
else
return false.
Alternate methods imagine a square inside this circle instead of a diamond but this requires slightly more tests and calculations with no computational advantage (inner square and diamonds have identical areas):
k = R/sqrt(2)
if dx <= k and dy <= k then
return true.
Update:
For those interested in performance I implemented this method in c, and compiled with O3.
I obtained execution times by time ./a.out
I implemented this method, a normal method and a dummy method to determine timing overhead.
Normal: 21.3s
This: 19.1s
Overhead: 16.5s
So, it seems this method is more efficient in this implementation.
// compile gcc O3 <filename>.c
// run: time ./a.out
#include <stdio.h>
#include <stdlib.h>
#define TRUE (0==0)
#define FALSE (0==1)
#define ABS(x) (((x)<0)?(0(x)):(x))
int xo, yo, R;
int inline inCircle( int x, int y ){ // 19.1, 19.1, 19.1
int dx = ABS(xxo);
if ( dx > R ) return FALSE;
int dy = ABS(yyo);
if ( dy > R ) return FALSE;
if ( dx+dy <= R ) return TRUE;
return ( dx*dx + dy*dy <= R*R );
}
int inline inCircleN( int x, int y ){ // 21.3, 21.1, 21.5
int dx = ABS(xxo);
int dy = ABS(yyo);
return ( dx*dx + dy*dy <= R*R );
}
int inline dummy( int x, int y ){ // 16.6, 16.5, 16.4
int dx = ABS(xxo);
int dy = ABS(yyo);
return FALSE;
}
#define N 1000000000
int main(){
int x, y;
xo = rand()%1000; yo = rand()%1000; R = 1;
int n = 0;
int c;
for (c=0; c<N; c++){
x = rand()%1000; y = rand()%1000;
// if ( inCircle(x,y) ){
if ( inCircleN(x,y) ){
// if ( dummy(x,y) ){
n++;
}
}
printf( "%d of %d inside circle\n", n, N);
}
See also original question in stackoverflow
#58: How to implement classic sorting algorithms in modern C++? (Score: 335)
Created: 20140709 Last updated: 20170523
Tags: c++, algorithm, sorting, c++14, c++faq
The std::sort
algorithm (and its cousins std::partial_sort
and std::nth_element
) from the C++ Standard Library is in most implementations a complicated and hybrid amalgamation of more elementary sorting algorithms, such as selection sort, insertion sort, quick sort, merge sort, or heap sort.
There are many questions here and on sister sites such as https://codereview.stackexchange.com/ related to bugs, complexity and other aspects of implementations of these classic sorting algorithms. Most of the offered implementations consist of raw loops, use index manipulation and concrete types, and are generally nontrivial to analyse in terms of correctness and efficiency.
Question: how can the above mentioned classic sorting algorithms be implemented using modern C++?
 no raw loops, but combining the Standard Library’s algorithmic building blocks from
<algorithm>
 iterator interface and use of templates instead of index manipulation and concrete types
 C++14 style, including the full Standard Library, as well as syntactic noise reducers such as
auto
, template aliases, transparent comparators and polymorphic lambdas.
Notes:
 for further references on implementations of sorting algorithms see Wikipedia, Rosetta Code or http://www.sortingalgorithms.com/
 according to Sean Parent’s conventions (slide 39), a raw loop is a
for
loop longer than composition of two functions with an operator. Sof(g(x));
orf(x); g(x);
orf(x) + g(x);
are not raw loops, and neither are the loops inselection_sort
andinsertion_sort
below.  I follow Scott Meyers’s terminology to denote the current C++1y already as C++14, and to denote C++98 and C++03 both as C++98, so don’t flame me for that.
 As suggested in the comments by @Mehrdad, I provide four implementations as a Live Example at the end of the answer: C++14, C++11, C++98 and Boost and C++98.
 The answer itself is presented in terms of C++14 only. Where relevant, I denote the syntactic and library differences where the various language versions differ.
#58 Best answer 1 of How to implement classic sorting algorithms in modern C++? (Score: 399)
Created: 20140709 Last updated: 20180717
Algorithmic building blocks
We begin by assembling the algorithmic building blocks from the Standard Library:
#include <algorithm> // min_element, iter_swap,
// upper_bound, rotate,
// partition,
// inplace_merge,
// make_heap, sort_heap, push_heap, pop_heap,
// is_heap, is_sorted
#include <cassert> // assert
#include <functional> // less
#include <iterator> // distance, begin, end, next
 the iterator tools such as nonmember
std::begin()
/std::end()
as well as withstd::next()
are only available as of C++11 and beyond. For C++98, one needs to write these himself. There are substitutes from Boost.Range inboost::begin()
/boost::end()
, and from Boost.Utility inboost::next()
.  the
std::is_sorted
algorithm is only available for C++11 and beyond. For C++98, this can be implemented in terms ofstd::adjacent_find
and a handwritten function object. Boost.Algorithm also provides aboost::algorithm::is_sorted
as a substitute.  the
std::is_heap
algorithm is only available for C++11 and beyond.
Syntactical goodies
C++14 provides transparent comparators of the form std::less<>
that act polymorphically on their arguments. This avoids having to provide an iterator’s type. This can be used in combination with C++11’s default function template arguments to create a single overload for sorting algorithms that take <
as comparison and those that have a userdefined comparison function object.
template<class It, class Compare = std::less<>>
void xxx_sort(It first, It last, Compare cmp = Compare{});
In C++11, one can define a reusable template alias to extract an iterator’s value type which adds minor clutter to the sort algorithms' signatures:
template<class It>
using value_type_t = typename std::iterator_traits<It>::value_type;
template<class It, class Compare = std::less<value_type_t<It>>>
void xxx_sort(It first, It last, Compare cmp = Compare{});
In C++98, one needs to write two overloads and use the verbose typename xxx<yyy>::type
syntax
template<class It, class Compare>
void xxx_sort(It first, It last, Compare cmp); // general implementation
template<class It>
void xxx_sort(It first, It last)
{
xxx_sort(first, last, std::less<typename std::iterator_traits<It>::value_type>());
}
 Another syntactical nicety is that C++14 facilitates wrapping userdefined comparators through polymorphic lambdas (with
auto
parameters that are deduced like function template arguments).  C++11 only has monomorphic lambdas, that require the use of the above template alias
value_type_t
.  In C++98, one either needs to write a standalone function object or resort to the verbose
std::bind1st
/std::bind2nd
/std::not1
type of syntax.  Boost.Bind improves this with
boost::bind
and_1
/_2
placeholder syntax.  C++11 and beyond also have
std::find_if_not
, whereas C++98 needsstd::find_if
with astd::not1
around a function object.
C++ Style
There is no generally acceptable C++14 style yet. For better or for worse, I closely follow Scott Meyers’s draft Effective Modern C++ and Herb Sutter’s revamped GotW. I use the following style recommendations:
 Herb Sutter’s “Almost Always Auto” and Scott Meyers’s “Prefer auto to specific type declarations” recommendation, for which the brevity is unsurpassed, although its clarity is sometimes disputed.
 Scott Meyers’s “Distinguish
()
and{}
when creating objects” and consistently choose bracedinitialization{}
instead of the good old parenthesized initialization()
(in order to sidestep all mostvexingparse issues in generic code).  Scott Meyers’s “Prefer alias declarations to typedefs”. For templates this is a must anyway, and using it everywhere instead of
typedef
saves time and adds consistency.  I use a
for (auto it = first; it != last; ++it)
pattern in some places, in order to allow for loop invariant checking for already sorted subranges. In production code, the use ofwhile (first != last)
and a++first
somewhere inside the loop might be slightly better.
Selection sort
Selection sort does not adapt to the data in any way, so its runtime is always O(N²)
. However, selection sort has the property of minimizing the number of swaps. In applications where the cost of swapping items is high, selection sort very well may be the algorithm of choice.
To implement it using the Standard Library, repeatedly use std::min_element
to find the remaining minimum element, and iter_swap
to swap it into place:
template<class FwdIt, class Compare = std::less<>>
void selection_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
for (auto it = first; it != last; ++it) {
auto const selection = std::min_element(it, last, cmp);
std::iter_swap(selection, it);
assert(std::is_sorted(first, std::next(it), cmp));
}
}
Note that selection_sort
has the already processed range [first, it)
sorted as its loop invariant. The minimal requirements are forward iterators, compared to std::sort
’s random access iterators.
Details omitted:
 selection sort can be optimized with an early test
if (std::distance(first, last) <= 1) return;
(or for forward / bidirectional iterators:if (first == last  std::next(first) == last) return;
).  for bidirectional iterators, the above test can be combined with a loop over the interval
[first, std::prev(last))
, because the last element is guaranteed to be the minimal remaining element and doesn’t require a swap.
Insertion sort
Although it is one of the elementary sorting algorithms with O(N²)
worstcase time, insertion sort is the algorithm of choice either when the data is nearly sorted (because it is adaptive) or when the problem size is small (because it has low overhead). For these reasons, and because it is also stable, insertion sort is often used as the recursive base case (when the problem size is small) for higher overhead divideandconquer sorting algorithms, such as merge sort or quick sort.
To implement insertion_sort
with the Standard Library, repeatedly use std::upper_bound
to find the location where the current element needs to go, and use std::rotate
to shift the remaining elements upward in the input range:
template<class FwdIt, class Compare = std::less<>>
void insertion_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
for (auto it = first; it != last; ++it) {
auto const insertion = std::upper_bound(first, it, *it, cmp);
std::rotate(insertion, it, std::next(it));
assert(std::is_sorted(first, std::next(it), cmp));
}
}
Note that insertion_sort
has the already processed range [first, it)
sorted as its loop invariant. Insertion sort also works with forward iterators.
Details omitted:
 insertion sort can be optimized with an early test
if (std::distance(first, last) <= 1) return;
(or for forward / bidirectional iterators:if (first == last  std::next(first) == last) return;
) and a loop over the interval[std::next(first), last)
, because the first element is guaranteed to be in place and doesn’t require a rotate.  for bidirectional iterators, the binary search to find the insertion point can be replaced with a reverse linear search using the Standard Library’s
std::find_if_not
algorithm.
Four Live Examples (C++14, C++11, C++98 and Boost, C++98) for the fragment below:
using RevIt = std::reverse_iterator<BiDirIt>;
auto const insertion = std::find_if_not(RevIt(it), RevIt(first),
[=](auto const& elem){ return cmp(*it, elem); }
).base();
 For random inputs this gives
O(N²)
comparisons, but this improves toO(N)
comparisons for almost sorted inputs. The binary search always usesO(N log N)
comparisons.  For small input ranges, the better memory locality (cache, prefetching) of a linear search might also dominate a binary search (one should test this, of course).
Quick sort
When carefully implemented, quick sort is robust and has O(N log N)
expected complexity, but with O(N²)
worstcase complexity that can be triggered with adversarially chosen input data. When a stable sort is not needed, quick sort is an excellent generalpurpose sort.
Even for the simplest versions, quick sort is quite a bit more complicated to implement using the Standard Library than the other classic sorting algorithms. The approach below uses a few iterator utilities to locate the middle element of the input range [first, last)
as the pivot, then use two calls to std::partition
(which are O(N)
) to threeway partition the input range into segments of elements that are smaller than, equal to, and larger than the selected pivot, respectively. Finally the two outer segments with elements smaller than and larger than the pivot are recursively sorted:
template<class FwdIt, class Compare = std::less<>>
void quick_sort(FwdIt first, FwdIt last, Compare cmp = Compare{})
{
auto const N = std::distance(first, last);
if (N <= 1) return;
auto const pivot = *std::next(first, N / 2);
auto const middle1 = std::partition(first, last, [=](auto const& elem){
return cmp(elem, pivot);
});
auto const middle2 = std::partition(middle1, last, [=](auto const& elem){
return !cmp(pivot, elem);
});
quick_sort(first, middle1, cmp); // assert(std::is_sorted(first, middle1, cmp));
quick_sort(middle2, last, cmp); // assert(std::is_sorted(middle2, last, cmp));
}
However, quick sort is rather tricky to get correct and efficient, as each of the above steps has to be carefully checked and optimized for production level code. In particular, for O(N log N)
complexity, the pivot has to result into a balanced partition of the input data, which cannot be guaranteed in general for an O(1)
pivot, but which can be guaranteed if one sets the pivot as the O(N)
median of the input range.
Details omitted:
 the above implementation is particularly vulnerable to special inputs, e.g. it has
O(N^2)
complexity for the “organ pipe” input1, 2, 3, ..., N/2, ... 3, 2, 1
(because the middle is always larger than all other elements).  medianof3 pivot selection from randomly chosen elements from the input range guards against almost sorted inputs for which the complexity would otherwise deteriorate to
O(N^2)
.  3way partitioning (separating elements smaller than, equal to and larger than the pivot) as shown by the two calls to
std::partition
is not the most efficientO(N)
algorithm to achieve this result.  for random access iterators, a guaranteed
O(N log N)
complexity can be achieved through median pivot selection usingstd::nth_element(first, middle, last)
, followed by recursive calls toquick_sort(first, middle, cmp)
andquick_sort(middle, last, cmp)
.  this guarantee comes at a cost, however, because the constant factor of the
O(N)
complexity ofstd::nth_element
can be more expensive than that of theO(1)
complexity of a medianof3 pivot followed by anO(N)
call tostd::partition
(which is a cachefriendly single forward pass over the data).
Merge sort
If using O(N)
extra space is of no concern, then merge sort is an excellent choice: it is the only stable O(N log N)
sorting algorithm.
It is simple to implement using Standard algorithms: use a few iterator utilities to locate the middle of the input range [first, last)
and combine two recursively sorted segments with a std::inplace_merge
:
template<class BiDirIt, class Compare = std::less<>>
void merge_sort(BiDirIt first, BiDirIt last, Compare cmp = Compare{})
{
auto const N = std::distance(first, last);
if (N <= 1) return;
auto const middle = std::next(first, N / 2);
merge_sort(first, middle, cmp); // assert(std::is_sorted(first, middle, cmp));
merge_sort(middle, last, cmp); // assert(std::is_sorted(middle, last, cmp));
std::inplace_merge(first, middle, last, cmp); // assert(std::is_sorted(first, last, cmp));
}
Merge sort requires bidirectional iterators, the bottleneck being the std::inplace_merge
. Note that when sorting linked lists, merge sort requires only O(log N)
extra space (for recursion). The latter algorithm is implemented by std::list<T>::sort
in the Standard Library.
Heap sort
Heap sort is simple to implement, performs an O(N log N)
inplace sort, but is not stable.
The first loop, O(N)
“heapify” phase, puts the array into heap order. The second loop, the O(N log N
) “sortdown” phase, repeatedly extracts the maximum and restores heap order. The Standard Library makes this extremely straightforward:
template<class RandomIt, class Compare = std::less<>>
void heap_sort(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
lib::make_heap(first, last, cmp); // assert(std::is_heap(first, last, cmp));
lib::sort_heap(first, last, cmp); // assert(std::is_sorted(first, last, cmp));
}
In case you consider it “cheating” to use std::make_heap
and std::sort_heap
, you can go one level deeper and write those functions yourself in terms of std::push_heap
and std::pop_heap
, respectively:
namespace lib {
// NOTE: is O(N log N), not O(N) as std::make_heap
template<class RandomIt, class Compare = std::less<>>
void make_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
for (auto it = first; it != last;) {
std::push_heap(first, ++it, cmp);
assert(std::is_heap(first, it, cmp));
}
}
template<class RandomIt, class Compare = std::less<>>
void sort_heap(RandomIt first, RandomIt last, Compare cmp = Compare{})
{
for (auto it = last; it != first;) {
std::pop_heap(first, it, cmp);
assert(std::is_heap(first, it, cmp));
}
}
} // namespace lib
The Standard Library specifies both push_heap
and pop_heap
as complexity O(log N)
. Note however that the outer loop over the range [first, last)
results in O(N log N)
complexity for make_heap
, whereas std::make_heap
has only O(N)
complexity. For the overall O(N log N)
complexity of heap_sort
it doesn’t matter.
Details omitted**: O(N)
implementation of make_heap
**
Testing
Here are four Live Examples (C++14, C++11, C++98 and Boost, C++98) testing all five algorithms on a variety of inputs (not meant to be exhaustive or rigorous). Just note the huge differences in the LOC: C++11/C++14 need around 130 LOC, C++98 and Boost 190 (+50%) and C++98 more than 270 (+100%).
#58 Best answer 2 of How to implement classic sorting algorithms in modern C++?(Score: 15)
Created: 20160508 Last updated: 20171005
Another small and rather elegant one originally found on code review. I thought it was worth sharing.
Counting sort
While it is rather specialized, counting sort is a simple integer sorting algorithm and can often be really fast provided the values of the integers to sort are not too far apart. It’s probably ideal if one ever needs to sort a collection of one million integers known to be between 0 and 100 for example.
To implement a very simple counting sort that works with both signed and unsigned integers, one needs to find the smallest and greatest elements in the collection to sort; their difference will tell the size of the array of counts to allocate. Then, a second pass through the collection is done to count the number of occurrences of every element. Finally, we write back the required number of every integer back to the original collection.
template<typename ForwardIterator>
void counting_sort(ForwardIterator first, ForwardIterator last)
{
if (first == last  std::next(first) == last) return;
auto minmax = std::minmax_element(first, last); // avoid if possible.
auto min = *minmax.first;
auto max = *minmax.second;
if (min == max) return;
using difference_type = typename std::iterator_traits<ForwardIterator>::difference_type;
std::vector<difference_type> counts(max  min + 1, 0);
for (auto it = first ; it != last ; ++it) {
++counts[*it  min];
}
for (auto count: counts) {
first = std::fill_n(first, count, min++);
}
}
While it is only useful when the range of the integers to sort is known to be small (generally not larger than the size of the collection to sort), making counting sort more generic would make it slower for its best cases. If the range is not known to be small, another algorithm such a radix sort, ska_sort or spreadsort can be used instead.
Details omitted:

We could have passed the bounds of the range of values accepted by the algorithm as parameters to totally get rid of the first
std::minmax_element
pass through the collection. This will make the algorithm even faster when a usefullysmall range limit is known by other means. (It doesn’t have to be exact; passing a constant 0 to 100 is still much better than an extra pass over a million elements to find out that the true bounds are 1 to 95. Even 0 to 1000 would be worth it; the extra elements are written once with zero and read once). 
Growing
counts
on the fly is another way to avoid a separate first pass. Doubling thecounts
size each time it has to grow gives amortized O(1) time per sorted element (see hash table insertion cost analysis for the proof that exponential grown is the key). Growing at the end for a newmax
is easy withstd::vector::resize
to add new zeroed elements. Changingmin
on the fly and inserting new zeroed elements at the front can be done withstd::copy_backward
after growing the vector. Thenstd::fill
to zero the new elements. 
The
counts
increment loop is a histogram. If the data is likely to be highly repetitive, and the number of bins is small, it can be worth unrolling over multiple arrays to reduce the serializing data dependency bottleneck of store/reload to the same bin. This means more counts to zero at the start, and more to loop over at the end, but should be worth it on most CPUs for our example of millions of 0 to 100 numbers, especially if the input might already be (partially) sorted and have long runs of the same number. 
In the algorithm above, we use a
min == max
check to return early when every element has the same value (in which case the collection is sorted). It is actually possible to instead fully check whether the collection is already sorted while finding the extreme values of a collection with no additional time wasted (if the first pass is still memory bottlenecked with the extra work of updating min and max). However such an algorithm does not exist in the standard library and writing one would be more tedious than writing the rest of counting sort itself. It is left as an exercise for the reader. 
Since the algorithm only works with integer values, static assertions could be used to prevent users from making obvious type mistakes. In some contexts, a substitution failure with
std::enable_if_t
might be preferred. 
While modern C++ is cool, future C++ could be even cooler: structured bindings and some parts of the Ranges TS would make the algorithm even cleaner.
See also original question in stackoverflow
#59: What's the best way to build a string of delimited items in Java? (Score: 332)
Created: 20080915 Last updated: 20200421
Tags: java, string, algorithm
While working in a Java app, I recently needed to assemble a commadelimited list of values to pass to another web service without knowing how many elements there would be in advance. The best I could come up with off the top of my head was something like this:
public String appendWithDelimiter( String original, String addition, String delimiter ) {
if ( original.equals( "" ) ) {
return addition;
} else {
return original + delimiter + addition;
}
}
String parameterString = "";
if ( condition ) parameterString = appendWithDelimiter( parameterString, "elementName", "," );
if ( anotherCondition ) parameterString = appendWithDelimiter( parameterString, "anotherElementName", "," );
I realize this isn’t particularly efficient, since there are strings being created all over the place, but I was going for clarity more than optimization.
In Ruby, I can do something like this instead, which feels much more elegant:
parameterArray = [];
parameterArray << "elementName" if condition;
parameterArray << "anotherElementName" if anotherCondition;
parameterString = parameterArray.join(",");
But since Java lacks a join command, I couldn’t figure out anything equivalent.
So, what’s the best way to do this in Java?
#59 Best answer 1 of What's the best way to build a string of delimited items in Java? (Score: 578)
Created: 20080915 Last updated: 20150821
Pre Java 8:
Apache’s commons lang is your friend here  it provides a join method very similar to the one you refer to in Ruby:
StringUtils.join(java.lang.Iterable,char)
Java 8:
Java 8 provides joining out of the box via StringJoiner
and String.join()
. The snippets below show how you can use them:
StringJoiner joiner = new StringJoiner(",");
joiner.add("01").add("02").add("03");
String joinedString = joiner.toString(); // "01,02,03"
String.join(CharSequence delimiter, CharSequence... elements))
String joinedString = String.join("  ", "04", "05", "06"); // "04  05  06"
String.join(CharSequence delimiter, Iterable<? extends CharSequence> elements)
List<String> strings = new LinkedList<>();
strings.add("Java");strings.add("is");
strings.add("cool");
String message = String.join(" ", strings);
//message returned is: "Java is cool"
#59 Best answer 2 of What's the best way to build a string of delimited items in Java?(Score: 52)
Created: 20080915
You could write a little joinstyle utility method that works on java.util.Lists
public static String join(List<String> list, String delim) {
StringBuilder sb = new StringBuilder();
String loopDelim = "";
for(String s : list) {
sb.append(loopDelim);
sb.append(s);
loopDelim = delim;
}
return sb.toString();
}
Then use it like so:
List<String> list = new ArrayList<String>();
if( condition ) list.add("elementName");
if( anotherCondition ) list.add("anotherElementName");
join(list, ",");
See also original question in stackoverflow
#60: What is the fastest way to get the value of π? (Score: 330)
Created: 20080801 Last updated: 20190515
Tags: performance, algorithm, languageagnostic, unix, pi
I’m looking for the fastest way to obtain the value of π, as a personal challenge. More specifically, I’m using ways that don’t involve using #define
constants like M_PI
, or hardcoding the number in.
The program below tests the various ways I know of. The inline assembly version is, in theory, the fastest option, though clearly not portable. I’ve included it as a baseline to compare against the other versions. In my tests, with builtins, the 4 * atan(1)
version is fastest on GCC 4.2, because it autofolds the atan(1)
into a constant. With fnobuiltin
specified, the atan2(0, 1)
version is fastest.
Here’s the main testing program (pitimes.c
):
#include <math.h>
#include <stdio.h>
#include <time.h>
#define ITERS 10000000
#define TESTWITH(x) { \
diff = 0.0; \
time1 = clock(); \
for (i = 0; i < ITERS; ++i) \
diff += (x)  M_PI; \
time2 = clock(); \
printf("%s\t=> %e, time => %f\n", #x, diff, diffclock(time2, time1)); \
}
static inline double
diffclock(clock_t time1, clock_t time0)
{
return (double) (time1  time0) / CLOCKS_PER_SEC;
}
int
main()
{
int i;
clock_t time1, time2;
double diff;
/* Warmup. The atan2 case catches GCC's atan folding (which would
* optimise the ``4 * atan(1)  M_PI'' to a noop), if fnobuiltin
* is not used. */
TESTWITH(4 * atan(1))
TESTWITH(4 * atan2(1, 1))
#if defined(__GNUC__) && (defined(__i386__)  defined(__amd64__))
extern double fldpi();
TESTWITH(fldpi())
#endif
/* Actual tests start here. */
TESTWITH(atan2(0, 1))
TESTWITH(acos(1))
TESTWITH(2 * asin(1))
TESTWITH(4 * atan2(1, 1))
TESTWITH(4 * atan(1))
return 0;
}
And the inline assembly stuff (fldpi.c
) that will only work for x86 and x64 systems:
double
fldpi()
{
double pi;
asm("fldpi" : "=t" (pi));
return pi;
}
And a build script that builds all the configurations I’m testing (build.sh
):
#!/bin/sh
gcc O3 Wall c m32 o fldpi32.o fldpi.c
gcc O3 Wall c m64 o fldpi64.o fldpi.c
gcc O3 Wall ffastmath m32 o pitimes132 pitimes.c fldpi32.o
gcc O3 Wall m32 o pitimes232 pitimes.c fldpi32.o lm
gcc O3 Wall fnobuiltin m32 o pitimes332 pitimes.c fldpi32.o lm
gcc O3 Wall ffastmath m64 o pitimes164 pitimes.c fldpi64.o lm
gcc O3 Wall m64 o pitimes264 pitimes.c fldpi64.o lm
gcc O3 Wall fnobuiltin m64 o pitimes364 pitimes.c fldpi64.o lm
Apart from testing between various compiler flags (I’ve compared 32bit against 64bit too because the optimizations are different), I’ve also tried switching the order of the tests around. But still, the atan2(0, 1)
version still comes out on top every time.
#60 Best answer 1 of What is the fastest way to get the value of π? (Score: 208)
Created: 20080802 Last updated: 20180905
The Monte Carlo method, as mentioned, applies some great concepts but it is, clearly, not the fastest, not by a long shot, not by any reasonable measure. Also, it all depends on what kind of accuracy you are looking for. The fastest π I know of is the one with the digits hard coded. Looking at Pi and Pi[PDF], there are a lot of formulae.
Here is a method that converges quickly — about 14 digits per iteration. PiFast, the current fastest application, uses this formula with the FFT. I’ll just write the formula, since the code is straightforward. This formula was almost found by Ramanujan and discovered by Chudnovsky. It is actually how he calculated several billion digits of the number — so it isn’t a method to disregard. The formula will overflow quickly and, since we are dividing factorials, it would be advantageous then to delay such calculations to remove terms.
where,
Below is the Brent–Salamin algorithm. Wikipedia mentions that when a and b are “close enough” then (a + b)² / 4t will be an approximation of π. I’m not sure what “close enough” means, but from my tests, one iteration got 2 digits, two got 7, and three had 15, of course this is with doubles, so it might have an error based on its representation and the true calculation could be more accurate.
let pi_2 iters =
let rec loop_ a b t p i =
if i = 0 then a,b,t,p
else
let a_n = (a +. b) /. 2.0
and b_n = sqrt (a*.b)
and p_n = 2.0 *. p in
let t_n = t . (p *. (a . a_n) *. (a . a_n)) in
loop_ a_n b_n t_n p_n (i  1)
in
let a,b,t,p = loop_ (1.0) (1.0 /. (sqrt 2.0)) (1.0/.4.0) (1.0) iters in
(a +. b) *. (a +. b) /. (4.0 *. t)
Lastly, how about some pi golf (800 digits)? 160 characters!
int a=10000,b,c=2800,d,e,f[2801],g;main(){for(;bc;)f[b++]=a/5;for(;d=0,g=c*2;c=14,printf("%.4d",e+d/a),e=d%a)for(b=c;d+=f[b]*a,f[b]=d%g,d/=g,b;d*=b);}
#60 Best answer 2 of What is the fastest way to get the value of π?(Score: 119)
Created: 20080902 Last updated: 20180903
I really like this program, because it approximates π by looking at its own area.
IOCCC 1988 : westley.c
#define _ F<00FOO; int F=00,OO=00;main(){F_OO();printf("%1.3f\n",4.*F/OO/OO);}F_OO() { ____ _________ ____________ ______________ _______________ _______________ ________________ ________________ ________________ ________________ _______________ _______________ ______________ ____________ ________ ____ }
See also original question in stackoverflow
Notes:
 This page use API to get the relevant data from stackoverflow community.
 Content license on this page is CC BYSA 3.0.
score
=up votes
down votes
.
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that.