Most votes on algorithm questions 2
Read all the top votes questions and answers in a single page.
#11: Ukkonen's suffix tree algorithm in plain English (Score: 1151)
Created: 20120226 Last updated: 20200729
Tags: string, algorithm, datastructures, languageagnostic, suffixtree
I feel a bit thick at this point. I’ve spent days trying to fully wrap my head around suffix tree construction, but because I don’t have a mathematical background, many of the explanations elude me as they start to make excessive use of mathematical symbology. The closest to a good explanation that I’ve found is Fast String Searching With Suffix Trees, but he glosses over various points and some aspects of the algorithm remain unclear.
A stepbystep explanation of this algorithm here on Stack Overflow would be invaluable for many others besides me, I’m sure.
For reference, here’s Ukkonen’s paper on the algorithm: http://www.cs.helsinki.fi/u/ukkonen/SuffixT1withFigs.pdf
My basic understanding, so far:
 I need to iterate through each prefix P of a given string T
 I need to iterate through each suffix S in prefix P and add that to tree
 To add suffix S to the tree, I need to iterate through each character in S, with the iterations consisting of either walking down an existing branch that starts with the same set of characters C in S and potentially splitting an edge into descendent nodes when I reach a differing character in the suffix, OR if there was no matching edge to walk down. When no matching edge is found to walk down for C, a new leaf edge is created for C.
The basic algorithm appears to be O(n^{2}), as is pointed out in most explanations, as we need to step through all of the prefixes, then we need to step through each of the suffixes for each prefix. Ukkonen’s algorithm is apparently unique because of the suffix pointer technique he uses, though I think that is what I’m having trouble understanding.
I’m also having trouble understanding:
 exactly when and how the “active point” is assigned, used and changed
 what is going on with the canonization aspect of the algorithm
 Why the implementations I’ve seen need to “fix” bounding variables that they are using
Here is the completed C# source code. It not only works correctly, but supports automatic canonization and renders a nicer looking text graph of the output. Source code and sample output is at:
Update 20171104
After many years I’ve found a new use for suffix trees, and have implemented the algorithm in JavaScript. Gist is below. It should be bugfree. Dump it into a js file, npm install chalk
from the same location, and then run with node.js to see some colourful output. There’s a stripped down version in the same Gist, without any of the debugging code.
https://gist.github.com/axefrog/c347bf0f5e0723cbd09b1aaed6ec6fc6
#11 Best answer 1 of Ukkonen's suffix tree algorithm in plain English (Score: 2444)
Created: 20120301 Last updated: 20191108
The following is an attempt to describe the Ukkonen algorithm by first showing what it does when the string is simple (i.e. does not contain any repeated characters), and then extending it to the full algorithm.
First, a few preliminary statements.

What we are building, is basically like a search trie. So there is a root node, edges going out of it leading to new nodes, and further edges going out of those, and so forth

But: Unlike in a search trie, the edge labels are not single characters. Instead, each edge is labeled using a pair of integers
[from,to]
. These are pointers into the text. In this sense, each edge carries a string label of arbitrary length, but takes only O(1) space (two pointers).
Basic principle
I would like to first demonstrate how to create the suffix tree of a particularly simple string, a string with no repeated characters:
abc
The algorithm works in steps, from left to right. There is one step for every character of the string. Each step might involve more than one individual operation, but we will see (see the final observations at the end) that the total number of operations is O(n).
So, we start from the left, and first insert only the single character
a
by creating an edge from the root node (on the left) to a leaf,
and labeling it as [0,#]
, which means the edge represents the
substring starting at position 0 and ending at the current end. I
use the symbol #
to mean the current end, which is at position 1
(right after a
).
So we have an initial tree, which looks like this:
And what it means is this:
Now we progress to position 2 (right after b
). Our goal at each step
is to insert all suffixes up to the current position. We do this
by
 expanding the existing
a
edge toab
 inserting one new edge for
b
In our representation this looks like
And what it means is:
We observe two things:
 The edge representation for
ab
is the same as it used to be in the initial tree:[0,#]
. Its meaning has automatically changed because we updated the current position#
from 1 to 2.  Each edge consumes O(1) space, because it consists of only two pointers into the text, regardless of how many characters it represents.
Next we increment the position again and update the tree by appending
a c
to every existing edge and inserting one new edge for the new
suffix c
.
In our representation this looks like
And what it means is:
We observe:
 The tree is the correct suffix tree up to the current position after each step
 There are as many steps as there are characters in the text
 The amount of work in each step is O(1), because all existing edges
are updated automatically by incrementing
#
, and inserting the one new edge for the final character can be done in O(1) time. Hence for a string of length n, only O(n) time is required.
First extension: Simple repetitions
Of course this works so nicely only because our string does not contain any repetitions. We now look at a more realistic string:
abcabxabcd
It starts with abc
as in the previous example, then ab
is repeated
and followed by x
, and then abc
is repeated followed by d
.
Steps 1 through 3: After the first 3 steps we have the tree from the previous example:
Step 4: We move #
to position 4. This implicitly updates all existing
edges to this:
and we need to insert the final suffix of the current step, a
, at
the root.
Before we do this, we introduce two more variables (in addition to
#
), which of course have been there all the time but we haven’t used
them so far:
 The active point, which is a triple
(active_node,active_edge,active_length)
 The
remainder
, which is an integer indicating how many new suffixes we need to insert
The exact meaning of these two will become clear soon, but for now let’s just say:
 In the simple
abc
example, the active point was always(root,'\0x',0)
, i.e.active_node
was the root node,active_edge
was specified as the null character'\0x'
, andactive_length
was zero. The effect of this was that the one new edge that we inserted in every step was inserted at the root node as a freshly created edge. We will see soon why a triple is necessary to represent this information.  The
remainder
was always set to 1 at the beginning of each step. The meaning of this was that the number of suffixes we had to actively insert at the end of each step was 1 (always just the final character).
Now this is going to change. When we insert the current final
character a
at the root, we notice that there is already an outgoing
edge starting with a
, specifically: abca
. Here is what we do in
such a case:
 We do not insert a fresh edge
[4,#]
at the root node. Instead we simply notice that the suffixa
is already in our tree. It ends in the middle of a longer edge, but we are not bothered by that. We just leave things the way they are.  We set the active point to
(root,'a',1)
. That means the active point is now somewhere in the middle of outgoing edge of the root node that starts witha
, specifically, after position 1 on that edge. We notice that the edge is specified simply by its first charactera
. That suffices because there can be only one edge starting with any particular character (confirm that this is true after reading through the entire description).  We also increment
remainder
, so at the beginning of the next step it will be 2.
Observation: When the final suffix we need to insert is found to
exist in the tree already, the tree itself is not changed at all (we only update the active point and remainder
). The tree
is then not an accurate representation of the suffix tree up to the
current position any more, but it contains all suffixes (because the final
suffix a
is contained implicitly). Hence, apart from updating the
variables (which are all of fixed length, so this is O(1)), there was
no work done in this step.
Step 5: We update the current position #
to 5. This
automatically updates the tree to this:
And because remainder
is 2, we need to insert two final
suffixes of the current position: ab
and b
. This is basically because:
 The
a
suffix from the previous step has never been properly inserted. So it has remained, and since we have progressed one step, it has now grown froma
toab
.  And we need to insert the new final edge
b
.
In practice this means that we go to the active point (which points to
behind the a
on what is now the abcab
edge), and insert the
current final character b
. But: Again, it turns out that b
is
also already present on that same edge.
So, again, we do not change the tree. We simply:
 Update the active point to
(root,'a',2)
(same node and edge as before, but now we point to behind theb
)  Increment the
remainder
to 3 because we still have not properly inserted the final edge from the previous step, and we don’t insert the current final edge either.
To be clear: We had to insert ab
and b
in the current step, but
because ab
was already found, we updated the active point and did
not even attempt to insert b
. Why? Because if ab
is in the tree,
every suffix of it (including b
) must be in the tree,
too. Perhaps only implicitly, but it must be there, because of the
way we have built the tree so far.
We proceed to step 6 by incrementing #
. The tree is
automatically updated to:
Because remainder
is 3, we have to insert abx
, bx
and
x
. The active point tells us where ab
ends, so we only need to
jump there and insert the x
. Indeed, x
is not there yet, so we
split the abcabx
edge and insert an internal node:
The edge representations are still pointers into the text, so splitting and inserting an internal node can be done in O(1) time.
So we have dealt with abx
and decrement remainder
to 2. Now we
need to insert the next remaining suffix, bx
. But before we do that
we need to update the active point. The rule for this, after splitting
and inserting an edge, will be called Rule 1 below, and it applies whenever the
active_node
is root (we will learn rule 3 for other cases further
below). Here is rule 1:
After an insertion from root,
active_node
remains rootactive_edge
is set to the first character of the new suffix we need to insert, i.e.b
active_length
is reduced by 1
Hence, the new activepoint triple (root,'b',1)
indicates that the
next insert has to be made at the bcabx
edge, behind 1 character,
i.e. behind b
. We can identify the insertion point in O(1) time and
check whether x
is already present or not. If it was present, we
would end the current step and leave everything the way it is. But x
is not present, so we insert it by splitting the edge:
Again, this took O(1) time and we update remainder
to 1 and the
active point to (root,'x',0)
as rule 1 states.
But there is one more thing we need to do. We’ll call this Rule 2:
If we split an edge and insert a new node, and if that is not the first node created during the current step, we connect the previously inserted node and the new node through a special pointer, a suffix link. We will later see why that is useful. Here is what we get, the suffix link is represented as a dotted edge:
We still need to insert the final suffix of the current step,
x
. Since the active_length
component of the active node has fallen
to 0, the final insert is made at the root directly. Since there is no
outgoing edge at the root node starting with x
, we insert a new
edge:
As we can see, in the current step all remaining inserts were made.
We proceed to step 7 by setting #
=7, which automatically appends the next character,
a
, to all leaf edges, as always. Then we attempt to insert the new final
character to the active point (the root), and find that it is there
already. So we end the current step without inserting anything and
update the active point to (root,'a',1)
.
In step 8, #
=8, we append b
, and as seen before, this only
means we update the active point to (root,'a',2)
and increment remainder
without doing
anything else, because b
is already present. However, we notice (in O(1) time) that the active point
is now at the end of an edge. We reflect this by resetting it to
(node1,'\0x',0)
. Here, I use node1
to refer to the
internal node the ab
edge ends at.
Then, in step #
=9, we need to insert ‘c’ and this will help us to
understand the final trick:
Second extension: Using suffix links
As always, the #
update appends c
automatically to the leaf edges
and we go to the active point to see if we can insert ‘c’. It turns
out ‘c’ exists already at that edge, so we set the active point to
(node1,'c',1)
, increment remainder
and do nothing else.
Now in step #
=10, remainder
is 4, and so we first need to insert
abcd
(which remains from 3 steps ago) by inserting d
at the active
point.
Attempting to insert d
at the active point causes an edge split in
O(1) time:
The active_node
, from which the split was initiated, is marked in
red above. Here is the final rule, Rule 3:
After splitting an edge from an
active_node
that is not the root node, we follow the suffix link going out of that node, if there is any, and reset theactive_node
to the node it points to. If there is no suffix link, we set theactive_node
to the root.active_edge
andactive_length
remain unchanged.
So the active point is now (node2,'c',1)
, and node2
is marked in
red below:
Since the insertion of abcd
is complete, we decrement remainder
to
3 and consider the next remaining suffix of the current step,
bcd
. Rule 3 has set the active point to just the right node and edge
so inserting bcd
can be done by simply inserting its final character
d
at the active point.
Doing this causes another edge split, and because of rule 2, we must create a suffix link from the previously inserted node to the new one:
We observe: Suffix links enable us to reset the active point so we
can make the next remaining insert at O(1) effort. Look at the
graph above to confirm that indeed node at label ab
is linked to
the node at b
(its suffix), and the node at abc
is linked to
bc
.
The current step is not finished yet. remainder
is now 2, and we
need to follow rule 3 to reset the active point again. Since the
current active_node
(red above) has no suffix link, we reset to
root. The active point is now (root,'c',1)
.
Hence the next insert occurs at the one outgoing edge of the root node
whose label starts with c
: cabxabcd
, behind the first character,
i.e. behind c
. This causes another split:
And since this involves the creation of a new internal node,we follow rule 2 and set a new suffix link from the previously created internal node:
(I am using Graphviz Dot for these little graphs. The new suffix link caused dot to rearrange the existing edges, so check carefully to confirm that the only thing that was inserted above is a new suffix link.)
With this, remainder
can be set to 1 and since the active_node
is
root, we use rule 1 to update the active point to (root,'d',0)
. This
means the final insert of the current step is to insert a single d
at root:
That was the final step and we are done. There are number of final observations, though:

In each step we move
#
forward by 1 position. This automatically updates all leaf nodes in O(1) time. 
But it does not deal with a) any suffixes remaining from previous steps, and b) with the one final character of the current step.

remainder
tells us how many additional inserts we need to make. These inserts correspond onetoone to the final suffixes of the string that ends at the current position#
. We consider one after the other and make the insert. Important: Each insert is done in O(1) time since the active point tells us exactly where to go, and we need to add only one single character at the active point. Why? Because the other characters are contained implicitly (otherwise the active point would not be where it is). 
After each such insert, we decrement
remainder
and follow the suffix link if there is any. If not we go to root (rule 3). If we are at root already, we modify the active point using rule 1. In any case, it takes only O(1) time. 
If, during one of these inserts, we find that the character we want to insert is already there, we don’t do anything and end the current step, even if
remainder
>0. The reason is that any inserts that remain will be suffixes of the one we just tried to make. Hence they are all implicit in the current tree. The fact thatremainder
>0 makes sure we deal with the remaining suffixes later. 
What if at the end of the algorithm
remainder
>0? This will be the case whenever the end of the text is a substring that occurred somewhere before. In that case we must append one extra character at the end of the string that has not occurred before. In the literature, usually the dollar sign$
is used as a symbol for that. Why does that matter? –> If later we use the completed suffix tree to search for suffixes, we must accept matches only if they end at a leaf. Otherwise we would get a lot of spurious matches, because there are many strings implicitly contained in the tree that are not actual suffixes of the main string. Forcingremainder
to be 0 at the end is essentially a way to ensure that all suffixes end at a leaf node. However, if we want to use the tree to search for general substrings, not only suffixes of the main string, this final step is indeed not required, as suggested by the OP’s comment below. 
So what is the complexity of the entire algorithm? If the text is n characters in length, there are obviously n steps (or n+1 if we add the dollar sign). In each step we either do nothing (other than updating the variables), or we make
remainder
inserts, each taking O(1) time. Sinceremainder
indicates how many times we have done nothing in previous steps, and is decremented for every insert that we make now, the total number of times we do something is exactly n (or n+1). Hence, the total complexity is O(n). 
However, there is one small thing that I did not properly explain: It can happen that we follow a suffix link, update the active point, and then find that its
active_length
component does not work well with the newactive_node
. For example, consider a situation like this:
(The dashed lines indicate the rest of the tree. The dotted line is a suffix link.)
Now let the active point be (red,'d',3)
, so it points to the place
behind the f
on the defg
edge. Now assume we made the necessary
updates and now follow the suffix link to update the active point
according to rule 3. The new active point is (green,'d',3)
. However,
the d
edge going out of the green node is de
, so it has only 2
characters. In order to find the correct active point, we obviously
need to follow that edge to the blue node and reset to (blue,'f',1)
.
In a particularly bad case, the active_length
could be as large as
remainder
, which can be as large as n. And it might very well happen
that to find the correct active point, we need not only jump over one
internal node, but perhaps many, up to n in the worst case. Does that
mean the algorithm has a hidden O(n^{2}) complexity, because
in each step remainder
is generally O(n), and the postadjustments
to the active node after following a suffix link could be O(n), too?
No. The reason is that if indeed we have to adjust the active point
(e.g. from green to blue as above), that brings us to a new node that
has its own suffix link, and active_length
will be reduced. As
we follow down the chain of suffix links we make the remaining inserts, active_length
can only
decrease, and the number of activepoint adjustments we can make on
the way can’t be larger than active_length
at any given time. Since
active_length
can never be larger than remainder
, and remainder
is O(n) not only in every single step, but the total sum of increments
ever made to remainder
over the course of the entire process is
O(n) too, the number of active point adjustments is also bounded by
O(n).
#11 Best answer 2 of Ukkonen's suffix tree algorithm in plain English(Score: 134)
Created: 20130129 Last updated: 20161010
I tried to implement the Suffix Tree with the approach given in jogojapan’s answer, but it didn’t work for some cases due to wording used for the rules. Moreover, I’ve mentioned that nobody managed to implement an absolutely correct suffix tree using this approach. Below I will write an “overview” of jogojapan’s answer with some modifications to the rules. I will also describe the case when we forget to create important suffix links.
Additional variables used
 active point  a triple (active_node; active_edge; active_length), showing from where we must start inserting a new suffix.
 remainder  shows the number of suffixes we must add explicitly. For instance, if our word is ‘abcaabca’, and remainder = 3, it means we must process 3 last suffixes: bca, ca and a.
Let’s use a concept of an internal node  all the nodes, except the root and the leafs are internal nodes.
Observation 1
When the final suffix we need to insert is found to exist in the tree already, the tree itself is not changed at all (we only update the active point
and remainder
).
Observation 2
If at some point active_length
is greater or equal to the length of current edge (edge_length
), we move our active point
down until edge_length
is strictly greater than active_length
.
Now, let’s redefine the rules:
Rule 1
If after an insertion from the active node = root, the active length is greater than 0, then:
 active node is not changed
 active length is decremented
 active edge is shifted right (to the first character of the next suffix we must insert)
Rule 2
If we create a new internal node OR make an inserter from an internal node, and this is not the first SUCH internal node at current step, then we link the previous SUCH node with THIS one through a suffix link.
This definition of the Rule 2
is different from jogojapan', as here we take into account not only the newly created internal nodes, but also the internal nodes, from which we make an insertion.
Rule 3
After an insert from the active node which is not the root node, we must follow the suffix link and set the active node to the node it points to. If there is no a suffix link, set the active node to the root node. Either way, active edge and active length stay unchanged.
In this definition of Rule 3
we also consider the inserts of leaf nodes (not only splitnodes).
And finally, Observation 3:
When the symbol we want to add to the tree is already on the edge, we, according to Observation 1
, update only active point
and remainder
, leaving the tree unchanged. BUT if there is an internal node marked as needing suffix link, we must connect that node with our current active node
through a suffix link.
Let’s look at the example of a suffix tree for cdddcdc if we add a suffix link in such case and if we don’t:

If we DON’T connect the nodes through a suffix link:
 before adding the last letter c:
 after adding the last letter c:

If we DO connect the nodes through a suffix link:
 before adding the last letter c:
 after adding the last letter **c**:
Seems like there is no significant difference: in the second case there are two more suffix links. But these suffix links are correct, and one of them  from the blue node to the red one  is very important for our approach with active point. The problem is that if we don’t put a suffix link here, later, when we add some new letters to the tree, we might omit adding some nodes to the tree due to the Rule 3
, because, according to it, if there’s no a suffix link, then we must put the active_node
to the root.
When we were adding the last letter to the tree, the red node had already existed before we made an insert from the blue node (the edge labled ‘c’). As there was an insert from the blue node, we mark it as needing a suffix link. Then, relying on the active point approach, the active node
was set to the red node. But we don’t make an insert from the red node, as the letter ‘c’ is already on the edge. Does it mean that the blue node must be left without a suffix link? No, we must connect the blue node with the red one through a suffix link. Why is it correct? Because the active point approach guarantees that we get to a right place, i.e., to the next place where we must process an insert of a shorter suffix.
Finally, here are my implementations of the Suffix Tree:
Hope that this “overview” combined with jogojapan’s detailed answer will help somebody to implement his own Suffix Tree.
See also original question in stackoverflow
#12: Calculate distance between two latitudelongitude points? (Haversine formula) (Score: 983)
Created: 20080826 Last updated: 20170911
Tags: algorithm, math, maps, latitudelongitude, haversine
How do I calculate the distance between two points specified by latitude and longitude?
For clarification, I’d like the distance in kilometers; the points use the WGS84 system and I’d like to understand the relative accuracies of the approaches available.
#12 Best answer 1 of Calculate distance between two latitudelongitude points? (Haversine formula) (Score: 1227)
Created: 20080826 Last updated: 20140731
This link might be helpful to you, as it details the use of the Haversine formula to calculate the distance.
Excerpt:
This script [in Javascript] calculates greatcircle distances between the two points – that is, the shortest distance over the earth’s surface – using the ‘Haversine’ formula.
function getDistanceFromLatLonInKm(lat1,lon1,lat2,lon2) {
var R = 6371; // Radius of the earth in km
var dLat = deg2rad(lat2lat1); // deg2rad below
var dLon = deg2rad(lon2lon1);
var a =
Math.sin(dLat/2) * Math.sin(dLat/2) +
Math.cos(deg2rad(lat1)) * Math.cos(deg2rad(lat2)) *
Math.sin(dLon/2) * Math.sin(dLon/2)
;
var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1a));
var d = R * c; // Distance in km
return d;
}
function deg2rad(deg) {
return deg * (Math.PI/180)
}
#12 Best answer 2 of Calculate distance between two latitudelongitude points? (Haversine formula)(Score: 414)
Created: 20140207 Last updated: 20200313
I needed to calculate a lot of distances between the points for my project, so I went ahead and tried to optimize the code, I have found here. On average in different browsers my new implementation runs 2 times faster than the most upvoted answer.
function distance(lat1, lon1, lat2, lon2) {
var p = 0.017453292519943295; // Math.PI / 180
var c = Math.cos;
var a = 0.5  c((lat2  lat1) * p)/2 +
c(lat1 * p) * c(lat2 * p) *
(1  c((lon2  lon1) * p))/2;
return 12742 * Math.asin(Math.sqrt(a)); // 2 * R; R = 6371 km
}
You can play with my jsPerf and see the results here.
Recently I needed to do the same in python, so here is a python implementation:
from math import cos, asin, sqrt, pi
def distance(lat1, lon1, lat2, lon2):
p = pi/180
a = 0.5  cos((lat2lat1)*p)/2 + cos(lat1*p) * cos(lat2*p) * (1cos((lon2lon1)*p))/2
return 12742 * asin(sqrt(a)) #2*R*asin...
And for the sake of completeness: Haversine on wiki.
See also original question in stackoverflow
#13: How to find time complexity of an algorithm (Score: 952)
Created: 20120614 Last updated: 20171027
Tags: algorithm, timecomplexity, complexitytheory
The Question
How to find time complexity of an algorithm?
What have I done before posting a question on SO ?
I have gone through this, this and many other links
But no where I was able to find a clear and straight forward explanation for how to calculate time complexity.
What do I know ?
Say for a code as simple as the one below:
char h = 'y'; // This will be executed 1 time
int abc = 0; // This will be executed 1 time
Say for a loop like the one below:
for (int i = 0; i < N; i++) {
Console.Write('Hello World !');
}
int i=0; This will be executed only once.
The time is actually calculated to i=0
and not the declaration.
i < N; This will be executed N+1 times
i++ ; This will be executed N times
So the number of operations required by this loop are
{1+(N+1)+N} = 2N+2
Note: This still may be wrong, as I am not confident about my understanding on calculating time complexity
What I want to know ?
Ok, so these small basic calculations I think I know, but in most cases I have seen the time complexity as
O(N), O(n2), O(log n), O(n!)…. and many other,
Can anyone help me understand how does one calculate time complexity of an algorithm? I am sure there are plenty of newbies like me wanting to know this.
#13 Best answer 1 of How to find time complexity of an algorithm (Score: 424)
Created: 20120614 Last updated: 20170329
How to find time complexity of an algorithm
You add up how many machine instructions it will execute as a function of the size of its input, and then simplify the expression to the largest (when N is very large) term and can include any simplifying constant factor.
For example, lets see how we simplify 2N + 2
machine instructions to describe this as just O(N)
.
Why do we remove the two 2
s ?
We are interested in the performance of the algorithm as N becomes large.
Consider the two terms 2N and 2.
What is the relative influence of these two terms as N becomes large? Suppose N is a million.
Then the first term is 2 million and the second term is only 2.
For this reason, we drop all but the largest terms for large N.
So, now we have gone from 2N + 2
to 2N
.
Traditionally, we are only interested in performance up to constant factors.
This means that we don’t really care if there is some constant multiple of difference in performance when N is large. The unit of 2N is not welldefined in the first place anyway. So we can multiply or divide by a constant factor to get to the simplest expression.
So 2N
becomes just N
.
#13 Best answer 2 of How to find time complexity of an algorithm(Score: 417)
Created: 20130118 Last updated: 20200504
This is an excellent article : http://www.daniweb.com/softwaredevelopment/computerscience/threads/13488/timecomplexityofalgorithm
The below answer is copied from above (in case the excellent link goes bust)
The most common metric for calculating time complexity is Big O notation. This removes all constant factors so that the running time can be estimated in relation to N as N approaches infinity. In general you can think of it like this:
statement;
Is constant. The running time of the statement will not change in relation to N.
for ( i = 0; i < N; i++ )
statement;
Is linear. The running time of the loop is directly proportional to N. When N doubles, so does the running time.
for ( i = 0; i < N; i++ ) {
for ( j = 0; j < N; j++ )
statement;
}
Is quadratic. The running time of the two loops is proportional to the square of N. When N doubles, the running time increases by N * N.
while ( low <= high ) {
mid = ( low + high ) / 2;
if ( target < list[mid] )
high = mid  1;
else if ( target > list[mid] )
low = mid + 1;
else break;
}
Is logarithmic. The running time of the algorithm is proportional to the number of times N can be divided by 2. This is because the algorithm divides the working area in half with each iteration.
void quicksort ( int list[], int left, int right )
{
int pivot = partition ( list, left, right );
quicksort ( list, left, pivot  1 );
quicksort ( list, pivot + 1, right );
}
Is N * log ( N ). The running time consists of N loops (iterative or recursive) that are logarithmic, thus the algorithm is a combination of linear and logarithmic.
In general, doing something with every item in one dimension is linear, doing something with every item in two dimensions is quadratic, and dividing the working area in half is logarithmic. There are other Big O measures such as cubic, exponential, and square root, but they’re not nearly as common. Big O notation is described as O ( <type> )
where <type>
is the measure. The quicksort algorithm would be described as O ( N * log ( N ) )
.
Note that none of this has taken into account best, average, and worst case measures. Each would have its own Big O notation. Also note that this is a VERY simplistic explanation. Big O is the most common, but it’s also more complex that I’ve shown. There are also other notations such as big omega, little o, and big theta. You probably won’t encounter them outside of an algorithm analysis course. ;)
See also original question in stackoverflow
#14: Big O, how do you calculate/approximate it? (Score: 918)
Created: 20080806 Last updated: 20191219
Tags: algorithm, optimization, complexitytheory, bigo, performance
Most people with a degree in CS will certainly know what Big O stands for. It helps us to measure how well an algorithm scales.
But I’m curious, how do you calculate or approximate the complexity of your algorithms?
#14 Best answer 1 of Big O, how do you calculate/approximate it? (Score: 1504)
Created: 20110131 Last updated: 20190305
I’ll do my best to explain it here on simple terms, but be warned that this topic takes my students a couple of months to finally grasp. You can find more information on the Chapter 2 of the Data Structures and Algorithms in Java book.
There is no mechanical procedure that can be used to get the BigOh.
As a “cookbook”, to obtain the BigOh from a piece of code you first need to realize that you are creating a math formula to count how many steps of computations get executed given an input of some size.
The purpose is simple: to compare algorithms from a theoretical point of view, without the need to execute the code. The lesser the number of steps, the faster the algorithm.
For example, let’s say you have this piece of code:
int sum(int* data, int N) {
int result = 0; // 1
for (int i = 0; i < N; i++) { // 2
result += data[i]; // 3
}
return result; // 4
}
This function returns the sum of all the elements of the array, and we want to create a formula to count the computational complexity of that function:
Number_Of_Steps = f(N)
So we have f(N)
, a function to count the number of computational steps. The input of the function is the size of the structure to process. It means that this function is called such as:
Number_Of_Steps = f(data.length)
The parameter N
takes the data.length
value. Now we need the actual definition of the function f()
. This is done from the source code, in which each interesting line is numbered from 1 to 4.
There are many ways to calculate the BigOh. From this point forward we are going to assume that every sentence that doesn’t depend on the size of the input data takes a constant C
number computational steps.
We are going to add the individual number of steps of the function, and neither the local variable declaration nor the return statement depends on the size of the data
array.
That means that lines 1 and 4 takes C amount of steps each, and the function is somewhat like this:
f(N) = C + ??? + C
The next part is to define the value of the for
statement. Remember that we are counting the number of computational steps, meaning that the body of the for
statement gets executed N
times. That’s the same as adding C
, N
times:
f(N) = C + (C + C + ... + C) + C = C + N * C + C
There is no mechanical rule to count how many times the body of the for
gets executed, you need to count it by looking at what does the code do. To simplify the calculations, we are ignoring the variable initialization, condition and increment parts of the for
statement.
To get the actual BigOh we need the Asymptotic analysis of the function. This is roughly done like this:
 Take away all the constants
C
.  From
f()
get the polynomium in itsstandard form
.  Divide the terms of the polynomium and sort them by the rate of growth.
 Keep the one that grows bigger when
N
approachesinfinity
.
Our f()
has two terms:
f(N) = 2 * C * N ^ 0 + 1 * C * N ^ 1
Taking away all the C
constants and redundant parts:
f(N) = 1 + N ^ 1
Since the last term is the one which grows bigger when f()
approaches infinity (think on limits) this is the BigOh argument, and the sum()
function has a BigOh of:
O(N)
There are a few tricks to solve some tricky ones: use summations whenever you can.
As an example, this code can be easily solved using summations:
for (i = 0; i < 2*n; i += 2) { // 1
for (j=n; j > i; j) { // 2
foo(); // 3
}
}
The first thing you needed to be asked is the order of execution of foo()
. While the usual is to be O(1)
, you need to ask your professors about it. O(1)
means (almost, mostly) constant C
, independent of the size N
.
The for
statement on the sentence number one is tricky. While the index ends at 2 * N
, the increment is done by two. That means that the first for
gets executed only N
steps, and we need to divide the count by two.
f(N) = Summation(i from 1 to 2 * N / 2)( ... ) =
= Summation(i from 1 to N)( ... )
The sentence number two is even trickier since it depends on the value of i
. Take a look: the index i takes the values: 0, 2, 4, 6, 8, …, 2 * N, and the second for
get executed: N times the first one, N  2 the second, N  4 the third… up to the N / 2 stage, on which the second for
never gets executed.
On formula, that means:
f(N) = Summation(i from 1 to N)( Summation(j = ???)( ) )
Again, we are counting the number of steps. And by definition, every summation should always start at one, and end at a number biggerorequal than one.
f(N) = Summation(i from 1 to N)( Summation(j = 1 to (N  (i  1) * 2)( C ) )
(We are assuming that foo()
is O(1)
and takes C
steps.)
We have a problem here: when i
takes the value N / 2 + 1
upwards, the inner Summation ends at a negative number! That’s impossible and wrong. We need to split the summation in two, being the pivotal point the moment i
takes N / 2 + 1
.
f(N) = Summation(i from 1 to N / 2)( Summation(j = 1 to (N  (i  1) * 2)) * ( C ) ) + Summation(i from 1 to N / 2) * ( C )
Since the pivotal moment i > N / 2
, the inner for
won’t get executed, and we are assuming a constant C execution complexity on its body.
Now the summations can be simplified using some identity rules:
 Summation(w from 1 to N)( C ) = N * C
 Summation(w from 1 to N)( A (+/) B ) = Summation(w from 1 to N)( A ) (+/) Summation(w from 1 to N)( B )
 Summation(w from 1 to N)( w * C ) = C * Summation(w from 1 to N)( w ) (C is a constant, independent of
w
)  Summation(w from 1 to N)( w ) = (N * (N + 1)) / 2
Applying some algebra:
f(N) = Summation(i from 1 to N / 2)( (N  (i  1) * 2) * ( C ) ) + (N / 2)( C )
f(N) = C * Summation(i from 1 to N / 2)( (N  (i  1) * 2)) + (N / 2)( C )
f(N) = C * (Summation(i from 1 to N / 2)( N )  Summation(i from 1 to N / 2)( (i  1) * 2)) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 )  2 * Summation(i from 1 to N / 2)( i  1 )) + (N / 2)( C )
=> Summation(i from 1 to N / 2)( i  1 ) = Summation(i from 1 to N / 2  1)( i )
f(N) = C * (( N ^ 2 / 2 )  2 * Summation(i from 1 to N / 2  1)( i )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 )  2 * ( (N / 2  1) * (N / 2  1 + 1) / 2) ) + (N / 2)( C )
=> (N / 2  1) * (N / 2  1 + 1) / 2 =
(N / 2  1) * (N / 2) / 2 =
((N ^ 2 / 4)  (N / 2)) / 2 =
(N ^ 2 / 8)  (N / 4)
f(N) = C * (( N ^ 2 / 2 )  2 * ( (N ^ 2 / 8)  (N / 4) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 )  ( (N ^ 2 / 4)  (N / 2) )) + (N / 2)( C )
f(N) = C * (( N ^ 2 / 2 )  (N ^ 2 / 4) + (N / 2)) + (N / 2)( C )
f(N) = C * ( N ^ 2 / 4 ) + C * (N / 2) + C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + 2 * C * (N / 2)
f(N) = C * ( N ^ 2 / 4 ) + C * N
f(N) = C * 1/4 * N ^ 2 + C * N
And the BigOh is:
O(N²)
#14 Best answer 2 of Big O, how do you calculate/approximate it?(Score: 205)
Created: 20080806
Big O gives the upper bound for time complexity of an algorithm. It is usually used in conjunction with processing data sets (lists) but can be used elsewhere.
A few examples of how it’s used in C code.
Say we have an array of n elements
int array[n];
If we wanted to access the first element of the array this would be O(1) since it doesn’t matter how big the array is, it always takes the same constant time to get the first item.
x = array[0];
If we wanted to find a number in the list:
for(int i = 0; i < n; i++){
if(array[i] == numToFind){ return i; }
}
This would be O(n) since at most we would have to look through the entire list to find our number. The BigO is still O(n) even though we might find our number the first try and run through the loop once because BigO describes the upper bound for an algorithm (omega is for lower bound and theta is for tight bound).
When we get to nested loops:
for(int i = 0; i < n; i++){
for(int j = i; j < n; j++){
array[j] += 2;
}
}
This is O(n^2) since for each pass of the outer loop ( O(n) ) we have to go through the entire list again so the n’s multiply leaving us with n squared.
This is barely scratching the surface but when you get to analyzing more complex algorithms complex math involving proofs comes into play. Hope this familiarizes you with the basics at least though.
See also original question in stackoverflow
#15: How to count the number of set bits in a 32bit integer? (Score: 918)
Created: 20080920 Last updated: 20140918
Tags: algorithm, binary, bitmanipulation, hammingweight, iec10967
8 bits representing the number 7 look like this:
00000111
Three bits are set.
What are algorithms to determine the number of set bits in a 32bit integer?
#15 Best answer 1 of How to count the number of set bits in a 32bit integer? (Score: 898)
Created: 20080920 Last updated: 20210313
This is known as the ‘Hamming Weight’, ‘popcount’ or ‘sideways addition’.
Some CPUs have a single builtin instruction to do it and others have parallel instructions which act on bit vectors. Instructions like x86’s popcnt
(on CPUs where it’s supported) will almost certainly be fastest for a single integer. Some other architectures may have a slow instruction implemented with a microcoded loop that tests a bit per cycle (citation needed  hardware popcount is normally fast if it exists at all.).
The ‘best’ algorithm really depends on which CPU you are on and what your usage pattern is.
Your compiler may know how to do something that’s good for the specific CPU you’re compiling for, e.g. C++20 std::popcount()
, or C++ std::bitset<32>::count()
, as a portable way to access builtin / intrinsic functions (see another answer on this question). But your compiler’s choice of fallback for target CPUs that don’t have hardware popcnt might not be optimal for your usecase. Or your language (e.g. C) might not expose any portable function that could use a CPUspecific popcount when there is one.
Portable algorithms that don’t need (or benefit from) any HW support
A prepopulated table lookup method can be very fast if your CPU has a large cache and you are doing lots of these operations in a tight loop. However it can suffer because of the expense of a ‘cache miss’, where the CPU has to fetch some of the table from main memory. (Look up each byte separately to keep the table small.) If you want popcount for a contiguous range of numbers, only the low byte is changing for groups of 256 numbers, making this very good.
If you know that your bytes will be mostly 0’s or mostly 1’s then there are efficient algorithms for these scenarios, e.g. clearing the lowest set with a bithack in a loop until it becomes zero.
I believe a very good general purpose algorithm is the following, known as ‘parallel’ or ‘variableprecision SWAR algorithm’. I have expressed this in a Clike pseudo language, you may need to adjust it to work for a particular language (e.g. using uint32_t for C++ and »> in Java):
int numberOfSetBits(uint32_t i)
{
// Java: use int, and use >>> instead of >>
// C or C++: use uint32_t
i = i  ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
return (((i + (i >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
For JavaScript: coerce to integer with 0
for performance: change the first line to i = (i0)  ((i >> 1) & 0x55555555);
This has the best worstcase behaviour of any of the algorithms discussed, so will efficiently deal with any usage pattern or values you throw at it. (Its performance is not datadependent on normal CPUs where all integer operations including multiply are constanttime. It doesn’t get any faster with “simple” inputs, but it’s still pretty decent.)
References:
 https://graphics.stanford.edu/~seander/bithacks.html
 https://en.wikipedia.org/wiki/Hamming_weight
 http://gurmeet.net/puzzles/fastbitcountingroutines/
 http://aggregate.ee.engr.uky.edu/MAGIC/#Population%20Count%20(Ones%20Count)
How this SWAR bithack works:
i = i  ((i >> 1) & 0x55555555);
The first step is an optimized version of masking to isolate the odd / even bits, shifting to line them up, and adding. This effectively does 16 separate additions in 2bit accumulators (SWAR = SIMD Within A Register). Like (i & 0x55555555) + ((i>>1) & 0x55555555)
.
The next step takes the odd/even eight of those 16x 2bit accumulators and adds again, producing 8x 4bit sums. The i  ...
optimization isn’t possible this time so it does just mask before / after shifting. Using the same 0x33...
constant both times instead of 0xccc...
before shifting is a good thing when compiling for ISAs that need to construct 32bit constants in registers separately.
The final shiftandadd step of (i + (i >> 4)) & 0x0F0F0F0F
widens to 4x 8bit accumulators. It masks after adding instead of before, because the maximum value in any 4bit accumulator is 4
, if all 4 bits of the corresponding input bits were set. 4+4 = 8 which still fits in 4 bits, so carry between nibble elements is impossible in i + (i >> 4)
.
So far this is just fairly normal SIMD using SWAR techniques with a few clever optimizations. Continuing on with the same pattern for 2 more steps can widen to 2x 16bit then 1x 32bit counts. But there is a more efficient way on machines with fast hardware multiply:
Once we have few enough “elements”, a multiply with a magic constant can sum all the elements into the top element. In this case byte elements. Multiply is done by leftshifting and adding, so a multiply of x * 0x01010101
results in x + (x<<8) + (x<<16) + (x<<24)
. Our 8bit elements are wide enough (and holding small enough counts) that this doesn’t produce carry into that top 8 bits.
A 64bit version of this can do 8x 8bit elements in a 64bit integer with a 0x0101010101010101 multiplier, and extract the high byte with >>56
. So it doesn’t take any extra steps, just wider constants. This is what GCC uses for __builtin_popcountll
on x86 systems when the hardware popcnt
instruction isn’t enabled. If you can use builtins or intrinsics for this, do so to give the compiler a chance to do targetspecific optimizations.
With full SIMD for wider vectors (e.g. counting a whole array)
This bitwiseSWAR algorithm could parallelize to be done in multiple vector elements at once, instead of in a single integer register, for a speedup on CPUs with SIMD but no usable popcount instruction. (e.g. x8664 code that has to run on any CPU, not just Nehalem or later.)
However, the best way to use vector instructions for popcount is usually by using a variableshuffle to do a tablelookup for 4 bits at a time of each byte in parallel. (The 4 bits index a 16 entry table held in a vector register).
On Intel CPUs, the hardware 64bit popcnt instruction can outperform an SSSE3 PSHUFB
bitparallel implementation by about a factor of 2, but only if your compiler gets it just right. Otherwise SSE can come out significantly ahead. Newer compiler versions are aware of the popcnt false dependency problem on Intel.

https://github.com/WojciechMula/ssepopcount stateoftheart x86 SIMD popcount for SSSE3, AVX2, AVX512BW, AVX512VBMI, or AVX512 VPOPCNT. Using HarleySeal across vectors to defer popcount within an element. (Also ARM NEON)

related: https://github.com/mklarqvist/positionalpopcount  separate counts for each bitposition of multiple 8, 16, 32, or 64bit integers. (Again, x86 SIMD including AVX512 which is really good at this, with
vpternlogd
making HarleySeal very good.)
#15 Best answer 2 of How to count the number of set bits in a 32bit integer?(Score: 225)
Created: 20080920 Last updated: 20210313
Some languages portably expose the operation in a way that can use efficient hardware support if available, otherwise some library fallback that’s hopefully decent.
For example (from a table by language):
 C++ has
std::bitset<>::count()
, or C++20std::popcount(T x)
 Java has
java.lang.Integer.bitCount()
(also for Long or BigInteger)  C# has
System.Numerics.BitOperations.PopCount()
 Python has
int.bit_count()
Not all compilers / libraries actually manage to use HW support when it’s available, though. (Notably MSVC, even with options that make std::popcount inline as x86 popcnt, its std::bitset::count still always uses a lookup table. This will hopefully change in future versions.)
Also consider the builtin functions of your compiler when the portable language doesn’t have this basic bit operation. In GNU C for example:
int __builtin_popcount (unsigned int x);
int __builtin_popcountll (unsigned long long x);
In the worst case (no singleinstruction HW support) the compiler will generate a call to a function (which in current GCC uses a shift/and bithack like this answer, at least for x86). In the best case the compiler will emit a cpu instruction to do the job. (Just like a *
or /
operator  GCC will use a hardware multiply or divide instruction if available, otherwise will call a libgcc helper function.) Or even better, if the operand is a compiletime constant after inlining, it can do constantpropagation to get a compiletimeconstant popcount result.
The GCC builtins even work across multiple platforms. Popcount has almost become mainstream in the x86 architecture, so it makes sense to start using the builtin now so you can recompile to let it inline a hardware instruction when you compile with mpopcnt
or something that includes that (e.g. https://godbolt.org/z/Ma5e5a). Other architectures have had popcount for years, but in the x86 world there are still some ancient Core 2 and similar vintage AMD CPUs in use.
On x86, you can tell the compiler that it can assume support for popcnt
instruction with mpopcnt
(also implied by msse4.2
). See GCC x86 options. march=nehalem mtune=skylake
(or march=
whatever CPU you want your code to assume and to tune for) could be a good choice. Running the resulting binary on an older CPU will result in an illegalinstruction fault.
To make binaries optimized for the machine you build them on, use march=native
(with gcc, clang, or ICC).
MSVC provides an intrinsic for the x86 popcnt
instruction, but unlike gcc it’s really an intrinsic for the hardware instruction and requires hardware support.
Using std::bitset<>::count()
instead of a builtin
In theory, any compiler that knows how to popcount efficiently for the target CPU should expose that functionality through ISO C++ std::bitset<>
. In practice, you might be better off with the bithack AND/shift/ADD in some cases for some target CPUs.
For target architectures where hardware popcount is an optional extension (like x86), not all compilers have a std::bitset
that takes advantage of it when available. For example, MSVC has no way to enable popcnt
support at compile time, and it’s std::bitset<>::count
always uses a table lookup, even with /Ox /arch:AVX
(which implies SSE4.2, which in turn implies the popcnt feature.) (Update: see below; that does get MSVC’s C++20 std::popcount
to use x86 popcnt
, but still not its bitset<>::count. MSVC could fix that by updating their standard library headers to use std::popcount when available.)
But at least you get something portable that works everywhere, and with gcc/clang with the right target options, you get hardware popcount for architectures that support it.
#include <bitset>
#include <limits>
#include <type_traits>
template<typename T>
//static inline // static if you want to compile with mpopcnt in one compilation unit but not others
typename std::enable_if<std::is_integral<T>::value, unsigned >::type
popcount(T x)
{
static_assert(std::numeric_limits<T>::radix == 2, "nonbinary type");
// sizeof(x)*CHAR_BIT
constexpr int bitwidth = std::numeric_limits<T>::digits + std::numeric_limits<T>::is_signed;
// std::bitset constructor was only unsigned long before C++11. Beware if porting to C++03
static_assert(bitwidth <= std::numeric_limits<unsigned long long>::digits, "arg too wide for std::bitset() constructor");
typedef typename std::make_unsigned<T>::type UT; // probably not needed, bitset width chops after signextension
std::bitset<bitwidth> bs( static_cast<UT>(x) );
return bs.count();
}
See asm from gcc, clang, icc, and MSVC on the Godbolt compiler explorer.
x8664 gcc O3 std=gnu++11 mpopcnt
emits this:
unsigned test_short(short a) { return popcount(a); }
movzx eax, di # note zeroextension, not signextension
popcnt rax, rax
ret
unsigned test_int(int a) { return popcount(a); }
mov eax, edi
popcnt rax, rax # unnecessary 64bit operand size
ret
unsigned test_u64(unsigned long long a) { return popcount(a); }
xor eax, eax # gcc avoids false dependencies for Intel CPUs
popcnt rax, rdi
ret
PowerPC64 gcc O3 std=gnu++11
emits (for the int
arg version):
rldicl 3,3,0,32 # zeroextend from 32 to 64bit
popcntd 3,3 # popcount
blr
This source isn’t x86specific or GNUspecific at all, but only compiles well with gcc/clang/icc, at least when targeting x86 (including x8664).
Also note that gcc’s fallback for architectures without singleinstruction popcount is a byteatatime table lookup. This isn’t wonderful for ARM, for example.
C++20 has std::popcount(T)
Current libstdc++ headers unfortunately define it with a special case if(x==0) return 0;
at the start, which clang doesn’t optimize away when compiling for x86:
#include <bit>
int bar(unsigned x) {
return std::popcount(x);
}
clang 11.0.1 O3 std=gnu++20 march=nehalem
(https://godbolt.org/z/arMe5a)
# clang 11
bar(unsigned int): # @bar(unsigned int)
popcnt eax, edi
cmove eax, edi # redundant: if popcnt result is 0, return the original 0 instead of the popcntgenerated 0...
ret
But GCC compiles nicely:
# gcc 10
xor eax, eax # break false dependency on Intel SnBfamily before Ice Lake.
popcnt eax, edi
ret
Even MSVC does well with it, as long as you use arch:AVX
or later (and enable C++20 with std:c++latest
). https://godbolt.org/z/7K4Gef
int bar(unsigned int) PROC ; bar, COMDAT
popcnt eax, ecx
ret 0
int bar(unsigned int) ENDP ; bar
See also original question in stackoverflow
#16: What is tail call optimization? (Score: 883)
Created: 20081122 Last updated: 20200418
Tags: algorithm, recursion, languageagnostic, tailrecursion, tailcalloptimization
Very simply, what is tailcall optimization?
More specifically, what are some small code snippets where it could be applied, and where not, with an explanation of why?
#16 Best answer 1 of What is tail call optimization? (Score: 798)
Created: 20081122 Last updated: 20210212
Tailcall optimization is where you are able to avoid allocating a new stack frame for a function because the calling function will simply return the value that it gets from the called function. The most common use is tailrecursion, where a recursive function written to take advantage of tailcall optimization can use constant stack space.
Scheme is one of the few programming languages that guarantee in the spec that any implementation must provide this optimization, so here are two examples of the factorial function in Scheme:
(define (fact x)
(if (= x 0) 1
(* x (fact ( x 1)))))
(define (fact x)
(define (facttail x accum)
(if (= x 0) accum
(facttail ( x 1) (* x accum))))
(facttail x 1))
The first function is not tail recursive because when the recursive call is made, the function needs to keep track of the multiplication it needs to do with the result after the call returns. As such, the stack looks as follows:
(fact 3)
(* 3 (fact 2))
(* 3 (* 2 (fact 1)))
(* 3 (* 2 (* 1 (fact 0))))
(* 3 (* 2 (* 1 1)))
(* 3 (* 2 1))
(* 3 2)
6
In contrast, the stack trace for the tail recursive factorial looks as follows:
(fact 3)
(facttail 3 1)
(facttail 2 3)
(facttail 1 6)
(facttail 0 6)
6
As you can see, we only need to keep track of the same amount of data for every call to facttail because we are simply returning the value we get right through to the top. This means that even if I were to call (fact 1000000), I need only the same amount of space as (fact 3). This is not the case with the nontailrecursive fact, and as such large values may cause a stack overflow.
#16 Best answer 2 of What is tail call optimization?(Score: 588)
Created: 20120322 Last updated: 20150708
Let’s walk through a simple example: the factorial function implemented in C.
We start with the obvious recursive definition
unsigned fac(unsigned n)
{
if (n < 2) return 1;
return n * fac(n  1);
}
A function ends with a tail call if the last operation before the function returns is another function call. If this call invokes the same function, it is tailrecursive.
Even though fac()
looks tailrecursive at first glance, it is not as what actually happens is
unsigned fac(unsigned n)
{
if (n < 2) return 1;
unsigned acc = fac(n  1);
return n * acc;
}
ie the last operation is the multiplication and not the function call.
However, it’s possible to rewrite fac()
to be tailrecursive by passing the accumulated value down the call chain as an additional argument and passing only the final result up again as the return value:
unsigned fac(unsigned n)
{
return fac_tailrec(1, n);
}
unsigned fac_tailrec(unsigned acc, unsigned n)
{
if (n < 2) return acc;
return fac_tailrec(n * acc, n  1);
}
Now, why is this useful? Because we immediately return after the tail call, we can discard the previous stackframe before invoking the function in tail position, or, in case of recursive functions, reuse the stackframe asis.
The tailcall optimization transforms our recursive code into
unsigned fac_tailrec(unsigned acc, unsigned n)
{
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n  1;
goto TOP;
}
This can be inlined into fac()
and we arrive at
unsigned fac(unsigned n)
{
unsigned acc = 1;
TOP:
if (n < 2) return acc;
acc = n * acc;
n = n  1;
goto TOP;
}
which is equivalent to
unsigned fac(unsigned n)
{
unsigned acc = 1;
for (; n > 1; n)
acc *= n;
return acc;
}
As we can see here, a sufficiently advanced optimizer can replace tailrecursion with iteration, which is far more efficient as you avoid function call overhead and only use a constant amount of stack space.
See also original question in stackoverflow
#17: How do I determine whether my calculation of pi is accurate? (Score: 780)
Created: 20130111 Last updated: 20180406
Tags: algorithm, math, languageagnostic, pi
I was trying various methods to implement a program that gives the digits of pi sequentially. I tried the Taylor series method, but it proved to converge extremely slowly (when I compared my result with the online values after some time). Anyway, I am trying better algorithms.
So, while writing the program I got stuck on a problem, as with all algorithms: How do I know that the n
digits that I’ve calculated are accurate?
#17 Best answer 1 of How do I determine whether my calculation of pi is accurate? (Score: 1639)
Created: 20130111 Last updated: 20170831
Since I’m the current world record holder for the most digits of pi, I’ll add my two cents:
Unless you’re actually setting a new world record, the common practice is just to verify the computed digits against the known values. So that’s simple enough.
In fact, I have a webpage that lists snippets of digits for the purpose of verifying computations against them: http://www.numberworld.org/digits/Pi/
But when you get into worldrecord territory, there’s nothing to compare against.
Historically, the standard approach for verifying that computed digits are correct is to recompute the digits using a second algorithm. So if either computation goes bad, the digits at the end won’t match.
This does typically more than double the amount of time needed (since the second algorithm is usually slower). But it’s the only way to verify the computed digits once you’ve wandered into the uncharted territory of neverbeforecomputed digits and a new world record.
Back in the days where supercomputers were setting the records, two different AGM algorithms were commonly used:
These are both O(N log(N)^2)
algorithms that were fairly easy to implement.
However, nowadays, things are a bit different. In the last three world records, instead of performing two computations, we performed only one computation using the fastest known formula (Chudnovsky Formula):
This algorithm is much harder to implement, but it is a lot faster than the AGM algorithms.
Then we verify the binary digits using the BBP formulas for digit extraction.
This formula allows you to compute arbitrary binary digits without computing all the digits before it. So it is used to verify the last few computed binary digits. Therefore it is much faster than a full computation.
The advantage of this is:
 Only one expensive computation is needed.
The disadvantage is:
 An implementation of the Bailey–Borwein–Plouffe (BBP) formula is needed.
 An additional step is needed to verify the radix conversion from binary to decimal.
_{I’ve glossed over some details of why verifying the last few digits implies that all the digits are correct. But it is easy to see this since any computation error will propagate to the last digits.}
Now this last step (verifying the conversion) is actually fairly important. One of the previous world record holders actually called us out on this because, initially, I didn’t give a sufficient description of how it worked.
So I’ve pulled this snippet from my blog:
N = # of decimal digits desired
p = 64bit prime number
Compute A using base 10 arithmetic and B using binary arithmetic.
If A = B
, then with “extremely high probability”, the conversion is correct.
For further reading, see my blog post Pi  5 Trillion Digits.
#17 Best answer 2 of How do I determine whether my calculation of pi is accurate?(Score: 48)
Created: 20130111 Last updated: 20200607
Undoubtedly, for your purposes (which I assume is just a programming exercise), the best thing is to check your results against any of the listings of the digits of pi on the web.
And how do we know that those values are correct? Well, I could say that there are computersciencey ways to prove that an implementation of an algorithm is correct.
More pragmatically, if different people use different algorithms, and they all agree to (pick a number) a thousand (million, whatever) decimal places, that should give you a warm fuzzy feeling that they got it right.
Historically, William Shanks published pi to 707 decimal places in 1873. Poor guy, he made a mistake starting at the 528th decimal place.
Very interestingly, in 1995 an algorithm was published that had the property that would directly calculate the nth digit (base 16) of pi without having to calculate all the previous digits!
Finally, I hope your initial algorithm wasn’t pi/4 = 1  1/3 + 1/5  1/7 + ...
That may be the simplest to program, but it’s also one of the slowest ways to do so. Check out the pi article on Wikipedia for faster approaches.
See also original question in stackoverflow
#18: Sorting 1 million 8decimaldigit numbers with 1 MB of RAM (Score: 754)
Created: 20121005 Last updated: 20200330
Tags: algorithm, sorting, embedded, ram
I have a computer with 1 MB of RAM and no other local storage. I must use it to accept 1 million 8digit decimal numbers over a TCP connection, sort them, and then send the sorted list out over another TCP connection.
The list of numbers may contain duplicates, which I must not discard. The code will be placed in ROM, so I need not subtract the size of my code from the 1 MB. I already have code to drive the Ethernet port and handle TCP/IP connections, and it requires 2 KB for its state data, including a 1 KB buffer via which the code will read and write data. Is there a solution to this problem?
Sources Of Question And Answer:
#18 Best answer 1 of Sorting 1 million 8decimaldigit numbers with 1 MB of RAM (Score: 773)
Created: 20121021 Last updated: 20200330
There is one rather sneaky trick not mentioned here so far. We assume that you have no extra way to store data, but that is not strictly true.
One way around your problem is to do the following horrible thing, which should not be attempted by anyone under any circumstances: Use the network traffic to store data. And no, I don’t mean NAS.
You can sort the numbers with only a few bytes of RAM in the following way:
 First take 2 variables:
COUNTER
andVALUE
.  First set all registers to
0
;  Every time you receive an integer
I
, incrementCOUNTER
and setVALUE
tomax(VALUE, I)
;  Then send an ICMP echo request packet with data set to
I
to the router. EraseI
and repeat.  Every time you receive the returned ICMP packet, you simply extract the integer and send it back out again in another echo request. This produces a huge number of ICMP requests scuttling backward and forward containing the integers.
Once COUNTER
reaches 1000000
, you have all of the values stored in the incessant stream of ICMP requests, and VALUE
now contains the maximum integer. Pick some threshold T >> 1000000
. Set COUNTER
to zero. Every time you receive an ICMP packet, increment COUNTER
and send the contained integer I
back out in another echo request, unless I=VALUE
, in which case transmit it to the destination for the sorted integers. Once COUNTER=T
, decrement VALUE
by 1
, reset COUNTER
to zero and repeat. Once VALUE
reaches zero you should have transmitted all integers in order from largest to smallest to the destination, and have only used about 47 bits of RAM for the two persistent variables (and whatever small amount you need for the temporary values).
I know this is horrible, and I know there can be all sorts of practical issues, but I thought it might give some of you a laugh or at least horrify you.
#18 Best answer 2 of Sorting 1 million 8decimaldigit numbers with 1 MB of RAM(Score: 428)
Created: 20121025 Last updated: 20150225
Here’s some working C++ code which solves the problem.
Proof that the memory constraints are satisfied:
Editor: There is no proof of the maximum memory requirements offered by the author either in this post or in his blogs. Since the number of bits necessary to encode a value depends on the values previously encoded, such a proof is likely nontrivial. The author notes that the largest encoded size he could stumble upon empirically was 1011732
, and chose the buffer size 1013000
arbitrarily.
typedef unsigned int u32;
namespace WorkArea
{
static const u32 circularSize = 253250;
u32 circular[circularSize] = { 0 }; // consumes 1013000 bytes
static const u32 stageSize = 8000;
u32 stage[stageSize]; // consumes 32000 bytes
...
Together, these two arrays take 1045000 bytes of storage. That leaves 1048576  1045000  2×1024 = 1528 bytes for remaining variables and stack space.
It runs in about 23 seconds on my Xeon W3520. You can verify that the program works using the following Python script, assuming a program name of sort1mb.exe
.
from subprocess import *
import random
sequence = [random.randint(0, 99999999) for i in xrange(1000000)]
sorter = Popen('sort1mb.exe', stdin=PIPE, stdout=PIPE)
for value in sequence:
sorter.stdin.write('%08d\n' % value)
sorter.stdin.close()
result = [int(line) for line in sorter.stdout]
print('OK!' if result == sorted(sequence) else 'Error!')
A detailed explanation of the algorithm can be found in the following series of posts:
 1MB Sorting Explained
 Arithmetic Coding and the 1MB Sorting Problem
 Arithmetic Encoding Using FixedPoint Math
See also original question in stackoverflow
#19: Expand a random range from 1–5 to 1–7 (Score: 703)
Created: 20080926 Last updated: 20120914
Tags: algorithm, random, puzzle
Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
 What is a simple solution?
 What is an effective solution to reduce memory usage or run on a slower CPU?
#19 Best answer 1 of Expand a random range from 1–5 to 1–7 (Score: 581)
Created: 20090508 Last updated: 20101005
This is equivalent to Adam Rosenfield’s solution, but may be a bit more clear for some readers. It assumes rand5() is a function that returns a statistically random integer in the range 1 through 5 inclusive.
int rand7()
{
int vals[5][5] = {
{ 1, 2, 3, 4, 5 },
{ 6, 7, 1, 2, 3 },
{ 4, 5, 6, 7, 1 },
{ 2, 3, 4, 5, 6 },
{ 7, 0, 0, 0, 0 }
};
int result = 0;
while (result == 0)
{
int i = rand5();
int j = rand5();
result = vals[i1][j1];
}
return result;
}
How does it work? Think of it like this: imagine printing out this doubledimension array on paper, tacking it up to a dart board and randomly throwing darts at it. If you hit a nonzero value, it’s a statistically random value between 1 and 7, since there are an equal number of nonzero values to choose from. If you hit a zero, just keep throwing the dart until you hit a nonzero. That’s what this code is doing: the i and j indexes randomly select a location on the dart board, and if we don’t get a good result, we keep throwing darts.
Like Adam said, this can run forever in the worst case, but statistically the worst case never happens. :)
#19 Best answer 2 of Expand a random range from 1–5 to 1–7(Score: 360)
Created: 20080926
There is no (exactly correct) solution which will run in a constant amount of time, since 1/7 is an infinite decimal in base 5. One simple solution would be to use rejection sampling, e.g.:
int i;
do
{
i = 5 * (rand5()  1) + rand5(); // i is now uniformly random between 1 and 25
} while(i > 21);
// i is now uniformly random between 1 and 21
return i % 7 + 1; // result is now uniformly random between 1 and 7
This has an expected runtime of 25/21 = 1.19 iterations of the loop, but there is an infinitesimally small probability of looping forever.
See also original question in stackoverflow
#20: How do I create a URL shortener? (Score: 703)
Created: 20090412 Last updated: 20181017
Tags: algorithm, url
I want to create a URL shortener service where you can write a long URL into an input field and the service shortens the URL to “http://www.example.org/abcdef
”.
Instead of “abcdef
” there can be any other string with six characters containing az, AZ and 09
. That makes 56~57 billion possible strings.
My approach:
I have a database table with three columns:
 id, integer, autoincrement
 long, string, the long URL the user entered
 short, string, the shortened URL (or just the six characters)
I would then insert the long URL into the table. Then I would select the autoincrement value for “id
” and build a hash of it. This hash should then be inserted as “short
”. But what sort of hash should I build? Hash algorithms like MD5 create too long strings. I don’t use these algorithms, I think. A selfbuilt algorithm will work, too.
My idea:
For “http://www.google.de/
” I get the autoincrement id 239472
. Then I do the following steps:
short = '';
if divisible by 2, add "a"+the result to short
if divisible by 3, add "b"+the result to short
... until I have divisors for az and AZ.
That could be repeated until the number isn’t divisible any more. Do you think this is a good approach? Do you have a better idea?
_{Due to the ongoing interest in this topic, I’ve published an efficient solution to GitHub, with implementations for JavaScript, PHP, Python and Java. Add your solutions if you like :) }
#20 Best answer 1 of How do I create a URL shortener? (Score: 862)
Created: 20090412 Last updated: 20190708
I would continue your “convert number to string” approach. However, you will realize that your proposed algorithm fails if your ID is a prime and greater than 52.
Theoretical background
You need a Bijective Function f. This is necessary so that you can find a inverse function g(‘abc’) = 123 for your f(123) = ‘abc’ function. This means:
 There must be no x1, x2 (with x1 ≠ x2) that will make f(x1) = f(x2),
 and for every y you must be able to find an x so that f(x) = y.
How to convert the ID to a shortened URL

Think of an alphabet we want to use. In your case, that’s
[azAZ09]
. It contains 62 letters. 
Take an autogenerated, unique numerical key (the autoincremented
id
of a MySQL table for example).For this example, I will use 125_{10} (125 with a base of 10).

Now you have to convert 125_{10} to X_{62} (base 62).
125_{10} = 2×62^{1} + 1×62^{0} =
[2,1]
This requires the use of integer division and modulo. A pseudocode example:
digits = [] while num > 0 remainder = modulo(num, 62) digits.push(remainder) num = divide(num, 62) digits = digits.reverse
Now map the indices 2 and 1 to your alphabet. This is how your mapping (with an array for example) could look like:
0 → a 1 → b ... 25 → z ... 52 → 0 61 → 9
With 2 → c and 1 → b, you will receive cb_{62} as the shortened URL.
http://shor.ty/cb
How to resolve a shortened URL to the initial ID
The reverse is even easier. You just do a reverse lookup in your alphabet.

e9a_{62} will be resolved to “4th, 61st, and 0th letter in the alphabet”.
e9a_{62} =
[4,61,0]
= 4×62^{2} + 61×62^{1} + 0×62^{0} = 19158_{10} 
Now find your databaserecord with
WHERE id = 19158
and do the redirect.
Example implementations (provided by commenters)
#20 Best answer 2 of How do I create a URL shortener?(Score: 57)
Created: 20090412 Last updated: 20200724
Why would you want to use a hash?
You can just use a simple translation of your autoincrement value to an alphanumeric value. You can do that easily by using some base conversion. Say you character space (AZ, az, 09, etc.) has 62 characters, convert the id to a base40 number and use the characters as the digits.
See also original question in stackoverflow
Notes:
 This page use API to get the relevant data from stackoverflow community.
 Content license on this page is CC BYSA 3.0.
score
=up votes
down votes
.
Feedback
Was this page helpful?
Glad to hear it!
Sorry to hear that.