RSS

Most votes on algorithm questions 5

Most votes on algorithm questions 5. #41 Check if all elements in a list are identical #42 Constant Amortized Time #43 Generating all permutations of a given string #44 How to implement a queue using two stacks? #45 Getting the closest string match #46 Image comparison - fast algorithm #47 Best algorithm for detecting cycles in a directed graph #48 Fastest sort of fixed length 6 int array #49 Efficiency of purely functional programming #50 Why is quicksort better than mergesort?

Read all the top votes questions and answers in a single page.

#41: Check if all elements in a list are identical (Score: 445)

Created: 2010-10-02 Last updated: 2020-11-01

Tags: python, algorithm, comparison

I need a function which takes in a list and outputs True if all elements in the input list evaluate as equal to each other using the standard equality operator and False otherwise.

I feel it would be best to iterate through the list comparing adjacent elements and then AND all the resulting Boolean values. But I’m not sure what’s the most Pythonic way to do that.

#41 Best answer 1 of Check if all elements in a list are identical (Score: 477)

Created: 2010-10-02 Last updated: 2021-03-12

Use itertools.groupby (see the itertools recipes):

from itertools import groupby

def all_equal(iterable):
    g = groupby(iterable)
    return next(g, True) and not next(g, False)

or without groupby:

def all_equal(iterator):
    iterator = iter(iterator)
    try:
        first = next(iterator)
    except StopIteration:
        return True
    return all(first == x for x in iterator)

There are a number of alternative one-liners you might consider:

  1. Converting the input to a set and checking that it only has one or zero (in case the input is empty) items

    def all_equal2(iterator):
        return len(set(iterator)) <= 1
    
  2. Comparing against the input list without the first item

    def all_equal3(lst):
        return lst[:-1] == lst[1:]
    
  3. Counting how many times the first item appears in the list

    def all_equal_ivo(lst):
        return not lst or lst.count(lst[0]) == len(lst)
    
  4. Comparing against a list of the first element repeated

    def all_equal_6502(lst):
        return not lst or [lst[0]]*len(lst) == lst
    

But they have some downsides, namely:

  1. all_equal and all_equal2 can use any iterators, but the others must take a sequence input, typically concrete containers like a list or tuple.
  2. all_equal and all_equal3 stop as soon as a difference is found (what is called “short circuit"), whereas all the alternatives require iterating over the entire list, even if you can tell that the answer is False just by looking at the first two elements.
  3. In all_equal2 the content must be hashable. A list of lists will raise a TypeError for example.
  4. all_equal2 (in the worst case) and all_equal_6502 create a copy of the list, meaning you need to use double the memory.

On Python 3.9, using perfplot, we get these timings (lower Runtime [s] is better):

for a list with a difference in the first two elements, groupby is fastestfor a list with no differences, count(l[0]) is fastest

#41 Best answer 2 of Check if all elements in a list are identical(Score: 326)

Created: 2010-10-02 Last updated: 2021-05-03

A solution faster than using set() that works on sequences (not iterables) is to simply count the first element. This assumes the list is non-empty (but that’s trivial to check, and decide yourself what the outcome should be on an empty list)

x.count(x[0]) == len(x)

some simple benchmarks:

>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*5000', number=10000)
1.4383411407470703
>>> timeit.timeit('len(set(s1))<=1', 's1=[1]*4999+[2]', number=10000)
1.4765670299530029
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*5000', number=10000)
0.26274609565734863
>>> timeit.timeit('s1.count(s1[0])==len(s1)', 's1=[1]*4999+[2]', number=10000)
0.25654196739196777

See also original question in stackoverflow

#42: Constant Amortized Time (Score: 443)

Created: 2008-10-14 Last updated: 2008-10-14

Tags: algorithm, complexity-theory, big-o

What is meant by “Constant Amortized Time” when talking about time complexity of an algorithm?

#42 Best answer 1 of Constant Amortized Time (Score: 822)

Created: 2008-10-30 Last updated: 2009-05-05

Amortised time explained in simple terms:

If you do an operation say a million times, you don’t really care about the worst-case or the best-case of that operation - what you care about is how much time is taken in total when you repeat the operation a million times.

So it doesn’t matter if the operation is very slow once in a while, as long as “once in a while” is rare enough for the slowness to be diluted away. Essentially amortised time means “average time taken per operation, if you do many operations”. Amortised time doesn’t have to be constant; you can have linear and logarithmic amortised time or whatever else.

Let’s take mats' example of a dynamic array, to which you repeatedly add new items. Normally adding an item takes constant time (that is, O(1)). But each time the array is full, you allocate twice as much space, copy your data into the new region, and free the old space. Assuming allocates and frees run in constant time, this enlargement process takes O(n) time where n is the current size of the array.

So each time you enlarge, you take about twice as much time as the last enlarge. But you’ve also waited twice as long before doing it! The cost of each enlargement can thus be “spread out” among the insertions. This means that in the long term, the total time taken for adding m items to the array is O(m), and so the amortised time (i.e. time per insertion) is O(1).

#42 Best answer 2 of Constant Amortized Time(Score: 55)

Created: 2008-10-14 Last updated: 2017-05-23

It means that over time, the worst case scenario will default to O(1), or constant time. A common example is the dynamic array. If we have already allocated memory for a new entry, adding it will be O(1). If we haven’t allocated it we will do so by allocating, say, twice the current amount. This particular insertion will not be O(1), but rather something else.

What is important is that the algorithm guarantees that after a sequence of operations the expensive operations will be amortised and thereby rendering the entire operation O(1).

Or in more strict terms,

There is a constant c, such that for every sequence of operations (also one ending with a costly operation) of length L, the time is not greater than c*L (Thanks Rafał Dowgird)

See also original question in stackoverflow

#43: Generating all permutations of a given string (Score: 439)

Created: 2010-11-21 Last updated: 2020-02-01

Tags: java, algorithm

What is an elegant way to find all the permutations of a string. E.g. permutation for ba, would be ba and ab, but what about longer string such as abcdefgh? Is there any Java implementation example?

#43 Best answer 1 of Generating all permutations of a given string (Score: 628)

Created: 2010-11-21 Last updated: 2016-01-21

public static void permutation(String str) { 
    permutation("", str); 
}
   
private static void permutation(String prefix, String str) {
    int n = str.length();
    if (n == 0) System.out.println(prefix);
    else {
        for (int i = 0; i < n; i++)
            permutation(prefix + str.charAt(i), str.substring(0, i) + str.substring(i+1, n));
    }
}

(via Introduction to Programming in Java)

#43 Best answer 2 of Generating all permutations of a given string(Score: 201)

Created: 2010-11-21 Last updated: 2010-11-21

Use recursion.

  • Try each of the letters in turn as the first letter and then find all the permutations of the remaining letters using a recursive call.
  • The base case is when the input is an empty string the only permutation is the empty string.

See also original question in stackoverflow

#44: How to implement a queue using two stacks? (Score: 420)

Created: 2008-09-16 Last updated: 2016-08-23

Tags: algorithm, data-structures, stack, queue

Suppose we have two stacks and no other temporary variable.

Is to possible to “construct” a queue data structure using only the two stacks?

#44 Best answer 1 of How to implement a queue using two stacks? (Score: 741)

Created: 2008-09-16 Last updated: 2016-12-22

Keep 2 stacks, let’s call them inbox and outbox.

Enqueue:

  • Push the new element onto inbox

Dequeue:

  • If outbox is empty, refill it by popping each element from inbox and pushing it onto outbox

  • Pop and return the top element from outbox

Using this method, each element will be in each stack exactly once - meaning each element will be pushed twice and popped twice, giving amortized constant time operations.

Here’s an implementation in Java:

public class Queue<E>
{
	
    private Stack<E> inbox = new Stack<E>();
    private Stack<E> outbox = new Stack<E>();
	
    public void queue(E item) {
        inbox.push(item);
    }
	
    public E dequeue() {
        if (outbox.isEmpty()) {
            while (!inbox.isEmpty()) {
               outbox.push(inbox.pop());
            }
        }
        return outbox.pop();
    }

}

#44 Best answer 2 of How to implement a queue using two stacks?(Score: 267)

Created: 2016-08-22 Last updated: 2016-08-23

A - How To Reverse A Stack

To understand how to construct a queue using two stacks, you should understand how to reverse a stack crystal clear. Remember how stack works, it is very similar to the dish stack on your kitchen. The last washed dish will be on the top of the clean stack, which is called as Last In First Out (LIFO) in computer science.

Lets imagine our stack like a bottle as below;

enter image description here

If we push integers 1,2,3 respectively, then 3 will be on the top of the stack. Because 1 will be pushed first, then 2 will be put on the top of 1. Lastly, 3 will be put on the top of the stack and latest state of our stack represented as a bottle will be as below;

enter image description here

Now we have our stack represented as a bottle is populated with values 3,2,1. And we want to reverse the stack so that the top element of the stack will be 1 and bottom element of the stack will be 3. What we can do ? We can take the bottle and hold it upside down so that all the values should reverse in order ?

enter image description here

Yes we can do that, but that’s a bottle. To do the same process, we need to have a second stack that which is going to store the first stack elements in reverse order. Let’s put our populated stack to the left and our new empty stack to the right. To reverse the order of the elements, we are going to pop each element from left stack, and push them to the right stack. You can see what happens as we do so on the image below;

enter image description here

So we know how to reverse a stack.

B - Using Two Stacks As A Queue

On previous part, I’ve explained how can we reverse the order of stack elements. This was important, because if we push and pop elements to the stack, the output will be exactly in reverse order of a queue. Thinking on an example, let’s push the array of integers {1, 2, 3, 4, 5} to a stack. If we pop the elements and print them until the stack is empty, we will get the array in the reverse order of pushing order, which will be {5, 4, 3, 2, 1} Remember that for the same input, if we dequeue the queue until the queue is empty, the output will be {1, 2, 3, 4, 5}. So it is obvious that for the same input order of elements, output of the queue is exactly reverse of the output of a stack. As we know how to reverse a stack using an extra stack, we can construct a queue using two stacks.

Our queue model will consist of two stacks. One stack will be used for enqueue operation (stack #1 on the left, will be called as Input Stack), another stack will be used for the dequeue operation (stack #2 on the right, will be called as Output Stack). Check out the image below;

enter image description here

Our pseudo-code is as below;


Enqueue Operation

Push every input element to the Input Stack

Dequeue Operation

If ( Output Stack is Empty)
    pop every element in the Input Stack
    and push them to the Output Stack until Input Stack is Empty

pop from Output Stack

Let’s enqueue the integers {1, 2, 3} respectively. Integers will be pushed on the Input Stack (Stack #1) which is located on the left;

enter image description here

Then what will happen if we execute a dequeue operation? Whenever a dequeue operation is executed, queue is going to check if the Output Stack is empty or not(see the pseudo-code above) If the Output Stack is empty, then the Input Stack is going to be extracted on the output so the elements of Input Stack will be reversed. Before returning a value, the state of the queue will be as below;

enter image description here

Check out the order of elements in the Output Stack (Stack #2). It’s obvious that we can pop the elements from the Output Stack so that the output will be same as if we dequeued from a queue. Thus, if we execute two dequeue operations, first we will get {1, 2} respectively. Then element 3 will be the only element of the Output Stack, and the Input Stack will be empty. If we enqueue the elements 4 and 5, then the state of the queue will be as follows;

enter image description here

Now the Output Stack is not empty, and if we execute a dequeue operation, only 3 will be popped out from the Output Stack. Then the state will be seen as below;

enter image description here

Again, if we execute two more dequeue operations, on the first dequeue operation, queue will check if the Output Stack is empty, which is true. Then pop out the elements of the Input Stack and push them to the Output Stack unti the Input Stack is empty, then the state of the Queue will be as below;

enter image description here

Easy to see, the output of the two dequeue operations will be {4, 5}

C - Implementation Of Queue Constructed with Two Stacks

Here is an implementation in Java. I’m not going to use the existing implementation of Stack so the example here is going to reinvent the wheel;

C - 1) MyStack class : A Simple Stack Implementation

public class MyStack<T> {
	
    // inner generic Node class
    private class Node<T> {
        T data;
        Node<T> next;
		
        public Node(T data) {
            this.data = data;
        }
    }
	
    private Node<T> head;
    private int size;
	
    public void push(T e) {
        Node<T> newElem = new Node(e);
		
        if(head == null) {
            head = newElem;
        } else {
            newElem.next = head;
            head = newElem;		// new elem on the top of the stack
        }
		
        size++;
    }
	
    public T pop() {
        if(head == null)
            return null;
		
        T elem = head.data;
        head = head.next;	// top of the stack is head.next
		
        size--;
		
        return elem;
    }
	
    public int size() {
        return size;
    }
	
    public boolean isEmpty() {
        return size == 0;
    }
	
    public void printStack() {
        System.out.print("Stack: ");
		
        if(size == 0)
            System.out.print("Empty !");
        else
            for(Node<T> temp = head; temp != null; temp = temp.next)
                System.out.printf("%s ", temp.data);
		
        System.out.printf("\n");
    }
}

C - 2) MyQueue class : Queue Implementation Using Two Stacks

public class MyQueue<T> {
	
    private MyStack<T> inputStack;		// for enqueue
    private MyStack<T> outputStack;		// for dequeue
    private int size;
	
    public MyQueue() {
        inputStack = new MyStack<>();
        outputStack = new MyStack<>();
    }
	
    public void enqueue(T e) {
        inputStack.push(e);
        size++;
    }
	
    public T dequeue() {
        // fill out all the Input if output stack is empty
        if(outputStack.isEmpty())
            while(!inputStack.isEmpty())
                outputStack.push(inputStack.pop());
		
        T temp = null;
        if(!outputStack.isEmpty()) {
            temp = outputStack.pop();
            size--;
        }
			
        return temp;
    }
	
    public int size() {
        return size;
    }
	
    public boolean isEmpty() {
        return size == 0;
    }
	
}

C - 3) Demo Code

public class TestMyQueue {

    public static void main(String[] args) {
        MyQueue<Integer> queue = new MyQueue<>();
		
        // enqueue integers 1..3
        for(int i = 1; i <= 3; i++)
            queue.enqueue(i);
		
        // execute 2 dequeue operations 
        for(int i = 0; i < 2; i++)
            System.out.println("Dequeued: " + queue.dequeue());
		
        // enqueue integers 4..5
        for(int i = 4; i <= 5; i++)
            queue.enqueue(i);
		
        // dequeue the rest
        while(!queue.isEmpty())
            System.out.println("Dequeued: " + queue.dequeue());
    }

}

C - 4) Sample Output

Dequeued: 1
Dequeued: 2
Dequeued: 3
Dequeued: 4
Dequeued: 5

See also original question in stackoverflow

#45: Getting the closest string match (Score: 417)

Created: 2011-05-02 Last updated: 2014-11-14

Tags: algorithm, language-agnostic, string-comparison, levenshtein-distance

I need a way to compare multiple strings to a test string and return the string that closely resembles it:

TEST STRING: THE BROWN FOX JUMPED OVER THE RED COW

CHOICE A   : THE RED COW JUMPED OVER THE GREEN CHICKEN
CHOICE B   : THE RED COW JUMPED OVER THE RED COW
CHOICE C   : THE RED FOX JUMPED OVER THE BROWN COW

(If I did this correctly) The closest string to the “TEST STRING” should be “CHOICE C”. What is the easiest way to do this?

I plan on implementing this into multiple languages including VB.net, Lua, and JavaScript. At this point, pseudo code is acceptable. If you can provide an example for a specific language, this is appreciated too!

#45 Best answer 1 of Getting the closest string match (Score: 983)

Created: 2011-05-02 Last updated: 2017-04-12

I was presented with this problem about a year ago when it came to looking up user entered information about a oil rig in a database of miscellaneous information. The goal was to do some sort of fuzzy string search that could identify the database entry with the most common elements.

Part of the research involved implementing the Levenshtein distance algorithm, which determines how many changes must be made to a string or phrase to turn it into another string or phrase.

The implementation I came up with was relatively simple, and involved a weighted comparison of the length of the two phrases, the number of changes between each phrase, and whether each word could be found in the target entry.

The article is on a private site so I’ll do my best to append the relevant contents here:


Fuzzy String Matching is the process of performing a human-like estimation of the similarity of two words or phrases. In many cases, it involves identifying words or phrases which are most similar to each other. This article describes an in-house solution to the fuzzy string matching problem and its usefulness in solving a variety of problems which can allow us to automate tasks which previously required tedious user involvement.

Introduction

The need to do fuzzy string matching originally came about while developing the Gulf of Mexico Validator tool. What existed was a database of known gulf of Mexico oil rigs and platforms, and people buying insurance would give us some badly typed out information about their assets and we had to match it to the database of known platforms. When there was very little information given, the best we could do is rely on an underwriter to “recognize” the one they were referring to and call up the proper information. This is where this automated solution comes in handy.

I spent a day researching methods of fuzzy string matching, and eventually stumbled upon the very useful Levenshtein distance algorithm on Wikipedia.

Implementation

After reading about the theory behind it, I implemented and found ways to optimize it. This is how my code looks like in VBA:

'Calculate the Levenshtein Distance between two strings (the number of insertions,
'deletions, and substitutions needed to transform the first string into the second)
Public Function LevenshteinDistance(ByRef S1 As String, ByVal S2 As String) As Long
    Dim L1 As Long, L2 As Long, D() As Long 'Length of input strings and distance matrix
    Dim i As Long, j As Long, cost As Long 'loop counters and cost of substitution for current letter
    Dim cI As Long, cD As Long, cS As Long 'cost of next Insertion, Deletion and Substitution
    L1 = Len(S1): L2 = Len(S2)
    ReDim D(0 To L1, 0 To L2)
    For i = 0 To L1: D(i, 0) = i: Next i
    For j = 0 To L2: D(0, j) = j: Next j
    
    For j = 1 To L2
        For i = 1 To L1
            cost = Abs(StrComp(Mid$(S1, i, 1), Mid$(S2, j, 1), vbTextCompare))
            cI = D(i - 1, j) + 1
            cD = D(i, j - 1) + 1
            cS = D(i - 1, j - 1) + cost
            If cI <= cD Then 'Insertion or Substitution
                If cI <= cS Then D(i, j) = cI Else D(i, j) = cS
            Else 'Deletion or Substitution
                If cD <= cS Then D(i, j) = cD Else D(i, j) = cS
            End If
        Next i
    Next j
    LevenshteinDistance = D(L1, L2)
End Function

Simple, speedy, and a very useful metric. Using this, I created two separate metrics for evaluating the similarity of two strings. One I call “valuePhrase” and one I call “valueWords”. valuePhrase is just the Levenshtein distance between the two phrases, and valueWords splits the string into individual words, based on delimiters such as spaces, dashes, and anything else you’d like, and compares each word to each other word, summing up the shortest Levenshtein distance connecting any two words. Essentially, it measures whether the information in one ‘phrase’ is really contained in another, just as a word-wise permutation. I spent a few days as a side project coming up with the most efficient way possible of splitting a string based on delimiters.

valueWords, valuePhrase, and Split function:

Public Function valuePhrase#(ByRef S1$, ByRef S2$)
    valuePhrase = LevenshteinDistance(S1, S2)
End Function

Public Function valueWords#(ByRef S1$, ByRef S2$)
    Dim wordsS1$(), wordsS2$()
    wordsS1 = SplitMultiDelims(S1, " _-")
    wordsS2 = SplitMultiDelims(S2, " _-")
    Dim word1%, word2%, thisD#, wordbest#
    Dim wordsTotal#
    For word1 = LBound(wordsS1) To UBound(wordsS1)
        wordbest = Len(S2)
        For word2 = LBound(wordsS2) To UBound(wordsS2)
            thisD = LevenshteinDistance(wordsS1(word1), wordsS2(word2))
            If thisD < wordbest Then wordbest = thisD
            If thisD = 0 Then GoTo foundbest
        Next word2
foundbest:
        wordsTotal = wordsTotal + wordbest
    Next word1
    valueWords = wordsTotal
End Function

''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
' SplitMultiDelims
' This function splits Text into an array of substrings, each substring
' delimited by any character in DelimChars. Only a single character
' may be a delimiter between two substrings, but DelimChars may
' contain any number of delimiter characters. It returns a single element
' array containing all of text if DelimChars is empty, or a 1 or greater
' element array if the Text is successfully split into substrings.
' If IgnoreConsecutiveDelimiters is true, empty array elements will not occur.
' If Limit greater than 0, the function will only split Text into 'Limit'
' array elements or less. The last element will contain the rest of Text.
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Function SplitMultiDelims(ByRef Text As String, ByRef DelimChars As String, _
        Optional ByVal IgnoreConsecutiveDelimiters As Boolean = False, _
        Optional ByVal Limit As Long = -1) As String()
    Dim ElemStart As Long, N As Long, M As Long, Elements As Long
    Dim lDelims As Long, lText As Long
    Dim Arr() As String
    
    lText = Len(Text)
    lDelims = Len(DelimChars)
    If lDelims = 0 Or lText = 0 Or Limit = 1 Then
        ReDim Arr(0 To 0)
        Arr(0) = Text
        SplitMultiDelims = Arr
        Exit Function
    End If
    ReDim Arr(0 To IIf(Limit = -1, lText - 1, Limit))
    
    Elements = 0: ElemStart = 1
    For N = 1 To lText
        If InStr(DelimChars, Mid(Text, N, 1)) Then
            Arr(Elements) = Mid(Text, ElemStart, N - ElemStart)
            If IgnoreConsecutiveDelimiters Then
                If Len(Arr(Elements)) > 0 Then Elements = Elements + 1
            Else
                Elements = Elements + 1
            End If
            ElemStart = N + 1
            If Elements + 1 = Limit Then Exit For
        End If
    Next N
    'Get the last token terminated by the end of the string into the array
    If ElemStart <= lText Then Arr(Elements) = Mid(Text, ElemStart)
    'Since the end of string counts as the terminating delimiter, if the last character
    'was also a delimiter, we treat the two as consecutive, and so ignore the last elemnent
    If IgnoreConsecutiveDelimiters Then If Len(Arr(Elements)) = 0 Then Elements = Elements - 1
    
    ReDim Preserve Arr(0 To Elements) 'Chop off unused array elements
    SplitMultiDelims = Arr
End Function

Measures of Similarity

Using these two metrics, and a third which simply computes the distance between two strings, I have a series of variables which I can run an optimization algorithm to achieve the greatest number of matches. Fuzzy string matching is, itself, a fuzzy science, and so by creating linearly independent metrics for measuring string similarity, and having a known set of strings we wish to match to each other, we can find the parameters that, for our specific styles of strings, give the best fuzzy match results.

Initially, the goal of the metric was to have a low search value for for an exact match, and increasing search values for increasingly permuted measures. In an impractical case, this was fairly easy to define using a set of well defined permutations, and engineering the final formula such that they had increasing search values results as desired.

Fuzzy String Matching Permutations

In the above screenshot, I tweaked my heuristic to come up with something that I felt scaled nicely to my perceived difference between the search term and result. The heuristic I used for Value Phrase in the above spreadsheet was =valuePhrase(A2,B2)-0.8*ABS(LEN(B2)-LEN(A2)). I was effectively reducing the penalty of the Levenstein distance by 80% of the difference in the length of the two “phrases”. This way, “phrases” that have the same length suffer the full penalty, but “phrases” which contain ‘additional information’ (longer) but aside from that still mostly share the same characters suffer a reduced penalty. I used the Value Words function as is, and then my final SearchVal heuristic was defined as =MIN(D2,E2)*0.8+MAX(D2,E2)*0.2 - a weighted average. Whichever of the two scores was lower got weighted 80%, and 20% of the higher score. This was just a heuristic that suited my use case to get a good match rate. These weights are something that one could then tweak to get the best match rate with their test data.

Fuzzy String Matching Value Phrase

Fuzzy String Matching Value Words

As you can see, the last two metrics, which are fuzzy string matching metrics, already have a natural tendency to give low scores to strings that are meant to match (down the diagonal). This is very good.

Application To allow the optimization of fuzzy matching, I weight each metric. As such, every application of fuzzy string match can weight the parameters differently. The formula that defines the final score is a simply combination of the metrics and their weights:

value = Min(phraseWeight*phraseValue, wordsWeight*wordsValue)*minWeight
      + Max(phraseWeight*phraseValue, wordsWeight*wordsValue)*maxWeight
      + lengthWeight*lengthValue

Using an optimization algorithm (neural network is best here because it is a discrete, multi-dimentional problem), the goal is now to maximize the number of matches. I created a function that detects the number of correct matches of each set to each other, as can be seen in this final screenshot. A column or row gets a point if the lowest score is assigned the the string that was meant to be matched, and partial points are given if there is a tie for the lowest score, and the correct match is among the tied matched strings. I then optimized it. You can see that a green cell is the column that best matches the current row, and a blue square around the cell is the row that best matches the current column. The score in the bottom corner is roughly the number of successful matches and this is what we tell our optimization problem to maximize.

Fuzzy String Matching Optimized Metric

The algorithm was a wonderful success, and the solution parameters say a lot about this type of problem. You’ll notice the optimized score was 44, and the best possible score is 48. The 5 columns at the end are decoys, and do not have any match at all to the row values. The more decoys there are, the harder it will naturally be to find the best match.

In this particular matching case, the length of the strings are irrelevant, because we are expecting abbreviations that represent longer words, so the optimal weight for length is -0.3, which means we do not penalize strings which vary in length. We reduce the score in anticipation of these abbreviations, giving more room for partial word matches to supersede non-word matches that simply require less substitutions because the string is shorter.

The word weight is 1.0 while the phrase weight is only 0.5, which means that we penalize whole words missing from one string and value more the entire phrase being intact. This is useful because a lot of these strings have one word in common (the peril) where what really matters is whether or not the combination (region and peril) are maintained.

Finally, the min weight is optimized at 10 and the max weight at 1. What this means is that if the best of the two scores (value phrase and value words) isn’t very good, the match is greatly penalized, but we don’t greatly penalize the worst of the two scores. Essentially, this puts emphasis on requiring either the valueWord or valuePhrase to have a good score, but not both. A sort of “take what we can get” mentality.

It’s really fascinating what the optimized value of these 5 weights say about the sort of fuzzy string matching taking place. For completely different practical cases of fuzzy string matching, these parameters are very different. I’ve used it for 3 separate applications so far.

While unused in the final optimization, a benchmarking sheet was established which matches columns to themselves for all perfect results down the diagonal, and lets the user change parameters to control the rate at which scores diverge from 0, and note innate similarities between search phrases (which could in theory be used to offset false positives in the results)

Fuzzy String Matching Benchmark

Further Applications

This solution has potential to be used anywhere where the user wishes to have a computer system identify a string in a set of strings where there is no perfect match. (Like an approximate match vlookup for strings).


So what you should take from this, is that you probably want to use a combination of high level heuristics (finding words from one phrase in the other phrase, length of both phrases, etc) along with the implementation of the Levenshtein distance algorithm. Because deciding which is the “best” match is a heuristic (fuzzy) determination - you’ll have to come up with a set of weights for any metrics you come up with to determine similarity.

With the appropriate set of heuristics and weights, you’ll have your comparison program quickly making the decisions that you would have made.

#45 Best answer 2 of Getting the closest string match(Score: 92)

Created: 2012-05-04 Last updated: 2012-05-04

This problem turns up all the time in bioinformatics. The accepted answer above (which was great by the way) is known in bioinformatics as the Needleman-Wunsch (compare two strings) and Smith-Waterman (find an approximate substring in a longer string) algorithms. They work great and have been workhorses for decades.

But what if you have a million strings to compare? That’s a trillion pairwise comparisons, each of which is O(n*m)! Modern DNA sequencers easily generate a billion short DNA sequences, each about 200 DNA “letters” long. Typically, we want to find, for each such string, the best match against the human genome (3 billion letters). Clearly, the Needleman-Wunsch algorithm and its relatives will not do.

This so-called “alignment problem” is a field of active research. The most popular algorithms are currently able to find inexact matches between 1 billion short strings and the human genome in a matter of hours on reasonable hardware (say, eight cores and 32 GB RAM).

Most of these algorithms work by quickly finding short exact matches (seeds) and then extending these to the full string using a slower algorithm (for example, the Smith-Waterman). The reason this works is that we are really only interested in a few close matches, so it pays off to get rid of the 99.9…% of pairs that have nothing in common.

How does finding exact matches help finding inexact matches? Well, say we allow only a single difference between the query and the target. It is easy to see that this difference must occur in either the right or left half of the query, and so the other half must match exactly. This idea can be extended to multiple mismatches and is the basis for the ELAND algorithm commonly used with Illumina DNA sequencers.

There are many very good algorithms for doing exact string matching. Given a query string of length 200, and a target string of length 3 billion (the human genome), we want to find any place in the target where there is a substring of length k that matches a substring of the query exactly. A simple approach is to begin by indexing the target: take all k-long substrings, put them in an array and sort them. Then take each k-long substring of the query and search the sorted index. Sort and search can be done in O(log n) time.

But storage can be a problem. An index of the 3 billion letter target would need to hold 3 billion pointers and 3 billion k-long words. It would seem hard to fit this in less than several tens of gigabytes of RAM. But amazingly we can greatly compress the index, using the Burrows-Wheeler transform, and it will still be efficiently queryable. An index of the human genome can fit in less than 4 GB RAM. This idea is the basis of popular sequence aligners such as Bowtie and BWA.

Alternatively, we can use a suffix array, which stores only the pointers, yet represents a simultaneous index of all suffixes in the target string (essentially, a simultaneous index for all possible values of k; the same is true of the Burrows-Wheeler transform). A suffix array index of the human genome will take 12 GB of RAM if we use 32-bit pointers.

The links above contain a wealth of information and links to primary research papers. The ELAND link goes to a PDF with useful figures illustrating the concepts involved, and shows how to deal with insertions and deletions.

Finally, while these algorithms have basically solved the problem of (re)sequencing single human genomes (a billion short strings), DNA sequencing technology improves even faster than Moore’s law, and we are fast approaching trillion-letter datasets. For example, there are currently projects underway to sequence the genomes of 10,000 vertebrate species, each a billion letters long or so. Naturally, we will want to do pairwise inexact string matching on the data…

See also original question in stackoverflow

#46: Image comparison - fast algorithm (Score: 417)

Created: 2009-05-09 Last updated: 2013-08-28

Tags: image, algorithm, comparison, computer-vision

I’m looking to create a base table of images and then compare any new images against that to determine if the new image is an exact (or close) duplicate of the base.

For example: if you want to reduce storage of the same image 100’s of times, you could store one copy of it and provide reference links to it. When a new image is entered you want to compare to an existing image to make sure it’s not a duplicate … ideas?

One idea of mine was to reduce to a small thumbnail and then randomly pick 100 pixel locations and compare.

#46 Best answer 1 of Image comparison - fast algorithm (Score: 477)

Created: 2009-05-09 Last updated: 2020-04-15

Below are three approaches to solving this problem (and there are many others).

  • The first is a standard approach in computer vision, keypoint matching. This may require some background knowledge to implement, and can be slow.

  • The second method uses only elementary image processing, and is potentially faster than the first approach, and is straightforward to implement. However, what it gains in understandability, it lacks in robustness – matching fails on scaled, rotated, or discolored images.

  • The third method is both fast and robust, but is potentially the hardest to implement.

Keypoint Matching

Better than picking 100 random points is picking 100 important points. Certain parts of an image have more information than others (particularly at edges and corners), and these are the ones you’ll want to use for smart image matching. Google “keypoint extraction” and “keypoint matching” and you’ll find quite a few academic papers on the subject. These days, SIFT keypoints are arguably the most popular, since they can match images under different scales, rotations, and lighting. Some SIFT implementations can be found here.

One downside to keypoint matching is the running time of a naive implementation: O(n^2m), where n is the number of keypoints in each image, and m is the number of images in the database. Some clever algorithms might find the closest match faster, like quadtrees or binary space partitioning.


Alternative solution: Histogram method

Another less robust but potentially faster solution is to build feature histograms for each image, and choose the image with the histogram closest to the input image’s histogram. I implemented this as an undergrad, and we used 3 color histograms (red, green, and blue), and two texture histograms, direction and scale. I’ll give the details below, but I should note that this only worked well for matching images VERY similar to the database images. Re-scaled, rotated, or discolored images can fail with this method, but small changes like cropping won’t break the algorithm

Computing the color histograms is straightforward – just pick the range for your histogram buckets, and for each range, tally the number of pixels with a color in that range. For example, consider the “green” histogram, and suppose we choose 4 buckets for our histogram: 0-63, 64-127, 128-191, and 192-255. Then for each pixel, we look at the green value, and add a tally to the appropriate bucket. When we’re done tallying, we divide each bucket total by the number of pixels in the entire image to get a normalized histogram for the green channel.

For the texture direction histogram, we started by performing edge detection on the image. Each edge point has a normal vector pointing in the direction perpendicular to the edge. We quantized the normal vector’s angle into one of 6 buckets between 0 and PI (since edges have 180-degree symmetry, we converted angles between -PI and 0 to be between 0 and PI). After tallying up the number of edge points in each direction, we have an un-normalized histogram representing texture direction, which we normalized by dividing each bucket by the total number of edge points in the image.

To compute the texture scale histogram, for each edge point, we measured the distance to the next-closest edge point with the same direction. For example, if edge point A has a direction of 45 degrees, the algorithm walks in that direction until it finds another edge point with a direction of 45 degrees (or within a reasonable deviation). After computing this distance for each edge point, we dump those values into a histogram and normalize it by dividing by the total number of edge points.

Now you have 5 histograms for each image. To compare two images, you take the absolute value of the difference between each histogram bucket, and then sum these values. For example, to compare images A and B, we would compute

|A.green_histogram.bucket_1 - B.green_histogram.bucket_1| 

for each bucket in the green histogram, and repeat for the other histograms, and then sum up all the results. The smaller the result, the better the match. Repeat for all images in the database, and the match with the smallest result wins. You’d probably want to have a threshold, above which the algorithm concludes that no match was found.


Third Choice - Keypoints + Decision Trees

A third approach that is probably much faster than the other two is using semantic texton forests (PDF). This involves extracting simple keypoints and using a collection decision trees to classify the image. This is faster than simple SIFT keypoint matching, because it avoids the costly matching process, and keypoints are much simpler than SIFT, so keypoint extraction is much faster. However, it preserves the SIFT method’s invariance to rotation, scale, and lighting, an important feature that the histogram method lacked.

Update:

My mistake – the Semantic Texton Forests paper isn’t specifically about image matching, but rather region labeling. The original paper that does matching is this one: Keypoint Recognition using Randomized Trees. Also, the papers below continue to develop the ideas and represent the state of the art (c. 2010):

#46 Best answer 2 of Image comparison - fast algorithm(Score: 91)

Created: 2011-11-09

The best method I know of is to use a Perceptual Hash. There appears to be a good open source implementation of such a hash available at:

http://phash.org/

The main idea is that each image is reduced down to a small hash code or ‘fingerprint’ by identifying salient features in the original image file and hashing a compact representation of those features (rather than hashing the image data directly). This means that the false positives rate is much reduced over a simplistic approach such as reducing images down to a tiny thumbprint sized image and comparing thumbprints.

phash offers several types of hash and can be used for images, audio or video.

See also original question in stackoverflow

#47: Best algorithm for detecting cycles in a directed graph (Score: 413)

Created: 2008-11-04 Last updated: 2014-06-27

Tags: algorithm, graph-theory, directed-graph

What is the most efficient algorithm for detecting all cycles within a directed graph?

I have a directed graph representing a schedule of jobs that need to be executed, a job being a node and a dependency being an edge. I need to detect the error case of a cycle within this graph leading to cyclic dependencies.

#47 Best answer 1 of Best algorithm for detecting cycles in a directed graph (Score: 201)

Created: 2008-11-04 Last updated: 2015-07-28

Tarjan’s strongly connected components algorithm has O(|E| + |V|) time complexity.

For other algorithms, see Strongly connected components on Wikipedia.

#47 Best answer 2 of Best algorithm for detecting cycles in a directed graph(Score: 79)

Created: 2008-11-04 Last updated: 2015-07-28

Given that this is a schedule of jobs, I suspect that at some point you are going to sort them into a proposed order of execution.

If that’s the case, then a topological sort implementation may in any case detect cycles. UNIX tsort certainly does. I think it is likely that it is therefore more efficient to detect cycles at the same time as tsorting, rather than in a separate step.

So the question might become, “how do I most efficiently tsort”, rather than “how do I most efficiently detect loops”. To which the answer is probably “use a library”, but failing that the following Wikipedia article:

http://en.wikipedia.org/wiki/Topological_sorting

has the pseudo-code for one algorithm, and a brief description of another from Tarjan. Both have O(|V| + |E|) time complexity.

See also original question in stackoverflow

#48: Fastest sort of fixed length 6 int array (Score: 405)

Created: 2010-05-07 Last updated: 2019-04-08

Tags: algorithm, sorting, optimization, gpgpu, sorting-network

Answering to another Stack Overflow question (this one) I stumbled upon an interesting sub-problem. What is the fastest way to sort an array of 6 integers?

As the question is very low level:

  • we can’t assume libraries are available (and the call itself has its cost), only plain C
  • to avoid emptying instruction pipeline (that has a very high cost) we should probably minimize branches, jumps, and every other kind of control flow breaking (like those hidden behind sequence points in && or ||).
  • room is constrained and minimizing registers and memory use is an issue, ideally in place sort is probably best.

Really this question is a kind of Golf where the goal is not to minimize source length but execution time. I call it ‘Zening’ code as used in the title of the book Zen of Code optimization by Michael Abrash and its sequels.

As for why it is interesting, there is several layers:

  • the example is simple and easy to understand and measure, not much C skill involved
  • it shows effects of choice of a good algorithm for the problem, but also effects of the compiler and underlying hardware.

Here is my reference (naive, not optimized) implementation and my test set.

#include <stdio.h>

static __inline__ int sort6(int * d){

    char j, i, imin;
    int tmp;
    for (j = 0 ; j < 5 ; j++){
        imin = j;
        for (i = j + 1; i < 6 ; i++){
            if (d[i] < d[imin]){
                imin = i;
            }
        }
        tmp = d[j];
        d[j] = d[imin];
        d[imin] = tmp;
    }
}

static __inline__ unsigned long long rdtsc(void)
{
  unsigned long long int x;
     __asm__ volatile (".byte 0x0f, 0x31" : "=A" (x));
     return x;
}

int main(int argc, char ** argv){
    int i;
    int d[6][5] = {
        {1, 2, 3, 4, 5, 6},
        {6, 5, 4, 3, 2, 1},
        {100, 2, 300, 4, 500, 6},
        {100, 2, 3, 4, 500, 6},
        {1, 200, 3, 4, 5, 600},
        {1, 1, 2, 1, 2, 1}
    };

    unsigned long long cycles = rdtsc();
    for (i = 0; i < 6 ; i++){
        sort6(d[i]);
        /*
         * printf("d%d : %d %d %d %d %d %d\n", i,
         *  d[i][0], d[i][6], d[i][7],
         *  d[i][8], d[i][9], d[i][10]);
        */
    }
    cycles = rdtsc() - cycles;
    printf("Time is %d\n", (unsigned)cycles);
}

Raw results

As number of variants is becoming large, I gathered them all in a test suite that can be found here. The actual tests used are a bit less naive than those showed above, thanks to Kevin Stock. You can compile and execute it in your own environment. I’m quite interested by behavior on different target architecture/compilers. (OK guys, put it in answers, I will +1 every contributor of a new resultset).

I gave the answer to Daniel Stutzbach (for golfing) one year ago as he was at the source of the fastest solution at that time (sorting networks).

Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O2

  • Direct call to qsort library function : 689.38
  • Naive implementation (insertion sort) : 285.70
  • Insertion Sort (Daniel Stutzbach) : 142.12
  • Insertion Sort Unrolled : 125.47
  • Rank Order : 102.26
  • Rank Order with registers : 58.03
  • Sorting Networks (Daniel Stutzbach) : 111.68
  • Sorting Networks (Paul R) : 66.36
  • Sorting Networks 12 with Fast Swap : 58.86
  • Sorting Networks 12 reordered Swap : 53.74
  • Sorting Networks 12 reordered Simple Swap : 31.54
  • Reordered Sorting Network w/ fast swap : 31.54
  • Reordered Sorting Network w/ fast swap V2 : 33.63
  • Inlined Bubble Sort (Paolo Bonzini) : 48.85
  • Unrolled Insertion Sort (Paolo Bonzini) : 75.30

Linux 64 bits, gcc 4.6.1 64 bits, Intel Core 2 Duo E8400, -O1

  • Direct call to qsort library function : 705.93
  • Naive implementation (insertion sort) : 135.60
  • Insertion Sort (Daniel Stutzbach) : 142.11
  • Insertion Sort Unrolled : 126.75
  • Rank Order : 46.42
  • Rank Order with registers : 43.58
  • Sorting Networks (Daniel Stutzbach) : 115.57
  • Sorting Networks (Paul R) : 64.44
  • Sorting Networks 12 with Fast Swap : 61.98
  • Sorting Networks 12 reordered Swap : 54.67
  • Sorting Networks 12 reordered Simple Swap : 31.54
  • Reordered Sorting Network w/ fast swap : 31.24
  • Reordered Sorting Network w/ fast swap V2 : 33.07
  • Inlined Bubble Sort (Paolo Bonzini) : 45.79
  • Unrolled Insertion Sort (Paolo Bonzini) : 80.15

I included both -O1 and -O2 results because surprisingly for several programs O2 is less efficient than O1. I wonder what specific optimization has this effect ?

Comments on proposed solutions

Insertion Sort (Daniel Stutzbach)

As expected minimizing branches is indeed a good idea.

Sorting Networks (Daniel Stutzbach)

Better than insertion sort. I wondered if the main effect was not get from avoiding the external loop. I gave it a try by unrolled insertion sort to check and indeed we get roughly the same figures (code is here).

Sorting Networks (Paul R)

The best so far. The actual code I used to test is here. Don’t know yet why it is nearly two times as fast as the other sorting network implementation. Parameter passing ? Fast max ?

Sorting Networks 12 SWAP with Fast Swap

As suggested by Daniel Stutzbach, I combined his 12 swap sorting network with branchless fast swap (code is here). It is indeed faster, the best so far with a small margin (roughly 5%) as could be expected using 1 less swap.

It is also interesting to notice that the branchless swap seems to be much (4 times) less efficient than the simple one using if on PPC architecture.

Calling Library qsort

To give another reference point I also tried as suggested to just call library qsort (code is here). As expected it is much slower : 10 to 30 times slower… as it became obvious with the new test suite, the main problem seems to be the initial load of the library after the first call, and it compares not so poorly with other version. It is just between 3 and 20 times slower on my Linux. On some architecture used for tests by others it seems even to be faster (I’m really surprised by that one, as library qsort use a more complex API).

Rank order

Rex Kerr proposed another completely different method : for each item of the array compute directly its final position. This is efficient because computing rank order do not need branch. The drawback of this method is that it takes three times the amount of memory of the array (one copy of array and variables to store rank orders). The performance results are very surprising (and interesting). On my reference architecture with 32 bits OS and Intel Core2 Quad E8300, cycle count was slightly below 1000 (like sorting networks with branching swap). But when compiled and executed on my 64 bits box (Intel Core2 Duo) it performed much better : it became the fastest so far. I finally found out the true reason. My 32bits box use gcc 4.4.1 and my 64bits box gcc 4.4.3 and the last one seems much better at optimizing this particular code (there was very little difference for other proposals).

update:

As published figures above shows this effect was still enhanced by later versions of gcc and Rank Order became consistently twice as fast as any other alternative.

Sorting Networks 12 with reordered Swap

The amazing efficiency of the Rex Kerr proposal with gcc 4.4.3 made me wonder : how could a program with 3 times as much memory usage be faster than branchless sorting networks? My hypothesis was that it had less dependencies of the kind read after write, allowing for better use of the superscalar instruction scheduler of the x86. That gave me an idea: reorder swaps to minimize read after write dependencies. More simply put: when you do SWAP(1, 2); SWAP(0, 2); you have to wait for the first swap to be finished before performing the second one because both access to a common memory cell. When you do SWAP(1, 2); SWAP(4, 5);the processor can execute both in parallel. I tried it and it works as expected, the sorting networks is running about 10% faster.

Sorting Networks 12 with Simple Swap

One year after the original post Steinar H. Gunderson suggested, that we should not try to outsmart the compiler and keep the swap code simple. It’s indeed a good idea as the resulting code is about 40% faster! He also proposed a swap optimized by hand using x86 inline assembly code that can still spare some more cycles. The most surprising (it says volumes on programmer’s psychology) is that one year ago none of used tried that version of swap. Code I used to test is here. Others suggested other ways to write a C fast swap, but it yields the same performances as the simple one with a decent compiler.

The “best” code is now as follow:

static inline void sort6_sorting_network_simple_swap(int * d){
#define min(x, y) (x<y?x:y)
#define max(x, y) (x<y?y:x) 
#define SWAP(x,y) { const int a = min(d[x], d[y]); \
                    const int b = max(d[x], d[y]); \
                    d[x] = a; d[y] = b; }
    SWAP(1, 2);
    SWAP(4, 5);
    SWAP(0, 2);
    SWAP(3, 5);
    SWAP(0, 1);
    SWAP(3, 4);
    SWAP(1, 4);
    SWAP(0, 3);
    SWAP(2, 5);
    SWAP(1, 3);
    SWAP(2, 4);
    SWAP(2, 3);
#undef SWAP
#undef min
#undef max
}

If we believe our test set (and, yes it is quite poor, it’s mere benefit is being short, simple and easy to understand what we are measuring), the average number of cycles of the resulting code for one sort is below 40 cycles (6 tests are executed). That put each swap at an average of 4 cycles. I call that amazingly fast. Any other improvements possible ?

#48 Best answer 1 of Fastest sort of fixed length 6 int array (Score: 164)

Created: 2010-05-07 Last updated: 2011-08-24

For any optimization, it’s always best to test, test, test. I would try at least sorting networks and insertion sort. If I were betting, I’d put my money on insertion sort based on past experience.

Do you know anything about the input data? Some algorithms will perform better with certain kinds of data. For example, insertion sort performs better on sorted or almost-sorted dat, so it will be the better choice if there’s an above-average chance of almost-sorted data.

The algorithm you posted is similar to an insertion sort, but it looks like you’ve minimized the number of swaps at the cost of more comparisons. Comparisons are far more expensive than swaps, though, because branches can cause the instruction pipeline to stall.

Here’s an insertion sort implementation:

static __inline__ int sort6(int *d){
        int i, j;
        for (i = 1; i < 6; i++) {
                int tmp = d[i];
                for (j = i; j >= 1 && tmp < d[j-1]; j--)
                        d[j] = d[j-1];
                d[j] = tmp;
        }
}

Here’s how I’d build a sorting network. First, use this site to generate a minimal set of SWAP macros for a network of the appropriate length. Wrapping that up in a function gives me:

static __inline__ int sort6(int * d){
#define SWAP(x,y) if (d[y] < d[x]) { int tmp = d[x]; d[x] = d[y]; d[y] = tmp; }
    SWAP(1, 2);
    SWAP(0, 2);
    SWAP(0, 1);
    SWAP(4, 5);
    SWAP(3, 5);
    SWAP(3, 4);
    SWAP(0, 3);
    SWAP(1, 4);
    SWAP(2, 5);
    SWAP(2, 4);
    SWAP(1, 3);
    SWAP(2, 3);
#undef SWAP
}

#48 Best answer 2 of Fastest sort of fixed length 6 int array(Score: 63)

Created: 2010-05-07 Last updated: 2017-05-23

Here’s an implementation using sorting networks:

inline void Sort2(int *p0, int *p1)
{
    const int temp = min(*p0, *p1);
    *p1 = max(*p0, *p1);
    *p0 = temp;
}

inline void Sort3(int *p0, int *p1, int *p2)
{
    Sort2(p0, p1);
    Sort2(p1, p2);
    Sort2(p0, p1);
}

inline void Sort4(int *p0, int *p1, int *p2, int *p3)
{
    Sort2(p0, p1);
    Sort2(p2, p3);
    Sort2(p0, p2);	
    Sort2(p1, p3);	
    Sort2(p1, p2);	
}

inline void Sort6(int *p0, int *p1, int *p2, int *p3, int *p4, int *p5)
{
    Sort3(p0, p1, p2);
    Sort3(p3, p4, p5);
    Sort2(p0, p3);	
    Sort2(p2, p5);	
    Sort4(p1, p2, p3, p4);	
}

You really need very efficient branchless min and max implementations for this, since that is effectively what this code boils down to - a sequence of min and max operations (13 of each, in total). I leave this as an exercise for the reader.

Note that this implementation lends itself easily to vectorization (e.g. SIMD - most SIMD ISAs have vector min/max instructions) and also to GPU implementations (e.g. CUDA - being branchless there are no problems with warp divergence etc).

See also: Fast algorithm implementation to sort very small list

See also original question in stackoverflow

#49: Efficiency of purely functional programming (Score: 403)

Created: 2010-01-02 Last updated: 2010-01-23

Tags: algorithm, functional-programming, performance

Does anyone know what is the worst possible asymptotic slowdown that can happen when programming purely functionally as opposed to imperatively (i.e. allowing side-effects)?

Clarification from comment by itowlson: is there any problem for which the best known non-destructive algorithm is asymptotically worse than the best known destructive algorithm, and if so by how much?

#49 Best answer 1 of Efficiency of purely functional programming (Score: 538)

Created: 2010-01-02 Last updated: 2010-05-23

According to Pippenger [1996], when comparing a Lisp system that is purely functional (and has strict evaluation semantics, not lazy) to one that can mutate data, an algorithm written for the impure Lisp that runs in O(n) can be translated to an algorithm in the pure Lisp that runs in O(n log n) time (based on work by Ben-Amram and Galil [1992] about simulating random access memory using only pointers). Pippenger also establishes that there are algorithms for which that is the best you can do; there are problems which are O(n) in the impure system which are Ω(n log n) in the pure system.

There are a few caveats to be made about this paper. The most significant is that it does not address lazy functional languages, such as Haskell. Bird, Jones and De Moor [1997] demonstrate that the problem constructed by Pippenger can be solved in a lazy functional language in O(n) time, but they do not establish (and as far as I know, no one has) whether or not a lazy functional language can solve all problems in the same asymptotic running time as a language with mutation.

The problem constructed by Pippenger requires Ω(n log n) is specifically constructed to achieve this result, and is not necessarily representative of practical, real-world problems. There are a few restrictions on the problem that are a bit unexpected, but necessary for the proof to work; in particular, the problem requires that results are computed on-line, without being able to access future input, and that the input consists of a sequence of atoms from an unbounded set of possible atoms, rather than a fixed size set. And the paper only establishes (lower bound) results for an impure algorithm of linear running time; for problems that require a greater running time, it is possible that the extra O(log n) factor seen in the linear problem may be able to be “absorbed” in the process of extra operations necessary for algorithms with greater running times. These clarifications and open questions are explored briefly by Ben-Amram [1996].

In practice, many algorithms can be implemented in a pure functional language at the same efficiency as in a language with mutable data structures. For a good reference on techniques to use for implementing purely functional data structures efficiently, see Chris Okasaki’s “Purely Functional Data Structures” [Okasaki 1998] (which is an expanded version of his thesis [Okasaki 1996]).

Anyone who needs to implement algorithms on purely-functional data structures should read Okasaki. You can always get at worst an O(log n) slowdown per operation by simulating mutable memory with a balanced binary tree, but in many cases you can do considerably better than that, and Okasaki describes many useful techniques, from amortized techniques to real-time ones that do the amortized work incrementally. Purely functional data structures can be a bit difficult to work with and analyze, but they provide many benefits like referential transparency that are helpful in compiler optimization, in parallel and distributed computing, and in implementation of features like versioning, undo, and rollback.

Note also that all of this discusses only asymptotic running times. Many techniques for implementing purely functional data structures give you a certain amount of constant factor slowdown, due to extra bookkeeping necessary for them to work, and implementation details of the language in question. The benefits of purely functional data structures may outweigh these constant factor slowdowns, so you will generally need to make trade-offs based on the problem in question.

References

#49 Best answer 2 of Efficiency of purely functional programming(Score: 44)

Created: 2010-05-23 Last updated: 2012-04-12

There are indeed several algorithms and data structures for which no asymptotically efficient purely functional solution (t.i. one implementable in pure lambda calculus) is known, even with laziness.

  • The aforementioned union-find
  • Hash tables
  • Arrays
  • Some graph algorithms

However, we assume that in “imperative” languages access to memory is O(1) whereas in theory that can’t be so asymptotically (i.e. for unbounded problem sizes) and access to memory within a huge dataset is always O(log n), which can be emulated in a functional language.

Also, we must remember that actually all modern functional languages provide mutable data, and Haskell even provides it without sacrificing purity (the ST monad).

See also original question in stackoverflow

#50: Why is quicksort better than mergesort? (Score: 383)

Created: 2008-09-16 Last updated: 2013-07-09

Tags: algorithm, sorting, language-agnostic, quicksort, mergesort

I was asked this question during an interview. They’re both O(nlogn) and yet most people use Quicksort instead of Mergesort. Why is that?

#50 Best answer 1 of Why is quicksort better than mergesort? (Score: 300)

Created: 2008-09-16 Last updated: 2014-05-19

Quicksort has O(n2) worst-case runtime and O(nlogn) average case runtime. However, it’s superior to merge sort in many scenarios because many factors influence an algorithm’s runtime, and, when taking them all together, quicksort wins out.

In particular, the often-quoted runtime of sorting algorithms refers to the number of comparisons or the number of swaps necessary to perform to sort the data. This is indeed a good measure of performance, especially since it’s independent of the underlying hardware design. However, other things – such as locality of reference (i.e. do we read lots of elements which are probably in cache?) – also play an important role on current hardware. Quicksort in particular requires little additional space and exhibits good cache locality, and this makes it faster than merge sort in many cases.

In addition, it’s very easy to avoid quicksort’s worst-case run time of O(n2) almost entirely by using an appropriate choice of the pivot – such as picking it at random (this is an excellent strategy).

In practice, many modern implementations of quicksort (in particular libstdc++’s std::sort) are actually introsort, whose theoretical worst-case is O(nlogn), same as merge sort. It achieves this by limiting the recursion depth, and switching to a different algorithm (heapsort) once it exceeds logn.

#50 Best answer 2 of Why is quicksort better than mergesort?(Score: 298)

Created: 2008-09-18

As many people have noted, the average case performance for quicksort is faster than mergesort. But this is only true if you are assuming constant time to access any piece of memory on demand.

In RAM this assumption is generally not too bad (it is not always true because of caches, but it is not too bad). However if your data structure is big enough to live on disk, then quicksort gets killed by the fact that your average disk does something like 200 random seeks per second. But that same disk has no trouble reading or writing megabytes per second of data sequentially. Which is exactly what mergesort does.

Therefore if data has to be sorted on disk, you really, really want to use some variation on mergesort. (Generally you quicksort sublists, then start merging them together above some size threshold.)

Furthermore if you have to do anything with datasets of that size, think hard about how to avoid seeks to disk. For instance this is why it is standard advice that you drop indexes before doing large data loads in databases, and then rebuild the index later. Maintaining the index during the load means constantly seeking to disk. By contrast if you drop the indexes, then the database can rebuild the index by first sorting the information to be dealt with (using a mergesort of course!) and then loading it into a BTREE datastructure for the index. (BTREEs are naturally kept in order, so you can load one from a sorted dataset with few seeks to disk.)

There have been a number of occasions where understanding how to avoid disk seeks has let me make data processing jobs take hours rather than days or weeks.

See also original question in stackoverflow


Notes:
  1. This page use API to get the relevant data from stackoverflow community.
  2. Content license on this page is CC BY-SA 3.0.
  3. score = up votes - down votes.