Sorting

Transcrição

Sorting
Fachhochschule Braunschweig / Wolfenbüttel
- University of Applied Sciences -
Data Structures
and Algorithm Design
- CSCI 340 Friedhelm Seutter
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
© Friedhelm Seutter, 2008
Contents
1.
2.
3.
4.
6.
7.
8.
10.
13.
Analyzing Algorithms and Problems
Data Abstraction
Recursion and Induction
Sorting
Dynamic Sets and Searching
Graphs and Graph Traversals
Optimization and Greedy Algorithms
Dynamic Programming
NP-Complete Problems
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
2
© Friedhelm Seutter, 2008
1
4. Sorting
•
•
•
•
Insertion Sort
Quicksort
Mergesort
Heapsort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
3
© Friedhelm Seutter, 2008
Sorting Problem
Let
A = (a1, a2, . . . , an)
be an array of (nonnegative) integers, called keys.
The problem is to find a permutation π such that
the integers are sorted in nondecreasing order.
Solution:
A = (a π(1), a π(2), . . . , a π(n))
with a π(1) ≤ a π(2) ≤ . . . ≤ a π(n)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
4
© Friedhelm Seutter, 2008
2
Analysis of Complexity
General strategy: Sorting by comparison of keys
• Time: Number of key comparisons
(basic operations)
• Space: Amount of extra space
(in addition to the input)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
5
© Friedhelm Seutter, 2008
Insertion Sort
Let some elements at the left side of the array
be sorted. Take the first element from the unexamined elements and insert it at the right
position of the sorted elements. To get a vacant
space the greater elements must be shifted one
position to the right.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
6
© Friedhelm Seutter, 2008
3
Insertion Sort - Example
<
sorted
>
2
4
12
14
15
2
4
12
14
15
< not examined >
7
19
11
19
11
19
11
7
2
4
<
7
12
14
sorted
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
15
>
<not exam.>
7
© Friedhelm Seutter, 2008
8
© Friedhelm Seutter, 2008
Insertion Sort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
4
Worst-case Complexity
Basic operation: Key comparison in line 4
W(n) = ∑(2 ≤ j ≤ n) (j – 1)
= ∑(1 ≤ j ≤ n-1) j
= ½ n (n – 1) ∈ Θ(n2)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
9
© Friedhelm Seutter, 2008
Average-case Complexity
Basic operation: Key comparison in line 4
Assumptions:
All permutations are equally
likely as input
and the keys are distinct.
A(n) ≈ ¼ n2 ∈ Θ(n2)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
10
© Friedhelm Seutter, 2008
5
Best-case Complexity
Basic operation: Key comparison in line 4
B(n) = n – 1 ∈ Θ(n)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
11
© Friedhelm Seutter, 2008
Space Complexity
Insertion Sort sorts in-place.
The additional amount of space is independent
of the number of elements to sort.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
12
© Friedhelm Seutter, 2008
6
Divide and Conquer
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
13
© Friedhelm Seutter, 2008
Quicksort
• Divide:
Choose one element to be the pivot.
Divide the array in two subarrays
corresponding to the pivot. Less or
equal elements to the left, greater
elements to the right, the pivot in
between.
• Conquer: An array of length 1 is sorted.
• Combine: Append two sorted subarrays with
the pivot in between.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
14
© Friedhelm Seutter, 2008
7
Quicksort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
15
© Friedhelm Seutter, 2008
Quicksort-Partition
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
16
© Friedhelm Seutter, 2008
8
Quicksort-Partition
f
l
unexamined
f
pivot
i
≤ pivot
j
l
unexamined
> pivot
f
q
≤ pivot
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
pivot
17
pivot
l
> pivot
© Friedhelm Seutter, 2008
Quicksort-Partition
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
18
© Friedhelm Seutter, 2008
9
Divide
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
19
© Friedhelm Seutter, 2008
20
© Friedhelm Seutter, 2008
Combine
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
10
Complexity
Basic operation:
Key comparison in line 4 of Partition
W(n) ∈ Θ(n2)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
A(n)
∈ Θ(n log n)
B(n)
∈ Θ(n log n)
21
© Friedhelm Seutter, 2008
Space Complexity
Amount of space needed: Θ(n)
The exchange of keys is in-place, but there are
in the worst case n recursive procedure calls
and they need that space for storing their local
variables.
A tricky implementation may reduce the space
complexity to Θ(log n).
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
22
© Friedhelm Seutter, 2008
11
Mergesort
• Divide:
Divide the array in two halves,
recursively.
• Conquer: An array of length 1 is sorted.
• Combine: Two sorted subarrays are merged to
a sorted array.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
23
© Friedhelm Seutter, 2008
24
© Friedhelm Seutter, 2008
Mergesort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
12
Mergesort-Merge
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
25
© Friedhelm Seutter, 2008
26
© Friedhelm Seutter, 2008
Divide
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
13
Combine
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
27
© Friedhelm Seutter, 2008
Complexity
Basic operation:
Key comparison in line 6 of Merge
W(n) ∈ Θ(n log n)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
A(n)
∈ Θ(n log n)
B(n)
∈ Θ(n log n)
28
© Friedhelm Seutter, 2008
14
Space Complexity
Amount of space needed: Θ(n)
There is no exchange of keys, but all n keys are
copied and merged to an extra array.
A tricky implementation may reduce the space
needed to n/2, but this still is in Θ(n).
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
29
© Friedhelm Seutter, 2008
Lower Bounds for Sorting
by Comparison of Keys
What is the minimum number of key
comparisons for sorting algorithms based on
comparisons of keys?
Given an array of n distinct keys. The solution of
sorting the keys is a permutation of the keys.
Thus there are n! possible solutions.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
30
© Friedhelm Seutter, 2008
15
Decision Tree for Sorting
All possible sorting solutions may be represented
in a decision tree.
The inner nodes represent a comparison of two
keys. The possible outcomes are true or false.
If false, the keys have to be exchanged. Inner
nodes have two successors.
The leaves are the possible sorting solutions.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
31
© Friedhelm Seutter, 2008
Decision Tree for Sorting
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
32
© Friedhelm Seutter, 2008
16
Decision Tree for Sorting
Sorting by comparison corresponds to a path in
the decision tree from the root to a leaf.
The length of the longest path corresponds to the
number of comparisons in the worst case.
Therefore the lower bound of the height of a
decision tree is a worst case lower bound for the
number of key comparisons.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
33
© Friedhelm Seutter, 2008
Lower Bounds for Sorting
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
34
© Friedhelm Seutter, 2008
17
Lower Bounds for Sorting
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
35
© Friedhelm Seutter, 2008
Heapsort
The algorithm uses a data structure called heap,
which is a binary tree and some special properties.
Heap-structure: Complete binary tree with some
of the rightmost leaves removed.
Partial tree order property: The key at any node is
greater (less) than or equal to the keys at each of
its children.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
36
© Friedhelm Seutter, 2008
18
Heap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
37
© Friedhelm Seutter, 2008
38
© Friedhelm Seutter, 2008
Heap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
19
Heap Implementation
• As a linked structure with each node containing
pointers (references) to the roots of its subtrees.
• As an array:
– The root is in A[1]
– Let i be the index of a node, except the root, then
the index of the parent is ⎣i/2⎦.
– Let i be the index of a node, except a leaf, then 2i is
the index of the left child and 2i+1 is the index of the
right child.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
39
© Friedhelm Seutter, 2008
Heap: Array-Implementation
1
2
3
4
5
6
7
8
9
10
16
14
13
8
7
11
12
2
4
1
heapsize[A]
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
40
11
12
13
14
length[A]
© Friedhelm Seutter, 2008
20
Heapsort Strategy
• The root contains the largest key in the heap.
• Build a sorted sequence in reverse order by
repeatedly removing the root element from the
heap.
• After each removing step the heap properties
have to be reestablished by bringing the next
largest key to the root.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
41
© Friedhelm Seutter, 2008
Fixing a Heap
• A node violates the partial order tree property,
i. e. its key is less than at least one of the keys
of its children.
• This node must be exchanged with the child,
which has the largest key, recursively.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
42
© Friedhelm Seutter, 2008
21
Fixing a Heap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
43
© Friedhelm Seutter, 2008
44
© Friedhelm Seutter, 2008
Fixing a Heap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
22
Fixing a Heap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
45
© Friedhelm Seutter, 2008
46
© Friedhelm Seutter, 2008
Fixing a Heap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
23
Complexity of FixHeap
Basic operation:
Key comparisons in lines 3 and 6
W(n) = 2h
= 2 ⎣lg n⎦ ∈ Θ( log n)
(h height of the heap, n number of nodes)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
47
© Friedhelm Seutter, 2008
Constructing a Heap
• Given an unordered array of keys. The corresponding binary tree has heap structure, but the
partial order tree property is violated.
• The leaves A[ ⎣n/2+1⎦] , . . . , A[n] are heaps.
• The subtrees with roots from A[ ⎣n/2⎦ ] down to
A[1] must establish their partial order tree
property.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
48
© Friedhelm Seutter, 2008
24
Constructing a Heap
A = (4, 1, 12, 2, 16, 11, 13, 14, 8, 7)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
49
© Friedhelm Seutter, 2008
Constructing a Heap
A = (16, 14, 13, 8, 7, 11, 12, 2, 4, 1)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
50
© Friedhelm Seutter, 2008
25
Constructing a Heap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
51
© Friedhelm Seutter, 2008
Complexity of ConstructHeap
Basic operation:
Call of FixHeap in line 3
W(n) ≤ n ⎣lg n⎦ ∈ Θ(n log n)
But this upper bound is poor!
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
52
© Friedhelm Seutter, 2008
26
Heights of subtrees for FixHeap
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
53
© Friedhelm Seutter, 2008
Complexity of ConstructHeap
Basic operation:
Call of FixHeap in line 3
W(n) ≤ ∑(0 ≤ k ≤ h) 2k(h – k)
≈ 2n - lg n + 2
∈ Θ(n)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
54
© Friedhelm Seutter, 2008
27
Heapsort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
55
© Friedhelm Seutter, 2008
56
© Friedhelm Seutter, 2008
Heapsort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
28
Heapsort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
57
© Friedhelm Seutter, 2008
58
© Friedhelm Seutter, 2008
Heapsort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
29
Heapsort
Given:
A = (4, 1, 12, 2, 16, 11, 13, 14, 8, 7)
ConstructHeap:
A = (16, 14, 13, 8, 7, 11, 12, 2, 4, 1)
HeapSort:
A = (1, 2, 4, 7, 8, 11, 12, 13, 14, 16)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
59
© Friedhelm Seutter, 2008
Complexity of Heapsort
Add up the complexities of ConstructHeap and
FixHeap in the loop:
W(n)
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
= Θ(n) + (n-1) Θ(lg n)
∈ Θ(n lg n)
60
© Friedhelm Seutter, 2008
30
Space Complexity
Heapsort sorts in-place.
The space needed for recursion is limited to a
depth of about lg n. But these procedures can be
recoded in iterative procedures.
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
61
© Friedhelm Seutter, 2008
Comparison of Sorting Algorithms
Algorithm
Worst case
Insertion Sort
Quicksort
Mergesort
Heapsort
Fachhochschule Braunschweig / Wolfenbüttel
Institut für Angewandte Informatik
n2/2
n2/2
n lg n
2n lg n
62
Average Extra space
Θ(n2)
Θ(n log n)
Θ(n log n)
Θ(n log n)
Θ(1)
Θ(log n)
Θ(n)
Θ(1)
© Friedhelm Seutter, 2008
31