Data Structure 資料結構

Data Structure資料結構 副教授翁志祁義0321(星期三: 02-04)

課程簡介 本課程在使同學瞭解各種資料結構，如Stacks, Queues, Linked Lists, Trees, Hash, Graph等。並使同學熟悉對這些資料結構的搜尋、讀寫、插入、刪除的演算法(algorithm)。更重要的是使學生在利用這些資料結構及演算法解決問題時，同時能夠評估記憶體使用空間和執行時間的複雜度(Complexity)。最終目標是要讓同學能根據問題，選擇適當的資料結構及演算法，正確地、且有效率地去解決問題。

參考書籍與教學網站 • Data Structures and Program Design in C,Kruse, Tondo, Leung, 全華圖書,2597-1300x338 • 資料結構使用C語言，吳勁樺著. 金禾資訊(02)2789-0561 • 本課程教材可至下列網址下載: http://faculty.pccu.edu.tw/~cweng/ • 因為教材內容正在陸續的更新中，所以只需下載最近要用到的章節。

成績比重和輔導時間 期中考試 30% 期末考試 40% 平時成績 30%(含出席、作業、測驗) 星期一12:00~13:00 星期二14:00~16:00 星期三12:00~13:00 星期四15:00~17:00

第一章資料結構導論本章重點 • 認識資料與資訊 • 演算法(Algorithm) • 程式(program)與程式設計(programming) • 演算法的效能分析

1. 認識資料與資訊 資料(Data)，指的就是一種未經處理的原始文字(Word)、數字(Number)、符號(Symbol)或圖形(Graph)等，它所表達出來的只是一種沒有評估價值的基本原素或項目。例如姓名或我們常看到的課表、通訊錄等等都可泛稱是一種「資料」。

資料處理 「資料處理」就是用人力或機器設備，對資料進行有系統的整理如記錄、排序、合併、整合、計算、統計等，以使原始的「資料」符合需求，而成為有用的「資訊」(Information)

資料結構 • 資料結構是在資料處理過程中，一種分析、組織資料的方法(algorithm)與邏輯(logic)。它考慮到了資料間的特性與相互關係(relationship)。 • 程式設計師必須選擇一種資料結構來進行資料的新增、修改、刪除、儲存等動作，如果在選擇資料結構時作了錯誤的決定，那程式執行起來的速度將可能變得非常沒有效率。

資料和資訊的角色是否一成不變 • 不一定。同一份文件可能在某種況下為資料，而在另一種狀況下則為資訊。 • 例如美伊戰爭的戰役死傷人數報告，對你我這些平民百姓而言，當然只是一份不痛不癢的「資料」(Data)；不過對於英美聯軍指揮官而言，這份報份可就是彌足珍貴的「資訊」(Information)。

2.演算法(Algorithm) • 資料結構加上演算法等於可執行程式。 • 「演算法」在韋氏辭典定義為：「在有限步驟內解決數學問題的程序。」 • 在計算機領域可以把演算法定義成：「為了解決某一個工作或問題，所需要有限數目的機械性或重覆性指令與計算步驟。」

演算法五個條件 • 輸入(Input)：0個或多個輸入資料，這些輸入必需有清楚的描述或定義。 • 輸出(Output)：至少會有一個輸出結果，不可以沒有輸出結果。 • 明確性(Definiteness)：每一個指令或步驟必需是簡潔明確而不含糊的。 • 有限性(Finiteness)：在有限的步驟後一定會結束，不會產生無窮迴路。 • 有效性(Effectiveness)：步驟清楚且可行，能讓使用者用紙筆計算而求出答案。

演算法常用的表現方法 • 一般文字：中文、英文、數字等。 • 虛擬語言：接近高階程式語言的寫法，經常使用的有SPARKS、PASCA-LIKE等語言。 • 表格或圖形：如陣列、樹狀圖、矩陣圖等。 • 流程圖：資料流程圖)及控制流程圖可以算是一種通用的表示法，也有固定的圖型符號。 • 程序語言：目前資料結構的演算法經常是以可讀性高的高階語言來表示，例如C語言、C++語言、Java語言、Visual Basic語言等，在本書中將以C語言來表達演算法

演算法、程式、與流程圖的異同 • 演算法和程序是有所區別，因為程式不一定要滿足有限性的要求，如作業系統或機器上的運作程式；除非當機，否則永遠在等待迴路(waiting loop)或記錄機器運作狀況，這也違反了演算法五大原則之一的「有限性」。 • 只要是演算法都能夠利用程式流程圖表現，但因為程式流程圖可包含無窮迴路，所以無法利用演算法來表達。

3. 程式(program)與程式設計(programming) 程式產生的五個階段: • 需求認識：了解程式所要解決的問題是什麼，有那些輸入及輸出等。 • 設計規劃：根據需求，選擇適合的資料結構，並以任何的表示方式來寫一個演算法以解決問題。 • 分析討論：思考其他可能的演算法及資料結構，最後再做出最適當的選擇。 • 編寫程式：把分析的結論，寫成初步的程式碼。 • 測試檢驗：最後必需確認程式的輸出是否符合需求，這個步驟得細步的執行程式並進行許多的相關測試。

結構化程式設計 它又分為三種基本結構(Basic Structure)：

結構化程式設計的優缺點 • 優點：「由上而下法」讓程式可讀性更高，對於日後修改維護幫助很大。再加上「模組化」的設計能讓設計者分工合作，降低開發成本。 • 缺點：由於維持可讀性高的要求，必須有較多的指令，容易佔用記憶體空間，與非結構程式設計相比，執行速度也會較慢。

資料儲存層次的分類 • 基本資料型態(atomic data type)或稱為實質資料型態(physical data type) • 結構型資料型態(structure data type)或稱為虛擬資料型態(virtual data type) • 抽象資料型態(Abstract Data Type：ADT)

基本資料型態 • 一個基本的資料實體，例如一般程式語言中的整數、實數、字元等等。基本上，每種語言都擁有略微不同的基本資料型態。像C語言的基本資料型態為整數(int)、字元(char)、單精度浮點數(float)與倍精度浮點數(double)。

結構型資料型態 • 比實質資料型態更高一層，是指一個資料實體包含其他的資料型態，例如字串(string)、集合(set)、陣列(array)。

抽象資料型態 • 比結構型資料型態更高一層，ADT是指定義一些結構型資料型態所具備的數學運算關係。也就是說，使用者毋需考慮到ADT的製作細節，只要知道如何使用即可。例如堆疊(stack)或佇列(queue)就是一種很典型的ADT模式。

4. 演算法的效能分析 Given a problem, there may be several possible implementations. Efficiency is the most important consideration including time and space. • Complexity Theory– to estimate the time and space needed for a program. It’s machine independent. • Space Complexity of a program – the amount of memory space needed to complete a program. • Time Complexity of a program – the amount of computation (computer) time needed to complete a program.

Space Complexity S(p)S(p)=Sc+Sp(I) • Sc (Fix space requirement) including instruction space, constants, simple variables, and fix-size structures. The Sc is independent of the number and size of inputs and outputs. e.g. int i, sum=0, A[100]; • Sp(I) (Variable space requirement) depends on a particular instance I of a problem. Instance I may be a function of number, size, or values of inputs and outputs. e.g. int *n; //score list of a course

Time Complexity T(p)T(p)=Tc+Tp(I) • Tc (Fix time requirement) compile time, Tc is independent of any instance of the problem. • Tp(I) (Variable time requirement) execution time, depends on a particular instance I of a problem. e.g. matrix multiplication C[ ] [ ] =A[ ] [ ] *B[ ] [ ] (0,0) 0,1 0,2 0,3 0,n 0,0 1,0 2,0 n,0 (n*) + (n+) => 2n 2n* n*n + an2+bn+c => 2n3+an2+bn+c

Asymptotic Notations- for measuring space and time complexities • O(Big-oh) • Ω(omega) • θ(theta)

Big-oh的介紹 • O(g(n))可視為某演算法在電腦中所需執行時間不會超過某一常數倍的g(n)，也就是說當某演算法的空間或時間複雜度(space or time complexity)為O(g(n))(讀成big-oh of g(n)或order is g(n))。 • Definition: The function f(n) is said to be of order at most g(n) if there are positive constants c and n0 such that f(n)<=cg(n) for all n, n>=n0. • Therefore, “Big oh” is the smallest upper bound of f(n).

常見的Big-oh有下列幾種： • O(1)：稱為常數 (constant) • O(log2n)：稱為 (logarithmic) • O(log22n)：稱為 (log squared) • O(n)：稱為線性 (linear) • O(nlog2n)：稱為n log n • O(n2)：稱為平方 (quadratic) • O(n3)：稱為立方 (cubic) • O(2n)：稱為指數 (exponential)

Ω(omega)的介紹 • Ω也是一種複雜度的漸近表示法，如果說Big-oh是執行時間量度的最壞情況，那Ω就是執行時間量度的最好狀況。以下是Ω的定義： • Definition: The function f(n) is said to be of order at least g(n) if there are positive constants c and n0 such that f(n)>=cg(n) for all n, n>=n0. • Therefore, “Big oh” is the largest lower bound of f(n).

θ(theta)的介紹 • 是一種比Big-O與Ω更精確複雜度的漸近表示法。定義如下： • Definition: The function f(n) is θ(g(n)) iff there exists positive constants c1, c2 and n0 , such that c1 g(n) <=f(n)<= c2 g(n) for all n, n>=n0. • Therefore, “θ” is both the smallest upper bound and the greatest lower bound of f(n).

Examples for asymptotic notations 1. 3n + 2 = Ο(n), ∵3n +2 ≦ 4n, for all n ≧ 2, c = 4. 2. 3n + 3 = Ο(n), ∵3n +3 ≦ 4n, for all n ≧ 3, c = 4. 3. 3n + 3 = Ο(n2), ∵3n +2 ≦ 3n2, for all n ≧ 2, c = 3. (2) is correct. (3) is wrong. “Big oh“ should be a smaller function of n. 4. 10n2 + 4n +2 = Ο(n2), ∵10n2 + 4n +2≦ 11n2, for all n ≧ 5, c = 11. 5. 6．2n + n2 = Ο(2n), ∵6．2n + nn≦ 7．2n, for all n ≧ 4, c = 7. 6. 3n +3 = Ω(n), ∵3n +3 ≧ 3n, for all n ≧ 1, c = 3. 7. 3n +3 = Ω(1), ∵3n +3 ≧ 3, for all n ≧ 1, c = 3. (6) is correct. (7) is wrong. “Omega” should be a larger function of n. 8. 3n +2 = Θ(n) ∵3n≦3n +2 ≦ 4n, for all n ≧ 2, c1 = 3, c2= 4 , and n0= 2.

Example for S(p): addup values of n elements in an array called list. main( ) float addup(float list[ ], int n) { { float list[n], temp = 0.0; float temp = 0.0; int i; int i; for( i=0; i < n; i++) for( i=0; i < n; i++) temp += list[i]; temp += list[i]; return temp; return temp; } } Smain(n) = c + n = O(n) Saddup(n) = c = O(1) Other languages may need to pass the whole array, but addup passes only addresses of the 1‘st element and the size of array.

Example for T(p): use step count instead of execution time. 1. on-line step count float sum(float list[ ], int n) /* calculate the sum of an array */ { float tmp=0.0; count ++; /* tmp assignment */ int i; for ( i=0; i <n; i++ ){ count++; /* for loop */ tmp+=list[i]; count++; * adding up */ } count++; /* last time to check for loop and fails */ return tmp; count++; /* for return */ } => step count = 2*n +3 => Tsum(n) = O(n)

2. Tabular method void add( int a[ ][ ], int b[ ][ ], int c[ ][ ], int row, int col) { Step per exec. freq. total int i, j; 0 0 0 for ( i=0; i<row; I++) 1 row +1 row+1 for ( j=0; j<col; j++) 1 row*(col+1) row*col + row c[i][j] = a[i][j]+b[i][j]; 1 row*col row*col } In total, we have step count = 2row*col +2row +1 . => Tadd(row, col) = O(row*col) Thus, if row >>col, one may want to exchange i and j to reduce the step count and execution time.

3. Execution time Measurement #include <time.h> 1. Clock 2. Time Before execution, use start = clock(); start = time(NULL); After execution, use stop = clock(); stop = time(NULL); Type return clock_t time_t result in sec. (stop - start)/ CLK_TCK;difftime(stop, start); When measuring the execution of a program, we have CLK_TCK= 18.2, beginclk = 66, stopclk = 193, diffclk = 127, time = 6.98 begintime=825316157, stoptime = 825316164, difftime = 7 Note: 1.The time is begin at around 1970. 2. Use %ld to print out a long integer

trade-off between program space and execution time Example: Two ways to interchange two elements, using a functions or a macro 1. using a function 2. using a macro void swap (int *x, int *y) #define swap(a,b,t) ((t)=(a), (a)=(b), (b)=(t)) { … int temp; swap(p, q, temp); temp = *x; … *x = *y; Disadv: duplicated macros may takes lots of space. *y = temp; Adv: no calls & returns, execution time is reduced. } Adv: single copy of function => save space. Disadv: calls & returns => waste time. Therefore, a careful evaluation of the trade-offs among various aspects before the implementation of a solution is important.

Data Structure 資料結構