生物計算期末作業

生物計算期末作業 暨南大學資訊工程系 2003/05/20

如何從C/C++讀入參數 • int main(int argc, char* argv[]) • argc: 參數的個數，argc>0 • argv[0]: 目前被執行的檔案名稱，以字串表示 • argv[1]: 第一個參數，以字串表示 • argv[2]: 第二個參數 • …依此類推

在老師給各位同學的檔案裡，包含了 (1) *.dat (2) *.cpp (3) *.lex (4) *.seq

punct.dat • punctuation in C/C++ programming language • keywords.dat • keywords in C/C++ programming language • corpus.dat • identifiers that you should assume they all have the same meaning in different files 我們以換行（\n）作為分隔符號同學可以假設在這三個檔案中出現的字串在不同的C/C++ source code 中意義是相同的。

*.cpp • 包含所有要給同學測試的C/C++ source code • 命名規則： • pnnn-v.cpp: • nnn: 一組數字 • v: 1是原來的版本，2是被老師亂改過後的 • nnn相同則表示檔案來源相同 • ACM INTERNATIONAL COLLEGIATE PROGRAMMING CONTEST的解答 • http://acm.uva.es/problemset/

*.lex • 被切割成一堆tokens的檔案，其中comments已經被移除了。 • 比如說，p101-2.lex是p101-2.cpp切割好的結果。

*.seq • *.lex轉換成數字的序列，數字與數字之間用/t （tab）區隔 • 所有在*.dat中出現過的字串編號都相同，以負數表示，其餘字串以正整數表示

評分標準 • 每一位同學至少要能完成*.seq之間的比對。 • 最好是能完成*.lex之間的比對。 • 如果能完成*.cpp之間的比對，那就更好。 • 基本分數： • 70、80、90 • 完整性：程式至少要能run • 正確性：降低false positive與false negative • 效率：不能太暴力去蠻做

C/C++的comments • 移除所有的 comments： • /* ……… */ • // ……… • 要注意C++的comment 是以EOL結尾 // this is a comment • C的comment可以換行，但是不可以nested • /* this is a valid comment //*/

/* this is another * valid comment /* */ • // this is a strange comment, /* #include <stdio.h> /* but still valid */

處理字串會遇到的問題 • Escape code • "say \"hello\"" • '\'‘ • 與comment造成的問題 • "3 /* ...... */ 4" • "3 //4" • 空白造成的問題 • "const int a=10"

換行與空白 • int a = 1; 與 int a=1;是相同的。 • if (a==1) return 1; • void f() { retrun; }

用有系統的方式去切tokens • 將所有可能的情況先規劃清楚

o/w: otherwise EOF: end-of-file

ctype.h • isalnum：數字與字母 • iscntrl：控制字元 • ispunct：標點符號 • isalpha：字元 • isdigit：十進位數字 • isspace：空白（包含/f /r /n /v /t）

建立symbol table unsigned int sum=0; while (*symbol != '\0') sum+=*symbol++; return sum % TABLE_SIZE;

Separate chaining

Node* SymbolTable::search(char *symbol) { int posn=hash(symbol); Node *temp; for (temp=table[posn]; temp!=NULL; temp=temp->next) { if (strcmp(symbol, temp->symbol)==0) return temp; } return NULL; }

Node* SymbolTable::insert(char *symbol) { Node *temp=search(symbol); if (temp!=NULL) // symbol is already in the table return temp; else { int pos=hash(symbol); temp=table[pos]; table[pos]=new Node; // create a new node strcpy(table[pos]->symbol, symbol); table[pos]->sn=++counter; // unique id table[pos]->next=temp; } return temp; }

生物計算期末作業

生物計算期末作業

Presentation Transcript