190 likes | 340 Vues
Simplified Molecular Graphics Scripting Language (SMGSL). A simplified scripting language which can easily translate to the three established molecular graphics scripting languages, Pymol , Rasmol , and Jmol. From Humble Beginnings, The First Lexer.
E N D
Simplified Molecular Graphics Scripting Language (SMGSL) A simplified scripting language which can easily translate to the three established molecular graphics scripting languages, Pymol, Rasmol, and Jmol
From Humble Beginnings, The First Lexer. Our very first Lexer was quite simple. Its only function was to tokenize any input and simply print out the tokens to the screen
From Humble Beginnings, The First Lexer. %{ #include <stdio.h> %} NUM [0-9] VAR [a-zA-Z] WS [ \t] %% set printf( "(STMT set) " ); {NUM}+ printf( "(NUM %s) ", yytext ); {VAR}+ printf( "(VAR %s) ", yytext ); /* Operators */ - printf( "(OP minus) " ); \+ printf( "(OP plus) " ); = printf( "(OP equal) " ); ; printf( "(END stmt) " ); \n printf( "\n\n" ); {WS}+ /* Ignore whitespace */ <<EOF>> printf( "End of parse\n." ); yyterminate(); %%
The Language Defined…Cont. SMGSL was first created and tested with a Pymol backend. We limited our command set to a simple load function, and rotate function. We then simply output a Pymol script file with these commands translated. We later did the same appraoch with Rasmol and Jmol.
The First Grammar File... • %token OP_MULT_TOKEN • %token END_TOKEN • %token ERROR_TOKEN • %token QUOTE_TOKEN • %token OPEN_BRACE_TOKEN • %token CLOSE_BRACE_TOKEN • %token OPEN_PAR_TOKEN • %token CLOSE_PAR_TOKEN • %token PERCENT_TOKEN • %token DEPTH_TOKEN • %token SLAB_TOKEN • %token NEWLINE_TOKEN • %token X_TOKEN • %token Y_TOKEN • %token Z_TOKEN • %% • /*TODO rewrite simpler */ • input: /*empty */ • | input line /*{printf("-----------------------------\n");} */ • ; • line: NEWLINE_TOKEN /*{printf("{\\n}");}*/ • | statement NEWLINE_TOKEN /*{printf("{statement \\n}");}*/ • ; • statement: VAR_TOKEN { • printf("<<<<VAR %s>>>>",$1); • } • | • ROTATE_TOKEN X_TOKEN NUM_TOKEN /*todo: replace VAR with working NUM*/ • { • appendRotateX((char*)$3,"script.py"); • } • | • DEPTH_TOKEN DEPTH_TOKEN • { • printf("<<<depth %s>>>>>",$2); • } • | • LOAD_TOKEN VAR_TOKEN • { • loadMol((char*)$2,"script.py"); • } • | • NUM_TOKEN • { • printf("<<num: %s>>>\n",$1); • } • ; • /* axis does not work at the moment • axis: X_TOKEN • | • Y_TOKEN • | • Z_TOKEN • ; • */ • %% %{ #include <stdio.h> #include <string.h> FILE *fp; void yyerror(const char *str) { fprintf(stderr, "error: %s\n", str); } void appendRotateX(char *degrees,char *output) { fp=fopen(output,"a"); fprintf(fp,"pymol.cmd.rotate(\"x\",%s)\n",degrees); fclose(fp); } void loadMol(char *fName,char *output) { fp=fopen(output,"a"); fprintf(fp,"pymol.cmd.load(\"%s\")\n",fName); fclose(fp); } int main() { fp=fopen("script.py", "w"); fprintf(fp,"import pymol,sys\nfrom pymol import cmd\n"); fclose(fp); FILE *inputf; inputf = fopen( "input.mgsl","r"); yyrestart (inputf); yyparse(); fclose(inputf); fclose(fp); system("pymol script.py"); return 0; } %} %token LOAD_TOKEN %token WAIT_TOKEN %token CLEAR_TOKEN %token ROTATE_TOKEN %token TRANSLATE_TOKEN %token COLOR_TOKEN %token SELECT_TOKEN %token SET_TOKEN %token DESELECT_TOKEN %token NUM_TOKEN %token VAR_TOKEN %token COMMENT_TOKEN %token OP_TOKEN %token OP_MINUS_TOKEN %token OP_PLUS_TOKEN %token OP_DIV_TOKEN %token OP_EQUAL_TOKEN
The Language Defined…Cont. Comments: Any text from the # to the end of the line is ignored. The comment ends when a newline is reached. Denoted in the lexer as: COMMENT \#[^\n]* Variables: Variables may begin with at least one alphabetic character, and may be followed by any number of alphanumeric characters, the underscore, or the decimal point. Denoted in the lexer as: VAR [a-zA-Z]+[.a-zA-Z0-9_]* Numbers: Numbers are simply read as characters, and are not defined as digits, since no mathematical operations are needed.
The Language Defined (Keywords)…Cont. rotate {axis} {value} This command takes a number of degrees as an argument and performs the rotate function on the given axis for the specified amount of degrees. eg: rotate x 90 select {expression} This command defines the currently selected region of the molecule. The parameter for the select command is the atomic expression. For the details of the expression please refer to the expression sections of the manual. eg: select atomno<36 slab <value> This command command enables, disables or positions the z-clipping plane of the molecule. spacefill {<value>} This command is used to render all the selected atoms as solid spheres and the parameter is the optional value that controls the radius of the sphere in Rasmol units. By default the radius is van der Waals radius. trace {<value>} This command displays a smooth spline between consecutive alpha carbon positions of the selected molecule. This command also have an optional paramete value. translate <axis> <value> This command moves the center of the molecule in the viewport. The axis parameter defines along which axis the center should be moved and the value determines by how much. write {<format>} <filename> This command current image of the molecule in the standard format like bmp, gif, ppm, pict etc. zap Deletes the contents of the current database and resets parameter variables to their initial default state. zoom <value>Change the magnification of the currently displayed image. Background{ color } This command changes the background to the desired color. eg: background red cartoon {number} This command represents the currently selected residues as a deep ribbon with width specified by the command's argument. The depth of the cartoon may be specified by using the set cartoon <value> command.eg: cartoon 5 color {object} {color} This command colors atom or selected object with the given color. The color names used are the predefined RasMol colors.eg: color s1 red colorRGB {object} (RGBtriplet)This command colors atom or selected object with the given color specified as an RGP triplet.eg: colorRGB s1 (100,50,100) define <identifier> <expression> This command allows the user to associate an arbitrary set of atoms with a unique identifier. This allows the definition of user-defined sets. eg: define s1 resno < 23 load {filename} This command loads the specified molecule file in the argument.eg: load 3cro.pdb restrict {expression} This command defines the selection and disables the representation of those parts of the molecule no longer selected. For the details of the expression please refer to the expression sections of the manual. eg: restrict atomno <=36 ribbons {<value>} This command displays the currently loaded protein or nucleic acid as a smooth solid "ribbon" surface, the parameter if the command is optional that controls the width of the ribbons.
The Language Defined (Atomic Expressions)…Cont. Atomic Expressions are used to manipulate the set of atoms from the molecule. Atomica expressions are parameter for different commands chain: refers to the chain Usage : select chain P resino : refer to the residue no Usage : resino 120 resi : refers to the residue name ( uses the three code of residue name) Usage : resi cys symbol : symbol of the periodic table of the chemical Usage : symbol c atomno : atomic no of the element in the periodic table of chemicals Usage : atomno 14
The Lexer…Revisited. With our language definition becoming solidified it was time to revisit our simple Lexer and modify it to accept the new Load and Rotate commands. Having used Bison on our grammar file with the –d option, we could now include the grammar.tab.h file into our lexer. We also redirected the tokens to the parser rather than to the screen.
The Lexer…Revisited. %{ #include <stdio.h> #include <string.h> #include "grammar.tab.h" %} WS [ \t] COMMENT \#[^\n]* NUM [0-9]+ FLOAT {NUM}+"."+{NUM}* STRING [a-zA-z] VAR [a-zA-Z0-9]+[-.a-zA-Z0-9_]* %% set return SET_TOKEN; load return LOAD_TOKEN; wait return WAIT_TOKEN; clear return CLEAR_TOKEN; rotate return ROTATE_TOKEN; translate return TRANSLATE_TOKEN; color return COLOR_TOKEN; select return SELECT_TOKEN; deselect return DESELECT_TOKEN; slab return SLAB_TOKEN; depth return DEPTH_TOKEN; x return X_TOKEN; y return Y_TOKEN; z return Z_TOKEN; bAs return BAS_TOKEN; bgColor return BG_TOKEN; hide return HIDE_TOKEN; show return SHOW_TOKEN; {NUM}+ yylval=(size_t)strdup(yytext);printf(" (NUM %s)",yytext);return NUM_TOKEN; {FLOAT}+ yylval=(size_t)strdup(yytext);printf(" (FLOAT %s)",yytext);return FLOAT_TOKEN; {VAR}+ yylval=(size_t)strdup(yytext); printf("(VAR %s)",yytext); return VAR_TOKEN; \" return QUOTE_TOKEN; \+ return OP_PLUS_TOKEN; \* return OP_MULT_TOKEN; \- return OP_MINUS_TOKEN; \/ return OP_DIV_TOKEN; = return OP_EQUAL_TOKEN; \{ return OPEN_BRACE_TOKEN; \} return CLOSE_BRACE_TOKEN; \( return OPEN_PAR_TOKEN; \) return CLOSE_PAR_TOKEN; {COMMENT} yylval=(size_t)strdup(yytext); return COMMENT_TOKEN; ; return END_TOKEN; \n return NEWLINE_TOKEN; {WS}+ /* Ignore whitespace */ <<EOF>> printf( "[End of parse.]\n" ); yyterminate(); %%
At first our lexer would only handle commands which were either all capitalized or all lowercase, e.g. load or LOAD. We had to change this to allow any combination of capitals and lowercase strings. An example of how we resolved this is in the BGCOLOR command. BGCOLOR [Bb][Gg][Cc][Oo][Ll][Oo][Rr] and then we return it as {BGCOLOR} return BG_TOKEN; Another issue was adjusting for spaces in between commands. Any number of spaces could be entered between commands and the parameters. The last issue we encountered was dealing with the extension in the file. Our regular expression needed to be changed to allow for a specific recognized extension of the molecule file. We decided to have our extension fixed to either PDB, CIF, ALC. The Lexical issues….
The Lexical issues…. A further issue we encountered was including expressions that needed to be enumerated as in the previous BGCOLOR example. The enumerated values for RED,BLUE,GREEN, etc… needed to be passed to the parser. Our enumerated colors were based off the RasMol color schemes. Color = {Black,Blue,BlueTint….Yellow,YellowTint} Currently this issue is still unresolved.
The largest issue we faced with the parser was the handling of the all important SELECT command with its atomic expressions. Atomic expressions can either be single atom in a chain, group of atoms from a chain, and entire chain, or entire molecule. Do to the flexibility of the SELECT command, we had to try to break the command down to its simplest form. E.g. Next Slide Parsing Issues….
Parsing Issues…. Example: Select expr1 AND expr2 OR expr3. Broken Down: Select t2 // t1 is where the atomic expression is stored in the t1 = expr1 AND expr2 t2 = t1 OR expr3
SMGSL Backend…. The Parser uses the actual RasMol, Jmol, and Pymol enviroments as its backend. The Parser as it stands now translates our language into standard script files for the three enviroments and loads the generated script files in those enviroments.
RasMol was chosen the be the intermidiate language due to its similarity with Jmol. The only true translation that then would be needed would be to Pymol. For the Pymol translation the conscript engine was needed to translate the intermeditate RasMol to Pymol. Intermidiate Language
Conclusion… Although the project is not completed, it has come a long way from its very humble beginnings. The team effort in developing this compiler has lead to a deeper understanding of various issues that will most certainly come up in our engineering careers.