Prepared by: Stephen Edmonds December 2004

Prepared by:Stephen EdmondsDecember 2004 Developing the Monash Research Directory

What is it? • A searchable web based directory of research publications and researchers at Monash University. • Developed using perl and open source modules.

Search form

Author search results

Publication search results

Author details

Publication details

Why? • Each year the research activities at Monash University produce a significant amount of output in the form of: • Journal articles • Books • Conference papers • and more… • Unfortunately only a limited number of people are aware of the full range of output.

Why? • A publicly available directory could potentially raise the profile of research activities at the University. • Additionally the Monash Research Directory would be the first of a series of research oriented tools for: • Researchers at Monash • People interested in research

Initial requirements • Publicly available through the Monash website. • Restricted access interface through the my.monash staff and student portal. • Utilise existing information from systems around the University. • Present the most up to date information possible. • Only display research output generated by current staff members of the University.

Research Master • A commercial product used to track research activities around the University. • Information regarding the research activities is entered by representatives from each faculty within the University. • Within Research Master one module contains details of the research output.

Research Master • … and another contains details of the authors of the research output. • 30,000 publications covering 8 years. • 25,000 distinct authors. • The information is stored in an Oracle database for use with a client application.

Monash Directory Service • Contains an entry for each current student or member of staff of the University. • Automatically updated from a number of sources such as the payroll system or the internal telephone directory. • Staff members have the ability to enter additional information into their entry such as: • Research interests • Professional associations • Biography • Photograph (as a JPEG) • A standard LDAP service.

Public Monash website • Farm of linux boxes running Apache web servers • Perl CGI is one of many technologies available.

my.monash portal • A integrated view of the University for both staff members and students. • Uses HTML::Mason, a dynamic web site authoring system written in perl.

The problem so far… • Two backend systems: • Research Master (Oracle database) • Monash Directory Service (LDAP service) • Two frontend environments: • my.monash portal (perl through HTML::Mason) • Public website (perl CGI)

The problem so far… • Some kind of glue is required between these four systems:

And the answer was… • A module or set of modules. • Written in perl.

But how? • The preliminary analysis showed that an author: • Has a variety of details. • Relates to one or more publications. • While a publication: • Has a variety of details. • Relates to one or more authors.

But how? • This data can be represented by a simple hierarchy:

This complete encapsulation of business logic within classes means that the usage code is simply: But how? my $research = Monash::ResearchDirectory->new( ... ); if ($research->search('name' => ‘john smith’)) { foreach my $author ($research->authors()) { print $author->name(), "\n"; foreach my $publication ($author->publications()) { print $publication->title(), "\n"; } } }

Publication data issues • The data contained within the Monash Directory Service is clearly defined. • However the data stored in Research Master for a publication can vary from category to category • … and even from year to year.

Publication data issues

Publication data issues • A solution was to retrieve the field labels from the database and then generalise the access methods on the publication class: foreach my $field ($publication->fields()) { my ($label, $value) = $publication->field($field); if ($value) { print $name, "\t", $value, "\n"; } }

Internals • As already stated the act of encapsulating as much business logic as possible in the classes means that the CGI script and HTML::Mason component aspects become trivial. • At first it appeared to be the opposite case for the internals of the classes • … however it fortunately did not become as complicated as feared.

Publication title search • Walkthrough of some of the interesting part of the publication title search process when the following call is made: $research->search('name' => ‘john smith’);

Querying Research Master • Simplified by being able to query the backend Oracle database directly. • A compromise between performance and maintenance resulted in a single SQL query. • Unfortunately information is now duplicated in the results …

Querying Research Master • … which can be selectively ignored during processing: while (my $row = $sth->fetchrow_hashref('NAME_lc')) { my $author = $self->_find_or_create_author($row); my $publication = $self->_find_or_create_publication($row); $author->add_publication($publication); $publication->add_author($author); }

Querying the Monash Directory Service • A filter is constructed from the results obtained by querying Research Master: • Which is then used to query the Monash Directory Service using Net::LDAP my @numbers = map { $_->employeenumber() || () } $self->authors(); my $ldap_filter = q{(|} . join q{}, map { qq{(employeenumber=$_)} } @numbers . q{)} ;

Correlating results • Results from the Monash Directory Service are then attached to the appropriate author object: foreach my $author ($self->authors()) { my $entry = $self->_get_ldap_entry($author->employeenumber()); $author->set_ldap_entry($entry) if $entry; }

Correlating results • The publications which do not have at least one current staff member of the University as an author are now removed from the results: foreach my $publication ($self->publications()) { unless (grep { $_->is_monash() } $publication->authors()) { $self->destroy_publication($publication); } }

Correlating results • Finally all the authors without any publications are removed from the results: foreach my $author ($self->authors()) { unless ($author->publications()) { $self->remove_author($author); } }

At this point the object represents sufficient objects to enable the search results to be displayed: Results $research->search('name' => ‘john smith’); foreach my $author ($research->authors()) { print $author->name(), "\n"; foreach my $publication ($author->publications()) { print $publication->title(), "\n"; } }

Limitations • At no point do the author or publication objects in existence represent the entire Research Directory. • Which means that a fresh search is required for the various pages in the interface. • Not such of an issue due to the stateless nature of the web.

Complicated scientific formula in titles • Plain text: • 2] • Rich text formatted: • {\rtf1\ansi\deff0{\fonttbl{\f0\fswiss Arial;}{\f1\fnil\fcharset2 Symbol;}} \viewkind4\uc1\pard\lang1033\f0\fs24 2] \fs18 Unprecedented \f1\fs24 m-h\up5\fs14 2:\up0\fs24 h\up5\fs14 2\up0\f0\fs18 - pyrazolate coordination in [\{Yb(\f1\fs24 h\up5\f0\fs14 2\up0\fs18 - \f1\fs24\'a6\f0\fs18 Bu\dn5\fs14 2\up0\fs18 pz)(\f1\fs24 m\f0\fs18 -\f1\fs24 h\up5\f0\fs14 2\up0\fs18 :\f1\fs24 h\up5\f0\fs14 2\up0\fs18 -\f1\fs24\'a6\f0\fs18 Bu\dn5\fs14 2\up0\fs18 pz)(thf)\}\dn5\fs14 2\up0\fs18 ] \par } • Correctly rendered: • 2] Unprecedented μ−η2:η2- pyrazolate coordination in [{Yb(η2- ƒBu2pz)(μ-η2:η2-ƒBu2pz)(thf)}2]

Complicated scientific formula in titles • Unfortunately this cannot be reliably rendered using HTML. • The perl module RTF::HTML::Converter is able to convert the RTF above to: • 2] Unprecedented m-h2:h2- pyrazolate coordination in [{Yb(h2 - ¦Bu2pz)(m-h2:h2 -¦Bu2pz)(thf)}2] • While not perfect it is a significant improvement and deemed satisfactory.

Conclusion • A practical example of how perl can be used to draw information from two sources, one a commercial application, and present the information in two similar but disparate environments. • All by using two widely used modules: • DBI (and DBD::Oracle) • Net::LDAP • And a third publicly available module: • RTF::HTML::Converter

Thank you • Any questions? • The publicly available version of the Monash Research Directory is available at: • http://monash.edu/research/directory/

Prepared by: Stephen Edmonds December 2004

Prepared by: Stephen Edmonds December 2004

Presentation Transcript

Prepared by Med3. February 2004.

Graphs by Michael Okito December 2004

Prepared December 2010

By Stephen

By Stephen

December 2004

Prepared By: Neville Hiscox May 2004

Prepared by Christine Walker December 2011

December 5, 2004

By Ms. Kneer’s Class December 2004

December 5, 2004

Jack Edmonds

by Aylin Koca December 7, 2004

By : stephen

Prepared for: Noah Sacks of M.C. Wheel Prepared by: Caitlin Barker Kate Edmonds Maribeth Foley

December 2, 2004

Prepared by Opinion Dynamics Corporation May 2004

Prepared December 2012

Prepared by Opinion Dynamics Corporation May 2004

Prepared by Stephen M. Thebaut, Ph.D. University of Florida