1 / 50

Overview of Component Search System SPARS-J

Overview of Component Search System SPARS-J. Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue** *Japan Science and Technology Agency **Osaka University. Outline. Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part

Télécharger la présentation

Overview of Component Search System SPARS-J

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Overview of Component Search System SPARS-J Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue** *Japan Science and Technology Agency **Osaka University

  2. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  3. Motivation • Reuse of Software Components • is a technique of developing new software components by using the components developed in the past. • Example of reusable components: source code, document ….. • improves productivity and quality, and cuts down development cost as a result. • However, reuse of components is not utilized effectively. • A developer doesn’t know existence of desirable components. • Although there are a lot of components, these components are not organized. • In order to take advantage of reuse, it is required to manage components and search suitable component easily

  4. Research aim • We have built the system which have functions as follows • Collects software components eagerly without preserving their inherent structures • Manages the component information automatically • Provides component be suitable for User’s request • Targets • Intranet • closed software development inside a company • Internet • Large open source software development web site • SourceForge, Jakarta Project. etc.

  5. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  6. SPARS-J(Software Product Archive,analysis and Retrieval System for Java) • Java Software Product Archiving, analyzing and Retrieving System • Many components are analyzed automatically. • A search engine is built based on the analysis information. • Component: a source code of class or interface • Features • Keyword search • Two ranking methods • Frequency in use of a word • Use relation • Analyzed information • Components using/used by a component • Package hierarchy

  7. User Structure of SPARS-J Library(Java source files) Result File Query User interface part Component analysis part ・deliver query to component retrieval part ・show search results ・extract components from a file・store analyzed information to DB ・clustering and rank components using DB Hit components Query Component retrieval part Analyzed information Component information ・search components in correspondence with query from DB ・rank components based on frequency in use of a keyword ・aggregate two rankings Database ・store analyzed information and component

  8. Ranking search results • Ranking method • Component suited to a user request • Ranking based on frequency in use of a word • Component used mostly • Ranking based on component use relation • We make it high ranking that the component both 1 and 2 are high • Search results are shown to aggregate two ranks Keyword Rank (KR) Component Rank (CR)

  9. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  10. Component analysis part • Extract component and its information from a Java source file • The process • Extract a component • Index the component • Extract use relations • Clustering similar components • Rank components based on use relations (CR method)

  11. Extract and index a component • Extracting component • Find class or interface block in a java source file • Location information in the file (start line number, end line number) • Indexing • Extract index key from the component • Index key: a word and the kind of it • No reserved words are extracted • Count frequency in use of the word public final class Sort { /*quicksort*/ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); } } Index key frequency

  12. Extract use relations • Extract use relations among components using semantic analysis • Make component graph from use relations • Node: component • Edge: use relation Data public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : } } Inheritance Field access Sort Test Method call The kind of use relation Component graph

  13. C G C G B F BF A D E E AD Similar component • Similar component is copied component or minor modified component • We merge similar components into single component • Merged component have use relations that all component before merging have C G B F A D E Component graph Clustered component graph

  14. Clustering components • We measure characteristics metrics to merge components • The difference ratio of each component metrics • Metrics • complexity • The number of methods, cyclomatic, etc. • represent a structural characteristic • Token-composition • The number of appearances of each token • represent a surface characteristic

  15. Ranking based on use relation • Component Rank (CR) • Reusable component have many use relation • The example of use is much • General purpose component • Sophisticated component • We measure use relation quantitatively, and rank components • The component used by many components is important • The component used by important component is also important Katsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank: Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.

  16. 0.34 0.33 0.17 0.17 0.33 0.33 0.33 Propagating weights A B C Ad-hoc weights are assigned to each node

  17. 0.33 0.17 0.175 0.175 0.5 0.17 0.5 Propagating weights A B C The node weights are re-defined by the incoming edge weights

  18. 0.25 0.25 0.345 0.175 Propagating weights 0.5 0.175 A B 0.345 C We get new node weights

  19. Propagating weights 0.4 0.2 0.2 A B 0.2 0.4 0.2 0.4 C • We get stable weight assignment • next-step weights are the same as previous ones • Component Rank : order of nodes sorted by the weight

  20. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  21. Component retrieval part • Search components from database, rank components • The process • Search components • Ranking suited to a user request • Aggregate two ranks (CR and KR)

  22. Search components • Search query • Words a user input • The kind of an index word, package name • Components contain given query are searched from Database

  23. Ranking suited to a user request • Keyword Rank (KR) • Components which contain words given by a user are searched • Rank components using the value calculated from index word weight • Index word weight • Many frequency in use of a component • A word contained particular components • A word represent the component function such as Class name • Sort the sum of all given word weight • TF-IDF weighting using full-text search engine

  24. Calculation of KR value • Calculate weight Wct with component c word t • TFi: The frequency with which a kind i of word t occurs in component c • IDF: the total number of components / the number of components containing word t • kwi: Weight of a kind i • KR value is the sum of all word Wct

  25. Aggregate two ranks • Aggregate two ranks KR and CR • Aggregation method • Borda Count method known a voting system • Use for single or multiple-seat elections • This form of voting is extremely popular in determining awards • SPARS-J • Rank components both KR and CR • Using KR and CR, the component that be suitable user’s request, reusable and sophisticated

  26. Borda Count method • There are 10 voters and 5 candidates (from A to E) • Each voter rank candidates • 1 point for last place, 2 points for second from last place …, and N points for first place • 1st=5points,2nd=4points,… • A:15+3+6+4=28points • B:38points • C:38points • D:22points • E:26points Aggregation

  27. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  28. User interface • Receive a user’s query and provide the search results through Web browser • Microsoft Internet Explore, Mozilla, etc. • The process • Parse query word and the search condition • Show rank ordered results • Show analyzed information of the component • Used by/Using the component • Metrics

  29. Analyzed information A component information are as follows • Metrics • The number of method, variable • LOC, cyclomatic • Etc. (measurable metrics in the component itself) • Components used by/using the component • Show lists of nodes followed use relation • Components that are similar to the component • Show lists of similar components

  30. Package browsing • The naming structure for Java packages is hierarchical • A user can search lists of components in same package of a component easily

  31. Screenshot (top page)

  32. Screenshot (search results)

  33. Screenshot (source code)

  34. Screenshot (similar components)

  35. Screenshot (using the component)

  36. Screenshot (used by the component)

  37. Screenshot (package browsing)

  38. Outline • Motivation and research aim • SPARS-J • Outline • System architecture • Ranking method • Each part • Analysis part • Retrieval part • User Interface • Experiment • Conclusion and Future work

  39. Experiment(1/2) • Comparison with Google • Register about 130,000 components get from Internet • Query words ‘calculator applet’ and ‘chat server client’ • Calculate relevance ratio of 10 rank higher • Relevance: The component is reusable source code • Google is a web search engine… • Add ‘java source’ term to the query words • Follow one link from the result web page

  40. Experiment(2/2) • Example 1: • ”calculator applet” • SPARS-J • 9 hits • 7 suited components • Example 2: • ”chat server client” • SPARS-J • 69 hits • 57 suited components • Using SPARS-J, suited component is high order Example1 Example2

  41. Conclusion and Future work • We developed component search engine SPARS-J • UsingSPARS-J, retrieval of components used well is enabled easily. • Future work • Morphological analysis of Index keyword • Collaborative filtering • Investigate best ranking method • The value of weight • Aggregation ranks • Evaluation of SPARS-J • Usability

  42. End

  43. Component graph System Y System X A B F C G D E H I component use relation

  44. 0.1 0.1 0.1 0.2 0.2 0.1 0.1 0.05 0.05 Weight of nodes System Y System X A B F C G D E H I sum of all node weights = 1 ... (1) weight of node represents significance of node

  45. 0.05 0.2 d=1/4 0.05 d=1/4 B 0.05 d=1/4 0.05 d=1/4 0.15 0.05 d: distribution ratio Weights of edges A 0.4 0.2 • Node weight is distributed to each outgoing edge • Edge weights are collected at the destination node sum of all outgoing edge weights = origin node weight ... (2) sum of all incoming edge weights = destination node weight ... (3)

  46. Definition of weights • Under constraints (1)~(3), we have a simultaneous equation . = W: node weight vector Dt: transposed matrix of distribution ratios • This simultaneous equation can be solved by propagating node weight through edges in the graph

  47. Pseudo use relation A B C • Weight computation does not always converge • Add a pseudo edge from a node to another, if there is no 'real' edge • Distribution ratios: pseudo edges << real edges

  48. 0.02 0.01 0.01 0.05 0.03 0.001 0.1 Markov model • Component rank model can be considered as a Markov Chain of user's focus • User's focus moves from one component to another along a use relation at a fixed time duration • Node weight represents the existence probability of the user's focus at infinite future

  49. Related Works • Markov models of documentation traversal • Influence Weight: impact factor of journal publication thought incoming references • Page Rank: weight of HTML in the Internet through incoming web links Explicit use relations No clustering (important for software products) • Measurement reusability of components or interfaces • Use various characteristic metrics • Indirect indicator of reusability • Our approach directly reflects usage of components

  50. C1 0.333 C1 0.334 C1 C1 0.334 C2 C2 0.167 C2 0.333 C2 0.333 0.167 v1×50% v1×50% 0.167 C1 C1 0.500 C2 0.1665 C2 v3×100% 0.333 0.1665 0.333 v2×100% C3 0.333 C3 C3 0.333 C3 0.500 C1 0.400 C2 0.200 0.1665 0.200 0.500 0.167 0.200 C3 C3 0.3335 0.400 0.200 C3 0.400 CR値の計算 • 部品群グラフをもとにした繰り返し計算 • 計算手順 • 各頂点に適当な重みを与える • 重みの総和は1 • 各有向辺の重みを求める • 頂点の重みを,出ていく辺で分配する • 各頂点の重みを再計算 • 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定義する • 頂点の重みが収束するまで,2.3.を繰り返し計算する • 収束した頂点の重みを,その頂点に対応する部品群のCR値とする • 部品の評価値は属する部品群のCR値とする

More Related