1 / 30

Software Engineering Foundations

Searching Sorting Storing. Software Engineering Foundations. Monica Farrow EM G30 email : monica@macs.hw.ac.uk Material available on Vision and my website. A first sight of Big-O. Big-O notation refers to the Order of time an operation takes

alanna
Télécharger la présentation

Software Engineering Foundations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching Sorting Storing Software Engineering Foundations Monica Farrow EM G30 email : monica@macs.hw.ac.uk Material available on Vision and my website F21SF1 Search,sort, store

  2. F21SF1 Search,sort, store A first sight of Big-O Big-O notation refers to the Order of time an operation takes We have a big O and then a term in brackets The term is the highest order variable in the expression for the time taken For example, we might find that a certain operation, carried out on a collection of n objects, takes 3n2+4n + 1024 units of time. We would say that the operation works in O(n2) time. Big-O refers to the growth in time of an operation with respect to the number (n) of objects involved

  3. Big O notation c

  4. F21SF1 Search,sort, store Some thoughts on Big-O An operation in constant time is not affected by the size of the collection. This is sometimes known as O(c). If really only one thing has to be done, we say O(1)‏ O(n) operations are linear. The time taken increases in direct proportion to the numbers of objects. They’re ok when n is small, but can be slow when it’s large. Note that constant time is not necessarily less than variable time

  5. F21SF1 Search,sort, store How computers spend their time Computers spend a lot of time searching and sorting. Many (for example, the half million Linux servers that Google uses) do practically nothing else. One could be tempted to think that, as hardware these days has such high capacity and works so quickly, it doesn’t much matter which algorithms we use. On the contrary, data are growing even faster: algorithms become more, not less, important

  6. F21SF1 Search,sort, store Fortunes made Google “searching and sorting” and you’re given a million results in a tenth of second. Google is worth billions because it’s very good at searching and sorting. Google is based on the idea of PageRank (developed at Stamford by the founders of the company): articles on the web are assessed in terms of the number of links to them from other pages.

  7. F21SF1 Search,sort, store Searching Consider the ListOfPeople used in L12 Person objects held in an ArrayList Suppose we wish to find the Person object for the name “Jane Grey” We would write code for a linear search A linear search method looks for a key, which arrives by parameter By convention, the method will return the object corresponding to the key or, if unsuccessful, the value ‘null’

  8. F21SF1 Search,sort, store Searching Linear search We have to look at each element in the array in turn, comparing the name. We stop when we find the right object If we’re looking for a name that isn’t in the list, and if the Person objects are not stored in Name order, we have to go right to the end before we know whether it is there or not If they were in alphabetical name order, we could stop searching once we’ve gone past the place where it should be How easy is it to keep the list in alphabetical order?

  9. F21SF1 Search,sort, store Maintaining a sorted list Easy if the objects are created in the right order Otherwise Either Create list then sort it Or Add each element into correct position Each time, involves moving some elements up to make space Issues arising Once it is sorted, what is the best way to search the list? What is the best way to sort the list? Is a list actually the best way to store this data? Answers to these questions in the next few lectures

  10. F21SF1 Search,sort, store A linear search method Algorithm for each person in the list if name is the same as the parameter it is found, so return this person Here nothing matching was found, so return null public Person search (String key) { for (Person p : ListOfPeople)‏ { if (p.getName().equals(key))‏ return p; } return null; }

  11. F21SF1 Search,sort, store Binary search Linear search is O(n): fine when n is small, not so good when n is large. (Imagine, for example, an unsorted phone book.)‏ If the list were sorted, we could use a binary search. In a binary search, we look for the key in the middle of the list If we get a match, the search is over If the key is greater than the thing in the middle of the list, we search the top half If the key is smaller, we search the bottom half

  12. Binary search for 123

  13. F21SF1 Search,sort, store Sequential and binary searches Sequential search is O(n)‏ Binary search is O(log n)‏ If n is a million, for example, sequential search will take on average 500 000 comparisons, and might take 1 000 000 After one unsuccessful comparison, we still have 999999 items left to look at Binary search will take at most 20 comparisons After one unsuccessful comparison, we have halved the number of items left to look at

  14. F21SF1 Search,sort, store Sorting There are many sorting algorithms. Books characteristically give the matter 50 pages or so. Two things matter: ease and speed. Easy sorts (such as bubble-, insertion-, shellsort) are O(n2). Order of a million for a thousand things. Ok for a list of friends or something but much too slow for anything major. More highbrow algorithms (mergesort, quicksort) are O(n log n). Order of ten thousand for a thousand things. A bit more like it. We shall look at one of each.

  15. F21SF1 Search,sort, store Selection sort Repeatedly find the smallest element and swap it with the element that is where it should be Each time, one more element on the LHS is correctly sorted 11 9 17 5 12 5 9 17 11 12 Don’t need to move 5 9 17 11 12 5 9 11 17 12 5 9 11 12 17

  16. F21SF1 Search,sort, store Selection sort - efficiency Algorithm For each element in turn for each element above find the minimum swap This involves 2 nested loops, so has O(n2)‏

  17. F21SF1 Search,sort, store Merge Sort The basic algorithm for sorting a list looks like this: Split the list into two halves Sort the left half Sort the right half Merge the two O(nlog(n))

  18. F21SF1 Search,sort, store Merge sort

  19. Sort comparison n Merge Selection 10000 110 3460 20000 160 13240 30000 220 28290 40000 280 51520 50000 360 82670 60000 450 121820 • Time in millisecs – from Big Java by Horstmann F21SF1 Search,sort, store

  20. F21SF1 Search,sort, store Speed of access We are discussing collections where the most common activity will be searching; where searching will be more common than, say, adding and removing We will look at binary search trees, which is O(log n) in the expected case and O(n) in the worst case We will also look at hashing, whose worst case is also dreadful, O(n), but whose expected case is very fast, O(1)‏

  21. F21SF1 Search,sort, store How We View a Tree Nature Lovers View Computer Scientists View

  22. F21SF1 Search,sort, store Drawbacks of lists The problem with lists is the linear access time: O(n)‏ This makes searching slow If searching is a frequent operation, a linear list is not the thing The problem with array lists is the linear adding or removal time: O(n)‏ If adding or removing is frequent, an array-based structure is not the thing

  23. F21SF1 Search,sort, store Where trees come in Trees are more complicated than lists In return for this complexity, we can organise trees so that searching, adding and removing can all be done in something like O(log n) time. An array is slow to search Data stored in a sorted order in a tree is far quicker At each node, we can quickly see which subtree to continue searching

  24. F21SF1 Search,sort, store A binary search tree In a binary tree, each node has only 2 children, one right, one left a e p c m h s a d r g l x

  25. F21SF1 Search,sort, store Traversing a tree Trees can be traversed in various different ways One most useful way is inorder Left child -> parent -> right child Starting with leftmost node (spaces put in for clarity)‏ acd e gh j lm p rsx j p e c m h s a d r g l x

  26. F21SF1 Search,sort, store Balancing a tree Ideally a tree should be fairly well balanced, otherwise it is no better than a list Binary search trees work well when data is inserted in a random order. Not if the data comes in already sorted. There are more sophisticated tree structures with methods to keep the trees balanced (by continually rearranging)‏ j p s x

  27. F21SF1 Search,sort, store Balancing a tree The java library uses red-black trees, efficient balanced binary trees. How they work is not covered in this module If you’re interested, see http://www.ececs.uc.edu/~franco/C321/html/RedBlack/redblack.html for an excellent animation

  28. F21SF1 Search,sort, store Background : HashTables A hash table is a data structure which associates keys with values E.g. store a person’s name (key) and their phone number (value)‏ Then, given a name, can look up the phone number A hash function is applied to the key, transforming it to a hash code This is used as an index to retrieve the data An efficient hash function leads to an even distribution of items, with very few, if any, items having the same hash number In java, the hash function is called hashCode()‏

  29. F21SF1 Search,sort, store Example of hash table 0 John Smith 1 Sara White 4100 Lisa Smith Lisa Smith 4100 872 Jim Young 873 John Smith 4100 Sara White Jim Young 4100 998 999 Keys Indexes Stored elements Collision at index 1, where both keys result in same index Collisions must be kept to a minimum

  30. Next lectures • We see how lists, sets, trees and maps are used in the java Collection framework

More Related