Fall 2013, Databases, Exam 2 Questions for the second exam. Your answers are due by Dec. 18 at 4PM. (This is the final exam slot.) And please type your answers. (You can include hand drawn images if you want.)
1. Cassandra • Describe a database-centric application that would be an excellent match for developing with Cassandra as its data store.(You can use Hbase or Mongo instead; but you must use one of these 3.) • What business owns this application? What product or service to they sell? • What does the data represent in the real world? • Most likely, some of their data is standard transaction processing data that they have to manipulate on an ongoing basis – assume that this is not your concern. • But some of the data they must process every day consists of documents that have internal structure. This is what you are concerned with. • Why would Cassandra be the best thing for them? • List the reasons • Why not a relational DBMS? • Why not Hbase or Mongo?
Cassandra, continued • Draw out the column families (in the data model of Cassandra) for the application • Make them specific and detailed • Make sure your specification isn’t trivial • Write 3 queries for your Cassandra application using CQL • One that creates data with an Insert • One that performs an SQL like summary task • One that creates the column families for your application
2. An XML language • Create an XML language for the general application domain that you used to describe the business or organization you invented in problem 1. • Your goal is to create a shared language that can be used to describe other companies that manage data of the sort sold by your chosen environment. • Note: in problem 1, you described a particular business or organization, and you built a specific data structure for an application, but your XML language needs to be generic, suitable any other companies or organizations similar to the one you described in problem 1. • IMPORTANT: make sure that some of your elements refer to data types and that some of them refer to operations that can be performed on the data. YOU WILL NEED both of THESE IN PROBLEM 4.)
An XML language, continued • For the company that you invented in problem 1, use your new XML language to create a data description for the data you defined structurally in question 1. In other words, you are specifying the XML version of the Cassandra structures you specified in problem 1. • Keep in mind that your language is just a list of elements and attribute types, and that users of your language will created hierarchies and lists of elements by using your primitives; in other words, they can build xml documents that embed your elements inside each other. • Make sure that… • … your language is not trivial. It should have at least 5 sorts of elements and 3 sorts of attributes • … the structure you define by using your XML language uses ALL of the elements and attributes you specified in your XML language. • …the structure you define has at least 2 different element instances with nested elements inside of them.
3. Relational DBs and OLAP • Consider problem 1. You did not have to concern yourself with the transaction processing data in problem 1. Now, you will. Create a relational schema for the online transaction processing data from the app you invented in problem 1. • Specify all Primary Keys and FK’s • Specify all relevant FD’s and MVD’s – but you do not have to normalize your schema • Write 2 OLTP queries for your schema • One does an insert into two tables • One does a general set oriented query with at least one join in it. • Your schema must use only terms from your XML language.
OLAP and OLTP, continued • Create a data warehousing schema to go with your relational schema • Add to your relational schema if you need to do so in order to make your warehousing schema non-trivial. You do not have to go back and add anything new to problems 1 or 2. • Assume that the basic “fact” data being studied over time is money. • Assume that time is one of the dimensions, and break your time data down by year/month/day. • Create three different transactions that can be run in order to look for trends over time. You do not have to use any OLAP operators, regular SQL is fine.
OLAP and OLTP, continued • Carefully describe the data that is taken from the transaction database and how it is aggregated into the dimension tables of the warehouse to create new time slices. • Write a couple of sentences describing the reason for your warehouse and queries – why are these very different from the OLTP transactions?
4. Assertions and inferences • For the application you created in question 1, write a set of assertions. • Create at least 10 triples • The “lefts” and “rights” refer to data; the “middles” refer to the operations that can be applied to the data lefts and the rights. • Make sure that all of the “middles” of your assertions are from the xml language you specified. You can add to your XML language if you want. These are the operations you placed in your XML language. • Make sure that some of the “lefts” and “rights” of your triples come from your XML language (these are the data types you placed in your XML language), and that some of the others consist of URLs pointing to specific documents. • In other words, you will have two kinds of triples, generic ones that are true for all data of certain types (like Cows giveMilk), and data-focused ones that relate specific documents to each other (like Betsy-the-cow give(s) chocolate-milk). Only your data will consist of documents, not farm animals.
Assertions and inferences, continued • Create at least two inferences based on your assertions • one that is generic and states a new assertion about how all data of these types are related … • …and one that is based on specific documents data types in the Cassandra database, i.e., creates a new assertion that is based only on specific documents. • In other words, your assertions and your inferences will be of two categories: relating data types and relating individual documents. Important: you do not have to use the same operations in both kinds of assertions. It might be that cows give milk, but that Betsy-the-cow is the color brown.
5. Distributed DBs • Create a heterogeneous distributed database • Use your relational schema from problem 3 as one of them • Specify two other schemas that concern the same general application domain, but do NOT use any of your XML language terms. • Instead, they use their own terms, but make some of them similar to your XML language terms and make others quite different. • For your 2 new schemas, specify their keys, FK’s, FDs and MVDs and deliberately make them conflict with the original schema and with each other
Distributed DBs, continued • Build a set of primitives that you might use to translate these conflicting schemas into a single schema. • For example, you will specify a primitive that can be used to to pick an FD from one schema over an FD from another schema (given that the two FDs conflict). • Be creative and think about other conflicts that must be settled. • IMPORTANT: Your primitives are generic and not based on the terms in the 3 schemas. You are in essence laying out some primitives that can be used to merge two or more schemas • Use your new primitives to turn your 3 schemas into 1. In other words, write a problem with your primitives that merges the 3 schemas into 1.
6. Ambient Intelligence • Describe an instance of ambient intelligence that would be useful to some sizeable and diverse group of people who are in a public place like a train station or a bank. • In what public place is this system located? • Make sure that computer-based device is end-user friendly. Describe what it does. You can draw one or more picture of the interface, if you want. • Choose an application where people are aware that they are accessing a computer; i.e., don’t use a hidden imbedded computer, like one controlling fuel usage in a car. • How is your device able to instantly adapt to these diverse people? • Would it require special hardware, like gaming hardware? If so, describe it. It’s perfectly fine, and maybe a good idea if you use specialized hardware.
Ambient Intelligence, continued • Make sure that it serves hands-on users, but is also web-connected. • How would the connectivity to the web enhance your systems value to people? • How might this connectivity help whoever it is that has placed this ambient intelligence device in this public place? • In other words, the web connectivity could be in the best interest of the users and/or the people who own the device. • Pick a database system (among the ones we have talked about in class) that will store the data that is collected from this device and uploaded into the web server. • Why is this the best database to use? Be specific. • It’s okay to suggest that more than one database system should be used, but you have to defend both and explain why.
7. Database Systems Comparison • Create a chart that includes • Neo4J • Mongo • MySQL • Cassandra • Hbase • In your chart, the vertical axis will have the databases on them and the horizontal axis will have (at least 4) properties that specifically focus on the way these database systems differ in their capabilities (not the kinds of apps they are meant for, but in their actual functionalities).