Administrivia Final Exam Tuesday 5 20 5 8 pm Cumulative stress end of semester 2 cribsheets.
Final Review Session Watch for announcement Office Hours Next week Tentative office hours on 5 15 watch.
As you study Reading maketh a full man conference a readyman and writing an exact man Francis Bacon If you want truly to understand something try.
to change it Kurt Lewin I hear and I forget I see and I remember Ido and I understand Chinese Proverb .
Knowledge is a process of piling up facts wisdom lies in their simplification Martin H Fischer Database Lessons to Live By If we do well.
here we shalldo well there I can tell youno more if Ipreach a whole.
Edwin 1749 Recall Lecture 1 Lessons of Data Independence High level declarative programming Maintenance in the face of change.
Automatic re optimization Data integrity Declarative consistency constraints Concurrent access recovery from Simplicity is Beautiful.
The relational model is simple simple query language means simple implementation basically just indexes join algorithms sorting grouping simple data model means easy schema evolution.
simple data model provides clean analysis ofschemas FD s NF s are essentially automatic Every other structured data model has proved to be XML has found a niche but not as a database There s a reason that the backend of web search.
looks so much like a relational database Bulk Processing I O Go Disks provide data a page at a time Databases deal with data a set at a time sets usually bigger than a page.
means I O costs are usually justified much better than other techniques whichare object at a time Set at a time allows for optimization can do bulk operations e g sort or hash .
or can do things tuple at a time e g nested loops Optimize the Memory DBMS worries about Disk vs RAM spend lotsa CPU cycles planning disk access.
I O cost hides the think time Similar hierarchies exist in other parts of a various caches on and off CPU chips less time to spare optimizing here Change is happening here .
Disk is the new tape Flash is the new disk RAM is really big Query Processing isPredictable.
Big queries take many predictable steps unlike typical OS workloads which depend onwhat small task users decide to do next DBMSs can use this knowledge to optimize For caching prefetching admission control .
memory allocation etc These lessons should be applied whenever youknow your access patterns again especially for bulk operations Applied Algorithm Analysis.
Know the practical costs of your algorithms The optimizer needs to know anyway How many disk I O s really needed to access a In many applications the bottlenecks determinethe cost model.
e g I O is traditional DB bottleneck in another setting it might be network orprocessor cache locality this affects the practical analysis of the Indexing Is Simple .
Hash indexes easy and quick for equality worth reading about linear hashing in the Trees can be used for just about anything each tree level partitions the dataset labels in the tree direct query traffic .
to the right data all you need to think about in designinga tree is how to partition and how to Not enough memory Partition .
Traditional main memory algorithms canbe extended to disk based algorithms partition input runs for sorting partitions for hash table process partitions sort runs hash.
partitions merge partitions merge runs concatenate partitions Sorting hashing very similar their I O patterns are dual .
Declarative languages are Simple say what you want not how to get it Should correctly convert to an imperative language Codd s Theorem says rel calc rel alg no such theorem for text ranking .
If you can convert in different ways you get to hides complexity from user accomodates changes in database without requiringapplications to be recompiled Especially important when.
App Rate of Change Physical Rate of Change A reborn trend in computing Declarative networking security robotics naturallanguage processing distributed systems SQL The good the bad the.
SQL is very simple SELECT FROM WHERE Well SQL is kind of tricky aggregation GROUP BY HAVING OK OK SQL is complicated .
duplicates NULLs Subqueries dups NULLs subqueries aggregat... together Remember SQL is not entirely declarative But it beats the heck out of writing and.
maintaining C or Java programs for every query Query Operators Optimization Query operators are actually all similar Sorting Hashing Iteration.
Query Optimization 3 part harmony define a plan space estimate costs for plans algorithm to search in the plan spacefor cheapest.
Research on each of the 3 pieces goes onindependently Usually Nice clean model for attacking a hard Database Design And you thought SQL was confusing .
This is not simple stuff requires a lot of thought a lot of there s no cookbook to follow decisions can make a huge differencedown the road .
The basic steps we studied conceptualdesign schema refinement physicaldesign break up the problem somewhat but also interact with each other Complexity in DB design pays off at.
query time and in consistency CC Recovery HouseSpecialties RDBMSs nailed concurrency and reliability transactions 2 phase locking.
write ahead logging details are tricky worked out over 20 Also models for relaxing transactions Lower degrees of consistency Other systems are now taking pieces.
Journaling file systems Transactional memories Web infrastructure locking services The Rebirth of Information A lonely backwater in the 70 s 80 s early 90 s.
Now a driver of research and industry We saw that it s easy to get working But there s tons more Watering hole for ideas from databases AI approximation algorithms distributed systems .
power efficient processors HCI Kicking off the new generation of parallel Pushing to yet another level of scalability Always a game changer Databases The natural way.
to leverage parallelism distribution The promise of CS research for the last 15 yrs There are millions of computers They are spread all over the world.
Harness them all world s best supercomputer This was routinely disappointing except for data intensive applications DBs Web 2 reasons for success data intensive apps easy to parallelize distribute.
lots of people want to share data fewer people want to share computation The parallelism craze is BACK Intel AMD etc need us to take advantage ofparallelism.
They have nothing else to do with all those transistors Google convinced people that bulk data analysis is cool Map Reduce Incoming freshman will get this in 61A and through the curriculum More more I m still not.
satisfied Grad classes Berkeley Tom Lehrer CS262A a grad level intro to DBMS and OS research CS286 grad DBMS course.
read discuss lots research papers See evolution of different communities on similar undertake a research project often big successes CS298 12 Database group seminar Upcoming seminar courses.
Alon Halevy from Google will offer something in Fall But wait there s more Graduate study in databases Used to be rare Berkeley Wisconsin You are living in the golden age .
Berkeley Wisconsin Stanford MIT Brown Cornell CMU Maryland Penn Duke Washington Michigan many others Tons of DB related companies lots of hiring Search companies DB elephants IBM Oracle MS.
Midstage DB startups ANTs Greenplum Netezza Early startups Truviso Streambase Coral8 Vertica Paraccel Enterprise app firms e g SAP Salesforce Every Web 2 0 company A note ask for the job you want.
E g not just engineering sales marketing R D management Parting Thoughts Education is the ability to listen to almostanything without losing your temper or your self confidence .
Robert Frost It is a miracle that curiosity survives formaleducation Albert Einstein Humility yet pride and scorn .
Instinct and study love and hate Audacity reverence These must mate Herman Melville The only thing one can do with good advice is topass it on It is never of any use to oneself .
Final Exam Tuesday, 5/20, 5-8 pm Cumulative, stress end of semester 2 cribsheets Final Review Session Watch for announcement Office Hours Next week Tentative office hours on 5/15, watch web page

