Introduction to Database Systems - Tufts CS

Introduction To Database Systems Tufts CS-ppt Download

  • Date:18 Aug 2020
  • Views:10
  • Downloads:0
  • Size:602.00 KB

Share Presentation : Introduction To Database Systems Tufts CS

Download and Preview : Introduction To Database Systems Tufts CS

Report CopyRight/DMCA Form For : Introduction To Database Systems Tufts CS


Transcription:

Data Warehousing MiningComp 150 DWChapter 7 Classification andPredictionInstructor Dan Hebert.
Data Warehousing Mining Chapter 7 Classificationand Prediction What is classification What is prediction Issues regarding classification and prediction.
Classification by decision tree induction Bayesian Classification Classification by backpropagation Classification based on concepts fromassociation rule mining.
Other Classification Methods Prediction Classification accuracy SummaryData Warehousing Mining.
Classification vs Prediction Classification predicts categorical class labels classifies data constructs a model based on the trainingset and the values class labels in a classifying attribute.
and uses it in classifying new data Prediction models continuous valued functions i e predictsunknown or missing values Typical Applications.
credit approval target marketing medical diagnosis treatment effectiveness analysisData Warehousing Mining.
Classification A Two Step Step 1 Model construction describe a set of predetermined classes Each tuple sample is assumed to belong to a predefinedclass as determined by the class label attribute.
The set of tuples used for model construction is thetraining set The model is represented as classification rules decisiontrees or mathematical formulae Step 2 Model usage.
Estimate accuracy of the model The known label of test sample is compared with theclassified result from the model Accuracy rate is the percentage of test set samplesthat are correctly classified by the model.
Test set is independent of training set Use model to classify future or unknown objectsData Warehousing Mining Classification Process 1 Model Construction.
ClassificationAlgorithmsNAME RANK YEARS TENURED ClassifierMike Assistant Prof 3 no Model Mary Assistant Prof 7 yes.
Bill Professor 2 yesJim Associate Prof 7 yes IF rank professor Dave Assistant Prof 6 noOR years 6Anne Associate Prof 3 no.
THEN tenured yes Data Warehousing Mining Classification Process 2 Use the Model in PredictionAccuracy 100.
ClassifierData Unseen Data Jeff Professor 4 NAME RANK YEARS TENUREDTom Assistant Prof 2 no Tenured .
Merlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yesData Warehousing Mining Supervised vs Unsupervised.
Supervised learning classification Supervision The training data observations measurements etc are accompanied by labelsindicating the class of the observations New data is classified based on the training set.
Unsupervised learning clustering The class labels of training data is unknown Given a set of measurements observations etc havethe aim of establishing the existence of classes orclusters in the data.
Data Warehousing Mining classification andprediction 1 DataPreparation Data cleaning.
Preprocess data in order to reduce noise smoothing technique handle missing values most commonly occurring value Relevance analysis feature selection Remove the irrelevant or redundant attributes.
Data transformation Generalize to higher level concepts Normalize dataData Warehousing Mining Issues Regarding.
Classification Prediction 2 Comparing Classification Methods Predictive accuracy Ability of model to correctly predict class label of new Speed and scalability.
time to construct the model time to use the model Robustness handling noise and missing values Scalability.
efficiency in disk resident databases Interpretability understanding and insight provided by the model Goodness of rules decision tree size.
compactness of classification rulesData Warehousing Mining Classification by DecisionTree Induction Decision tree.
A flow chart like tree structure Internal node denotes a test on an attribute Branch represents an outcome of the test Leaf nodes represent class labels or class distribution Decision tree generation consists of two phases.
Tree construction At start all the training examples are at the root Partition examples recursively based on selected attributes Tree pruning Identify and remove branches that reflect noise or outliers.
Use of decision tree Classifying an unknown sample Test the attribute values of the sample against the decisionData Warehousing Mining 1 Trainingage income student credit rating buys computer.
30 high no fair no 30 high no excellent no31 40 high no fair yes 40 medium no fair yes 40 low yes fair yes.
40 low yes excellent no31 40 low yes excellent yes 30 medium no fair no 30 low yes fair yes 40 medium yes fair yes.
30 medium yes excellent yes31 40 medium no excellent yes31 40 high yes fair yes 40 medium no excellent noData Warehousing Mining 1.
Example A Decision Tree for buys computer 30 overcast30 40 40student yes credit rating .
no yes excellent fairno yes no yesNon leaf nodes test on an attributeData Warehousing Mining Leaf nodes class buys computer 1 Algorithm for Decision Tree.
Basic algorithm a greedy algorithm Tree is constructed in a top down recursive divide and conquer At start all the training examples are at the root Attributes are categorical if continuous valued they arediscretized in advance .
Examples are partitioned recursively based on selectedattributes Test attributes are selected on the basis of a heuristic orstatistical measure e g information gain Conditions for stopping partitioning.
All samples for a given node belong to the same class There are no remaining attributes for further partitioning majority voting is employed for classifying the leaf There are no samples leftData Warehousing Mining 1.
Algorithm for Decision TreeInduction continued Basic algorithm generate decision tree Create a node N If samples are all of the same class C then.
Return N as a leaf node labeled with the class C If attribute list is empty then Return N as a leaf node labeled with most common class in sample Select test attribute the attribute with highest info gain fromattribute list.
Label node N with test attribute For each known value ai of test attribute Grow a branch from node N for the condition test attribute ai Let si be the set of samples in samples for which test attribute ai If si is empty then.
Attach a leaf labeled with the most common class in samples Else attach the node returned by generate decision tree s i attribute list Data Warehousing Mining 1 Information Gain.
Select the attribute with the highest information Assume there are two classes P and N Let the set of examples S contain p elements of class P andn elements of class N The amount of information needed to decide if an arbitrary.
example in S belongs to P or N is defined asI p n log 2 log 2p n p n p n p nData Warehousing Mining 1 Information Gain in.
Decision Tree Induction Assume that using attribute A a set S willbe partitioned into sets S1 S2 Sv If Si contains pi examples of P and ni examplesof N the entropy or the expected information.
needed to classify objects in all subtrees Si isE A I pi ni i 1 p n The encoding information that would be.
gained by branching on AGain A I p n E A Data Warehousing Mining 1 Attribute Selection byInformation Gain.
Computation 5 4E age I 2 3 I 4 0 Class P buys computer 14 14 yes 5 I 3 2 0 69.
Class N buys computer no Hence I p n I 9 5 0 940 Gain age I p n E age Compute the entropy for 25age pi ni I pi ni Similarly.
30 2 3 0 971 Gain income 0 02930 40 4 0 0 Gain student 0 151 40 3 2 0 971Gain credit rating 0 048Data Warehousing Mining 1.
Extracting ClassificationRules from Trees Represent the knowledge in the form of IF THEN rules One rule is created for each path from the root to a leaf Each attribute value pair along a path forms a.
conjunction The leaf node holds the class prediction Rules are easier for humans to understand ExampleIF age 30 AND student no THEN buys computer no .
IF age 30 AND student yes THEN buys computer yes IF age 31 40 THEN buys computer yes IF age 40 AND credit rating excellent THEN buys computer IF age 40 AND credit rating fair THEN buys computer no Data Warehousing Mining 1.
Avoid Overfitting inClassification The generated tree may overfit the training data Too many branches some may reflect anomalies dueto noise or outliers.
Result is in poor accuracy for unseen samples Two approaches to avoid overfitting Prepruning Halt tree construction early do not split anode if this would result in the goodness measurefalling below a threshold.
Difficult to choose an appropriate threshold Postpruning Remove branches from a fully grown tree get a sequence of progressively pruned trees Use a set of data different from the training data todecide which is the best pruned tree .
Data Warehousing Mining 1 Enhancements to basicdecision tree induction Allow for continuous valued attributes Dynamically define new discrete valued attributes.
that partition the continuous attribute value into adiscrete set of intervals Handle missing attribute values Assign the most common value of the attribute Assign probability to each of the possible values.
Attribute construction Create new attributes based on existing ones thatare sparsely represented This reduces fragmentation repetition andreplication.
Data Warehousing Mining 2 Classification in Large Classification a classical problem extensively studiedby statisticians and machine learning researchers Scalability Classifying data sets with millions of.
examples and hundreds of attributes with reasonable Why decision tree induction in data mining relatively faster learning speed than other classification convertible to simple and easy to understand classification can use SQL queries for accessing databases.
comparable classification accuracy with other methodsData Warehousing Mining 2 Scalable Decision TreeInduction Methods in DataMining Studies.
SLIQ EDBT 96 Mehta et al builds an index for each attribute and only class list andthe current attribute list reside in memory SPRINT VLDB 96 J Shafer et al constructs an attribute list data structure.
PUBLIC VLDB 98 Rastogi Shim integrates tree splitting and tree pruning stop growing thetree earlier RainForest VLDB 98 Gehrke Ramakrishnan separates the scalability aspects from the criteria that.
determine the quality of the tree builds an AVC list attribute value class label Data Warehousing Mining 2 Data Cube BasedDecision Tree Induction.
Integration of generalization with decision tree induction Kamber et al 97 Classification at primitive concept levels E g precise temperature humidity outlook etc Low level concepts scattered classes bushy.
classification trees Semantic interpretation problems Cube based multi level classification Relevance analysis at multi levels Information gain analysis with dimension level .
Data Warehousing Mining 2 Presentation of ClassificationResults Decision Tree Data Warehousing Mining 2 Presentation of Classification.
Results Classification Rules Data Warehousing Mining 2 Presentation of ClassificationResults Tree Grid Data Warehousing Mining 2.
DBMiner Classification Show help on classification module andclassification results Run examples 1 5Data Warehousing Mining 2.
Bayesian Classification Why Probabilistic learning Calculate explicit probabilitiesfor hypothesis among the most practicalapproaches to certain types of learning problems Incremental Each training example can.
incrementally increase decrease the probability thata hypothesis is correct Prior knowledge can becombined with observed data Probabilistic prediction Predict multiple hypotheses weighted by their probabilities.
Data Warehousing/Mining Comp 150 DW Chapter 7. Classification and Prediction Instructor: Dan Hebert

Related Presentations

New price indexes sites tufts edu Tufts Self Serve

Daniel Sarpong (University of Ghana), Fulgence Mishili and Joyce Kinabo (Sokoine University) ... People will include higher-cost energy sources in their diet to meet additional needs. People who include at least five groups are likely to reach adequacy thresholds.

19 Views0 Downloads

sites tufts edu Tufts Self Serve Blogs and Websites

Inequities in global agriculture, dietary intake and health outcomes (working paper, forthcoming). This will be replaced by a figure from Keith so that all food visuals come from GENuS [email protected] Slides for PIM workshop at ICAE

27 Views0 Downloads

Database Models Flat Files and the Relational Database

Database Models: Flat Files and the Relational Database Objectives: Understand the fundamental structure of the relational database model Learn the circumstances under which it is a better choice than the flat file

26 Views0 Downloads

Database Security with focus on Hyperion Database

Hyperion application security determines user access to products using the concept of roles. A role is a set of permissions that determines user access to product functions. User directories store information about the users who can access Hyperion products. Both the Authentication and the authorization processes utilize user information.

22 Views0 Downloads

Nutrition Transition Tufts University

Winner-take-all versus proportional shares of same total award. Interventions designed in collaboration with ICDS management. Primary outcome is lower weight-for-age malnutrition. Also report changes in height, mechanism checks and placebo tests

20 Views0 Downloads

Computer Science 141 Computer Architecture ece tufts edu

Understand the basics of gate delay & why it can be hard to predict. Understand timing constraints for flops and latches. Understand the basics of speed binning, and its interaction with STA. Manufacturing is cool – because money is cool For those who went through the STA lectures in last spring’s CAD class, we’ll go deeper this time

34 Views0 Downloads

Omicron LT UHV STM Tufts University

LHe Fill Procedure Always fill any cryogen with the quiet room door(s) open for ventilation Always wear safety glasses and cryogen appropriate gloves Double check the extender on the long end of the transfer rod is tightened (it loosens after every fill) Check pressure inside 60 L or 100 L liquid helium storage dewar with pressure gauge ...

15 Views0 Downloads

Wireless Grid Computing ece tufts edu

Wireless Grid Computing A Prototype Wireless Grid Grant Gifford Mark Hempstead April 30, 2003 Overview of Presentation Grid Computing Wireless Networking Building a small wireless grid Test Application Conclusions Purpose of the Project Study the collision of two emerging technologies Grid Computing Corporate research IBM, Sun R&D Magazine top 100 technologies of 2002 MIT Technology Review one ...

22 Views0 Downloads

Second Order Circuits ece tufts edu

Second Order Circuits ... (504 kHz) R = 10, 63.2, and 1000 ohms a = 5x105, 3.16x106, and 5x107 s-1 Underdamped Critically Damped Overdamped t vs Automobile: Mass inductor Suspension: Spring capacitor Shock Absorber: Damper resistor Force voltage Velocity current Wikipedia.com * * * * Title: Second Order Circuits ...

9 Views0 Downloads

With rapid change in food environments Tufts University

Average intake (mean g/day) and frequency of intake (percent of days) In Tanzania, achieving nutrient adequacy at lowest cost involves a lot of spinach (>100 g/day) and also soya beans (200 g/day) Measuring the affordability of nutritious diet: methods and results

9 Views0 Downloads

Undernutrition and the Dietary Transition sites tufts edu

Source: Packaged food sales are estimated by Euromonitor, and are available for the 54 countries shown. National income is GNI per capita in constant US dollars at 2011 PPP prices. Darker dots show later years. GNI per capita, PPP (constant 2011 US dollars), log scale. Food systems differ: packaged food sales don’t . always . rise with ...

10 Views0 Downloads

The IMMANA Fellowships Tufts University

R3 Fellows (2017-18) will meet at ANH Academy week somewhere in Africa, June or July 2018. Round 4 (2018-19) application deadline is end of February, for one-year Fellowships to start June-December 2018

13 Views0 Downloads