Data Mining

 

 

 

 

20. Consider the task of building a classifier from random data, where the attribute values are generated randomly irrespective of the class labels. Assume the data set contains records from two classes, “+” and “−.” Half of the data set is used for training while the remaining half is used for testing.

(a) Suppose there are an equal number of positive and negative records in the data and the decision tree classifier predicts every test record to be positive. What is the expected error rate of the classifier on the test data?

(b) Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 0.8 and negative class with probability 0.2.

(c) Suppose two-thirds of the data belong to the positive class and the remaining one-third belong to the negative class. What is the expected error of a classifier that predicts every test record to be positive?

(d) Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 2/3 and negative class with probability 1/3.

Chapter 6 exercises

5. Prove Equation 6.3 in the book. (Hint: First, count the number of ways to create an itemset that forms the left hand side of the rule. Next, for each size k itemset selected for the left-hand side, count the number of ways to choose the remaining d − k items to form the right-hand side of the rule.)

17. Suppose we have market basket data consisting of 100 transactions and 20 items. If the support for item a is 25%, the support for item b is 90% and the support for itemset {a, b} is 20%. Let the support and confidence thresholds be 10% and 60%, respectively.

(a) Compute the confidence of the association rule {a} -> {b}. Is the rule interesting according to the confidence measure?

(b) Compute the interest measure for the association pattern {a, b}. Describe the nature of the relationship between item a and item b in terms of the interest measure.

(c) What conclusions can you draw from the results of parts (a) and (b)?

(d) NOT NEEDED FOR THE TEST

Chapter 7 exercises

5. For the data set with the attributes given below, describe how you would convert it into a binary transaction data set appropriate for association analysis. Specifically, indicate for each attribute in the original data set.
(a) How many binary attributes it would correspond to in the transaction data set,

(b) How the values of the original attribute would be mapped to values of the binary attributes, and

(c) If there is any hierarchical structure in the data values of an attribute that could be useful for grouping the data into fewer binary attributes. The following is a list of attributes for the data set along with their possible values. Assume that all attributes are collected on a per-student basis:

• Year : Freshman, Sophomore, Junior, Senior, Graduate: Masters, Graduate: PhD, Professional

• Zip code : zip code for the home address of a U.S. student, zip code for the local address of a non-U.S. student

• College : Agriculture, Architecture, Continuing Education, Education, Liberal Arts, Engineering, Natural Sciences, Business, Law, Medical, Dentistry, Pharmacy, Nursing, Veterinary Medicine

• On Campus : 1 if the student lives on campus, 0 otherwise

• Each of the following is a separate attribute that has a value of 1 if the person speaks the language and a value of 0, otherwise.

– Arabic
– Bengali
– Chinese Mandarin
– English
– Portuguese
– Russian
– Spanish
Chapter 8 exercises

1. Consider a data set consisting of 2^(20) data vectors, where each vector has 32 components and each component is a 4-byte value. Suppose that vector quantization is used for compression and that 2^(16) prototype vectors are used. How many bytes of storage does that data set take before and after compression and what is the compression ratio?

8. Consider the mean of a cluster of objects from a binary transaction data set. What are the minimum and maximum values of the components of the mean? What is the interpretation of components of the cluster mean? Which components most accurately characterize the objects in the cluster?

9. Give an example of a data set consisting of three natural clusters, for which (almost always) K-means would likely find the correct clusters, but bisecting K-means would not.

11. Total SSE is the sum of the SSE for each separate attribute. What does it mean if the SSE for one variable is low for all clusters? Low for just one cluster? High for all clusters? High for just one cluster? How could you use the per variable SSE information to improve your clustering?

13. The Voronoi diagram for a set of 1( points in the plane is a partition of all the points of the plane into K regions, such that every point (of the plane) is assigned to the closest point among the 1( specified points. (See Figure 8.38.) What is the relationship between Voronoi diagrams and K-means clusters? What do Voronoi diagrams tell us about the possible shapes of K-means clusters?

 

 

 

 

 

The post Data Mining first appeared on COMPLIANT PAPERS.

TutorPro
Calculate your paper price
Pages (550 words)
Approximate price: -
Tutorpro

High Quality Papers

We always make sure that writers follow all your instructions precisely. You can choose your academic level or professional level, and we will assign a writer who has a respective degree.

Tutorpro

Experienced Writers

We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.

Tutorpro

Free Revisions

If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document.

Tutorpro

Timely Delivery

All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension.A 100% refund is guaranteed.

Tutorpro

100% Confidential

We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text.

Tutorpro

24/7 Customer Services

Tutorpro support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.

Try it now!

Calculate the price of your order

Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Tutorpro Homework Help Services

At Tutorpro, we have top rated masters and PhD writers who will help you tacke that homework and score A+ grade. Tutorpro services covers all levels of education : high school, college, university undergraduate, masters and PhD academic level.

Essays

Essay Writing

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Tutorpro

Admissions

Admission Essays

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Tutorpro

Editing

Essay Editing

Tutorpro academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Tutorpro

Revision

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. This is free because we want you to be completely satisfied with the service offered.