Intro-to-Data-Analytics-Business-Data-Mining

In this homework, you will be performing linear regression on a data set of your choice. Perform the following for this homework:

  • Find a data set of your choice from a valid source. You may use a previously used data set. If you have categorical data, you may want to create dummy variables.
  • Split up your data set into a training set and a scoring set. Rename the data sets appropriately. The scoring data set should not include the column that you are trying to predict.
  • Import both data sets into RapidMiner and check the ranges on all attributes. If some observations in the scoring data set for an attribute lie below or above the training data set’s lower or upper bound for that respective attribute, then remove these observations that are outside this range. Take a screenshot of the loaded data sets.
  • Set the role of the attribute in the training stream that you are trying to predict as a label.
  • Perform linear regression by adding the “Linear Regression” operator to the training stream and adding the “Apply Model” operator to connect the training stream to the scoring stream. Take a screenshot of the final process stream.
  • Run the model and take screenshots of both the linear regression results (i.e., table with regression coefficients) and the results of predictions made on the scoring data set. Evaluate and interpret your results. Examine your attribute coefficients and the predictions made in the scoring data set. In your interpretation of results, you should include answers to the following questions:
    • Which attributes have the greatest weight?
    • What would the resulting mathematical formula be for the regression line?
    • Were any attributes dropped from the data set as non-predictors? If so, which ones and why do you think they weren’t effective predictors?
    • What can you conclude from the predictions made?

Submission Instructions:

Please type up your homework using the homework template posted on Blackboard under Assignments. You should include at least four screenshots: (1) data set loaded in RapidMiner, (2) final process stream, (3) linear regression results (i.e., table with regression coefficients), and (4) results of predictions made on the scoring data set. Remember to interpret your results and answer all questions above in step

 
Do you need a similar assignment done for you from scratch? We have qualified writers to help you. We assure you an A+ quality paper that is free from plagiarism. Order now for an Amazing Discount!
Use Discount Code "Newclient" for a 15% Discount!

NB: We do not resell papers. Upon ordering, we do an original paper exclusively for you.