学堂在线 Theory and application of data science Final Exam

wangke 学堂在线答案 1

1.单选题 (3分)

Which of the following areas of knowledge is NOT required of data scientists?

A

Computer science and information technology

B

Math and statistics

C

Business knowledge

D

Biomedical Engineering

正确答案: D

2.单选题 (3分)

Which of the following is Not a basic analytical approach to data science?

A

Regression

B

Classification

C

Descriptive Statistics

D

Cluster

正确答案: C

3.单选题 (3分)

What is the main difference between supervised learning and unsupervised learning?

A

Supervised learning is done using ground truth

B

Unsupervised learning does not have labeled outputs

C

Supervised learning aims to learn a function

D

Unsupervised learning can infer the natural structure present in a set of data points

正确答案: A

4.单选题 (3分)

How many columns of a 10*10 long data type table will be converted to a wide data type for a column?

A

17

B

18

C

19

D

20

正确答案: B

5.单选题 (3分)

Which of the following descriptions of the weighted mean is incorrect.

A

calculated by multiplying the weight (or probability)

B

associated with a particular event or outcome with its associated quantitative outcome

C

very useful when calculating a theoretically expected outcome

D

each outcome has a different probability of occurring

正确答案: D

6.单选题 (3分)

The univariate visualizations don't include______.

A

boxplot

B

histogram

C

line chart

D

density estimate

正确答案: C

7.单选题 (3分)

What data type does the code "a=as.vector(c(list('a',1),list('afo',222)))" will assign to a?

A

vector

B

list

C

NULL

D

The code will be Error

正确答案: B

8.单选题 (3分)

What output does the code "as.integer(as.factor(c(0,1)))" will have?

A

[1] 0 1

B

[1] 1 0

C

[1] 1 2

D

[1] 2 1

正确答案: C

9.单选题 (3分)

Which of the following statements about Tidy Data is incorrect

A

every column is variable

B

every row is an observation

C

every cell is a single numerical value

D

All the above descriptions about Tidy Data are correct

正确答案: C

10.单选题 (3分)

For splitting the data by one or two categorical variables, what is most suitable for us?

A

theme()

B

geom_bar()

C

facet_grid()

D

facet_wrap()

正确答案: C

11.单选题 (3分)

Which layer can provide a new perspective of data interpretation for visual analysis?

A

The facets layer 

B

The theme layer

C

The coordinate layer

D

The statistics layer

正确答案: C

12.单选题 (3分)

If we want to change individual elements, such as the background color or font of our title, what functions can we use?

A

geom_bar()

B

theme()

C

facet_grid()

D

Facet_wrap()

正确答案: B

13.单选题 (3分)

In regression analysis, there are _____ main hypothesis tests.

A

one

B

two

C

three

D

four

正确答案: D

14.单选题 (3分)

For example, the significance level is 0.05; the corresponding confidence level is( ) . 

A

93% 

B

94% 

C

95% 

D

96%

正确答案: C

15.单选题 (3分)

Which of the following code can present the result of regression?

A

Anova()

B

Summary()

C

Confint()

D

Predict()

正确答案: B

16.单选题 (3分)

Which of the following algorithms is not a decision tree algorithm?

A

ID3

B

Yolo v5

C

CART

D

C4.5

正确答案: B

17.单选题 (3分)

Which of the following algorithms is not the example of an eager learner?

A

K-Nearest Neighbors

B

Logistic regression

C

Decision tree

D

Naive bayes

正确答案: A

18.单选题 (3分)

The attribute selection measure used by CART is ______.

A

information gain

B

information gain ratio

C

basic information entropy

D

gini Index

正确答案: D

19.单选题 (3分)

In which type of clustering, do you need to use the concept of dendrogram?

A

Prototype-based clustering

B

Density-based clustering

C

Hierarchical clustering

D

Partitioning clustering

正确答案: C

20.单选题 (3分)

Which strategy or algorithm below belongs to Hierarchical clustering?

A

AGNES

B

K-means

C

DBSCAN

D

SMC

正确答案: A

21.单选题 (3分)

What should be alerted when you use a collaborative filtering strategy?

A

It determines the features of items that can be used to measure their similarity.

B

It could be useless at the beginning since the records you have are not enough.

C

It won't recommend an item that hasn't been bought before.

D

The "over-specialization" problem still exists.

正确答案: B

22.多选题 (4分)

What skills do data scientists need to use to deal with data?

A

The machine learning algorithms

B

The knowledge of programming languages

C

Processing of financial statements

D

Data visualization knowledge

正确答案: A,B,D (少选不得分)

23.多选题 (4分)

In general, histograms are plotted such that

A

empty bins are included in the graph.

B

bins are equal in width.

C

the number of bins is up to the user.

D

bars are contiguous. That is, no empty space shows between bars unless there is an empty bin

正确答案: A,B,C,D (少选不得分)

24.多选题 (4分)

The common problems we can find with raw data can be______.

A

namely missing data

B

noisy data

C

unstructured data

D

inconsistent data

正确答案: A,B,D (少选不得分)

25.多选题 (4分)

Which belong to auxiliary layers of the ggplot2 package?

A

Data

B

Facets

C

Statistics

D

Geometries

正确答案: B,C (少选不得分)

26.多选题 (4分)

Which belongs to the classical OLS assumptions for linear regression?( )

A

the regression model is linear in the coefficients and the error term.

B

all independent variables are uncorrelated with the error term.

C

the error term has a constant variance.

D

the error term is normally distributed.

正确答案: A,B,C,D (少选不得分)

27.多选题 (4分)

Which of the following are the advantages of the decision tree algorithms?

A

Hard to overfit

B

Different attribute division methods have different preferences for attribute selection

C

Ability to fit data with irrelevant features and missing value

D

Easy to understand, explain and visually analyze

正确答案: C,D (少选不得分)

28.多选题 (4分)

User-based and item-based filtering have different performances in different situations. Which choices below are correct?

A

User-based filtering is more suitable for time-sensitive items like news.

B

Item-based filtering is more suitable when items are simple and relatively stable.

C

User-based filtering is more suitable when the number of users is more significant than the items.

D

Item-based filtering is more suitable for tailoring to personal taste.

正确答案: A,B,D (少选不得分)

29.判断题 (1分)

Raw data is the original data provided by the users or collected through some techniques, such as crawlers.

正确答案: 错误

30.判断题 (1分)

We can only talk about the correlation between the two variables.

正确答案: 错误

31.判断题 (1分)

If an analysis requires data preprocessing, it must be done before data analysis.

正确答案: 正确

32.判断题 (1分)

When we create a plot skeleton, we first need to think about how to map the data variables to the aesthetics in the graph.

正确答案: 错误

33.判断题 (1分)

Hypothesis testing helps you prove if your data is statistically significant and unlikely to have occurred by chance alone.

正确答案: 正确

34.判断题 (1分)

To address this concern, nearest-neighbor methods often use weighted voting or similarity moderated voting such that each neighbor's contribution is scaled by its similarity.

正确答案: 正确

35.判断题 (1分)

In hierarchical clustering, you can choose the number of clusters depending on the dendrogram it produces, and can always turn back after making the wrong decision.

正确答案: 错误

36.判断题 (1分)

We can use correlation analysis to predict a driver's travel time by using miles traveled and number of deliveries.( )

正确答案: 错误

37.判断题 (1分)

In the narrow sense, a data science product is a product facilitated with a particular data science technique.

正确答案: 正确


电大答案

发布评论 0条评论)

还木有评论哦,快来抢沙发吧~

当前文章名称

手机号用于查询订单,请认真核对

支付宝
立即支付

请输入手机号或商家订单号

商家订单号在哪里?点此了解

你输入的数据有误,请确认!

如已购买,但查不到

可联系客服QQ 55089918 进行核实