学堂在线 Theory and application of data science Final Exam

wangke 学堂在线答案 2025-03-19 21:58:36 1

1.单选题 (3分)

Which of the following areas of knowledge is NOT required of data scientists?

Computer science and information technology

Math and statistics

Business knowledge

Biomedical Engineering

正确答案： D

2.单选题 (3分)

Which of the following is Not a basic analytical approach to data science?

Regression

Classification

Descriptive Statistics

Cluster

正确答案： C

3.单选题 (3分)

What is the main difference between supervised learning and unsupervised learning?

Supervised learning is done using ground truth

Unsupervised learning does not have labeled outputs

Supervised learning aims to learn a function

Unsupervised learning can infer the natural structure present in a set of data points

正确答案： A

4.单选题 (3分)

How many columns of a 10*10 long data type table will be converted to a wide data type for a column?

正确答案： B

5.单选题 (3分)

Which of the following descriptions of the weighted mean is incorrect.

calculated by multiplying the weight (or probability)

associated with a particular event or outcome with its associated quantitative outcome

very useful when calculating a theoretically expected outcome

each outcome has a different probability of occurring

正确答案： D

6.单选题 (3分)

The univariate visualizations don't include______.

boxplot

histogram

line chart

density estimate

正确答案： C

7.单选题 (3分)

What data type does the code "a=as.vector(c(list('a',1),list('afo',222)))" will assign to a?

vector

list

NULL

The code will be Error

正确答案： B

8.单选题 (3分)

What output does the code "as.integer(as.factor(c(0,1)))" will have?

[1] 0 1

[1] 1 0

[1] 1 2

[1] 2 1

正确答案： C

9.单选题 (3分)

Which of the following statements about Tidy Data is incorrect

every column is variable

every row is an observation

every cell is a single numerical value

All the above descriptions about Tidy Data are correct

正确答案： C

10.单选题 (3分)

For splitting the data by one or two categorical variables, what is most suitable for us?

theme()

geom_bar()

facet_grid()

facet_wrap()

正确答案： C

11.单选题 (3分)

Which layer can provide a new perspective of data interpretation for visual analysis?

The facets layer

The theme layer

The coordinate layer

The statistics layer

正确答案： C

12.单选题 (3分)

If we want to change individual elements, such as the background color or font of our title, what functions can we use?

geom_bar()

theme()

facet_grid()

Facet_wrap()

正确答案： B

13.单选题 (3分)

In regression analysis, there are _____ main hypothesis tests.

one

two

three

four

正确答案： D

14.单选题 (3分)

For example, the significance level is 0.05; the corresponding confidence level is( ) .

93%

94%

95%

96%

正确答案： C

15.单选题 (3分)

Which of the following code can present the result of regression？

Anova()

Summary()

Confint()

Predict()

正确答案： B

16.单选题 (3分)

Which of the following algorithms is not a decision tree algorithm?

ID3

Yolo v5

CART

C4.5

正确答案： B

17.单选题 (3分)

Which of the following algorithms is not the example of an eager learner?

K-Nearest Neighbors

Logistic regression

Decision tree

Naive bayes

正确答案： A

18.单选题 (3分)

The attribute selection measure used by CART is ______.

information gain

information gain ratio

basic information entropy

gini Index

正确答案： D

19.单选题 (3分)

In which type of clustering, do you need to use the concept of dendrogram?

Prototype-based clustering

Density-based clustering

Hierarchical clustering

Partitioning clustering

正确答案： C

20.单选题 (3分)

Which strategy or algorithm below belongs to Hierarchical clustering?

AGNES

K-means

DBSCAN

SMC

正确答案： A

21.单选题 (3分)

What should be alerted when you use a collaborative filtering strategy?

It determines the features of items that can be used to measure their similarity.

It could be useless at the beginning since the records you have are not enough.

It won't recommend an item that hasn't been bought before.

The "over-specialization" problem still exists.

正确答案： B

22.多选题 (4分)

What skills do data scientists need to use to deal with data?

The machine learning algorithms

The knowledge of programming languages

Processing of financial statements

Data visualization knowledge

正确答案： A,B,D (少选不得分)

23.多选题 (4分)

In general, histograms are plotted such that

empty bins are included in the graph.

bins are equal in width.

the number of bins is up to the user.

bars are contiguous. That is, no empty space shows between bars unless there is an empty bin

正确答案： A,B,C,D (少选不得分)

24.多选题 (4分)

The common problems we can find with raw data can be______.

namely missing data

noisy data

unstructured data

inconsistent data

正确答案： A,B,D (少选不得分)

25.多选题 (4分)

Which belong to auxiliary layers of the ggplot2 package?

Data

Facets

Statistics

Geometries

正确答案： B,C (少选不得分)

26.多选题 (4分)

Which belongs to the classical OLS assumptions for linear regression?( )

the regression model is linear in the coefficients and the error term.

all independent variables are uncorrelated with the error term.

the error term has a constant variance.

the error term is normally distributed.

正确答案： A,B,C,D (少选不得分)

27.多选题 (4分)

Which of the following are the advantages of the decision tree algorithms?

Hard to overfit

Different attribute division methods have different preferences for attribute selection

Ability to fit data with irrelevant features and missing value

Easy to understand, explain and visually analyze

正确答案： C,D (少选不得分)

28.多选题 (4分)

User-based and item-based filtering have different performances in different situations. Which choices below are correct?

User-based filtering is more suitable for time-sensitive items like news.

Item-based filtering is more suitable when items are simple and relatively stable.

User-based filtering is more suitable when the number of users is more significant than the items.

Item-based filtering is more suitable for tailoring to personal taste.

正确答案： A,B,D (少选不得分)

29.判断题 (1分)

Raw data is the original data provided by the users or collected through some techniques, such as crawlers.

正确答案：错误

30.判断题 (1分)

We can only talk about the correlation between the two variables.

正确答案：错误

31.判断题 (1分)

If an analysis requires data preprocessing, it must be done before data analysis.

正确答案：正确

32.判断题 (1分)

When we create a plot skeleton, we first need to think about how to map the data variables to the aesthetics in the graph.

正确答案：错误

33.判断题 (1分)

Hypothesis testing helps you prove if your data is statistically significant and unlikely to have occurred by chance alone.

正确答案：正确

34.判断题 (1分)

To address this concern, nearest-neighbor methods often use weighted voting or similarity moderated voting such that each neighbor's contribution is scaled by its similarity.

正确答案：正确

35.判断题 (1分)

In hierarchical clustering, you can choose the number of clusters depending on the dendrogram it produces, and can always turn back after making the wrong decision.

正确答案：错误

36.判断题 (1分)

We can use correlation analysis to predict a driver's travel time by using miles traveled and number of deliveries.（ )

正确答案：错误

37.判断题 (1分)

In the narrow sense, a data science product is a product facilitated with a particular data science technique.

正确答案：正确