1.单选题 (3分)
Which of the following areas of knowledge is NOT required of data scientists?
A
Computer science and information technology
B
Math and statistics
C
Business knowledge
D
Biomedical Engineering
正确答案: D
2.单选题 (3分)
Which of the following is Not a basic analytical approach to data science?
A
Regression
B
Classification
C
Descriptive Statistics
D
Cluster
正确答案: C
3.单选题 (3分)
What is the main difference between supervised learning and unsupervised learning?
A
Supervised learning is done using ground truth
B
Unsupervised learning does not have labeled outputs
C
Supervised learning aims to learn a function
D
Unsupervised learning can infer the natural structure present in a set of data points
正确答案: A
4.单选题 (3分)
How many columns of a 10*10 long data type table will be converted to a wide data type for a column?
A
17
B
18
C
19
D
20
正确答案: B
5.单选题 (3分)
Which of the following descriptions of the weighted mean is incorrect.
A
calculated by multiplying the weight (or probability)
B
associated with a particular event or outcome with its associated quantitative outcome
C
very useful when calculating a theoretically expected outcome
D
each outcome has a different probability of occurring
正确答案: D
6.单选题 (3分)
The univariate visualizations don't include______.
A
boxplot
B
histogram
C
line chart
D
density estimate
正确答案: C
7.单选题 (3分)
What data type does the code "a=as.vector(c(list('a',1),list('afo',222)))" will assign to a?
A
vector
B
list
C
NULL
D
The code will be Error
正确答案: B
8.单选题 (3分)
What output does the code "as.integer(as.factor(c(0,1)))" will have?
A
[1] 0 1
B
[1] 1 0
C
[1] 1 2
D
[1] 2 1
正确答案: C
9.单选题 (3分)
Which of the following statements about Tidy Data is incorrect
A
every column is variable
B
every row is an observation
C
every cell is a single numerical value
D
All the above descriptions about Tidy Data are correct
正确答案: C
10.单选题 (3分)
For splitting the data by one or two categorical variables, what is most suitable for us?
A
theme()
B
geom_bar()
C
facet_grid()
D
facet_wrap()
正确答案: C
11.单选题 (3分)
Which layer can provide a new perspective of data interpretation for visual analysis?
A
The facets layer
B
The theme layer
C
The coordinate layer
D
The statistics layer
正确答案: C
12.单选题 (3分)
If we want to change individual elements, such as the background color or font of our title, what functions can we use?
A
geom_bar()
B
theme()
C
facet_grid()
D
Facet_wrap()
正确答案: B
13.单选题 (3分)
In regression analysis, there are _____ main hypothesis tests.
A
one
B
two
C
three
D
four
正确答案: D
14.单选题 (3分)
For example, the significance level is 0.05; the corresponding confidence level is( ) .
A
93%
B
94%
C
95%
D
96%
正确答案: C
15.单选题 (3分)
Which of the following code can present the result of regression?
A
Anova()
B
Summary()
C
Confint()
D
Predict()
正确答案: B
16.单选题 (3分)
Which of the following algorithms is not a decision tree algorithm?
A
ID3
B
Yolo v5
C
CART
D
C4.5
正确答案: B
17.单选题 (3分)
Which of the following algorithms is not the example of an eager learner?
A
K-Nearest Neighbors
B
Logistic regression
C
Decision tree
D
Naive bayes
正确答案: A
18.单选题 (3分)
The attribute selection measure used by CART is ______.
A
information gain
B
information gain ratio
C
basic information entropy
D
gini Index
正确答案: D
19.单选题 (3分)
In which type of clustering, do you need to use the concept of dendrogram?
A
Prototype-based clustering
B
Density-based clustering
C
Hierarchical clustering
D
Partitioning clustering
正确答案: C
20.单选题 (3分)
Which strategy or algorithm below belongs to Hierarchical clustering?
A
AGNES
B
K-means
C
DBSCAN
D
SMC
正确答案: A
21.单选题 (3分)
What should be alerted when you use a collaborative filtering strategy?
A
It determines the features of items that can be used to measure their similarity.
B
It could be useless at the beginning since the records you have are not enough.
C
It won't recommend an item that hasn't been bought before.
D
The "over-specialization" problem still exists.
正确答案: B
22.多选题 (4分)
What skills do data scientists need to use to deal with data?
A
The machine learning algorithms
B
The knowledge of programming languages
C
Processing of financial statements
D
Data visualization knowledge
正确答案: A,B,D (少选不得分)
23.多选题 (4分)
In general, histograms are plotted such that
A
empty bins are included in the graph.
B
bins are equal in width.
C
the number of bins is up to the user.
D
bars are contiguous. That is, no empty space shows between bars unless there is an empty bin
正确答案: A,B,C,D (少选不得分)
24.多选题 (4分)
The common problems we can find with raw data can be______.
A
namely missing data
B
noisy data
C
unstructured data
D
inconsistent data
正确答案: A,B,D (少选不得分)
25.多选题 (4分)
Which belong to auxiliary layers of the ggplot2 package?
A
Data
B
Facets
C
Statistics
D
Geometries
正确答案: B,C (少选不得分)
26.多选题 (4分)
Which belongs to the classical OLS assumptions for linear regression?( )
A
the regression model is linear in the coefficients and the error term.
B
all independent variables are uncorrelated with the error term.
C
the error term has a constant variance.
D
the error term is normally distributed.
正确答案: A,B,C,D (少选不得分)
27.多选题 (4分)
Which of the following are the advantages of the decision tree algorithms?
A
Hard to overfit
B
Different attribute division methods have different preferences for attribute selection
C
Ability to fit data with irrelevant features and missing value
D
Easy to understand, explain and visually analyze
正确答案: C,D (少选不得分)
28.多选题 (4分)
User-based and item-based filtering have different performances in different situations. Which choices below are correct?
A
User-based filtering is more suitable for time-sensitive items like news.
B
Item-based filtering is more suitable when items are simple and relatively stable.
C
User-based filtering is more suitable when the number of users is more significant than the items.
D
Item-based filtering is more suitable for tailoring to personal taste.
正确答案: A,B,D (少选不得分)
29.判断题 (1分)
Raw data is the original data provided by the users or collected through some techniques, such as crawlers.
正确答案: 错误
30.判断题 (1分)
We can only talk about the correlation between the two variables.
正确答案: 错误
31.判断题 (1分)
If an analysis requires data preprocessing, it must be done before data analysis.
正确答案: 正确
32.判断题 (1分)
When we create a plot skeleton, we first need to think about how to map the data variables to the aesthetics in the graph.
正确答案: 错误
33.判断题 (1分)
Hypothesis testing helps you prove if your data is statistically significant and unlikely to have occurred by chance alone.
正确答案: 正确
34.判断题 (1分)
To address this concern, nearest-neighbor methods often use weighted voting or similarity moderated voting such that each neighbor's contribution is scaled by its similarity.
正确答案: 正确
35.判断题 (1分)
In hierarchical clustering, you can choose the number of clusters depending on the dendrogram it produces, and can always turn back after making the wrong decision.
正确答案: 错误
36.判断题 (1分)
We can use correlation analysis to predict a driver's travel time by using miles traveled and number of deliveries.( )
正确答案: 错误
37.判断题 (1分)
In the narrow sense, a data science product is a product facilitated with a particular data science technique.
正确答案: 正确
还木有评论哦,快来抢沙发吧~