数据科学理论与应用 - 南京大学 - 学堂在线
1.单选题 (3分)
Which of the following areas of knowledge is NOT required of data scientists?
AComputer science and information technology
BMath and statistics
CBusiness knowledge
DBiomedical Engineering
正确答案:D
2.单选题 (3分)
Which of the following is Not a basic analytical approach to data science?
ARegression
BClassification
CDescriptive Statistics
DCluster
正确答案:C
3.单选题 (3分)
What is the main difference between supervised learning and unsupervised learning?
ASupervised learning is done using ground truth
BUnsupervised learning does not have labeled outputs
CSupervised learning aims to learn a function
DUnsupervised learning can infer the natural structure present in a set of data points
正确答案:A
4.单选题 (3分)
How many columns of a 10*10 long data type table will be converted to a wide data type for a column?
A17
B18
C19
D20
正确答案:B
5.单选题 (3分)
Which of the following descriptions of the weighted mean is incorrect.
Acalculated by multiplying the weight (or probability)
Bassociated with a particular event or outcome with its associated quantitative outcome
Cvery useful when calculating a theoretically expected outcome
Deach outcome has a different probability of occurring
正确答案:D
6.单选题 (3分)
The univariate visualizations don't include______.
Aboxplot
Bhistogram
Cline chart
Ddensity estimate
正确答案:C
7.单选题 (3分)
What data type does the code "a=as.vector(c(list('a',1),list('afo',222)))" will assign to a?
Avector
Blist
CNULL
DThe code will be Error
正确答案:B
8.单选题 (3分)
What output does the code "as.integer(as.factor(c(0,1)))" will have?
A[1] 0 1
B[1] 1 0
C[1] 1 2
D[1] 2 1
正确答案:C
9.单选题 (3分)
Which of the following statements about Tidy Data is incorrect
Aevery column is variable
Bevery row is an observation
Cevery cell is a single numerical value
DAll the above descriptions about Tidy Data are correct
正确答案:C
10.单选题 (3分)
For splitting the data by one or two categorical variables, what is most suitable for us?
Atheme()
Bgeom_bar()
Cfacet_grid()
Dfacet_wrap()
正确答案:C
11.单选题 (3分)
Which layer can provide a new perspective of data interpretation for visual analysis?
AThe facets layer
BThe theme layer
CThe coordinate layer
DThe statistics layer
正确答案:C
12.单选题 (3分)
If we want to change individual elements, such as the background color or font of our title, what functions can we use?
Ageom_bar()
Btheme()
Cfacet_grid()
DFacet_wrap()
正确答案:B
13.单选题 (3分)
In regression analysis, there are _____ main hypothesis tests.
Aone
Btwo
Cthree
Dfour
正确答案:D
14.单选题 (3分)
For example, the significance level is 0.05; the corresponding confidence level is( ) .
A93%
B94%
C95%
D96%
正确答案:C
15.单选题 (3分)
Which of the following code can present the result of regression?
AAnova()
BSummary()
CConfint()
DPredict()
正确答案:B
16.单选题 (3分)
Which of the following algorithms is not a decision tree algorithm?
AID3
BYolo v5
CCART
DC4.5
正确答案:B
17.单选题 (3分)
Which of the following algorithms is not the example of an eager learner?
AK-Nearest Neighbors
BLogistic regression
CDecision tree
DNaive bayes
正确答案:A
18.单选题 (3分)
The attribute selection measure used by CART is ______.
Ainformation gain
Binformation gain ratio
Cbasic information entropy
Dgini Index
正确答案:D
19.单选题 (3分)
In which type of clustering, do you need to use the concept of dendrogram?
APrototype-based clustering
BDensity-based clustering
CHierarchical clustering
DPartitioning clustering
正确答案:C
20.单选题 (3分)
Which strategy or algorithm below belongs to Hierarchical clustering?
AAGNES
BK-means
CDBSCAN
DSMC
正确答案:A
21.单选题 (3分)
What should be alerted when you use a collaborative filtering strategy?
AIt determines the features of items that can be used to measure their similarity.
BIt could be useless at the beginning since the records you have are not enough.
CIt won't recommend an item that hasn't been bought before.
DThe "over-specialization" problem still exists.
正确答案:B
22.多选题 (4分)
What skills do data scientists need to use to deal with data?
AThe machine learning algorithms
BThe knowledge of programming languages
CProcessing of financial statements
DData visualization knowledge
正确答案:A,B,D (少选不得分)
23.多选题 (4分)
In general, histograms are plotted such that
Aempty bins are included in the graph.
Bbins are equal in width.
Cthe number of bins is up to the user.
Dbars are contiguous. That is, no empty space shows between bars unless there is an empty bin
正确答案:A,B,C,D (少选不得分)
24.多选题 (4分)
The common problems we can find with raw data can be______.
Anamely missing data
Bnoisy data
Cunstructured data
Dinconsistent data
正确答案:A,B,D (少选不得分)
25.多选题 (4分)
Which belong to auxiliary layers of the ggplot2 package?
AData
BFacets
CStatistics
DGeometries
正确答案:B,C (少选不得分)
26.多选题 (4分)
Which belongs to the classical OLS assumptions for linear regression?( )
Athe regression model is linear in the coefficients and the error term.
Ball independent variables are uncorrelated with the error term.
Cthe error term has a constant variance.
Dthe error term is normally distributed.
正确答案:A,B,C,D (少选不得分)
27.多选题 (4分)
Which of the following are the advantages of the decision tree algorithms?
AHard to overfit
BDifferent attribute division methods have different preferences for attribute selection
CAbility to fit data with irrelevant features and missing value
DEasy to understand, explain and visually analyze
正确答案:C,D (少选不得分)
28.多选题 (4分)
User-based and item-based filtering have different performances in different situations. Which choices below are correct?
AUser-based filtering is more suitable for time-sensitive items like news.
BItem-based filtering is more suitable when items are simple and relatively stable.
CUser-based filtering is more suitable when the number of users is more significant than the items.
DItem-based filtering is more suitable for tailoring to personal taste.
正确答案:A,B,D (少选不得分)
29.判断题 (1分)
Raw data is the original data provided by the users or collected through some techniques, such as crawlers.
正确答案:错误
30.判断题 (1分)
We can only talk about the correlation between the two variables.
正确答案:错误
31.判断题 (1分)
If an analysis requires data preprocessing, it must be done before data analysis.
正确答案:正确
32.判断题 (1分)
When we create a plot skeleton, we first need to think about how to map the data variables to the aesthetics in the graph.
正确答案:错误
33.判断题 (1分)
Hypothesis testing helps you prove if your data is statistically significant and unlikely to have occurred by chance alone.
正确答案:正确
34.判断题 (1分)
To address this concern, nearest-neighbor methods often use weighted voting or similarity moderated voting such that each neighbor's contribution is scaled by its similarity.
正确答案:正确
35.判断题 (1分)
In hierarchical clustering, you can choose the number of clusters depending on the dendrogram it produces, and can always turn back after making the wrong decision.
正确答案:错误
36.判断题 (1分)
We can use correlation analysis to predict a driver's travel time by using miles traveled and number of deliveries.( )
正确答案:错误
37.判断题 (1分)
In the narrow sense, a data science product is a product facilitated with a particular data science technique.
正确答案:正确
还木有评论哦,快来抢沙发吧~