学堂在线Big Data Analysis 期末考试答案

wangke 学堂在线答案 2025-03-19 10:06:44 5

Big Data Analysis - 北京理工大学 - 学堂在线

1.单选题(2分)In big data general architecture, there are three parts in data processing system, which one best describes them? ( )

Data storage, Data processing algorithm, computing engine and platform

Data storage, computing model, computing engine and platform

Data processing algorithm, computing model, computing engine and platform

Data processing algorithm, computing engine, platform

正确答案：C

2.单选题(2分)Based on the requirement we can build the business model, it includes ( ) and ( ). ( )

Conceptual Model, Logic Model

Logic Model, Physical Model

Process model, Data model

Process model, logical model

正确答案：C

3.单选题(2分)( ) extract new or modified data in the database since the last extraction, at the same time, it normally would not have a big impact on the running business system. ()

Incremental data extraction

Full extraction

Timestamp Extraction

Trigger

正确答案：A

4.单选题(2分)Which of the following big data characters best describes Data in Doubt (which means Uncertainty due to data inconsistency and incompleteness, ambiguities, latency, deception, model approximations)? ( )

Volume

Variety

Veracity

Velocity

正确答案：C

5.单选题(2分)Which of the following is NOT the Advantages of NoSQL database?

Can support ultra-large-scale data storage

Flexible data model can well support Web2.0 applications

strong horizontal expansion capabilities

Mathematical theoretical foundation

正确答案：D

6.单选题(2分)MPP(Massively Parallel Processing) improves performance through ( )parallelism. ( ) coordinates work with ( ) , ( ) coordinates workwith one or more ( ). ( ) process queries in parallel. ( ) havetheir own CPU disk memory in shared nothing architecture. High speedinterconnect for continuous pipelining of data processing. ( )

segmenthosts, Master, segment host, Segment host, segment instances,Segment instances, Segment hosts

segmentinstance, Master, segment host, Segment host, segment instances,Segment instances, Segment hosts

segmentinstance, Master, segment host, Segment host, segment instances,Segment instances, Segment instances

segmenthosts, Master, segment host, Segment host, segment instances,Segment hosts, Segment hosts

正确答案：B

7.单选题(2分)The right order of reading data in HDFS.

a)Distributed Filesystem makes an RPC call to the namenode to determine location of datanodes where files are stored in form of blocks. For each block, the namenode returns address of datanodes (metadata of blocks and datanodes) that have a copy of block. Datanodes are sorted according to proximity (depending of network topology information).

b)The client opens the file by calling open () method on Distributed Filesystem.

c)The client then calls read () on the stream. DFSInputStream, which has stored the datanode addresses for the first few blocks in the file, then connects to the first (closest) datanode for the first block in the file.

d)The Distributed Filesystem returns an FSDataInputStream (an input stream that supports file seeks) to the client for it to read data from. FSDataInputStream in turn wraps a DFSInputStream, which manages the datanode and namenode I/O.

e)Data is streamed from the datanode back to the client (in the form of packets) and read () is repeatedly called on the stream by client.

f)When the client has finished reading, it calls close () on the FSDataInputStream

g)When the end of the block is reached, DFSInputStream will close the connection to the datanode, then find the best datanode for the next block.

4-2-2.jpg

ABDCEGF

BADCEGF

BADCEFG

BACDEGF

正确答案：B

8.单选题(2分)What attributes subset selection method showed in the diagram?

Test 3-5-2.jpg

Forward Stepwise Attributes subset selection

Backward Stepwise Attributes subset selection

Combine forward selection and backward deletion

Decision tree (decision tree) induction

正确答案：A

9.单选题(2分)According to organization boundary, data resources can be divided into 2 categories. ( )

online data and offline data.

organization data and government data.

internal data and external data.

system data and IoT data.

正确答案：C

10.单选题(2分)( )is a user-friendly API standard for machine learning, will be the central high-level API used to build and train models. ( )

SaveModel

TensorFlowHub

PremadeEstimators

Tf.keras

正确答案：D

11.单选题(2分)Distributed computing ‘s idea is to use the ( ) to achieve the ( ) ()

redundancy, reliability;

reliability, redundancy;

redundancy, performance;

reliability, performance;

正确答案：A

12.单选题(2分)Superstep execution process is ( )

1) Send messages to other nodes causing them active;

2) Modify node and arc properties;

3) Remove the existing or creating new arcs;

4) Receive message from inbox;

5) Halt self until new message received;

42513

12345

42135

42351

正确答案：A

13.单选题(2分)Which of the following big data characters like Panning for gold in the sand? ( )

Value

Variety

Veracity

Velocity

正确答案：A

14.单选题(2分)About Big data term, which description is not suitable ( )

Big data can be analyzed for insights of better decisions and strategic business moves

Just large

Both structured and unstructured

Hard-to-manage volumes of data

正确答案：B

15.单选题(2分)In big data general architecture, there are four parts in data storage system, which one best describes them? ( )

Data collection, data modeling, data storage including distributed file system and distributed Database, Unified Data Access Interface

Data collection, data preprocessing, data storage including distributed file system and distributed Database, Unified Data Access Interface

Data preprocessing, data modeling, data storage including distributed file system and distributed Database, Unified Data Access Interface

Data preprocessing, data modeling, distributed file system; distributed Database

正确答案：A

16.单选题(2分)Inthe following picture, what are the right terms for each number ?

test 1-6.jpg

Data sources, Data storage, Data collection, Data Processing, Data Visualization, Report monitoring

Data sources, Data collection, Data storage, Data Visualization, Data Processing, Report monitoring

Data sources, Data collection, Data storage, Data Processing, Data Visualization, Report monitoring

Data sources, Data collection, Data storage, Data Processing, Report monitoring, Data Visualization

正确答案：C

17.单选题(2分)( ) is responsible for resource monitoring and job scheduling, ( ) monitors the health status of all ( ) and Jobs, and if it finds a failure, it will transfer the corresponding tasks to other nodes. ( ) will track the task execution progress, resource usage, and other information, and inform the ( ), and ( ) will select the appropriate task to use these resources when resources become free. ( )

JobTracker, JobTracker, TaskTrackers, JobTracker ,TaskScheduler, TaskScheduler

JobTracker, TaskTrackers, JobTracker, JobTracker ,TaskScheduler, TaskScheduler

JobTracker, JobTracker, JobTracker , TaskTrackers,TaskScheduler, TaskScheduler

JobTracker, JobTracker, TaskTrackers, TaskScheduler, JobTracker ,TaskScheduler

正确答案：A

18.单选题(2分)According to Gartner, there is estimated 20% data of organization is ( ) data, the other majority is ( ) data. ()

structured, unstructured

unstructured, structured

structured, semi-structured

unstructured, semi-structured

正确答案：A

19.单选题(2分)Which of the following is NOTthe dimensionality reduction? ()

Wavelettransformation

Attributesubset selection

Principal component analysis

Data Cube Aggregation

正确答案：D

20.单选题(2分)Redundant and repeated records belongs to the data quality category ()

Single data resource, model level

Single data resource, instance level

Multiple data resource, model level

Multiple data resource, instance level

正确答案：B

21.单选题(2分)Stormis a Native Stream Processing System, that is, the processing ofstream data is based on each piece of data, and its parallelcalculation is implemented based on a directed topology graph.Topology composed of data source- ( ) and processing unit- ( ).Topology Defines the ( ) of parallel computing, that is, designs thecalculation steps and processes from the perspective of function andarchitecture. ( )

Bolt,Spout, physical model

Spout,Bolt, physical model

Bolt, Spout, logical model

Spout, Bolt, logical model

正确答案：D

22.单选题(2分)Why do we say data is like Crude oil? Which is not the reason? ()

It is valuable

It needs to be refined

One data set can be adapted to be used for different purpose

It can be sold

正确答案：D

23.单选题(2分)Among the following which one is about organized, structured, categorized, useful, condensed, calculated data? ()

Data

Information

Wisdom

Knowledge

正确答案：B

24.单选题(2分)Which of the following description about the architecture of the Graph Parallel Computing is NOT correct ()

The whole graph is broken down into multiple "partitions"

Each partition contains a large number of nodes

Partition is a unit of execution and typically has an execution thread associated with it

A "worker" machine host one "partitions"

正确答案：D

25.单选题(2分)Spark has several components to facilitate different type of computing tasks, like streaming,Graph etc. the components include ()

1)Spark Core API2)Resilient distributed dataset (RDD),

3）Spark SQL4）Spark topology

5）Spark Streaming6）MLlib (Machine Learning Library)

7）GraphX8）Sklearn

12345

13456

13567

13578

正确答案：C

26.单选题(2分)Data modeling could include defining.()

1)Metadata

2)Data structure

3)Attributes

4)Value range

5)Association relationship

6)Consistency

7)Timeliness

12345

1234567

134567

123567

正确答案：B

27.单选题(2分)Among the following which one is about idea, learning, notion, concept, synthesized, compared, thought-out, discussed? ()

Data

Information

Wisdom

Knowledge

正确答案：D

28.单选题(2分)Spark advantages includes ()

1)Fast processing

2)Flexibility

3)In-memory computing

4)Real-time processing

5)Better analytics

6)Fault tolerance

7)Need extra persistent storage

123567

123456

134567

234567

正确答案：B

29.单选题(2分)Thedifference between machine learning and deep learning, machinelearning algorithms employ ( ) for pattern recognition, Deeplearning is modeled using ( ), both can learn in a supervised orunsupervised way. ( )

Statisticalanalysis techniques, neural networks

Neuralnetworks, statistical analysis techniques

Statisticalanalysis techniques, Statistical analysis techniques

Neuralnetworks, Neural networks

正确答案：A

30.单选题(2分)Among the following which one is about understanding, integration, applied, reflected upon, actionable, accumulated, principles, patterns, decision-making progress? ( )

Data

Information

Wisdom

Knowledge

正确答案：C

31.单选题(2分)The execution model is based on BSP (Bulk Synchronous Processing) model. In this model, there are multiple processing units proceeding in parallel in a sequence of "Supersteps".Within each "Superstep", the processing sequence will be ()

a)each processing units first receive all messages delivered to them from the preceding "superstep",

b)When all the processing unit finishes the message delivery (hence the synchronization point)

c)may queue up the message that it intends to send to other processing units.

d)The queued up message will be delivered to the destined processing units but won't be seen until the next "superstep".

e)manipulate their local data

f)the next superstep can be started,

g)the cycle repeats until the termination condition has been reached.

aedcbfg

aecdbfg

acedbfg

adecbfg

正确答案：B

32.单选题(2分)In HDFS, the name node and the data node have their own responsibilities, select the responsibilities of name node and data node respectively. Name nodes ( ),Data nodes ( )

1)Realize the mapping of data blocks to the local file system of the data node

2)Manage file system namespace

3)Store file data block

4)Save “file to data block to data node” mapping relationship

5)Scheduling client access to files

6)store the Data blocks on the local disk

7)Store the Metadata in memory for quick access

1237, 456

2457, 136

1245, 367

2456, 137

正确答案：B

33.单选题(2分)( ) Evaluate the changed data in data extraction through the DB's own log. ()

Log comparison

Timestamp

Triggers

Full table comparison

正确答案：A

34.单选题(2分)1.Which of the following are the choices of attributes subset selection methods?( C )

1)Forward Stepwise Attributes subset selection

2)Backward Stepwise Attributes subset selection

3)Combine forward selection and backward deletion

4)Principal component analysis

5)Reduction based on statistical analysis

6)Decision tree (decision tree) induction

12346

12345

12356

123456

正确答案：C

35.单选题(2分)Databaseconnection programming interfaces such as ( ) can support SQL accessby applications to the database, but they cannot provide complexfunctions such as transaction management, concurrent scheduling,buffer management, heterogeneous database conversion and inheritancein a distributed computing environment. This introduces the ( ). Itis a layer of software that provides data exchange functions on topof the database. When the system is extended and need to accesscross-platform heterogeneous databases, OS could be UNIX, Linux orWindows, forms could be mails, XML documents, EJB components, Webservices, images, audio/video files or For other unstructured data,And the technology of the big data application layer is alsodiversified and various standards. The design of the ( ) needs to becompatible with various standard technologies and products, whichintroduces the ( ).

ODBC and JDBC; DAL dataaccess layer; Unified data access interface; Unified data accessinterface;

ODBC and JDBC; DAL dataaccess layer; DAL data access layer; Unified data access interface;

DALdata access layer; ODBC and JDBC; DAL data access layer; Unifieddata access interface;

ODBCand JDBC; DAL data access layer; Unified data access interface; DALdata access layer;

正确答案：B

36.单选题(2分)( ) uses ( ) to divide the amount of resources (CPU, memory, etc.) on this node. A Task has a chance to run after it gets a ( ), and the role of the ( ) is to allocate idle ( ) on each ( ) to the Task. ( )

JobTracker,slot, slot, Hadoop scheduler, slots, TaskTracker;

TaskTracker,slot, slot, Hadoop scheduler, slots, TaskTracker;

TaskTracker,slot, slot, Task scheduler, slots, TaskTracker;

TaskTracker,slot, slot, Hadoop scheduler, task, TaskTracker;

正确答案：B

37.单选题(2分)Attribute dependence belongs to the data quality category ( )

Single data resource, model level

Single data resource, instance level

Multiple data resource, model level

Multiple data resource, instance level

正确答案：A

38.单选题(2分)The Machine Learning Pipeline in Spark MLlib is()

1.Load/Clean Data, 2. Transformer, 3. Estimator and 4. Evaluator

1.Load/clean data, 2. Feature extraction, 3. Model training and 4. Model evaluation

1.Load/clean data, 2. Feature extraction, 3. Estimator and 4. Model evaluation

1.Load/Clean Data, 2. Transformer, 3. Model training and 4. Evaluator

正确答案：A

39.单选题(2分)The correct Chronologically order of the four Paradigms is ( )

Empirical – Theoretical – Computational - Data exploration

Theoretical - Empirical - Computational - Data exploration

Empirical - Computational - Theoretical -Data exploration

Empirical - Theoretical -Data exploration - Computational

正确答案：A

40.单选题(2分)The right order of writing to the Datanodes in HDFS is ( )

a)DistributedFileSystem makes an RPC call to the namenode to create a new file in the filesystem’s namespace, with no blocks associated with it.

b)The client creates the file by calling create() method on DistributedFileSystem.

c)The list of datanodes forms a pipeline, and default replication level is three, so there are three nodes in the pipeline. The DataStreamer streams the packets to the first datanode in the pipeline, which stores the packet and forwards it to the second datanode in the pipeline.

d)The namenode performs various checks to make sure the file doesn’t already exist and the client has the right permissions to create the file. If all these checks pass, the namenode makes a record of the new file; otherwise, file creation fails and the client is thrown an IOException.

e)TheDistributedFileSystem returns an FSDataOutputStream for the client to start writing data to datanode. FSDataOutputStream wraps a DFSOutputStream which handles communication with the datanodes and namenode.

f)As the client writes data, DFSOutputStream splits it into packets, which it writes to an internal queue, called the data queue. The data queue is consumed by the DataStreamer, which is responsible for asking the namenode to allocate new blocks by picking a list of suitable datanodes to store the replicas.

g)Similarly, the second datanode stores the packet and forwards it to the third (and last) datanode in the pipeline.

h)DFSOutputStream also maintains an internal queue of packets that are waiting to be acknowledged by datanodes, called the ack queue. A packet is removed from the ack queue only when it has been acknowledged by all the datanodes in the pipeline.

i)When the client has finished writing data, it calls close() on the stream. It flushes all the remaining packets to the datanode pipeline and waits for acknowledgments before contacting the namenode to signal that the file is complete.

j)The namenode already knows which blocks the file is made up of , so it only has to wait for blocks to be minimally replicated before returning successfully.

4-2-1.jpg

badefcghij

abdefcghij

badefchgij

badefcgihj

正确答案：A

41.主观题(10分)Dynamic Web Crawler :

Shrimp Shopping website Crawler Requirement

Task:

You need to crawl kitchen knife information from shrimp skin website and store them insearch_keyword.csv. These information includes itemid-shopid-catid,commodity name, price, rating_star and so on.

Shrimp skin website: https://xiapi.xiapibuy.com/

Reference: Manual and codes.

Submission requirement:

Please upload a screenshot of the crawled result -- search_keyword.csv, the file name should be ID_NAME.PNG.

Dynamic Crawler 1.jpg

我的答案

无

查看解析 

42.判断题(2分)Thedata in HDFS is immutable. ()

正确答案：正确

43.判断题(2分)Incustomer Collaborative filtering, the similar users are definedbased on the common items they purchased ()

正确答案：正确

44.判断题(2分)InHDFS, each storage file is first divided into multiple data blockswith a flexible length according to the data size. ()

正确答案：错误

45.判断题(2分)we can find one kind of tool to deal with all the data manage problems of the Big data. ()

正确答案：错误

46.判断题(2分)HDFSsupport batch reading, writing operation and updating operation. ()

正确答案：错误

47.判断题(2分)Itembased collaborative Filtering algorithm calculate the itemsimilarity according to item features. ()

正确答案：错误

48.判断题(2分)DFSdistributed file system provides the logical storage structure ofthe data. ()

正确答案：错误

49.判断题(2分)we can find one kind of tool to deal with all the data manage problems of the Database. ()

正确答案：正确

50.判断题(2分)Databaseprovides the physical storage structure; ()

正确答案：错误

51.判断题(2分)Hadoop is the only big data architecture.

正确答案：错误

学堂在线Big Data Analysis 期末考试答案

手机号用于查询订单，请认真核对

请输入手机号或商家订单号

商家订单号查看步骤