# Big Data Computing Assignment 3 Answers NPTEL 2021

Big Data Computing Assignment 3 Answers NPTEL 2021:- In This Article, we have provided the Answers to Big Data Computing Assignment 3 Answers. It’s nice to see you on our site

Q1. In Spark, a ______________________is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.

• Spark Streaming
• FlatMap
• Driver
• Resilient Distributed Dataset (RDD)

Q2. Given the following definition about the join transformation in Apache Spark:

def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]

Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.

Output the result of joinrdd, when the following code is run.

val rdd1 = sc.parallelize(Seq((“m”,55),(“m”,56),(“e”,57),(“e”,58),(“s”,59),(“s”,54)))

val rdd2 = sc.parallelize(Seq((“m”,60),(“m”,65),(“s”,61),(“s”,62),(“h”,63),(“h”,64)))

val joinrdd = rdd1.join(rdd2)

joinrdd.collect

• Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
• Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)),  (s,(54,61)), (s,(54,62)))
• Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
• None of the mentioned

Answer:- Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))

Q3. Consider the following statements in the context of Spark:

Statement 1:  Spark improves efficiency through in-memory computing primitives and general computation graphs.

Statement 2:  Spark improves usability through high-level APIs in Java, Scala, Python and also provides an interactive shell.

• Only statement 1 is true
• Only statement 2 is true
• Both statements are true
• Both statements are false

Q4. True or False ?Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable.

• True
• False

Q5. Which of the following is not a NoSQL database ?

• HBase
• Cassandra
• SQL Server
• None of the mentioned

Q6. True or False ?Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk.

• True
• False

Q7. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.

• MLlib
• Spark Streaming
• GraphX
• RDDs

Q8. ____________________ is a distributed graph processing framework on top of Spark.

• MLlib
• Spark streaming
• GraphX
• All of the mentioned

Q9. Point out the incorrect statement in the context of Cassandra:

• It is a centralized key-value store
• It is originally designed at Facebook
• It is  designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure
• It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing

Answer:- It is a centralized key-value store

Q10. Consider the following statements:

Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines.

Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf).

• Only statement 1 is true
• Only statement 2 is true
• Both statements are false
• Both statements are true