Big Data Computing Assignment 3 Answers NPTEL 2021:- In This Article, we have provided the Answers to Big Data Computing Assignment 3 Answers. It’s nice to see you on our site
Q1. In Spark, a ______________________is a read-only collection of objects partitioned across a set of machines that can be rebuilt if a partition is lost.
- Spark Streaming
- FlatMap
- Driver
- Resilient Distributed Dataset (RDD)
Answer:- Resilient Distributed Dataset (RDD)
Q2. Given the following definition about the join transformation in Apache Spark:
def join[W](other: RDD[(K, W)]): RDD[(K, (V, W))]
Where join operation is used for joining two datasets. When it is called on datasets of type (K, V) and (K, W), it returns a dataset of (K, (V, W)) pairs with all pairs of elements for each key.
Output the result of joinrdd, when the following code is run.
val rdd1 = sc.parallelize(Seq((“m”,55),(“m”,56),(“e”,57),(“e”,58),(“s”,59),(“s”,54)))
val rdd2 = sc.parallelize(Seq((“m”,60),(“m”,65),(“s”,61),(“s”,62),(“h”,63),(“h”,64)))
val joinrdd = rdd1.join(rdd2)
joinrdd.collect
- Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (h,(63,64)), (s,(54,61)), (s,(54,62)))
- Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (e,(57,58)), (s,(54,61)), (s,(54,62)))
- Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
- None of the mentioned
Answer:- Array[(String, (Int, Int))] = Array((m,(55,60)), (m,(55,65)), (m,(56,60)), (m,(56,65)), (s,(59,61)), (s,(59,62)), (s,(54,61)), (s,(54,62)))
Q3. Consider the following statements in the context of Spark:
Statement 1: Spark improves efficiency through in-memory computing primitives and general computation graphs.
Statement 2: Spark improves usability through high-level APIs in Java, Scala, Python and also provides an interactive shell.
- Only statement 1 is true
- Only statement 2 is true
- Both statements are true
- Both statements are false
Answer:- Both statements are true
Q4. True or False ?Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable.
- True
- False
Answer:- True
Q5. Which of the following is not a NoSQL database ?
- HBase
- Cassandra
- SQL Server
- None of the mentioned
Answer:- SQL Server
Q6. True or False ?Apache Spark potentially run batch-processing programs up to 100 times faster than Hadoop MapReduce in memory, or 10 times faster on disk.
- True
- False
Answer:- True
Q7. ______________ leverages Spark Core fast scheduling capability to perform streaming analytics.
- MLlib
- Spark Streaming
- GraphX
- RDDs
Answer:- Spark Streaming
Q8. ____________________ is a distributed graph processing framework on top of Spark.
- MLlib
- Spark streaming
- GraphX
- All of the mentioned
Answer:- GraphX
Q9. Point out the incorrect statement in the context of Cassandra:
- It is a centralized key-value store
- It is originally designed at Facebook
- It is designed to handle large amounts of data across many commodity servers, providing high availability with no single point of failure
- It uses a ring-based DHT (Distributed Hash Table) but without finger tables or routing
Answer:- It is a centralized key-value store
Q10. Consider the following statements:
Statement 1: Scale out means grow your cluster capacity by replacing with more powerful machines.
Statement 2: Scale up means incrementally grow your cluster capacity by adding more COTS machines (Components Off the Shelf).
- Only statement 1 is true
- Only statement 2 is true
- Both statements are false
- Both statements are true
Answer:- Both statements are false
For More NPTEL Answers:- CLICK HERE
Join Our Telegram:- CLICK HERE
Big Data Computing Assignment 3 Answers NPTEL 2021:- In This Article, we have provided the Answers to Big Data Computing Assignment 3 Answers. It’s nice to see you on our site