Search Tutorials

Top Apache Storm (2023) Interview Questions | JavaInUse

Apache Storm Interview Questions

In this post we will look at Apache Storm Interview questions. Examples are provided with explanation.

  1. What is Apache Storm?
  2. What do you mean by "spouts" and "bolts"?
  3. Where would you use Apache Storm?
  4. What are the characteristics of Apache Storm?
  5. How would one split a stream in Apache Storm?
  6. Is there an effortless approach to deploy Apache storm on a local machine (say, Ubuntu) for evaluation?
  7. What is a directed acyclic graph in Storm?
  8. What do you mean by Nodes?
  9. What are the Elements of Storm?
  10. What are Storm Topologies?
  11. What is the TopologyBuilder class?
  12. How do you Kill a topology in Storm?
  13. What transpires when Storm kills a topology?
  14. What is the suggested approach for writing integration tests for an Apache Storm topology in Java?
  15. What does the swap command do?
  16. How do you monitor topologies?
  17. How do you rebalance the number of executors for a bolt in a running Apache Storm topology?
  18. What are Streams?
  19. What can tuples hold in Storm?
  20. What is Kryo?
  21. What are Spouts?
  22. What are Bolts?

What is Apache Storm?

Apache Storm is a free and open-source distributed stream processing framework composed predominantly in Clojure. Founded by Nathan Marz and the unit at BackType, the project open-sourced following its acquisition by Twitter. Storm makes it simple to dependably process unbounded streams of information, producing real-time processing in place of what Hadoop did for batch processing. Storm is uncomplicated, can be utilized with several programming languages.

What do you mean by "spouts" and "bolts"?

Apache Storm utilizes custom-created "spouts" and "bolts" to describe information origins and manipulations to provide batch, distributed processing of streaming data.
Spout/Bolt in Storm

Where would you use Apache Storm?

Storm is used for: Stream processing- Apache Storm is adopted to the processing of a stream of data in real-time and update numerous databases. The processing rate must balance that of the input data. Distributed RPC- Apache Storm can parallelize a complicated query, enabling its computation in real-time. Continuous computation- Data streams are continuously processed, and Storm presents the results to customers in real-time. This might need the processing of every message when it reaches or building it in tiny batches over a brief period. Streaming trending themes from Twitter into web browsers is an illustration of continuous computation. Real-time analytics- Apache Storm will interpret and respond to data as it arrives from multiple data origins in real-time.

What are the characteristics of Apache Storm?

  1. It is a speedy and secure processing system.
  2. It can manage huge volumes of data at tremendous speeds.
  3. It is open-source and a component of Apache projects.
  4. It aids in processing big data.
  5. Apache Storm is horizontally expandable and fault-tolerant.

How would one split a stream in Apache Storm?

One can use multiple streams if one's case requires that, which is not really splitting, but we will have a lot of flexibility, we can use it for content-based routing from a bolt for example: Declaring the stream in the bolt:
public void declareOutputFields(final OutputFieldsDeclarer outputFieldsDeclarer) {
    outputFieldsDeclarer.declareStream("stream1", new Fields("field1"));
    outputFieldsDeclarer.declareStream("stream2", new Fields("field1"));
Emitting from the bolt on the stream:
collector.emit("stream1", new Values("field1Value"));
You listen to the correct stream through the topology
builder.setBolt("myBolt1", new MyBolt1()).shuffleGrouping("boltWithStreams", "stream1");
builder.setBolt("myBolt2", new MyBolt2()).shuffleGrouping("boltWithStreams", "stream2");

Is there an effortless approach to deploy Apache storm on a local machine (say, Ubuntu) for evaluation?

You use the below code, the topology is submitted to the cluster through the active nimbus node.
StormSubmitter.submitTopology("Topology_Name", conf, Topology_Object);
But if you use the below code, the topology is submitted locally in the same machine. In this case, a new local cluster is created with nimbus, zookeepers, and supervisors in the same machine.
LocalCluster cluster = new LocalCluster();
cluster.submitTopology("Topology_Name", conf, Topology_Object);

What is a directed acyclic graph in Storm?

Storm is a "topology" in the form of a directed acyclic graph (DAG) with spouts and bolts serving as the graph vertices. Edges on the graph are called streams and forward data from one node to the next. Collectively, the topology operates as a data alteration pipeline.
DAG Visualisation

What do you mean by Nodes?

The two classes of nodes are the Master Node and Worker Node. The Master Node administers a daemon Nimbus which allocates jobs to devices and administers their performance. The Worker Node operates a daemon known as Supervisor, which distributes the responsibilities to other worker nodes and manages them as per requirement.
Nodes in Storm

What are the Elements of Storm?

Storm has three crucial elements, viz., Topology, Stream, and Spout. Topology is a network composed of Stream and Spout. The Stream is a boundless pipeline of tuples, and Spout is the origin of the data streams which transforms the data into the tuple of streams and forwards it to the bolts to be processed.

What are Storm Topologies?

The philosophy for a real-time application is inside a Storm topology. A Storm topology is comparable to MapReduce. One fundamental distinction is that a MapReduce job ultimately concludes, whereas a topology continues endlessly (or until you kill it, of course). A topology is a graph of spouts and bolts combined with stream groupings.

What is the TopologyBuilder class?

java.lang.Object -> org.apache.storm.topology.TopologyBuilder
public class TopologyBuilder
extends Object
TopologyBuilder displays the Java API for defining a topology for Storm to administer. Topologies are Thrift formations in the conclusion, but as the Thrift API is so repetitive, TopologyBuilder facilitates generating topologies. Template for generating and submitting a topology:
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("1", new TestWordSpout(true), 5);
builder.setSpout("2", new TestWordSpout(true), 3);
builder.setBolt("3", new TestWordCounter(), 3)
.fieldsGrouping("1", new Fields("word"))
.fieldsGrouping("2", new Fields("word"));
builder.setBolt("4", new TestGlobalCount())
Map conf = new HashMap();
conf.put(Config.TOPOLOGY_WORKERS, 4);
StormSubmitter.submitTopology("mytopology", conf, builder.createTopology());

How do you Kill a topology in Storm?

storm kill topology-name [-w wait-time-secs]
Kills the topology with the name: topology-name. Storm will initially deactivate the topology's spouts for the span of the topology's message timeout to let all messages currently processing finish processing. Storm will then shut down the workers and clean up their state. You can annul the measure of time Storm pauses between deactivation and shutdown with the -w flag.

What transpires when Storm kills a topology?

Storm does not kill the topology instantly. Instead, it deactivates all the spouts so they don't release any more tuples, and then Storm pauses for Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS moments before destroying all workers. This provides the topology sufficient time to finish the tuples it was processing while it got destroyed.

What is the suggested approach for writing integration tests for an Apache Storm topology in Java?

You can utilize LocalCluster for integration testing. You can look at some of Storm's own integration tests for inspiration here. Tools you want to use are the FeederSpout and FixedTupleSpout. A topology where all spouts implement the CompletableSpout interface can be run until fulfillment using the tools in the Testing class. Storm tests can also decide to "simulate time" which implies the Storm topology will idle till you call LocalCluster.advanceClusterTime. This can allow you to do asserts in between bolt emits, for example.

What does the swap command do?

A proposed feature is to achieve a storm swap command that interchanges a working topology with a brand-new one, assuring minimum downtime and no risk of both topologies working on tuples simultaneously.

How do you monitor topologies?

The most suitable place to monitor a topology is utilizing the Storm UI. The Storm UI gives data about errors occurring in tasks, fine-grained statistics on the throughput, and latency performance of every element of each operating topology.

How do you rebalance the number of executors for a bolt in a running Apache Storm topology?

You continually need to have larger (or equal number of) jobs than executors. As the quantity of tasks is fixed, you need to define a larger initial number than initial executors to be able to scale up the number of executors throughout the runtime. You can see the number of tasks, like a maximum number of executors:
#executors <= #numTasks
See here for details.

What are Streams?

A Stream is the core concept in Storm. A stream is a boundless series of tuples that are processed and produced in parallel in a distributed manner. We define Streams by a schema that represents the fields in the stream's records.

What can tuples hold in Storm?

By default, tuples can include integers, longs, shorts, bytes, strings, doubles, floats, booleans, and byte arrays. You can further specify your serializers so that custom varieties can be utilized natively.

How do we check for the httpd.conf consistency and the errors in it?

We check the configuration file by using:
httpd -S
The command gives a description of how Storm parsed the configuration file. A careful examination of the IP addresses and servers might help in uncovering configuration errors.

What is Kryo?

Storm utilizes Kryo for serialization. Kryo is a resilient and quick serialization library that provides minute serializations.

What are Spouts?

A spout is the origin of streams in a topology. Generally, spouts will scan tuples from an outside source and release them into the topology. Spouts can be reliable or unreliable. A reliable spout is able to replay a tuple if it was not processed by Storm, while an unreliable spout overlooks the tuple as soon as it is emitted. Spouts can emit more than one stream. To do so, declare multiple streams utilizing the declareStream method of OutputFieldsDeclarer and define the stream to emit to when applying the emit method on SpoutOutputCollector. The chief method on spouts is nextTuple. nextTuple either emits a distinct tuple into the topology or just returns if there are no new tuples to emit. It is important that nextTuple does not block any spout implementation as Storm calls all the spout methods on the corresponding thread. Other chief methods on spouts are ack and fail. These are called when Storm identifies that a tuple emitted from the spout either successfully made it through the topology or failed to be achieved. ack and fail are only called for reliable spouts. See the Javadoc for more information.

What are Bolts?

All processing in topologies is done in bolts. Bolts can do everything from filtering, aggregations, functions, talking to schemas, joins, and more. Bolts can perform simplistic stream transmutations. Doing complicated stream transformations usually demands multiple actions and hence added bolts.

Compare Apache Storm with Kafka.

Storm vs Kafka

See Also

Spring Boot Interview Questions Apache Camel Interview Questions Drools Interview Questions Java 8 Interview Questions Enterprise Service Bus- ESB Interview Questions. JBoss Fuse Interview Questions Top ElasticSearch frequently asked interview questions