Apache Zeppelin is a web based multi purpose notebook for data analytics and data visualization that is very much paramount for data engineers.Here i will discuss about some of its components .
Web socket is the most striking feature in Apache zeppelin.The mode of communication in a normal web application is HTTP protocol based,which is a two way based communication. For each request, a new connection needs to established for each request. First the client needs to send request and server will process the request to send response.
But web socket keeps the connection between the client and server always open until it is explicitly closed, which enables server/client to broadcast the message/response to client/server after initial establishment of connection. This enables the user to be updated with latest data as soon as server process the request as the connection is already established.
General applications which use web sockets are
- Chat applications
For real time example, financial results should be collaborated among the sales team.Analysis of the financial results in visualized manner helps the company to understand the growth of revenue. We need to drill down the financial results to get finer details, even though it can be attained by building an web application on our own.For Detailed information on web sockets, Please refer the http://blog.teamtreehouse.com/an-introduction-to-websockets
The above can be achieved by using Apache Zeppelin. Lets understand why ? When we build a web application on our own, we need to write the business logic for department based revenue and also need to leverage on the chart libraries such D3.js, Google charts, etc. Here we can make sure of one of the main principles of Software engineering, Re-usability. Apache Zeppelin helps us in coming up with data visualization, we just need to provide the functional logic.
In Apache Zeppelin, once we open a notebook a web socket connection is established between the notebook and the zeppelin server.when you run a paragraph of the notebook an event ‘RUN_PARAGRAPH’ is sent to zeppelin server to process the paragraph. Once the zeppelin server gets the data/code from the paragraph, zeppelin just adds it to job queue and wait for it to process the paragraph using interpreters configured in the system. Then zeppelin broadcast “PROGRESS” event to the client/notebook to track the progress of the paragraph.Finally “PARAGRAPH_APPEND_OUTPUT” event broadcast the processed output to the notebook.
From Apache Zeppelin notebook URL can be shared among collaborators. Apache Zeppelin will broadcast any changes in real time, just like the collaboration in Google docs.Apache Zeppelin also provides an URL to display the result only which can embedded in any application.In addition to above, there are some more interesting things which you can find below.
Zeppelin Interpreter is a plug-in which enables Zeppelin users to use a specific language/data-processing-backend. Currently, Zeppelin supports many interpreters such as Scala ( with Apache Spark ), Python, SparkSQL, JDBC, Cassandra, hive, R and so on.
By default, Apache Zeppelin prints interpreter response as a plain text like below
But we can customize the same with some directives which are available inside apache zeppelin
%html directive treats your output as HTML as captured below
%table directive leverages Zeppelin’s built in visualization to make use of tabular structure.