Skip to main content

Apache Airflow Interview Questions

 

1. How will you describe Airflow?

Apache Airflow is referred to an open-source platform that is used for workflow management. This one is a data transformation pipeline Extract, Transform, Load (ETL) workflow orchestration tool. It initiated its operations back in October 2014 at Airbnb. At that time, it offered a solution to manage the increasingly complicated workflows of a company. This Airflow tool allowed them to programmatically write, schedule and regulate the workflows through an inbuilt Airflow user interface. 

2. What are the problems resolved by Airflow?

Some of the issues and problems resolved by Airflow include:

  • Maintaining an audit trail of every completed task
  • Scalable in nature
  • Creating and maintaining a relationship between tasks with ease
  • Comes with a UI that can track and monitor the execution of the workflow and more.

3. What are some of the features of Apache Airflow?

Some of the features of Apache Airflow include:

  • It helps schedule all the jobs and their historical status
  • Helps in supporting executions through web UI and CRUD operations on DAG
  • Helps view Directed Acyclic Graphs and the relation dependencies

4. How does Apache Airflow act as a Solution?

Airflow solves a variety of problems, such as:

  • Failures: This tool assists in retrying in case there is a failure.
  • Monitoring: It helps in checking if the status has been succeeded or failed.
  • Dependency: There are two different types of dependencies, such as:
    • Data Dependencies that assist in upstreaming the data
    • Execution Dependencies that assist in deploying all the new changes
  • Scalability: It helps centralise the scheduler
  • Deployment: It is useful in deploying changes with ease
  • Processing Historical Data: It is effective in backfilling historical data

5. Define the basic concepts in Airflow.

Airflow has four basic concepts, such as:

  • DAG: It acts as the order’s description that is used for work
  • Task Instance: It is a task that is assigned to a DAG
  • Operator: This one is a Template that carries out the work
  • Task: It is a parameterized instance

6. Define integrations of the Airflow.

Some of the integrations that you’ll find in Airflow include:

  • Apache Plg
  • Amazon EMR
  • Kubernetes
  • Amazon S3
  • AWS Glue
  • Hadoop
  • Azure Data Lake

Comments

Popular posts from this blog

Gateway port to use 80 to 443 on Tableau Server

Change Gateway port from 80 to 443 or 8080       To change the Gateway port from 80 to 443 then use below commands First , list your nodes with the command. tsm topology list-node  Then execute the command to set port  tsm topology set-ports --node-name node1 --port-name gateway:primary --port-value 443 The above command will ask for a restart.  Check ports using  tsm topology list-ports  If some error occurs in applying changes: try disabling (enable later) external ssl first- tsm security external-ssl disable You might need to run init command again with new port- tsm reset tabcmd initialuser --server localhost:443 --username 'ADMINUSER' --password 'adminpwd'

Creating Groups using Tableau API and Python Script

 Creating Tableau Groups using Python and API Prerequisites: Python on the machine tableauserverclient module API enabled on Tableau server. Permissions to update for the user Groups.csv file . list all groups in a csv file Changes to be made in the script for below lines in the script groups = open('<Filepath>/groups.csv') pat='getyourpersonaltoken' server = TSC.Server('http://myserver.tableau.com',use_server_version=True) tokenName = 'mytokenname' Script: import tableauserverclient as TSC import csv groups = open('<Filepath>/groups.csv') grp = csv.reader(groups) sourcegroups=[] createdgrps=[] pat='getyourpersonaltoken' server = TSC.Server('http://myserver.tableau.com',use_server_version=True) tokenName = 'mytokenname' ta = TSC.PersonalAccessTokenAuth(token_name=tokenName, personal_access_token=pat) with server.auth.sign_in_with_personal_access_token(ta):    groupslist, pagination_item  = server.groups.get() ...

Tableau dirty Backup

Steps for performing a dirty restore 1. Install Tableau Server 2019.1.0 on the Dev machine 2. Stop Tableau Server 3. Copy the following folders from the current production: D:\Tableau Server\data\tabsvc\pgsql D:\Tableau Server\data\tabsvc\dataengine 4. Update the pg_hba.conf D:\Tableau Server\data\tabsvc\config\pgsql_0.<version>/pg_hba.conf Change "md5" to "trust" for the user "tblwgadmin" like this: host    all         tblwgadmin         <address>/32          md5 to : host    all         tblwgadmin        <address>/32         trust 5. Regenerate the internal token: tsm security regenerate-internal-tokens [options] [global options] tsm pending-changes apply 6. Re-index tsm maintenance reindex-search 7. Make a proper backup: tsm maintenance backup -f myBackup At this point, we w...