Sunday, 22 January 2017

Sqoop Password Encryption

Sqoop Password Encryption ➽➽➽➣

Sqoop ==> A tool designed to easily IMPORT data from DBMS(Relational database management system) into Hadoop cluster and EXPORT data from hadoop cluster to RDBMS.





Sqoop has gained a lot of popularity amongst Hadoop ecosystem which made it one of the darling tools in the Bigdata industry.

Firstly, when we import or export data from database through Sqoop then it is mandatory to provide password either through command line interface or through a file placed on a secured server location.

Let's Discuss in detail the various options available for managing passwords in Sqoop.

The First :--PASSWORD

This is a very rudimentary acceptable way of passing a database password to Sqoop if security is not a major concern.

sqoop  import   --connect jdbc:mysql://localhost:3306/sqoop  \
--username root  \
--password root  \
--table student

Second: -P

One of the problems with the first option  --password is that it logs the password in shell history.

Luckily, Sqoop supports reading passwords via input stream (STDIN) with echo turned off which in turn prompt users to input the password via. command line interface.

sqoop import  --connect jdbc:mysql://localhost:3306/sqoop \
--username root  \
-P  \
--table student

The -P option will tell Sqoop to prompt for a password. This will allow you to pass your password through the standard input stream as below snippet.






Third: Passing password through a password file :

The Password file is stored in HDFS(Hadoop Distributed File System) not in local system.

This password file should have 400 (read+write -Only for owner) permission so that it is not exposed to other users.

Let's follow few steps to create a password file:


  • vi sqoop.pwd — Enter the password and save it.
  • Transfer the sqoop.pwd file to HDFS by the command below:

                             hadoop fs –put sqoop.pwd /sqoop.pwd

  • Change the ownership and permission of sqoop.pwd file for more security.
  • Remove the sqoop.pwd in current directory(non-hdfs)

Post which execute the below command:

sqoop import --connect jdbc:mysql://localhost:3306/sqoop \
--username root \
--password-file /sqoop.pwd \
--table student \
--target-dir  '/sqoop_output/student_test'

Fourth:  HADOOP CREDENTIAL PROVIDER API

In Hadoop 2.6, a fundamental feature was introduced – the Credential API.

The CredentialProvider API in Hadoop provides a layer of abstraction between application and password the way they are stored

Sqoop 1.4.5 and later versions support the credential API keystore.

Now let's follow below steps to generate an encrypted credential keystore.

Note: On prompt, enter the password that will be used to access the database.

usre@nn1:- $ hadoop credential create mydb.password.alias -provider jceks://hdfs/user/root/mysql.password.jceks
Enter password:
Enter password again:

After successful generation of keystore below message will be printed.

mydb.password.alias has been successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider has been updated.

Now you can make use of mydb.password.alias file in your sqoop command

Add Below Property in sqoop command and change the password.

-Dhadoop.security.credential.provider.path = jceks://hdfs/user/root/mysql.password.jceks

Then follow the below command:

sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/root/mysql.password.jceks
--connect jdbc:mysql://localhost:3306/sqoop \
--username sqoop \
--password-alias mydb.password.alias \
--table stdent \
--target-dir '/sqoop_output/student_test'

This way password is hidden inside jceks://hdfs/user/root/mysql.password.jceks
Note: Alias “mydb.password.alias” is used to provide the password to the sqoop command to connect to the source database.