Sqoop Password Encryption ➽➽➽➣
Sqoop ==> A tool designed to easily IMPORT data from DBMS(Relational database management system) into Hadoop cluster and EXPORT data from hadoop cluster to RDBMS.
Sqoop has gained a lot of popularity amongst Hadoop ecosystem which made it one of the
darling tools in the Bigdata industry.
Firstly, when we import or
export data from database through Sqoop then it is mandatory to provide password either through command line interface or through a file placed on a secured server location.
Let's Discuss in detail the various options available for managing passwords in Sqoop.
The First :--PASSWORD
Let's Discuss in detail the various options available for managing passwords in Sqoop.
The First :--PASSWORD
This is a very rudimentary acceptable way of passing a database password to Sqoop if security is not a major concern.
sqoop import --connect jdbc:mysql://localhost:3306/sqoop \
sqoop import --connect jdbc:mysql://localhost:3306/sqoop \
--username root \
--password root \
--table student
Second: -P
One of the problems
with the first option --password is that it logs the
password in shell history.
Luckily, Sqoop supports reading passwords via input stream (STDIN) with echo turned off which in turn prompt users to input the password via. command line interface.
Luckily, Sqoop supports reading passwords via input stream (STDIN) with echo turned off which in turn prompt users to input the password via. command line interface.
sqoop import --connect jdbc:mysql://localhost:3306/sqoop \
--username root \
-P \
--table student
The -P option will tell Sqoop to prompt for a password. This will allow you to pass your password through the standard input stream as below snippet.
The -P option will tell Sqoop to prompt for a password. This will allow you to pass your password through the standard input stream as below snippet.
Third: Passing password through a password file :
The Password file is stored in HDFS(Hadoop Distributed File System) not in local system.
This password file should have 400 (read+write -Only for owner) permission so that it is not exposed to other users.
This password file should have 400 (read+write -Only for owner) permission so that it is not exposed to other users.
Let's follow few steps to create a password file:
- vi sqoop.pwd — Enter the password and save it.
- Transfer the sqoop.pwd file to HDFS by the command below:
hadoop fs –put sqoop.pwd /sqoop.pwd
- Change the ownership and permission of sqoop.pwd file for more security.
- Remove the sqoop.pwd in current directory(non-hdfs)
Post which execute the below command:
sqoop import --connect jdbc:mysql://localhost:3306/sqoop
\
--username root \
--password-file /sqoop.pwd \
--table student \
--target-dir '/sqoop_output/student_test'
Fourth: HADOOP CREDENTIAL PROVIDER
API
In Hadoop 2.6, a fundamental feature
was introduced – the Credential API.
The CredentialProvider API in Hadoop provides a layer of abstraction between application and password the way they are stored
Sqoop 1.4.5 and later versions support the credential API keystore.
Now let's follow below steps to generate an encrypted credential keystore.
The CredentialProvider API in Hadoop provides a layer of abstraction between application and password the way they are stored
Sqoop 1.4.5 and later versions support the credential API keystore.
Now let's follow below steps to generate an encrypted credential keystore.
Note: On prompt, enter the password
that will be used to access the database.
usre@nn1:- $ hadoop credential create
mydb.password.alias -provider jceks://hdfs/user/root/mysql.password.jceks
Enter password:
Enter password again:
After successful generation of keystore below message will be printed.
After successful generation of keystore below message will be printed.
mydb.password.alias has been
successfully created.
org.apache.hadoop.security.alias.JavaKeyStoreProvider
has been updated.
Now you can make use of mydb.password.alias
file in your sqoop command
Add Below Property in sqoop command and change the password.
Add Below Property in sqoop command and change the password.
-Dhadoop.security.credential.provider.path
= jceks://hdfs/user/root/mysql.password.jceks
Then follow the below command:
Then follow the below command:
sqoop import -Dhadoop.security.credential.provider.path=jceks://hdfs/user/root/mysql.password.jceks
--connect jdbc:mysql://localhost:3306/sqoop \
--username sqoop \
--password-alias mydb.password.alias \
--table stdent \
--username sqoop \
--password-alias mydb.password.alias \
--table stdent \
--target-dir '/sqoop_output/student_test'
This way password is hidden inside
jceks://hdfs/user/root/mysql.password.jceks
Note: Alias “mydb.password.alias” is used to provide the password to the sqoop command to connect to the source database.