How to run Sqoop command from oozie

In the Importing data from Sqoop into Hive External Table with Avro encoding updated i blogged about how you can use sqoop to import data from RDBMS into Hadoop. I wanted to test if i can use Oozie for invoking Sqoop command and i followed these steps for doing that.
  1. First i tried executing this command from my command line on Hadoop cluster to make sure that i can actually run sqoop without any problem
    sqoop import --connect jdbc:mysql://localhost/test 
    --username root 
    --password cloudera 
    --table CUSTOMER 
  2. Once the sqoop command was successfully executed i went back and deleted the CUSTOMER directory from HDFS to make sure that i could re-import data using following command
    hdfs dfs -rm -R CUSTOMER
  3. Next i went to Hue to create oozie workflow with single sqoop command that i had executed before
    But if your not using the Hue console you can create workflow.xml manually like this Also make sure to create file like this Take a look at Enabling Oozie console on Cloudera VM 4.4.0 and executing examples for information on how to run oozie job from command line
  4. Next when i ran the Oozie workflow, the job failed with following error, which indicates that Oozie does not have the MySQL JDBC driver.
    java.lang.RuntimeException: Could not load db driver class: com.mysql.jdbc.Driver
     at org.apache.sqoop.manager.SqlManager.makeConnection(
     at org.apache.sqoop.manager.GenericJdbcManager.getConnection(
     at org.apache.sqoop.manager.SqlManager.execute(
     at org.apache.sqoop.manager.SqlManager.execute(
     at org.apache.sqoop.manager.SqlManager.getColumnInfoForRawQuery(
     at org.apache.sqoop.manager.SqlManager.getColumnTypesForRawQuery(
     at org.apache.sqoop.manager.SqlManager.getColumnTypes(
     at org.apache.sqoop.manager.ConnManager.getColumnTypes(
     at org.apache.sqoop.orm.ClassWriter.getColumnTypes(
     at org.apache.sqoop.orm.ClassWriter.generate(
     at org.apache.sqoop.tool.CodeGenTool.generateORM(
     at org.apache.sqoop.tool.ImportTool.importTable(
     at org.apache.sqoop.Sqoop.runSqoop(
     at org.apache.sqoop.Sqoop.runTool(
     at org.apache.sqoop.Sqoop.runTool(
     at org.apache.sqoop.Sqoop.main(
     at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(
  5. So first thing i did was to check if mysql driver is there in the oozie shared lib by executing following commands
    export OOZIE_URL=http://localhost:11000/oozie
    oozie admin -shareliblist sqoop
    I noticed that the mysql-connector-java.jar was not there in the list of shared libs for Oozie + sqoop
  6. Next step was to find the mysql-connector-java.jar in my sandbox that i could do by finding it like this
    sudo find / -name mysql*
    I found mysql-connector-java.jar on my local machine at /var/lib/sqoop/mysql-connector-java.jar
  7. I wanted to update the Oozie shared lib to include the mysql driver jar. So i executed following command to figure out the directory where the oozie sqoop shared lib is
    oozie admin -sharelibupdate
    From this output i got HDFS directory location for Oozie shared lib which is /user/oozie/share/lib/lib_20160406022812
  8. Then i used following two commands to first copy the db driver into the oozie shared lib and making sure it is accessible to other users hdfs -copyFromLocal /var/lib/sqoop/mysql-connector-java.jar /user/oozie/share/lib/sqoop/. hdfs dfs -chmod 777 /user/oozie/share/lib/sqoop/mysql-connector-java.jar
  9. Now the last step was to let Oozie know that it should reload the sharedlib and i did that by executing following two commands
    oozie admin -sharedlibupdate
    oozie admin -shareliblist sqoop | grep mysql*
    The second command queries oozie to get current list of shared jars and i could see mysql-connector-java.jar listed in it like this
When i re-executed the ooize job again this time it ran successfully.

No comments: