Hive is a great bigdata tool for the capability of translating SQL querying into a series of Map-Reduce tasks. In some cases, loading data into Hive becomes an issue to be solved. A tipical situation comes as followed: a large amount of structed data was generated by some process and was saved on HDFS, and now, you are finding a way to build database upon the mentioned data, and thus it could be handled by SQL querying. The problem is that, due to the big volume of data, the high cost of moving data from the birth place to Hive data directory could be ineluctable.
In the following parts of this post, a practical solution would be presented.