打造分布式文件系统--HDFS

但分布式文件系统之路才刚刚开始呢.这一旅程的第一步就是HDFS

关于其介绍网上太多,而且基本类似,就不在这里阐述了.大家自行搜索

http://www.cnblogs.com/forfuture1978/archive/2010/03/14/1685351.html

Hadoop的命令行提供了一套完整命令接口，就像Linux命令一样方便使用。

如下:

bin/hadoop dfs
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Usage: hadoop fs [generic options]
    [-appendToFile ...]
    [-cat [-ignoreCrc] ...]
    [-checksum ...]
    [-chgrp [-R] GROUP PATH...]
    [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
    [-chown [-R] [OWNER][:[GROUP]] PATH...]
    [-copyFromLocal [-f] [-p] ...]
    [-copyToLocal [-p] [-ignoreCrc] [-crc] ...]
    [-count [-q] ...]
    [-cp [-f] [-p] ...]
    [-createSnapshot []]
    [-deleteSnapshot]
    [-df [-h] [ ...]]
    [-du [-s] [-h] ...]
    [-expunge]
    [-get [-p] [-ignoreCrc] [-crc] ...]
    [-getmerge [-nl]]
    [-help [cmd ...]]
    [-ls [-d] [-h] [-R] [ ...]]
    [-mkdir [-p] ...]
    [-moveFromLocal ...]
    [-moveToLocal]
    [-mv ...]
    [-put [-f] [-p] ...]
    [-renameSnapshot]
    [-rm [-f] [-r|-R] [-skipTrash] ...]
    [-rmdir [--ignore-fail-on-non-empty]

Generic options supported are
-conf <configuration file> specify an application configuration file
-D <property=value> use value for given property
-fs <local|namenode:port> specify a namenode
-jt <local|jobtracker:port> specify a job tracker
-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines

在这里可查看
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/FileSystemShell.html#expunge

不做过多演示,直接进入java API 环节

由于之前hadoop 版本为2.2 所以我们需要引入hadoop-client相对应版本.

在maven 项目pom.xml文件中添加相关仓库

<repositories>
<repository>
<id>cloudera-repo-releases</id>
<url>https://repository.cloudera.com/artifactory/repo/</url>
</repository>
</repositories>

项目依赖中添加
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0-cdh5.0.0-beta-1</version>
</dependency>

代码示例:

    /**
 * 创建目录
 * @param dir
 * @throws IOException
 */
public static void mkdirs(String dir) throws IOException{
    Path path = new Path(hdfs+dir);
    FileSystem fs = FileSystem.get(URI.create(hdfs),conf);
    if (!fs.exists(path)) {
        fs.mkdirs(path);
    }else {

    }
    fs.close();
}
//删除
public static void rmDir(String dir) throws IOException{
    Path path = new Path(hdfs+dir);
    FileSystem fs = FileSystem.get(URI.create(hdfs),conf);
    fs.deleteOnExit(path);
    fs.close();
}
//显示
public static void cat(String file) throws IOException{
    Path path = new Path(file);
    FileSystem fs = FileSystem.get(URI.create(hdfs), conf);
    FSDataInputStream fsdis = null;
    System.out.println("cat: " + file);
    try {
        fsdis = fs.open(path);
        IOUtils.copyBytes(fsdis, System.out, 4096, false);
    } finally {
        IOUtils.closeStream(fsdis);
        fs.close();
    }
}

//从本地copy
public static void upload() throws IOException{
    String path ="/home/neoyin/Desktop/testtest";
    String fpath =hdfs+"ttt";
    InputStream in = new BufferedInputStream(new FileInputStream(path));
    FileSystem fs = FileSystem.get(URI.create(fpath),conf);
    OutputStream out =fs.create(new Path(fpath),new Progressable() {
        @Override
        public void progress() {
            System.out.println("===");
        }
    });
    IOUtils.copyBytes(in, out, 4096, true);
}

以上只是通过简单java API 去了解并简单应用hadoop .

执行上述程序可以在http://$master_ip:50070
查看文件的创建等是否正确

参考: