欧美三区_成人在线免费观看视频_欧美极品少妇xxxxⅹ免费视频_a级毛片免费播放_鲁一鲁中文字幕久久_亚洲一级特黄

hadoop 入門(mén)

系統(tǒng) 2617 0

轉(zhuǎn)載的。

原文出自? http://www.infosci.cornell.edu/hadoop/mac.html ?

?

?

NOTICE: The Web Lab Hadoop cluster was closed at the end of September 2011

Quick Guide to Developing and Running Hadoop Jobs (Mac OS X 10.6)

This guide is written to help Cornell students using Mac OS X 10.6 with setting up a development environment for working with? Hadoop and running Hadoop jobs on the? Cornell Center for Advanced Computing (CAC) ?Hadoop cluster. This guide will walk you through compiling and running a simple example Hadoop job. More information is available at the official? Hadoop Map-Reduce Tutorial.

The overall process of developing a Hadoop job is as follows:

  1. Install Hadoop on your development machine (personal or lab computer)
  2. Compile the Hadoop job, create a JAR file
  3. Run the Hadoop job JAR file on your development machine, for testing and debugging
  4. Run the Hadoop job JAR file on the CAC Hadoop cluster, for production

1. Installing Hadoop

This section shows you how to download Hadoop and prepare it for use on a Mac machine. Note: Hadoop versions after 0.19.2 require Java version 1.6. The following instructions take this into account.

  1. Obtain the latest stable Hadoop release. The file is named? hadoop- version .tar.gz ?and can be obtained? here . Unzip the downloaded file and place the resulting folder on your Desktop (or other location).

  2. To make hadoop run on a Mac, you will need to edit two files. Open the file? conf/hadoop-env.sh ?within the hadoop folder you just unzipped in your favorite text editor. Find the following line in the file:?

    # export JAVA_HOME=/usr/lib/j2sdk1.6-sun ?

    and change it to:?

    export JAVA_HOME=/System/Library/Frameworks/JavaVM.framework/Versions/1.6.0/ ?

    Save the file. Second, open the file? bin/hadoop ?within the hadoop folder in your favorite text editor. Search the file for the following line:?

    JAVA=$JAVA_HOME/bin/java ?

    and change it to:?

    JAVA=$JAVA_HOME/Commands/java ?

    Save the file and exit the editor. You have now set up Hadoop for development purposes on your computer.

2. Compiling a Hadoop job into a JAR file

This section guides you through compiling the? WordCount example ?available in the? Hadoop Map-Reduce Tutorial. ?This section assumes you are using the? Eclipse ?IDE. If this is not the case, you should be able to adapt these instructions for your IDE.

  1. Create a new Java Project.
    Launch Eclipse, and from the File Menu select New, then use the Wizard to create a new Java Project. Enter a project name, in this example? WordCount . Make sure you that the selected JRE is of version 1.6.0. Click Finish.

  2. Add hadoop library to project
    In Eclipse, right-click (control-click), on your project, go to Build Paths then Add External Archives. Browse to the hadoop folder on your desktop and select the file? hadoop- version -core.jar , click Open.

  3. Add source code file
    From the File Menu, select New, then File. Select the parent folder? WordCount/src ?(make sure this is right or you will encounter trouble when exporting the JAR file below.) and name the new file? WordCount.java ?click Finish. Copy? this code ?and paste it into the new file and save it. Eclipse will compile the file as soon as you save it.
  4. Export JAR file
    From the File Menu, select Export. From under Java select JAR file, click Next. Select all resources to be exported. In this case, select the entire WordCount project. Make sure the export classes checkbox is checked. Select an export destination for your JAR file - you can use your Desktop, or some other directory. For simplicity, name the file WordCount.jar and export it to your Desktop.

3. Running a Hadoop job on your development machine

This section shows you how to run your job on your own machine, for testing purposes. Hadoop will run in "standalone mode", which means that it will run within a single process, not taking advantage of any parallel processing. This will be much slower than running on the cluster, so you may want to reduce the data size set for testing.

  1. Create or obtain test data
    For this example, the input data will be this web page. Copy this entire web page, and using your favorite text editor save it as a plain text file named? testing.txt . Place this file within a folder called? input ?on your Desktop.

  2. Run the job
    First, go to the command line. (To access the command line, go to the finder, then to "Applications", then "Utilities" and finally launch "Terminal"). If you are not familiar with the UNIX command line,? here is a basic guide . Change into your hadoop directory ~/Desktop/hadoop-0.19.2 ?or similar. Execute the following command

    ./bin/hadoop jar ~/Desktop/WordCount.jar WordCount ~/Desktop/input ~/Desktop/output

    You may need to alter the paths if any of the files were saved to different places.

  3. Retrieve the results
    The results have been written to a new folder called? output ?on your Desktop. There should be one file, named? part-00000 ?which lists all the words on this web page, along with their occurrence count. Note, that before running hadoop again you will need to delete the entire output folder, since hadoop will not do this for you.

4. Running a Hadoop job on the CAC cluster

This section shows you how to take the JAR file you created above along with the test data, and run the job on the CAC cluster.

  1. Obtain a CAC account
    If you are taking a course which requires the use of the cluster, the instructor should organize the CAC account for you. If you are using the cluster for research, the Principal Investigator will add you to their CAC project. In either case, you will receive an email to your Cornell email address with your username and password for the CAC.

  2. Use SSH to connect to the job tracker node
    To connect to the cluster and run Hadoop jobs, use? SSH ?in the Macintosh Terminal window, which provides a Bash shell. Run the following from the Bash command line. First, connect to the CAC:?

    ssh? netid @wl01.cac.cornell.edu ?

    where? netid ?is replaced by your CAC username. Note: the address starts with doubleu-el-zero-one NOT doubleu-zero-one-zero. Enter your CAC password when prompted. The first time that you log in you will be required to change your password to something secure and easy to remember. Once you are logged in you will be placed in your CAC home directory.?

  3. Copy JAR and input files to CAC
    Copy your? WordCount.jar ?file and? input ?folder from your Desktop into your CAC home directory. You can use? scp ?from the Terminal window to copy files, or you can mount the CAC directory on your Macintosh. To do this from the Macintosh Finder, then select the? Connect to Server ?option from the? Go ?menu. Enter the following path:

    smb://cacfs01.cac.cornell.edu/ netid ?

    and replace? netid ?with your CAC user name. Enter your CAC username and password when prompted. You should then see a new Finder window, showing the contents of your CAC home directory. Please note, this directory is only accessible from within the Cornell firewall. If you wish to access it from off-campus, you will first need to? VPN into Cornell .?

  4. Copy input files into HDFS
    Make a directory in the Hadoop Distributed File System (dfs) for your input files. You can see the list of commands available for working on the dfs by executing the following:?

    hadoop dfs ?

    More information about the commands is available? here . Note, that to execute any hadoop dfs command, you must type? hadoop dfs - command , where? command ?is the dfs command to run.

    To copy input data files into dfs from your home directory, do the following:?

    hadoop dfs -copyFromLocal input .

  5. Run your job
    Perform the following:?

    hadoop jar WordCount.jar WordCount input output ?

    This will place the result files in a directory called "output" in the dfs. You can then copy these files back to your CAC home directory by executing the following:?

    hadoop dfs -copyToLocal output output

    Now you can retrieve the output files in the same fashion that you copied the input files to your home directory. Note, that one output file is produced for each reduce job you run. The WordCount example uses the system-configured limit of the number of reduce jobs, so do not be surprised to see 10-20 output files (the exact number depends on the number of cluster nodes running and their configuration). You can control this limit programmatically via the? setNumReduceTasks() ?method of the? JobConf ?class in the hadoop API. Refer to the? map reduce tutorial ?for more details on running map reduce jobs.?

    When you are finished with the output files, you should delete the output directory. Hadoop will not automatically do this for you, and it will throw an error if you run it while there is an old output directory. To do this, execute:?

    hadoop dfs -rmr output

Last revised: March 15, 2010?
bjk/wya

?

hadoop 入門(mén)


更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索:z360901061

微信掃一掃加我為好友

QQ號(hào)聯(lián)系: 360901061

您的支持是博主寫(xiě)作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點(diǎn)擊下面給點(diǎn)支持吧,站長(zhǎng)非常感激您!手機(jī)微信長(zhǎng)按不能支付解決辦法:請(qǐng)將微信支付二維碼保存到相冊(cè),切換到微信,然后點(diǎn)擊微信右上角掃一掃功能,選擇支付二維碼完成支付。

【本文對(duì)您有幫助就好】

您的支持是博主寫(xiě)作最大的動(dòng)力,如果您喜歡我的文章,感覺(jué)我的文章對(duì)您有幫助,請(qǐng)用微信掃描上面二維碼支持博主2元、5元、10元、自定義金額等您想捐的金額吧,站長(zhǎng)會(huì)非常 感謝您的哦!!!

發(fā)表我的評(píng)論
最新評(píng)論 總共0條評(píng)論
主站蜘蛛池模板: 久久aⅴ国产欧美74aaa | 久久亚洲精品国产精品紫薇 | 99久久99热久久精品免费看 | 奇米影视88888 | 亚洲国产路线1路线2路线 | 狠狠视频 | 精品国精品国产自在久国产应用 | 欧美日韩国产手机在线观看视频 | 91精品国产综合久久久密闭 | 91视频免费观看高清观看完整 | 成人免费视频网站在线观看 | 不卡中文一二三区 | 天天摸天天操天天干 | 日韩一二区 | 欧美交换乱理伦片120秒 | 日本精品一区二区三区在线 | 香蕉久久夜色精品国产小优 | 亚洲欧洲日本无在线码天堂 | 久久久久久久久淑女av国产精品 | 国产成人精品久久亚洲高清不卡 | 一级视频在线 | 天堂资源在线中文 | 麻豆传媒地址 | 黄色av毛片| 久久国内精品自在自线400部o | 日干夜干天天干 | 67194国产精品免费观看 | 日韩在线观看第一页 | 色男人综合 | 亚洲国产视频网站 | 日韩精品资源 | 天天精品视频免费观看 | 久久99综合国产精品亚洲首页 | 2022国内精品免费福利视频 | 久久久大片 | 性夜黄a爽影免费看 | 男女激情网址 | 国产成人综合一区精品 | 97国产精品 | 色网在线播放 | 99精品国产免费久久国语 |