본문 바로가기

IT&코딩/국비지원

빅데이터 - 3 (하둡을 이용한 워드카운팅)

728x90
반응형

■ 워드카운팅

 

su - hadoop

[hadoop@localhost ~]$ cd $HADOOP_HOME/etc/hadoop
[hadoop@localhost hadoop]$ ls -al

[hadoop@localhost hadoop]$ mapred-site.xml

 

vi로 다음과 같은 코드를 추가한다

vi ./mapred-site.xml

 

<property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>

 

다른 xml들은 확인만

이제

 

[hadoop@localhost hadoop]$ cd /opt/hadoop3/yoki3
[hadoop@localhost yoki3]$ pwd
/opt/hadoop3/yoki3

[hadoop@localhost yoki3]$ ls -al

 

하면 결과는

 

drwxr-xr-x. 10 hadoop hadoop    161  8월  8 04:46 .
drwxr-xr-x.  3 hadoop hadoop     39  8월  8 03:50 ..
-rw-rw-r--.  1 hadoop hadoop 150571  7월 12  2022 LICENSE.txt
-rw-rw-r--.  1 hadoop hadoop  21932  7월 12  2022 NOTICE.txt
-rw-rw-r--.  1 hadoop hadoop   1361  7월 12  2022 README.txt
drwxr-xr-x.  2 hadoop hadoop    203  7월 12  2022 bin
drwxr-xr-x.  3 hadoop hadoop     20  7월 12  2022 etc
drwxr-xr-x.  2 hadoop hadoop    106  7월 12  2022 include
drwxr-xr-x.  3 hadoop hadoop     20  7월 12  2022 lib
drwxr-xr-x.  4 hadoop hadoop   4096  7월 12  2022 libexec
drwxrwxr-x.  3 hadoop hadoop   4096  8월  8 09:01 logs
drwxr-xr-x.  3 hadoop hadoop   4096  7월 12  2022 sbin
drwxr-xr-x.  4 hadoop hadoop     31  7월 12  2022 share

이 목록에서 README.txt에서 워드카운터를 해보자

 

wordcount 구하는 프로그램은 자바로 작성되어 있음

wordcount.java ==> wordcount.class ==> .jar 파일로 만들어서 저장해 놓음
==> 해당 파일이 있는 곳은 share/hadoop/mapreduce

 

[hadoop@localhost yoki3]$ cd share/hadoop/mapreduce
[hadoop@localhost mapreduce]$ hadoop fs -ls / 
ㄴ를 입력하면 만든 게 없어서 아무것도 나오지 않는다.
[hadoop@localhost mapreduce]$ cd $HADOOP_HOME


ls는 리눅스 영역

 

[hadoop@localhost yoki3]$ hadoop fs -mkdir /inputdir
[hadoop@localhost yoki3]$ ls

LICENSE.txt  README.txt  etc      lib      logs  share
NOTICE.txt   bin         include  libexec  sbin

 

[hadoop@localhost yoki3]$ hadoop fs -ls /

Found 1 items
drwxr-xr-x   - hadoop supergroup          0 2023-08-08 11:11 /inputdir

 

[hadoop@localhost yoki3]$ hadoop fs -put ./README.txt /inputdir
[hadoop@localhost yoki3]$ cd share/hadoop/mapreduce
[hadoop@localhost mapreduce]$ ls ha*.jar

를 입력하면

 

hadoop-mapreduce-client-app-3.2.4.jar
hadoop-mapreduce-client-common-3.2.4.jar
hadoop-mapreduce-client-core-3.2.4.jar
hadoop-mapreduce-client-hs-3.2.4.jar
hadoop-mapreduce-client-hs-plugins-3.2.4.jar
hadoop-mapreduce-client-jobclient-3.2.4-tests.jar
hadoop-mapreduce-client-jobclient-3.2.4.jar
hadoop-mapreduce-client-nativetask-3.2.4.jar
hadoop-mapreduce-client-shuffle-3.2.4.jar
hadoop-mapreduce-client-uploader-3.2.4.jar
hadoop-mapreduce-examples-3.2.4.jar

 

[hadoop@localhost mapreduce]$ hadoop jar hadoop-mapreduce-examples-3.2.4.jar wordcount /inputdir /out200
[hadoop@localhost mapreduce]$ hadoop fs -cat /out200/part-r-00000

 

이렇게 치면 README의 wordcount가 출력된다.

mapreduce에서 

 

hadoop fs -cat /out200/part-r-00000 >> $HADOOP_HOME/wcount

scp로 윈도우 바탕화면으로 가져오자

728x90
반응형

'IT&코딩 > 국비지원' 카테고리의 다른 글

빅데이터 - 5 (R 실습)  (0) 2023.08.21
빅데이터 - 4 (R 설치)  (0) 2023.08.21
빅데이터 - 2 (하둡 & winscp)  (0) 2023.08.17
빅데이터 - 1  (0) 2023.08.17
리눅스 - 4 (실습)  (0) 2023.08.17