AWGJournal: Hadoop, Avro experimentation

Followed http://hadooptutorial.info/avro-mapreduce-word-count-example/ but caution: I need to set the javac CLASSPATH instead to
export CLASSPATH="$HADOOP_HOME/share/hadoop/tools/lib/*"
export CLASSPATH="$HADOOP_HOME/share/hadoop/mapreduce/*:$CLASSPATH"

when I used the jar command it failed at first. This post helped to get it into alternatives: http://johnglotzer.blogspot.in/2012/09/alternatives-install-gets-stuck-failed.html

Used hadoop fs -put command to put the test.txt file onto the hadoop filesystem per http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/

when running the MapReduceAvroWordCount application get errors similar to as described http://stackoverflow.com/questions/20586920/hadoop-connecting-to-resourcemanager-failed

Added yarn.resourcemanager.address per answer.

jps command (see http://stackoverflow.com/questions/11738070/hadoop-cannot-use-jps-command) says hadoop services appear to be running.

stop/start per http://solaimurugan.blogspot.com/2014/05/step-by-step-instruction-how-startstop.html

this time didn't get same error, but job seems stuck.

Reading http://www.alexjf.net/blog/distributed-systems/hadoop-yarn-installation-definitive-guide/

issue stop-all.sh and modifying hadoop config files to match single-node installation.

tests under single-node Testing section gave same error as before: INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032.

Reinstalling to local user dir per alexjf blog.

Still attempts to connect to 0.0.0.0/0.0.0.0:8032

Reinstall Fedora, and set static ip address, Install JdK per http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/

enable ssh per https://docs.fedoraproject.org/en-US/Fedora/16/html/System_Administrators_Guide/s2-ssh-configuration-sshd.html

tried tests got error this time

Attempt to disable ipv6 using http://www.itsprite.com/linuxhow-to-disable-ipv6-networking-on-redhatcentosfedoraubuntu-linux-system/ but resulted in OS instability. Forced reboot.

could not login.

try reinstall OS.

install jdk, enable ssh. Tried enabling passwordless ssh per http://allthingshadoop.com/2010/04/20/hadoop-cluster-setup-ssh-key-authentication/

but would not work.

tried the hadoop installation and test from alexjf blog. It failed the test, pulling up the url, shows memory limit exceeded as cause of failure.

removed all settings from yarn-site.xml and stopped restarted using $HADOOP_PREFIX/sbin/stop-yarn.sh stop-dfs.sh start etc. but got seeming infinite running.

used https://coderwall.com/p/a5kbtw/installing-apache-hadoop-on-linux first set of instructions to enable passwordless ssh.

restart hadoop services and didn't ask for password this time. test still fails.

modified command set --num_containers 1 --master_memory 512 and says completed successfully.

Trying on a Mac now per http://amodernstory.com/2014/09/23/installing-hadoop-on-mac-osx-yosemite/ works.

Back on Fedora setup; retried MapReduceAvroWordCount and it worked. Next try writing schema based on Java class: https://gist.github.com/QwertyManiac/4724582. How to convert schema to json representation (avsc file?) and use that instead of java class?

AWGJournal

Sunday, March 6, 2016

Hadoop, Avro experimentation

1 comment: