Tuesday, March 29, 2016

AWS EC2, RoR, ruby...

Created an Ubuntu EC2 instance, then followed https://gorails.com/setup/ubuntu/14.04 to install and setup ruby on rails.
Also, https://www.digitalocean.com/community/tutorials/how-to-setup-ruby-on-rails-with-postgres
for the postgres setup:
in config/database.yml set:
production:
  <<: *default
  database: myrailsapp_production
  username: <linux user account here, also create user with same user name in postgres db for "create role" command>
  password: <db password from "create role" command>

Added a security group "custom tcp rule" to permit port 3000 inbound. Wasn't getting a response from ruby on rails after running 'rails server'. netstat -tulpn showed it was listening on 127.0.0.1. To make listen on all ports followed https://fullstacknotes.com/make-rails-4-2-listen-to-all-interface/ , i.e. do instead: rails server -b 0.0.0.0

should try this later.

Sunday, March 6, 2016

Hadoop, Avro experimentation

Followed http://hadooptutorial.info/avro-mapreduce-word-count-example/  but caution: I need to set the javac CLASSPATH instead to
export CLASSPATH="$HADOOP_HOME/share/hadoop/tools/lib/*"
export CLASSPATH="$HADOOP_HOME/share/hadoop/mapreduce/*:$CLASSPATH"

when I used the jar command it failed at first. This post helped to get it into alternatives: http://johnglotzer.blogspot.in/2012/09/alternatives-install-gets-stuck-failed.html

Used hadoop fs -put command to put the test.txt file onto the hadoop filesystem per http://hortonworks.com/hadoop-tutorial/using-commandline-manage-files-hdfs/

when running the MapReduceAvroWordCount application get errors similar to as described http://stackoverflow.com/questions/20586920/hadoop-connecting-to-resourcemanager-failed 
Added yarn.resourcemanager.address per answer.
jps command (see http://stackoverflow.com/questions/11738070/hadoop-cannot-use-jps-command) says hadoop services appear to be running.

this time didn't get same error, but job seems stuck.

issue stop-all.sh and modifying hadoop config files to match single-node installation.
tests under single-node Testing section gave same error as before: INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032.
Reinstalling to local user dir per alexjf blog.
Still attempts to connect to 0.0.0.0/0.0.0.0:8032
Reinstall Fedora, and set static ip address, Install JdK per http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/
tried tests got error this time
Attempt to disable ipv6 using http://www.itsprite.com/linuxhow-to-disable-ipv6-networking-on-redhatcentosfedoraubuntu-linux-system/ but resulted in OS instability. Forced reboot.
could not login.
try reinstall OS.
install jdk, enable ssh. Tried enabling passwordless ssh per http://allthingshadoop.com/2010/04/20/hadoop-cluster-setup-ssh-key-authentication/
but would not work.
tried the hadoop installation and test from alexjf blog. It failed the test, pulling up the url, shows memory limit exceeded as cause of failure.
removed all settings from yarn-site.xml and stopped restarted using $HADOOP_PREFIX/sbin/stop-yarn.sh stop-dfs.sh start etc. but got seeming infinite running.
used https://coderwall.com/p/a5kbtw/installing-apache-hadoop-on-linux first set of instructions to enable passwordless ssh.
restart hadoop services and didn't ask for password this time. test still fails. 
modified command set --num_containers 1 --master_memory 512 and says completed successfully.

Trying on a Mac now per http://amodernstory.com/2014/09/23/installing-hadoop-on-mac-osx-yosemite/ works.

Back on Fedora setup; retried MapReduceAvroWordCount and it worked. Next try writing schema based on Java class: https://gist.github.com/QwertyManiac/4724582. How to convert schema to json representation (avsc file?) and use that instead of java class?