Sentiment Analysis of Tweets

Entailed in this project is a methodology for doing Data Analytics on twitter hashtags. All artifacts (code, algorithms and data) are released under Apache 2. The artifacts are available at https://github.com/phumlani-mbabela/data-analytics .

The primary objective is to stream(download) tweets referencing the #ZumaMustFall hashtag, the secondary objective to is do data visualisation of the tweets. The traditional methods of processing such data are going to require a lot of coding(programming), as a result, I choose to use the Cloudera Hadoop Ecosystem, the VM is available here. The cloudera VM comes equipped with the Hadoop Ecosystem. This simplifies much of the work to be done, we need to use Flume to stream the tweets from twitter(Firehose) and save the tweets to hdfs, after we are going to use Zeppelin do visualise the data using Hive.

In the second phase of this project we are going to use machine learning to determine a tweet’s sentiment, meaning if a tweet is positive or negative and apply statistical regression using Spark-ML.

The Hadoop Ecosystem(Cloudera)

Cloudera QuickStart virtual machines (VMs) includes “virtually” every application you need for Big Data or Machine Learning related tasks, cloudera is a our execution environment.

The Cloudera cluster comes pre-loaded with Spark, Impala, Crunch, Hive, Pig, Sqoop, Kafka, Flume, Kite, Hue, Oozie, DataFu, and many others (See a full list). In addition the cluster also comes with Python (2.6, 2.7, and 3.4), Perl 5.10, Elephant Bird, Cascading 2.6, Brickhouse, Hive Swarm, Scala 2.11, Scalding, IDEA, Sublime, Eclipse, and NetBeans. This is an ideal toolset for data hacking.

The articles defining the source code and data can be found at https://pmbabela.wordpress.com/big-data-articles/

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
data		data
src/main		src/main
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentiment Analysis of Tweets

About

Releases

Packages

Languages

License

phumlani-mbabela/data-analytics

Folders and files

Latest commit

History

Repository files navigation

Sentiment Analysis of Tweets

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages