We have developed complete training courses for Big Data engineer training over the years. The training materials are tested and evaluated by various trainers and students.
- Sustainable Big Data Infrastructure (SBDI) vs Amazon Elastic Compute Cloud v1.0
- Apache Hadoop installation and benchmarking
- Data Mining training seminars (contact Dr Song email@example.com)
- Statistical Analysis and Quantitative analysis training seminars (contact Dr Song firstname.lastname@example.org)
- General Programming language training (contact Dr Song email@example.com)
- Big Data Research Training (contact Dr Song firstname.lastname@example.org)
Join Big Data Club
KOPO Certified Big Data Infrastructure developers
- Sashiraj Chandrasekaran: Big Data Architect, Hadoop, Amazon EC2
For training, development, research, and consulting services, please contact Dr Insu Song (email@example.com)
What is Big Data? It is any data that are too big and/or complex to be processed fast enough using computers that you can afford to buy. The main idea is that we divide the data into smaller pieces that can be handled by multiple cheaper computers. Obviously, we will need lots of computers working on the smaller pieces of the data and methods to divide tasks/data and put the analysis results together. Sounds complicated, but lucky for you, you can simply use Apache Hadoop for big data analysis. Additional side effects or benefits of using Hadoop are that you can store everything and faster without having to think about how to structure the data. You structure or represent in the form you need when you need to read the data. So, it is ideal for processing unstructured data.
Quick Start Big Data
It is surprisingly easy and free. Simply go to Hortonworks.com and download a virtual appliance, a pre-configured virtual machine for you to do big data analysis right away. The virtual appliance comes with a pre-configured Linux system and Hadoop service.
Here are the steps:
1. Go to http://hortonworks.com/products/hortonworks-sandbox/
2. Download Sandbox for Virtualbox (1.9 GB)
3. Install Oracle Virtualbox if necessary.
4. Import the Sandbox into Virtualbox. Here is an instruction.
5. Start the Sandbox. Just wait until the machine loads up, no need to log in. You will see an instruction on the boot-up screen for next steps.
6. When it completes loading up, just open a web browser on your local host machine and type URL: http://127.0.0.1:8888/
7. Follow the instruction. Use the Sandbox to do Big Data analysis using Hadoop service. Of course you need a lot of computers to actually do Big Data analysis, but you get the idea.
Additionally you can watch Hortonworks Sandbox Video tutorial to get started using the Sandbox.
Google offers a Big Data analysis service, called BigQuery. The implementation is based on Dremel.
To use this service, simply activate the BigQuery api service from your Google API console as follows:
1. Log-in to your Gmail account
2. Open a new tab and goto BigQuery sign-up page: https://developers.google.com/bigquery/sign-up
3. Follow the instruction to activate the BigQuery api service.
Don’t worry, you won’t get charged if you don’t activate the billing. You can try out BigQuery using the provided sample data. Unless you have terabytes of data to process, there is really no need to use the BigQuery service. However, it is a good idea to try it out to learn its limitations and potentials for your big business ideas.