Find the best tutors and institutes for Big Data

Find Best Big Data Training

Please select a Category.

Please select a Locality.

No matching category found.

No matching Locality found.

Outside India?

Search for topics

Big Data Updates

Ask a Question

Post a Lesson

All

All

Lessons

Discussion

Lesson Posted on 16 Sep IT Courses/Hadoop IT Courses/Big Data IT Courses/Hadoop/Hadoop Testing +1 IT Courses/Big Data/Big Data Testing less

How to create UDF (User Defined Function) in Hive

Sachin Patil

Proficient Technical Trainer and Developer skilled at developing training programs, classes and modules...

1. User Defined Function (UDF) in Hive using Java. 2. Download hive-0.4.1.jar and add it to lib-> Buil Path -> Add jar to libraries 3. Q:Find the Cube of number passed: import org.apache.hadoop.hive.ql.exec.UDF; public class Cube extends UDF{ public int evaluate (int number) { return number*number*number;... read more

1. User Defined Function (UDF) in Hive using Java.

2. Download hive-0.4.1.jar and add it to lib-> Buil Path -> Add jar to libraries

3. Q:Find the Cube of number passed:

import org.apache.hadoop.hive.ql.exec.UDF;

public class Cube extends UDF{

public int evaluate (int number)
{
return number*number*number;
}
}

4. Create jar file and copy to HDFS
terminal > sudo hadoop fs -put /home/cloudera/Desktop/Cube.jar /user/cloudera/

5. ADD JAR EXPORTED_FILE_NAME.jar;
hive> ADD JAR hdfs://quickstart.cloudera:8020/user/cloudera/Cube.jar;

6. hive> list jars;

7. Create Temperory function:
CREATE temporary function func_name as ‘absolute_class_name’.
hive> CREATE temporary function Cube as ‘Cube’;

8. hive> select Cube(marks) from student;

9. Create Permenent Function:
hive> CREATE function Cube as ‘Cube’ using jar ‘hdfs://quickstart.cloudera:8020/user/cloudera/Cube.jar’;

10. hive> select Cube(marks) from student;

read less
Comments
Dislike Bookmark

Lesson Posted on 16 Sep IT Courses/Big Data IT Courses/Hadoop/Hadoop Testing Tuition/BTech Tuition/Big Data Analytics

Use of Piggybank and Registration in Pig

Sachin Patil

Proficient Technical Trainer and Developer skilled at developing training programs, classes and modules...

What is a Piggybank? Piggybank is a jar and its a collection of user contributed UDF’s that is released along with Pig. These are not included in the Pig JAR, so we have to register them manually in our script. 1. Download piggybank.jar 2. Copy this jar to /usr/lib/pig/lib Terminal > sudo... read more

What is a Piggybank?

Piggybank is a jar and its a collection of user contributed UDF’s that is released along with Pig. These are not included in the Pig JAR, so we have to register them manually in our script.

1. Download piggybank.jar

2. Copy this jar to /usr/lib/pig/lib
Terminal > sudo cp /home/cloudera/Desktop/piggybank.jar /usr/lib/pig/lib/

3. Register this jar to Pig:
Terminal > Pig
Grunt > Register piggybank.jar;

4.Now we are set to use UDF’s of Piggybank like below to process CSV file in Pig:

Grunt > tweets = load ‘/user/cloudera/tweets.csv’ using org.apache.pig.piggybank.storage.CSVExcelStorage() as (date: chararray,timing:chararray,Tweet_Text:chararray,Type:chararray,Media_Type:chararray,Hashtags:chararray,Tweet_Id:long,
Tweet_Url:chararray,twt_favourites:long,Retweets:long,col1:chararray,col2:chararray);

5. Dump its result:

Grunt> Dump tweets;

read less
Comments
Dislike Bookmark

Lesson Posted on 04 Aug Functional Training/Data Analytics IT Courses/Big Data IT Courses/Programming Languages/Python +6 IT Courses/R Programming IT Courses/Scala Training IT Courses/Hadoop IT Courses/Big Data/Big Data Testing IT Courses/Tableau IT Courses/Data Modeling less

Analytics Expert Program

Bestintown Analytics Private Limited

Best in Town Analytics is a leading Data Science training and consulting firm with a proven track record....

Analytics Expert Program is the best program for anyone to get started and learn Analytics from Scratch to Implementation. Key Program Benefits:• Essential Math and Stats for Analytics - Unique Thinking in Analytics approach using innovative techniques & excel • Business... read more

Analytics Expert Program is the best program for anyone to get started and learn Analytics from Scratch to Implementation.

 

Key Program Benefits:
• Essential Math and Stats for Analytics - Unique Thinking in Analytics approach using innovative techniques & excel

• Business Intelligence - Learn how to transform, reshape, prepare data for modelling using Power BI or Tableau

• Programming for Data Analytics - Understand how to analyse data using R or Python

• Predictive Modeling - Learn how to convert data to insights by applying fundamental statistical learning techniques

• Advanced Machine Learning - Improve model accuracy and performance by using cutting edge ML algorithms & its applications

• Other Data Analytics Aspects - Get an introduction to AI Applications, Big Data Ecosystem and Big Data Analytics.

 

Course Modules:
• Module 1: Essential Math and Stat for Analytics: Provide strong statistical thinking with a good understanding of the applications of probability, calculus and linear algebra in model building. Build a strong reasoning framework to convert any business problem into an inferential analytics framework.

• Module 2: R for Data Science: Provide a solid foundation in programming concepts required for a successful data analyst and data scientist. We learn to use R to perform data analysis and visualisation. Additionally, we inculcate skills to investigate data issues interactively using R Studio.

• Module 3: Python for Data Science: Provide a solid foundation in programming concepts required for a successful data analyst and data scientist. We learn to use Python to perform data analysis and visualisation. Additionally, we inculcate skills to investigate data issues interactively using Spyder.

• Module 4: Machine Learning Algorithms & Implementation: Convert raw data into insights and help understand what factors drive a specific business outcome and to what extent. We help in building strategies to link these models to business actions.

• Module 5: Other Aspects of Data Science: To get a robust understanding of Data Science, knowledge of a few other topics is very crucial. We will mainly focus on some of the other important topics needed for a Data Science professional in this section


Enrol Now!

read less
Comments
Dislike Bookmark

Looking for Big Data Training

Find best Big Data Training in your locality on UrbanPro.

FIND NOW

Asked on 31 May IT Courses/Big Data

Firstly, Congratulations on scoring 8 bands. Yes, 2 months is more than enough for PTE to score required... read more

Firstly, Congratulations on scoring 8 bands.

Yes, 2 months is more than enough for PTE to score required score with proper guidance and practice. Assess yourself in PTE modules in which you are lacking initially and plan accordingly.

read less

Answer

Lesson Posted on 22 May IT Courses/Big Data

Case Study : Tibco

Tecksphere

We are taking this chance to introduce you the Tecksphere Training and Consulting Services, an US based...

OVERVIEW Our Client is a leading shoe retailer, dwelling over 3,500 shops across the United States of America. The retailer proffers an assorted variety of fashion accessories and footwear products by enforcing modernism and dynamic response. The client holds many warehouse amenities and supply chain... read more

OVERVIEW

Our Client is a leading shoe retailer, dwelling over 3,500 shops across the United States of America. The retailer proffers an assorted variety of fashion accessories and footwear products by enforcing modernism and dynamic response. The client holds many warehouse amenities and supply chain maintained by a Warehouse Management System.

BACKGROUND

Initially, the Client practised the traditional batch processing solution for the execution of outstanding orders. The mainframe system handles the data, and a time-frame is scheduled during early morning or night and was fed into Warehouse Management System. Moreover, E-Commerce orders are also processed which increased the volume of batch processing data. This imbalance in load leads to several time and processing constraints in the Warehouse Management System. In addition to this, the issues aroused with joining of the data as it is needed to join various distribution centres data. For the client, there was a necessity for the solutions to be reliable and should be architecturally compatible between the Eastern Distribution Center and the Western Distribution Center solutions. The Client approached TeckSphere and asked us to integrate the incoming ticket requests from the mainframe system to real time system to process the outstanding orders quickly.

TECKSPHERE SOLUTION

TeckSphere recommended a real time solution by employing TIBCO suite to integrate the enterprise data management products. The solution was built and enforced individually for the two Warehouse Management Systems and deployed to a central enterprise integration management and administration application. While processing E-Commerce orders, the Mainframe system generates a pick ticket file, and the data in the pick ticket file was populated into the Warehouse Management System staging table after the ETL procedure.

Our team has employed Publisher-Subscriber Architecture to collect, examine and transforms the incoming data before publishing it into Java Messaging Service (JMS) Server and finally upload the data to the target system. TIBCO adapter is used in file transfer operation to assure reliability. TIBCO File Server is used to parse the transaction, generates the order message, transform the message into canonical format and publish the individual transactions. The subscribing Business Information Warehouse converts the message into Eastern Distribution Center, and the Western Distribution Center format and places the transactions into their corresponding Warehouse Management System staging tables and the mainframe system is alerted for the process completion by inducing a trigger.

Based on the business needs, our TeckSphere experts have explored a complete data transformation and integration solution. The solution holds reusable property and adheres the standard of Retail Industry.

Outcome

  • The retail orders are processed under real time scenario. 
  • Enhanced order management support. 
  • Well Organized and Centralized distribution operations. 
  • Improved system scalability and data consolidation 
  • Reduced overall system Usage by avoiding the usage of Warehouse Management System

    TECHNOLOGIES USED

    TIBCO BusinessWorks

    TIBCO DataExchange

    TIBCO Rendezvous

    TIBCO EMS

     TIBCO Object Star

read less
Comments
Dislike Bookmark

Lesson Posted on 28 Apr IT Courses/Big Data IT Courses/Apache Spark IT Courses/Hadoop

Lets look at Apache Spark's Competitors. Who are the top Competitors to Apache Spark today.

Biswanath Banerjee

We have trained more than 1000 students on Big Data Technologies - Hadoop ecosystem, Apache Spark, Tableau,...

Apache Spark is the most popular open source product today to work with Big Data. More and more Big Data developers are using Spark to generate solutions for Big Data problems. It is the de-facto standard tool today. But are there any tools/products which can claim as a close competitor to Apache Spark?... read more

Apache Spark is the most popular open source product today to work with Big Data. More and more Big Data developers are using Spark to generate solutions for Big Data problems. It is the de-facto standard tool today. But are there any tools/products which can claim as a close competitor to Apache Spark? Putting the question in another way - If I am given a choice, can I as a Big Data Architect can think beyond using Apache Spark as a tool which I can use for all my Big Data tasks?

I would like to analyse this question taking different use cases.

Firstly, the data which is to be considered. In this case, the data scientist gets data from some source. The data scientist or the user gets the data from somewhere, understands the data, cleans it up, correlate the data with other sources.The size of the data determines a lot here. If the data is few gigabytes (GBs), we have the option of choosing between R, MySQL, SQL Lite or a python notebook with Pandas. Spark is more useful when data is too large to process. Apache Spark is best for huge data, AWS Athena or Google BigQuery can be good competitors for Spark, but Spark has more enriched features. In such case, Spark steals over other competitors.

Secondly, for Data Visualization and creating Dashboards that provide monitoring and insights based on data streams. Here Spark does not come up to that level for this use case. BI tools like Tableau and SiSense provide much better support than Spark for streaming data within a certain range of the data set which is being used.

Thirdly as an ETL tool Spark works well especially when the data does scale up pretty high. But the user has to do a lot of work around Spark to make sure that everything is working smoothly. This usually means that when Spark is used for ETL, data is considerably delayed by several hours or even a day. Apache Flink and Spark streaming are two other alternatives for this use case, but the user needs to code a lot and manage the cluster.

Fourth and last when talking about Machine Learning as a use case to determine other alternatives for Apache Spark, we can analyse the entire process into the following steps-
1. Preparing your data set
2. Building your models and
3. Using your models in a production environment.

Spark is considered very good for the first two jobs - preparing the data sets and building the models. Apache Spark scores high over other tools on data discovery and manipulating the data. Spark has rich Machine learning libraries for building models. However key-value data store like Cassandra also required here which increases the complexity of the solution and running these data models in production for real-time predictions gives Spark the bumps and the process usually falls apart. Few alternatives to Spark for this particular use case are Google's Tensorflow and ScikitLearn.

read less
Comments
Dislike Bookmark

Looking for Big Data Training

Find best Big Data Training in your locality on UrbanPro.

FIND NOW

Answered on 08 Apr IT Courses/SAP/SAP HANA IT Courses/ETL IT Courses/Big Data

Big Data Tech

Senior Data Scientist

Future is bigdata or nothing. All companies are moving thier workloads (data processing) from Traditional RDBMs to Bigdata tools. Majority of usecases can be handled by Hive, Spark SQL and Sqoop which provide complete ETL pipeline to process Structured data.
Answers 13 Comments
Dislike Bookmark

Lesson Posted on 05 Mar IT Courses/Big Data IT Courses/Cloud Computing IT Courses/Cloud Security

Cloud Computing

Namrata

Introduction: In online world, we get information with just one click. But where this all information is stored? How we can store so much data from anywhere and can access from everywhere. No time bound, no distance regulation, everything is so easy. How is it possible? How it works? What is it? It... read more

Introduction:

In online world, we get information with just one click. But where this all information is stored? How we can store so much data from anywhere and can access from everywhere. No time bound, no distance regulation, everything is so easy. How is it possible? How it works? What is it? It has only one name and one answer "CLOUD COMPUTING".

Image result for cloud computing

It's an environment which enables user to access, store all data over the internet. Till now store information on various hardware such as HDD(Hard Drive), PD (Pen Drive), CD, DVD. These all traditional ways has its own advantages and disadvantages. Virus, Data corruption, Data loss are common major problems we faced with it. Storing large amount of information on it always increases the cost. After lots of research storing information online is the by far best solution we found it in industry. Careers in same field is booming.

Image result for cloud computing

Time Line:

  • 1996: The phrase "cloud computing" was first introduced as "Compaq".
  • 2000: Cloud computing has come into existence.
  • 2006: Amazon.com releasing its Elastic Compute Cloud.
  • 2008: Google released Google App Engine.
  • 2010: Microsoft released Microsoft Azure (February 2010).
  • 2010: Rackspace Hosting and NASA jointly launched an open-source cloud-software
    initiative known as OpenStack (July 2010).
  • 2011: IBM announced the IBM SmartCloud framework.
  • 2012: Google Compute Engine was released in preview.

Services in Cloud Computing:

  • Infrastructure as a service (IaaS)
  • Platform as a service (PaaS)
  • Software as a service (SaaS)
  • Mobile "backend" as a service (MBaaS)
  • Serverless computing

Cloud Applications

read less
Comments
Dislike Bookmark

Answered on 17 Jan IT Courses/Big Data

Ashok

Data Scientist , AI Project Expert

Bigdata has good prospects and soon such technologies will be disruptive. In the enterprise world, there are challenges to overcome still with data governance (of the data lake) and privacy.
Answers 13 Comments
Dislike Bookmark

Looking for Big Data Training

Find best Big Data Training in your locality on UrbanPro.

FIND NOW

Answered on 25/10/2017 IT Courses/Big Data

Gopal Raj

Big Data Architect

If you want to shape up your career for next 4-5 years, go for Apache Spark. Knowing Hadoop would be a big plus. Since Spark ecosystem integrated with YARN.
Answers 8 Comments
Dislike Bookmark

About UrbanPro

UrbanPro.com helps you to connect with the best Big Data Training in India. Post Your Requirement today and get connected.

Overview

Questions 270

Lessons 55

Total Shares  

+ Follow 7,245 Followers

Top Contributors

Connect with Expert Tutors & Institutes for Big Data

x

Ask a Question

Please enter your Question

Please select a Tag

UrbanPro.com is India's largest network of most trusted tutors and institutes. Over 25 lakh students rely on UrbanPro.com, to fulfill their learning requirements across 1,000+ categories. Using UrbanPro.com, parents, and students can compare multiple Tutors and Institutes and choose the one that best suits their requirements. More than 6.5 lakh verified Tutors and Institutes are helping millions of students every day and growing their tutoring business on UrbanPro.com. Whether you are looking for a tutor to learn mathematics, a German language trainer to brush up your German language skills or an institute to upgrade your IT skills, we have got the best selection of Tutors and Training Institutes for you. Read more