Posts

Showing posts from January, 2015

How to find Top 5 Salaries

Image
Three ways we can do it   To find top 5 salaries: Method-1 WITH RANK_TBL AS(SELECT LASTNAME, SALARY,RANK() OVER (ORDER BY SALARY DESC) AS RANK_NUM  FROM EMP) SELECT * FROM RANK_TBL WHERE RANK_NUM < 6; What is Rank Function? The RANK function orders and ranks a result. In this example, the result in being ranked by SALARY in descending sequence, so the highest salary has a rank of 1. When there is a tie, all rows with the value receive the same rank and an appropriate number of ranks are "skipped". In this example, since there were 2 salary values in second place, the value of 3 is skipped. The result set may be ordered using an ORDER BY for the entire SELECT, and this order need not be the same as the column being ranked. When this is done, 2 sorts may need to be performed to achieve the desired result Method-2 WITH RANK_TBL AS(SELECT LASTNAME, SALARY,DENSE_RANK() OVER (ORDER BY SALARY DESC) AS RANK_NUM FROM EMP) SELECT * FROM RANK_TBL WHERE RAN...

Big SQL - Architecture and Tutorial (1 of 5)

Image
Part-1 Introduction to Big SQL   When considering SQL-on-Hadoop, the most fundamental question is: What is the right tool for the job? For interactive queries that require a few seconds (or even milliseconds) of response time, MapReduce (MR) is the wrong choice. On the other hand, for queries that require massive scale and runtime fault tolerance, an MR framework works well. MR was built for large-scale processing on big data, viewed mostly as “batch” processing. As enterprises start using Apache Hadoop as a central data repository for all data — originating from sources as varied as operational systems, sensors, smart devices, metadata and internal applications — SQL processing becomes an optimal choice. A fundamental reason is that most enterprise data management and analytical tools rely on SQL. As a tool for interactive query execution, SQL processing (of relational data) benefits from decades of research, usage experience and optimizations. Clearly, the SQL sk...

Big SQL - Hadoop, JDBC, ODBC

Image
What is BIG SQL? This question coming to every software professional. We all know what is SQL. SQL is a language is used to access data from RDBMS. Big SQL- to provide ANSI SQL access to data across any system from Hadoop, via JDBC or ODBC - seamlessly whether that data exists in Hadoop or a relational data base. This means that developers familiar with the SQL programming language can access data in Hadoop without having to learn new languages or skills. There are different types of queries in Bg SQL: Point queries - These are queries that need to return very fast, like HBase queries, for example.  In these types of queries, you cannot use MapReduce Big ad-hoc queries - In larger, more complex jobs MapReduce parallelism becomes very important to be able to break down these massive data sets. Standards-compliant via JDBC -This is how most applications access databases and in this usage pattern, you can use the same to access your Hadoop-based data store. St...