NNP NAHAR BLOG: February 2015

2.25.2015

SQL Tunning or Query Optimization Tips

Optimized SQL writing Technique

1. WHERE EXISTS vs IN
If the sub-query return more records than the main query then use WHERE EXISTS. If the sub-query return less records than the main query then use IN. EXISTS return a Boolean value where as IN returns real value.

2. NOT IN, NOT=
Avoid NOT IN or NOT= on indexed columns. They prevent the optimizer from using indexes.
For example - you can write WHERE amount>0 instead of WHERE amount!=0.

3. NOT NULL
Avoid writing NULL on indexed columns. It can prevent the optimizer from using Indexes.

For example - avoid writing like WHERE amount is NOT NULL.

4. OR and Full table scan
Try to use IN or UNION in place of OR on indexed columns. ORs on indexed columns causes the optimizer to perform a full table scan.

5. Calculation on Indexed column
Try to avoid calculation on indexed column.
Example - write WHERE amount >1800/3 instead of WHERE amount/3 > 1800

6. using Hints
Sometimes using hints can bring benefit.

For example -
SELECT /*+ index (t1 t1_abc) index (t2 t2_abc)*/
COUNT (*)
FROM t1,t2
WHERE t1.col1 = t2.col2;

Sometimes when you need to show first rows in a list -

SELECT /*+ FIRST_ROW(10)*/ column_nm1

FROM table_nm1
WHERE column_nm1 LIKE 'abc%';

If you want to actually understand what you are doing with you SQL, start playing with following things -

Execution plan
Trace - It will tell you like how much time it spends on each query.
Baseline - plan
Profile - guideline

Indexing Tips:

1. Consider adding small frequently accessed columns (not frequently updated) to an existing index. This will enable some queries to work only with index, not the table.

2. Regularly perform Index Monitoring. Check if the indexes are using by the application queries or not.

3. After adding a new index, analyze the index and compute statistics.

Applying the Normalization Rules

The process of applying the rules of Normalization to your database design is called Normalizing the database, or just Normalization.

Why Normalization?
- To see if the tables are structured correctly.

- To divided your information items into the appropriate tables.
- To avoid data Redundancy (Video Example)

Five normal forms are widely accepted - the first normal form through the fifth normal form. But the first three are required for the majority of database designs.

1NF - First Normal Form

Number of things should be satisfied by applying 1NF -

1. Value should be Atomic.
First normal form states that at every row and column intersection in the table there, exists a single value, and never a list of values.

For example, you cannot have a field named Price in which you place more than one Price. If you think of each intersection of rows and columns as a cell, each cell can hold only one value.

2. All entities in any column must be same type
3. Each column name must be unique
4. NO two rows are identical

Video link - https://www.youtube.com/watch?v=BtUF8JTgs7A

2NF - Second Normal Form

First of all, should be satisfied 1NF.

All the No prime attribute should be fully Functional dependent on candidate key or each candidate key.

Second normal form requires that Each non-key column be fully dependent on the entire primary key, not on just part of the key. This rule applies when you have a primary key that consists of more than one column.

For example, suppose you have a table containing the following columns, where Order ID and Product ID form the primary key:

Order ID + Product ID (primary key)

Product Name

This design violates second normal form, because Product Name is dependent on Product ID, but not on Order ID, so it is not dependent on the entire primary key. You must remove Product Name from the table. It belongs in a different table (Products).

3NF - Third Normal Form

First of all, should be satisfied 1NF and 2NF.

Third normal form requires that not only every non-key column be dependent on the entire primary key, but that non-key columns be independent of each other.

Another way of saying this is that each non-key column must be dependent on the primary key.

For example, suppose you have a table containing the following columns:

ProductID (primary key)

Name

SRP

Discount

Assume that Discount depends on the suggested retail price (SRP). This table violates third normal form because a non-key column, Discount, depends on another non-key column, SRP. Column independence means that you should be able to change any non-key column without affecting any other column. If you change a value in the SRP field, the Discount would change accordingly, thus violating that rule. In this case Discount should be moved to another table that is keyed on SRP.

Refining the Database Design

Once you have the tables, fields, and relationships you need, you should create and populate your tables with sample data and try working with the information: creating queries, adding new records, and so on. Doing this helps highlight potential problems. For example, you might need to add a column that you forgot to insert during your design phase, or you may have a table that you should split into two tables to remove duplication.

See if you can use the database to get the answers you want. Create rough drafts of your forms and reports and see if they show the data you expect. Look for unnecessary duplication of data and, when you find any, alter your design to eliminate it.

As you try out your initial database, you will probably discover room for improvement. Here are a few things to check for:

Did you forget any columns? If so, does the information belong in the existing tables? If it is information about something else, you may need to create another table. Create a column for every information item you need to track. If the information can’t be calculated from other columns, it is likely that you will need a new column for it.

Are any columns unnecessary because they can be calculated from existing fields? If an information item can be calculated from other existing columns (Ex: a discounted price calculated from the retail price) it is usually better to do just that, and avoid creating new column.

Are you repeatedly entering duplicate information in one of your tables? If so, you probably need to divide the table into two tables that have a one-to-many relationship.

Do you have tables with many fields, a limited number of records, and many empty fields in individual records? If so, think about redesigning the table so it has fewer fields and more records.

Has each information item been broken into its smallest useful parts? If you need to report, sort, search, or calculate on an item of information, put that item in its own column.

Does each column contain a fact about the table's subject? If a column does not contain information about the table's subject, it belongs in a different table.

Are all relationships between tables represented, either by common fields or by a third table? One-to-one and one-to- many relationships require common columns. Many-to-many relationships require a third table.

Specify Primary Keys

Each table should include a column or set of columns that uniquely identifies each row stored in the table. This is often a unique identification number, such as an employee ID number or a serial number. In database terminology, this information is called the primary key of the table. Access uses primary key fields to quickly associate data from multiple tables and bring the data together for you.

If you already have a unique identifier for a table, such as a product number that uniquely identifies each product in your catalog, you can use that identifier as the table’s primary key — but only if the values in this column will always be different for each record. You cannot have duplicate values in a primary key. For example, don’t use people’s names as a primary key, because names are not unique. You could easily have two people with the same name in the same table.

A primary key must always have a value and it can not be null.

You should always choose a primary key whose value will not change. In a database that uses more than one table, a table’s primary key can be used as a reference in other tables. If the primary key changes, the change must also be applied everywhere the key is referenced. Using a primary key that will not change reduces the chance that the primary key might become out of sync with other tables that reference it.

Often, an arbitrary unique number is used as the primary key.

If you don’t have in mind a column or set of columns that might make a good primary key, consider using a column that has the AutoNumber (for ms access and SqlServer) or sequence (for oracle) data type. When you use the AutoNumber data type or sequence, Access automatically assigns a value for you. Such an identifier is factless; it contains no factual information describing the row that it represents. Factless identifiers are ideal for use as a primary key because they do not change. A primary key that contains facts about a row — a telephone number or a customer name, for example — is more likely to change, because the factual information itself might change.

In some cases, you may want to use two or more fields that, together, provide the primary key of a table. For example, an Order Details table that stores line items for orders would use two columns in its primary key: Order ID and Product ID. When a primary key employs more than one column, it is also called a composite key.

How to turn information items into database columns

The following list shows a few tips for determining your columns:

1. Don’t include calculated data:

In most cases, you should not store the result of calculations in tables. Instead, you can have Access perform the calculations when you want to see the result.

2. Store information in its smallest logical parts:

You may be tempted to have a single field for full names, or for product names along with product descriptions. If you combine more than one kind of information in a field, it is difficult to retrieve individual facts later. Try to break down information into logical parts; for example, create separate fields for first and last name, or for product name, category, and description.

Once you have refined the data columns in each table, you are ready to choose each table's primary key.

Dividing the information into Tables

To divide the information into tables, Choose the major entities, or subjects:

For example, after finding and organizing information for a product sales database, suppose the major entities are the products, the suppliers, the customers, and the orders. Therefore, it makes sense to start out with these four tables: one for facts about products, one for facts about suppliers, one for facts about customers, and one for facts about orders.

When you design your database, always try to record each fact just once. If you find yourself repeating the same information in more than one place, such as the address for a particular supplier, place that information in a separate table. Once you have chosen the subject that is represented by a table, columns in that table should store facts only about the subject. For instance, the product table should store facts only about products.

Finding and Organizing the Required Information for Database Design

To find and organize the information required, start with your existing information.

For example, you might record purchase orders in a ledger or keep customer information on paper forms in a file cabinet. Gather those documents and list each type of information shown (for example, each box that you fill in on a form). If you don't have any existing forms, imagine instead that you have to design a form to record the customer information.

For example, suppose you currently keep the customer list on index cards. Examining these cards might show that each card holds a customers name, address, city, state, postal code and telephone number. Each of these items represents a potential column in a table.

As you prepare this list, don’t worry about getting it perfect at first. Instead, list each item that comes to mind. If someone else will be using the database, ask for their ideas, too.

For example, suppose you give customers the opportunity to opt in to (or out of) periodic e-mail updates, and you want to print a listing of those who have opted in. To record that information, you add a “Send e-mail” column to the customer table. For each customer, you can set the field to Yes or No. A key point to remember is that you should break each piece of information into its smallest useful parts. In the case of a name, to make the last name readily available, you will break the name into two parts — First Name and Last Name.

Think about the questions you might want the database to answer. For instance, how many sales of your featured product did you close last month? Where do your best customers live? Who is the supplier for your best-selling product? Anticipating these questions helps you zero in on additional items to record. After gathering this information, you are ready for the next step.
Dividing the information into tables

Determining the purpose of your Database

It is a good idea to write down the purpose of the database on paper — its purpose, how you expect to use it, and who will use it. For a small database for a home based business, for example, you might write something simple like "The customer database keeps a list of customer information for the purpose of producing mailings and reports." If the database is more complex or is used by many people, as often occurs in a corporate setting, the purpose could easily be a paragraph or more and should include when and how each person will use the database. The idea is to have a well developed mission statement that can be referred to throughout the design process. Having such a statement helps you focus on your goals when you make decisions.

Creating the Table Relationships

Now that you have divided your information into tables, you need a way to bring the information together again in meaningful ways.

Microsoft Access, SQL server, MySql and Oracle relational database management system. In a relational database, you divide your information into separate, subject-based tables. You then use table relationships to bring the information together as needed.

Creating a one-to-many relationship

Consider this example: the Suppliers and Products tables in the product orders database. A supplier can supply any number of products. It follows that for any supplier represented in the Suppliers table, there can be many products represented in the Products table. The relationship between the Suppliers table and the Products table is, therefore, a one-to-many relationship.

Creating a many-to-many relationship

Consider the relationship between the Products table and Orders table.

A single order can include more than one product. On the other hand, a single product can appear on many orders. Therefore, for each record in the Orders table, there can be many records in the Products table. And for each record in the Products table, there can be many records in the Orders table. This type of relationship is called a many-to-many relationship because for any product, there can be many orders; and for any order, there can be many products. Note that to detect many-to-many relationships between your tables, it is important that you consider both sides of the relationship.

The subjects of the two tables — orders and products — have a many-to-many relationship. This presents a problem. To understand the problem, imagine what would happen if you tried to create the relationship between the two tables by adding the Product ID field to the Orders table. To have more than one product per order, you need more than one record in the Orders table per order. You would be repeating order information for each row that relates to a single order — resulting in an inefficient design that could lead to inaccurate data. You run into the same problem if you put the Order ID field in the Products table — you would have more than one record in the Products table for each product. How do you solve this problem?

The answer is to create a third table, often called a junction table or link table, that breaks down the many-to-many relationship into two one-to-many relationships. You insert the primary key from each of the two tables into the third table. As a result, the third table records each occurrence or instance of the relationship.

Each record in the Order Details table represents one line item on an order. The Order Details table’s primary key consists of two fields — the foreign keys from the Orders and the Products tables. Using the Order ID field alone doesn’t work as the primary key for this table, because one order can have many line items. The Order ID is repeated for each line item on an order, so the field doesn’t contain unique values. Using the Product ID field alone doesn’t work either, because one product can appear on many different orders. But together, the two fields always produce a unique value for each record.

In the product sales database, the Orders table and the Products table are not related to each other directly. Instead, they are related indirectly through the Order Details table. The many-to-many relationship between orders and products is represented in the database by using two one-to-many relationships:

The Orders table and Order Details table have a one-to-many relationship. Each order can have more than one line item, but each line item is connected to only one order.

The Products table and Order Details table have a one-to-many relationship. Each product can have many line items associated with it, but each line item refers to only one product.

From the Order Details table, you can determine all of the products on a particular order. You can also determine all of the orders for a particular product.

Creating a one-to-one relationship

Another type of relationship is the one-to-one relationship. For instance, suppose you need to record some special supplementary product information that you will need rarely or that only applies to a few products. Because you don't need the information often, and because storing the information in the Products table would result in empty space for every product to which it doesn’t apply, you place it in a separate table. Like the Products table, you use the ProductID as the primary key. The relationship between this supplemental table and the Product table is a one-to-one relationship. For each record in the Product table, there exists a single matching record in the supplemental table. When you do identify such a relationship, both tables must share a common field.

When you detect the need for a one-to-one relationship in your database, consider whether you can put the information from the two tables together in one table. If you don’t want to do that for some reason, perhaps because it would result in a lot of empty space, the following list shows how you would represent the relationship in your design:

If the two tables have the same subject, you can probably set up the relationship by using the same primary key in both tables.
If the two tables have different subjects with different primary keys, choose one of the tables (either one) and insert its primary key in the other table as a foreign key.

Determining the relationships between tables helps you ensure that you have the right tables and columns. When a one-to-one or one-to-many relationship exists, the tables involved need to share a common column or columns. When a many-to-many relationship exists, a third table is needed to represent the relationship.

FAQ - Database

1. OLTP vs OLAP ?
2. OLTP Database Design ?
3. Normalization ?
4. How to specify primary key in a table ?

OLTP vs. OLAP

We can divide IT systems into transactional (OLTP) and analytical (OLAP). In general we can assume that OLTP systems provide source data to data warehouses, whereas OLAP systems help to analyze it.

OLTP (On-line Transaction Processing):

- This system is characterized by a large number of short on-line transactions (INSERT, UPDATE, DELETE).

- OLTP System deals with operational data. Operational data are those data involved in the operation of a particular system.

- In an OLTP system data are frequently updated and queried. So quick response to a request is highly expected. Since the OLTP systems involve large number of update queries, the database tables are optimized for write operations.

- The main emphasis for OLTP systems is put on very fast query processing, maintaining data integrity in multi-access environments and an effectiveness measured by number of transactions per second.

- To prevent data redundancy and to prevent update anomalies the database tables are normalized (usually 3NF). Set of tables that are normalized and fragmented. Normalization makes the write operation in the database tables more efficient.

Example: In a banking System, you withdraw amount from your account. Then Account Number, Withdrawal amount, Available Amount, Balance Amount, Transaction Number etc are operational data elements.

OLAP (On-line Analytical Processing):

- This system is characterized by relatively low volume of transactions.

- In OLAP database there is aggregated, historical data, stored in multi-dimensional schemas (usually star schema).

- Updates are very rare in a Data warehouse. OLAP quiries will give aggregated information about the things happend in the past over a period of time and this will help the management in strategic decision making. Hence OLAP Queries are of having significant importance in strategic decision making.

Example: If we collect last 10 years data about flight reservation, The data can give us many meaningful information such as the trends in reservation. This may give useful information like peak time of travel, what kinds of people are travelling in various classes (Economy/Business)etc.

The following table summarizes the major differences between OLTP and OLAP system design.

	OLTP	OLAP
Application	Operational: ERP, CRM, legacy apps, ...	Management Information System, Decision Support System
Typical users	Staff	Managers, Executives
Horizon	Weeks, Months	Years
Refresh	Immediate	Periodic
Data model	Entity-relationship	Multi-dimensional
Schema	Normalized	Star
Emphasis	Update	Retrieval

Another table to understand the differences -

	OLTP System Online Transaction Processing (Operational System)	OLAP System Online Analytical Processing (Data Warehouse)
Source of data	Operational data; OLTPs are the original source of the data.	Consolidation data; OLAP data comes from the various OLTP Databases
Purpose of data	To control and run fundamental business tasks	To help with planning, problem solving, and decision support
What the data	Reveals a snapshot of ongoing business processes	Multi-dimensional views of various kinds of business activities
Inserts and Updates	Short and fast inserts and updates initiated by end users	Periodic long-running batch jobs refresh the data
Queries	Relatively standardized and simple queries Returning relatively few records	Often complex queries involving aggregations
Processing Speed	Typically very fast	Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes
Space Requirements	Can be relatively small if historical data is archived	Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP
Database Design	Highly normalized with many tables	Typically de-normalized with fewer tables; use of star and/or snowflake schemas
Backup and Recovery	Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability	Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method

2.24.2015

SSIS, SSAS and SSRS

SQL Server Business Intelligence implementation steps:
Collect the data (SSIS) -> Analyze(SSAS) -> view/report (SSRS)
SSIS => Integration - ETL
SSAS => Analysis
SSRS => Reporting

SQL Server Integration Services (SSIS)

The SQL Server Integration Services (SSIS) is the data-warehousing arm, equipped with superior Extract, Transform and Load (ETL) capabilities. It provides the vehicle for moving data from different data sources to another and changing the data, if necessary. The three components in the SSIS platform for starting the data integration process are the Import and Export Wizard, SSIS Designer and SSIS API Programming. The Import and Export Wizard simply transfers data from source to destination but does not include data transformation capabilities. The SSIS Designer is an integrated component of the Business Intelligence Development Studio used for the development and maintenance of integration services packages. SSIS API Programming module allows you to code SSIS packages using any number of programming languages.

video Link - what is SSIS?

Project/Package/xml file

| |-------> Extract/Read the data

|---------> Task1----|-------> Transform the data

| |-------> Load the new form of data

|---------> Task2--------

So here the Control flow is task1,task2..
And, data flow (or ETL) is under the control flow.

Each Package has control flow(s) and each control flow has data flow(s).

SQL Server Reporting Services (SSRS)

The SQL Server Reporting Services (SSRS) is a framework of reporting mechanisms such as the Report Builder, Report Designer, Report Manager and Report Server that work together through a Web interface to enable the development of concise interactive reporting solutions in print or Web format. Report Builder and Report Designer are two SSRS components for generating reports. Report Builder is a simple solution for the information worker or business user to create quick reports without the need to understand the core structure of the data. The Report Designer is a tool for developers because it adds complexity to custom reports development. Using this tool requires an understanding of the Business Intelligence Development Studio shell in Visual Studio and the underlying structure of the data. According to Microsoft, the Report Server is the core process engine in SSRS that manages the processing and delivery of reports using processors. The Report Manager is an administrative tool that controls reporting services through a Web interface.

SQL Server Analysis Services (SSAS)

The SQL Server Analysis Services (SSAS) is a multidimensional analysis tool that features Online Analytical Processing, powerful data mining capabilities, and deeper dimensions to business information within a relational database. Multidimensional analysis is an OLAP technique that produces the ability to analyze large quantities of data by storing data in axes and cells instead of the traditional relational two-dimensional view in rows and columns. SSAS places predictive analytic capabilities in the hands of information workers by creating an instant connection to backend data using familiar applications such as Microsoft Excel and SharePoint for analysis, visual presentation and collaboration.

2.23.2015

Comparison between Oracle and Microsoft SQL Server

There are many similarities and differences between the two database platforms.

Language

SQL: Both systems use a version of Structured Query Language (SQL).

PLSQL vs T-SQL: Oracle uses Procedural Language/SQL (PL/SQL).

MS SQL Server uses Transact SQL (T-SQL), which is an extension of SQL originally developed by Sybase and used by Microsoft. Both are different “flavors” or dialects of SQL and both languages have different syntax and capabilities. The main difference between the two languages is how they handle variables, stored procedures, and built-in functions.

PL/SQL in Oracle can also group procedures together into packages, which can’t be done in MS SQL Server. PL/SQL is comparatively complex and potentially more powerful, while T-SQL is much more simple and easier to use.

Transaction Control

Another one of the biggest differences between Oracle and MS SQL Server is transaction control. A transaction can be defined as a group of operations or tasks that should be treated as a single unit. For instance, a collection of SQL queries modifying records that all must be updated at the same time, where (for instance) a failure to update any single records among the set should result in none of the records being updated. By default, MS SQL Server will execute and commit each command/task individually, and it will be difficult or impossible to roll back changes if any errors are encountered along the way. To properly group statements, the “BEGIN TRANSACTION” command is used to declare the beginning of a transaction, and either a COMMIT statement is used at the end. This COMMIT statement will write the changed data to disk, and end the transaction. Within a transaction, ROLLBACK will discard any changes made within the transaction block. When properly used with error handling, the ROLLBACK allows for some degree of protection against data corruption. After a COMMIT is issued, it is not possible to roll back any further than the COMMIT command.

Within Oracle, on the other hand, each new database connection is treated as new transaction. As queries are executed and commands are issued, changes are made only in memory and nothing is committed until an explicit COMMIT statement is given (with a few exceptions related to DDL commands, which include “implicit” commits, and are committed immediately). After the COMMIT, the next command issued essentially initiates a new transaction, and the process begins again. This provides greater flexibility and helps for error control as well, as no changes are committed to disk until the DBA explicitly issues the command to do so.

Organization of Database Objects

The last difference I want to discuss is how the RDBMS organizes database objects. MS SQL Server organizes all objects, such as tables, views, and procedures, by database names. Users are assigned to a login which is granted accesses to the specific database and its objects. Also, in SQL Server each database has a private, unshared disk file on the server. In Oracle, all the database objects are grouped by schemas, which are a subset collection of database objects and all the database objects are shared among all schemas and users. Even though it is all shared, each user can be limited to certain schemas and tables via roles and permissions.

Can I install SQL Server on Linux and UNIX platforms?

No, SQL Server is only available for Windows platforms. However, the SQL Server ODBC driver enables you to access SQL Server from applications running on Linux and UNIX platforms.

SQL Server uses a protocol called Tabular Data Stream (TDS) for communication between client applications and SQL Server. Because TDS can be used over TCP/IP, remote client applications can communicate with SQL Server. It is not therefore necessary for client applications and SQL Server to run on the same machine.

ODBC, which provides a standard way for applications to access databases, is one of the available client library APIs that can provide an interface to TDS. Microsoft has supported ODBC in all versions of SQL Server and has committed to continue to support ODBC in future versions of SQL Server. Microsoft’s SQL Server Native Client, which enables Windows client applications to access SQL Server, contains an ODBC driver.

The SQL Server ODBC driver provides ODBC access to SQL Server for Linux and UNIX applications, and uses TDS to communicate with SQL Server.

If you want to use SQL Server, but do not have any Windows machines at your site, you can use SQL Azure instead. SQL Azure provides database functionality as a pay-as-you-go utility service and is built on SQL Server technologies. Because SQL Azure uses Tabular Data Stream (TDS), client applications can use the same tools and libraries to access SQL Azure Database as they do to access SQL Server. The SQL Azure ODBC driver enables you to access SQL Azure from Linux and UNIX platforms.

Back To FAQ

FAQ - SQL Server

1. Can I install SQL Server on Linux and UNIX platforms ?
2. Comparison between Oracle and Microsoft SQL Server ?

3. What do you know about SSIS, SSAS and SSRS?

Define Some DB Server tools of SQL server:

Profiler, DTA, SSMS, SAC, SSCM, PerfMon, DMVs, System procs

Profiler

SQL Server Profiler can perform various significant functions such as tracing what is running under the SQL Server Engine's hood, and finding out how queries are resolved internally and what scripts are running to accomplish any T-SQL command. The major functions this tool can perform have been listed below:

- Creating trace
- Watching trace
- Storing trace
- Replaying trace

Trace includes all the T-SQL scripts that run simultaneously on SQL Server. As trace contains all the T-SQL scripts running on SQL Server, it often tends to become considerably huge. Hence, it is always a good practice to capture only those data that are actually required for analysis.

DTA - Database Tuning Advisor

In SQL Server, The DTA utility is the command prompt version of Database Engine Tuning Advisor. The DTA utility is designed to allow you to use Database Engine Tuning Advisor functionality in applications and scripts.

Like Database Engine Tuning Advisor, the DTA utility analyzes a workload and recommends physical design structures to improve server performance for that workload. The workload can be a plan cache, a SQL Server Profiler trace file or table, or a Transact-SQL script. Physical design structures include indexes, indexed views, and partitioning. After analyzing a workload, the DTA utility produces a recommendation for the physical design of databases and can generate the necessary script to implement the recommendation. Workloads can be specified from the command prompt with the -if or the -it argument. You can also specify an XML input file from the command prompt with the -ix argument. In that case, the workload is specified in the XML input file.

SSMS - SQL Server Management Studio

SSMS is a comprehensive utility for managing SQL Server objects that combines an easy-to-use graphical interface with rich scripting capabilities. Management Studio can be used to manage Database Engine, Analysis Services, Integration Services, and Reporting Services.

SAC - Surface Area Configuration

In the default configuration of new installations of SQL Server, many features are not enabled. SQL Server selectively installs and starts only key services and features, to minimize the number of features that can be attacked by a malicious user. A system administrator can change these defaults at installation time and also selectively enable or disable features of a running instance of SQL Server. Additionally, some components may not be available when connecting from other computers until protocols are configured.

In Microsoft SQL Server, there are a number of services, ports, and connections that can be configured to control how much “surface area” is exposed. For instance, the SQL Server Surface Area Configuration tool allows administrators to control connection access as Local connections only, or Local and Remote connections.

SSCM - SQL Server Configuration Manager

SSCM is a tool to manage the services associated with SQL Server, to configure the network protocols used by SQL Server, and to manage the network connectivity configuration from SQL Server client computers. SQL Server Configuration Manager is a Microsoft Management Console snap-in that is available from the Start menu, or can be added to any other Microsoft Management Console display. Microsoft Management Console (mmc.exe) uses the SQLServerManager10.msc file in the Windows System32 folder to open SQL Server Configuration Manager.

PerfMon - Performance Monitor

Perfmon measures performance statistics on a regular interval, and saves those stats in a file. The database administrator picks the time interval, file format, and which statistics are monitored. After the stats are gathered over a period of time (hours or days), we can do analysis by opening the results in Excel and setting up some basic formulas.

Perfmon is not just for SQL Server tuning; system administrators use it to monitor performance on Windows itself, Exchange, file & print servers, and anything else that can run into bottlenecks. As a result, it is easy to find Perfmon information online, but it's not always specific to SQL Server. Since every application has its own set of statistics, it helps to get SQL-related tips.

DMVs - Dynamic management views

DMVs and functions return server state information that can be used to monitor the health of a server instance, diagnose problems, and tune performance.
Using DMVs, metadata could be exposed regarding the connections, sessions, transactions, statements and processes, which are, or have been, executing against a database instance.

System procs - System stored Procedures

Many administrative and informational activities can be performed by using system stored procedures, in SQL server.

Helpful links -

Shikkhok.com

2.04.2015

Hadoop

Hadoop is an open-source project from Apache. It is developing a software library for reliable, scalable, distributed computing systems capable of handling the Big Data and provides the first viable platform for Big Data analytics. Hadoop is already used by most Big Data pioneers. For example, LinkedIn currently uses Hadoop to generate over 100 billion personalized recommendations every week.

What Hadoop does is - distribute the storage and processing of large data sets across groups or "clusters" of server computers using a simple programming model. The number of servers in a cluster can also be scaled easily as requirements dictate, from maybe 50 machines to perhaps 2000 or more. Whereas traditional large-scale computing solutions rely on expensive server hardware with a high fault tolerance. Hadoop detects and compensates for hardware failures or other system problems at the application level.

Technically, Hadoop consists of two key elements -

1. HDFS:
Hadoop Distributed File System (HDFS), which permits the high-bandwidth, cluster-based storage essential for Big Data computing.

2. MapReduce:
The second part of Hadoop is then a data processing framework called MapReduce.

Google developed MapReduce to support distributed computing on large data sets on Computer Clusters. Inspired by Google's MapReduce and Google File System (GFS) papers, Doug Cutting created Hadoop while he was at Yahoo.

Based on Google's search technology, this distributes or "maps" large data sets across multiple servers. Each of these servers then performs processing on the part of the overall data set it has been allocated, and from this creates a summary. The summaries created on each server are then aggregated in the so-termed "Reduce" stage. This approach allows extremely large raw data sets to be rapidly pre-processed and distilled before more traditional data analysis tools are applied.

At present, many Big Data pioneers are deploying a Hadoop ecosystem alongside their legacy IT systems in order to allow them to combine old and new data in new ways. However, in time, Hadoop may be destined to replace many traditional data warehouse and rigidly-structured relational database technologies and to become the dominant platform for many kinds of data processing.

Many organizations are unlikely to have the resources and expertise to implement their own Hadoop solutions. Fortunately they do not have to, as cloud solutions are already available. Offered by providers including Amazon, Netapp and Google, these allow organizations of all sizes to start benefiting from the potential of Big Data processing. Where public Big Data sets need to be utilized, running everything in the cloud also makes a lot of sense, as the data does not have to be downloaded to an organization's own system. For example, AWS already hosts many public data sets. These include US and Japanese Census data, and many genomic and other medical and scientific Big Data repositories.

Video link -

Explaining Big Data and Hadoop

What is Big Data? Big Data Explained (Hadoop and MapReduce)

Big Data

Why we need to know Big Data?

Big data will continue to grow in next years, due in part to the rise of the Internet of Things, which has the power to embed technology in practically anything. As ever-larger volumes of data are created, it's vital to know how to collect and analyze that data particularly when it's related to customer preferences and business processes. No matter what industry you're in, you'll miss out on key marketing and decision-making opportunities by ignoring big data.

Video link on Big Data basic -
DataBig Data Opportunity: Structured vs. Unstructured Data

What is Big Data?

Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques. In most enterprise scenarios the data is too big or it moves too fast or it exceeds current processing capacity. Big data has the potential to help companies improve operations and make faster, more intelligent decisions.

Is Big Data a Volume or a Technology?

While the term may seem to reference the volume of data, that isn't always the case. The term big data, especially when used by vendors, may refer to the technology (which includes tools and processes) that an organization requires to handle the large amounts of data and storage facilities. The term big data is believed to have originated with Web search companies who needed to query very large distributed aggregations of loosely-structured data.

An Example of Big Data

An example of big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people - all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.

“So in your own company, look at your own structured data, look at the data that you have, look for insights in the unstructured data, and perhaps more importantly, look for opportunities to accumulate more data that provides some value – any data you can get your hands on and start accumulating it, because that’s a Growing asset.

And the beauty of big data is you can sell it multiple times. You can keep selling the same asset over and over again for different applications, for different uses, for different opportunities. So look at the unstructured data and get as much of it as you can.”

Big Data Sources

User , Application, System and Sensor generated Data

Big data sources

The Future of Big Data: IT Insider with MIT Prof. Sam Madden

Tools and Technology

Big Data is too large to process using traditional methods.

Amazon AWS Big Data solutions for every stage of the big data life-cycle:

Collect > Stream > Store > RDBMS | Data Warehouse | NoSQL > Analytics > Archive

Some technologies that are the driver of Big Data system are as follows -

Hosting Environment:

Distributed Servers or Cloud. For example -

Amazon Elastic Compute Cloud (Amazon EC2)

Data storage:

Distributed Storage. For example -

Amazon Simple Storage Service (Amazon S3)

Programming Model:

Distributed Processing or Distributed Computing System. For example -

Hadoop - MapReduce and HDFS

Database:

High performance Schema free databases. For example -

Relational Database

- Amazon RDS (Relational Database Service)

NoSQL Database

- Amazon DynamoDB

- MongoDB (open source)

- Hbase

- Cassandra

Data warehouse:

Amazon Redshift

Different Operations performed on Data:

Analytic/Semantic Processing

2.25.2015

OLTP System Online Transaction Processing (Operational System)

OLAP System Online Analytical Processing (Data Warehouse)

2.24.2015

2.23.2015

2.04.2015

OLTP System
Online Transaction Processing
(Operational System)

OLAP System
Online Analytical Processing
(Data Warehouse)