Database Archives - NinethSense oWnZ mE!

September 24, 2020

Whitepaper on Modern Enterprise Data Management in Healthcare

Here is my attempt to create first draft of my whitepaper on Data Governance. Will be working on improving the version my missing elements soon.

Read paper here :

First page for preview:

May 7, 2020

Clone/Copy SQL Server Database with timestamp in name

This script works on Azure SQL Database also.

 DECLARE @cmd nvarchar(255) SET @cmd = N'CREATE DATABASE ' + CONCAT('MyDB_', REPLACE(REPLACE(REPLACE(CONVERT(VARCHAR(16), GETDATE(), 120), '-', ''),':',''),' ','_')) + ' AS COPY OF MyDB' EXECUTE sp_executesql @cmd

April 23, 2020April 23, 2020

UNPIVOT in SQL

Below example converts rows to columns using UNPIVOT clause.

Assuming you have a table like this:

And you want to convert like this:

Use the query:


SELECT 
     ID, col, val 

FROM 
     Table_1 

UNPIVOT 
     (val for col in (col1, col2, col3)) p

You can also apply WHERE clause like this:


SELECT 
     ID, col, val 

FROM 
     Table_1 

UNPIVOT 
     (val for col in (col1, col2, col3)) p 

WHERE id=1

Result:

March 17, 2019

SSAS: Dimension Relationships in Cubes

“Dimension relationship” refers to the direct or indirect relationships between dimension and its measure groups in a Cube.

Regular	Refers to a standard relationship, when a Key column in the dimension is directly joined to fact table.
Reference	When a Key column in the dimension is indirectly joined to fact table by referencing another dimension.
Fact / Degenerate	Dimensions constructed from attribute columns in fact tables than from attribute columns in dimension tables.
Many-to-Many	One dimension is associated with multiple facts

Note: My study notes

February 12, 2019February 12, 2019

Kimball DW/BI Lifecycle Methodology

Must read for data enthusiasts – https://www.kimballgroup.com/data-warehouse-business-intelligence-resources/kimball-techniques/dw-bi-lifecycle-method/

November 21, 2018

Why Cosmos DB may not be apt for building Data Warehouse?

Well, the question is slightly wrong until the context is specified because it is possible to build Modern Data Warehouse by including Cosmos DB in the architecture. This is too much relevant today because the data is no more straight forward content with human readable entities and relations (structured), but unstructured and/or streaming too. Also the pace of the data flow, or business requirement is becoming near real-time.

See a reference architecture below:

Image Source: MS Docs

Here, in this blog, the context is about Traditional Data Warehouse possibility, where you will be modelling the data, specifying relationships, etc. Let us look at the definition of Data Warehouse mentioned in Oracle Docs:

“A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing.”

Now let us ask the right question – Why Cosmos DB may not be apt for using as a data store in a Data Warehouse? – It is not apt, because, Cosmos DB is a NoSQL database where it is literally not easy to draw relationships between entities/tables/data. Check what MSDN blog said about this:

“Cosmos DB is not a relational database. You cannot just take your relational database and expect it to run in Cosmos DB. You could move tables of data into Cosmos, but not the relational aspects of your existing data structures.”

As of today, this is the conclusion. But we cannot say tomorrow what will happen to these concepts because Cosmos DB is becoming powerful and I am already in love with it.

You can read common scenarios (use cases) where you can use, or the companies use Cosmos DB here.

Do you have different thoughts on this? Please comment.

October 15, 2018

Getting started with Azure Databricks

Introduction

What is Azure Databricks?

Azure Databricks is the same Apache Databricks, but a managed version by Azure. This managed service allows data scientists, developers, and analysts to create, analyse and visualize data science projects in cloud.

Databricks is a user friendly, analytics platform built on top of Apache Spark. Databricks acts as an UI layer, a WYSIWYG dashboard where you can create clusters, manage notebooks, write code and analyse data without knowing the internals of the system. Apache Spark is a unified analytics engine for large scale data processing and currently it supports popular languages such as Python, Scala, SQL and R.

About the article

If you know Apache Databricks already, then a tutorial is not necessary to get started because Azure Databricks also uses the same management portal used by Databricks.

Though there are different strategies possible to create and manage Databricks projects, I have followed below flow in this article:

Screenshots and steps provided in this article are valid as on 20 Sept 2018. Advancement in technology happening at a faster pace so as the Azure portal upgrades. So, please be aware of any portal flow changes when you try out the same. I will try to keep this tutorial up to date.

Login to Azure Portal

You must be having at least a trial account to get started. Visit Azure home page to get one – https://azure.microsoft.com/

Step 1: Create your first Databricks workspace

First step in creating a Databricks project is by creating a Workspace.

Typical steps will be to click “+ Create a resource” à “Analytics” à Azure Databricks

In the workspace creation wizard, you will have to provide below details:

A. Workspace name: Give a unique name (retry until you get a green tick mark at the right. You get a red X mark because someone already took your favourite names).

B. Subscription: Choose an appropriate subscription plan, or leave the default value if you do not know what this is about

C. Resource Group: Choose an existing resource group, or give a new one. (Provide a new name if you do not know what this box is about)

D. Location: This is the data center. Select your nearest location in the dropdown, or keep the default

E. Pricing Tier: Now this is about cost so be careful. I would prefer to go with a Free trial if I am doing this for learning purpose. You can read more about the pricing tiers here.

Click “Create” button and wait till the workspace get created. This will take couple of minutes and you will get the notification once it is completed.

Once he workspace is created, you can go to “All resources” and click your newly created workspace name in the list.

The resource dashboard will look like this:

Now it is time for some action. Click “Launch Workspace” button, and you will be directed to a new browser page. You will be signed into the portal automatically.

Your Azure Databricks journey starts here.

From here, there are different strategies possible to execute projects. Since a full-fledged project which includes a meaningful data analysis is out of scope of this article, we will try out a simple example like querying a dataset or plotting a bar chart.

Let us load a dataset and visualize using a notebook.

For the purpose, I have downloaded a dataset from internet, which is about the literacy rate in India. You may also download a freely available one, or create a dataset of your own. We are not going to do any complex analysis in this example so this simple dataset is enough. May note that the values in the dataset are not real values. My CSV file looks like this, with first row as header row.

Create Cluster

For storing the data and doing processing, we need some powerful machines. Let us call it clusters and create one in this section.

On the dashboard, click on “New Cluster”

I am giving the cluster a name “MyFirstCluster”. If you are good in Azure portal already then you know most of the input parameters mentioned in the page. Otherwise if you are a beginner, I suggest you to leave all the other settings ‘as it is’ and click “Create Cluster” button to proceed further.

It will take some time to complete the cluster creation. For me it took about 5-10 minutes. You can see the status of cluster creation in next screen.

Once the cluster is created, the status will change from “Pending” to “Running”

Once the cluster is crated then we are read to upload data or creating notebooks. Let us upload the data first.

Upload data

Upload the already prepared/downloaded dataset to the newly created cluster.

Go back to the dashboard and click “Upload Data”

In the next screen, give the dataset a name and upload the dataset. In my case I am using a CSV file with some 35 rows. Your dataset can be a bigger one but note that depending on the size of the dataset the upload and processing can take more time.

Once upload is completed, you can create the Notebook.

Create Notebook

A Notebook in the context is an interactive web based editor which allows data scientists, analysts and developers to write and collaborate scripts and notes to analyse and visualize.

You can either create the Notebook by clicking “Create Table” in the Dashboard screen, or as the continuation of the last step. When you click “Create Table in Notebook” button in the above screen, Databricks service will create sample notepad for you with sufficient sample code, with python as the default language.

Make sure that you have the cluster attached to this notepad. If you see “Detached” status at left-top side, then make sure to choose a cluster by clicking on the “detached” text. Without a cluster, you cannot run the scripts.

Now it is time to test the script. You can see the sample python scripts in various script boxes in the page. You can click on the play button you see on right-top side of any script snippet box:

You should be able to see the script getting executed and result will be displayed below in the form of a table. If there are errors, you will be provided with proper error messages which you can use to debug the script.

Now it is your time for experimenting and more learning.

As a bonus, let us see how to visualize the same data using a bar chart. Click on the bar chart icon. If you do not see any charts auto generated, then click “Plot Options” and play around with the parameters.

Click “Apply”, and now you can see the bar chart updated in the Notebook.

Happy Learning!

References:

September 29, 2018September 29, 2018

SQL Server 2016 Row Level Security (RLS) Implementation

Row Level Security Capability was introduced with 2016 version SQL Server. Same is available in Azure SQL Database also as of today. This blog will detail a novice example on how to implement.

This is the planned implementation flow:

About the example
Prepare Sample data
Create Predicate function
Apply Security Policy
Test Security

1. Plot

For the purpose of example, we will take the case of an imaginary Super market. Let us assume there are Supervisors assigned to each department in the shop and we want each supervisor to see only items he is responsible for.

RLS is applied on tables but in this example we will apply to a VIEW, which makes more sense as it is close to the real world scenarios.

2. Prepare sample data

Find the schema and sample data I used for the example:

Table: dbo.Employee



CREATE TABLE [dbo].[Employee](

	[EmpID] [int] NULL,

	[Department] [varchar](50) NULL,

	[Name] [nvarchar](150) NULL,

	[Username] [varchar](50) NULL

) ON [PRIMARY]

Table: dbo.StockByDepartment



CREATE TABLE [dbo].[StockByDepartment](

	[Department] [varchar](50) NULL,

	[Item] [nvarchar](100) NULL,

	[UnitPrice] [money] NULL

) ON [PRIMARY]

View: dbo.Stock



CREATE VIEW [dbo].[Stock]

WITH SCHEMABINDING

AS

    SELECT e.Department, e.EmpID, e.Name, e.Username, d.Item, d.UnitPrice FROM dbo.StockByDepartment d

    INNER JOIN dbo.Employee e

    ON d.Department=e.Department

3. Create Predicate function

Run this script to create predicate function over dbo.Stock view.



CREATE FUNCTION fn_SecurityPredicate(@username sysname)

RETURNS TABLE

WITH SCHEMABINDING

AS

	RETURN SELECT 1 AS [fn_SecurityPredicate_result]

	FROM

		dbo.Stock

	WHERE

		@username=USER_NAME();

4. Apply Security Policy

Run this script to apply security policy over dbo.Stock view



CREATE SECURITY POLICY DepartmentFilter

ADD FILTER PREDICATE dbo.fn_SecurityPredicate(Username)

ON dbo.Stock

5. Test Security

Now it is time for us to try out the applied security with various users.

Create sample users with the name you have provided in “Login” field of dbo.Employee table, and login to SSMS using it and try SELECTing the records in dbo.Stock.

For demo purpose, below code will create some sample users and grant permission to dbo.Stock view:



CREATE User paul WITHOUT LOGIN
GRANT SELECT on dbo.Stock TO paul

Now, for the sake of testing, you can use the below code:



EXECUTE('SELECT * FROM dbo.Stock') as USER='paul'

You should see filtered data like the one below:

July 20, 2018July 30, 2018

T-SQL script to create SQL Job for daily database backup

This script will create an SQL Server Agent job, which will take daily database backup to a folder with date appended in the filename.



DECLARE @job varchar(100) = 'Backup_testdb_daily' -- Name of Job

DECLARE @db varchar(100) = 'testdb' -- DB to backup

DECLARE @bakfile varchar(100) = 'd:\_temp\' + @db -- Backup file path
DECLARE @date varchar(8)  = '20180720' -- Job Start date

DECLARE @time varchar(8) = '135400' -- Job run time. Eg: Run At 23rd hour
--------------------------------------------------------------------
DECLARE @cmd varchar(200) = CONCAT ('DECLARE @bakfile varchar(200) = ''' , @bakfile , ''' + ''_'' + convert(varchar(100),GetDate(),112) + ''.bak'';') +
CONCAT('BACKUP DATABASE ', @db, ' TO DISK = @bakfile');
USE msdb
EXEC dbo.sp_add_job

	@job_name = @job;
EXEC sp_add_jobstep

	@job_name = @job,

	@step_name = 'Backup database',

	@subsystem = 'TSQL',

	@command = @cmd
EXEC sp_add_jobschedule

	@job_name = @job,

	@name = 'DB Backup Schedule',

	@freq_type = 4, -- daily

	@freq_interval = 1,

	@active_start_date = @date,

	@active_start_time = @time
EXEC dbo.sp_add_jobserver

	@job_name = @job,

	@server_name = @@SERVERNAME