SQL SERVER – Move Data Files

Here is how you can move data files in SQL Server.

1. Retrieve DB LogicalName and current location.

SELECT name, physical_name, state_desc AS OnlineStatus
FROM sys.master_files  
WHERE database_id = DB_ID(N'AdventureWorks2014')  
GO

2. Set Database offline.

ALTER DATABASE AdventureWorks2014 SET OFFLINE;
GO

3. If your database it is taking too slow to become offline, you should kill all session ID’s that are connected to the database (including the sleeping connections). I have a script below to do just that. Just specify the database name for the @DBName variable.

DECLARE @DBName VARCHAR(50) = 'AdventureWorks2014'
DECLARE @sql NVARCHAR (500)
DECLARE @spid INT
DECLARE @numberOfRowsToKill INT
DECLARE @i INT = 0
DECLARE @sessionID INT

SET NOCOUNT ON

SET @sql = 'sp_who'

IF OBJECT_ID('tempdb..#spids') IS NOT NULL
	DROP TABLE #spids

CREATE TABLE #spids
(
	spid INT,
	ecid INT,
	status VARCHAR(30),
	loginname VARCHAR(30),
	HostName VARCHAR(50),
	Blk INT,
	DBName VARCHAR(30),
	Command VARCHAR(MAX),
	RequestID INT
)

INSERT INTO #spids
EXEC (@sql)


SELECT @numberOfRowsToKill = COUNT(*) FROM #spids WHERE DBName = @DBName

IF  @i < @numberOfRowsToKill
	WHILE @i < @numberOfRowsToKill
	BEGIN
		SELECT TOP 1 @sessionID = spid from #spids WHERE DBName = @DBName
		SET @sql =  'kill ' + CAST(@sessionID AS NVARCHAR(10))

		EXEC (@sql)
		PRINT 'SPID ' + CAST(@sessionID AS VARCHAR) + ' has been killed'

		DELETE FROM #spids WHERE spid = @sessionID
		SET @i = @i + 1
	END
ELSE
	PRINT 'There are no more sessions connected to ' + @DBName + ' database.'


Preview(opens in a new tab)

4. Move the physical files using Powershell. Run Powershell as Administrator and run the following command.

Move-Item -Path "C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\CreditData.mdf" -Destination "C:\SQL\DATA"

Move-Item -Path "C:\Program Files\Microsoft SQL Server\MSSQL12.MSSQLSERVER\MSSQL\DATA\CreditLog.ldf" -Destination "C:\SQL\DATA"

5. Modify the file name. Specify the new file path.

ALTER DATABASE AdventureWorks2014 MODIFY FILE (NAME = 'LogicalName'	,FILENAME = 'C:\SQL\DATA\CreditData.mdf')
GO

ALTER DATABASE AdventureWorks2014 MODIFY FILE (NAME = 'LogicalName',FILENAME = 'C:\SQL\DATA\CreditLog.ldf')
GO

6. Set database online.

ALTER DATABASE AdventureWorks2014 SET ONLINE;
GO

7. Verify that the change is okay.

SELECT name, physical_name, state_desc AS OnlineStatus
FROM sys.master_files  
WHERE database_id = DB_ID(N'AdventureWorks2014')  
GO

Cheers!

Knowledge worth sharing...Share on linkedin
Linkedin
Share on facebook
Facebook
Share on google
Google
Share on twitter
Twitter

SQL SERVER – The logical file name is already in use

I just encountered the error below.

Msg 1828, Level 16, State 4, Line 3
The logical file name “OTDB_FG_WRTestResults_01” is already in use. Choose a different

I was able to solve it by executing the following scripts. Essentially, you just have to rename the logical file name to different one, then rename it back to your preferred name.

ALTER DATABASE [DBName] MODIFY FILE (NAME=N'OTDB_FG_WRTestResults_01', NEWNAME=N'OTDB_FG_WRTestResults_01_Temp')

The file name ‘OTDB_FG_WRTestResults_03_Temp’ has been set.

Then change it back to the original name

ALTER DATABASE [OptimalTestDB] MODIFY FILE (NAME=N'OTDB_FG_WRTestResults_01_Temp', NEWNAME=N'OTDB_FG_WRTestResults_01')

Done.

Cheers!

SQL SERVER – Backup Database to Multiple Files (Faster, and more manageable files)

Benefits of writing database backups to multiple files:

  • Backup process can finish faster because it will run using multiple threads
  • Better I/O throughput if you have multiple disk arrays where you can write your  backup files.

Here are two methods of writing your backup to multiple files:

Using SQL Server Management Studio 

Using T-SQL

BACKUP DATABASE [AdventureWorks2014] TO  
DISK = N'C:\SQLBackup\AdventureWorks_backupfile1.bak',  
DISK = N'C:\SQLBackup\AdventureWorks_backupfile2.bak', 
DISK = N'C:\SQLBackup\AdventureWorks_backupfile3.bak',  
DISK = N'C:\SQLBackup\AdventureWorks_backupfile4.bak' 
WITH NOFORMAT, NOINIT,  NAME = N'AdventureWorks2014-Full Database Backup', SKIP, NOREWIND, NOUNLOAD, COMPRESSION,  STATS = 10
GO

SQL SERVER – Trace Flag 1117 and 1118

Trace flag 1117 will allow us to take care of tempdb contention. It permits us to grow our data files at the same time.

We also use 1118 to prevent tempdb contention. Trace flag 1118  deals with how we allocate extents whether it be mixed extents or uniform extents

These trace flags are enabled by default in SQL Server 2016. For prior versions, you have to manually turn-on these trace flags.

Below are the ways to enable Trace Flags:

To Enable Trace Flags at Startup

Go to Configuration Manager > Select your instance > Choose Properties > Go to Startup Parameters tab.

Enable Trace Flags with DBCC TRACEON/TRACEOFF

Add the -1 to enable it globally on the instance

DBCC TRACEON( 1117, -1)

DBCC TRACEON( 1118, -1)

To see which trace are enabled:

DBCC TRACESTATUS

 

 

SQL SERVER – Changing MAXDOP and Cost Threshold of Parallelism

Firstly, you have to check out Recommendations and guidelines for the “max degree of parallelism” configuration option in SQL Server .

The KB basically says, if we have 8 or more logical processors in a NUMA node, generally we will get the best performance by setting maxdop to 8 or lower.

Secondly, use the script below to get to instantly get the suggested MAXDOP for your SQL instance.

DECLARE @CoreCount int;
DECLARE @NumaNodes int;

SET @CoreCount = (SELECT i.cpu_count from sys.dm_os_sys_info i);
SET @NumaNodes = (
    SELECT MAX(c.memory_node_id) + 1 
    FROM sys.dm_os_memory_clerks c 
    WHERE memory_node_id < 64
    );

IF @CoreCount > 4 /* If less than 5 cores, don't bother. */
BEGIN
    DECLARE @MaxDOP int;

    /* 3/4 of Total Cores in Machine */
    SET @MaxDOP = @CoreCount * 0.75; 

    /* if @MaxDOP is greater than the per NUMA node
       Core Count, set @MaxDOP = per NUMA node core count
    */
    IF @MaxDOP > (@CoreCount / @NumaNodes) 
        SET @MaxDOP = (@CoreCount / @NumaNodes) * 0.75;

    /*
        Reduce @MaxDOP to an even number 
    */
    SET @MaxDOP = @MaxDOP - (@MaxDOP % 2);

    /* Cap MAXDOP at 8, according to Microsoft */
    IF @MaxDOP > 8 SET @MaxDOP = 8;

    PRINT 'Suggested MAXDOP = ' + CAST(@MaxDOP as varchar(max));
END
ELSE
BEGIN
    PRINT 'Suggested MAXDOP = 0 since you have less than 4 cores total.';
    PRINT 'This is the default setting, you likely do not need to do';
    PRINT 'anything.';
END

 

Cost Threshold Parallelism

Cost Threshold of Parallelism is like a bar that we set to control which queries qualifies to go parallel and which queries are cheap, that it is going to get 1 core no matter what.

The default value is 5 which is very low . The starting value for Cost Threshold of Parallelism is 50.

To change the values of these 2 parameters, execute the scripts below.

-- Change MAXDOP value
EXEC sp_configure 'show advanced options', 1;  
GO  
RECONFIGURE WITH OVERRIDE;  
GO  
EXEC sp_configure 'max degree of parallelism', 8;  
GO  
RECONFIGURE WITH OVERRIDE;  
GO    
 
-- Change Cost Threshold for Parallelism value
EXEC sys.sp_configure N'cost threshold for parallelism', N'50'
GO
RECONFIGURE WITH OVERRIDE
GO

-- Verify the changes made
SELECT
	name
	,value
	,value_in_use
	,description
	,is_dynamic
	,is_advanced
FROM
	sys.configurations WITH (NOLOCK)
WHERE
	name IN ('max degree of parallelism','cost threshold for parallelism',)

 

Cheers!

 

POWERSHELL – Running scripts is disabled on this system (SOLVED)

I encountered the following error after executing a script in Powershell.

The reason for this error is because the Execution Policy is set to Restricted by default. We have to change it to prevent this error by running the script below.

PS C:\> Set-ExecutionPolicy RemoteSigned

Verify the change.

PS C:\> get-executionpolicy

You can also run the script below to bypass the policy.

c:\> powershell -ExecutionPolicy ByPass -File script.ps1

Cheers!

SQL SERVER – Sending HTML Reports to Email Using T-SQL

Below is a script that you can use to send HTML Reports to your email just by using T-SQL. The example below retrieves the disk space usage of the DB Server then sends the report to my email.

 -- Disk Space Usage
USE [master]

begin
set nocount on
/*******************************************************/
/* Enabling Ole Automation Procedures */
exec sp_configure 'show advanced options', 1
RECONFIGURE with override

exec sp_configure 'Ole Automation Procedures', 1
RECONFIGURE with override


/*******************************************************/
DECLARE @hr int
DECLARE @fso int
DECLARE @drive char(1)
DECLARE @odrive int
DECLARE @TotalSize varchar(20) DECLARE @MB Numeric ; SET @MB = 1048576

CREATE TABLE #drives 
(
	drive char(1) PRIMARY KEY, 
	FreeSpace int NULL,
	TotalSize int NULL
) 

INSERT #drives(drive,FreeSpace) 
EXEC master.dbo.xp_fixeddrives 
EXEC @hr=sp_OACreate 'Scripting.FileSystemObject',@fso OUT IF @hr <> 0 
EXEC sp_OAGetErrorInfo @fso

DECLARE dcur CURSOR LOCAL FAST_FORWARD
FOR SELECT drive from #drives ORDER by drive
OPEN dcur FETCH NEXT FROM dcur INTO @drive
WHILE @@FETCH_STATUS=0

BEGIN
	EXEC @hr = sp_OAMethod @fso,'GetDrive', @odrive OUT, @drive
		IF @hr <> 0 EXEC sp_OAGetErrorInfo @fso EXEC @hr =
		sp_OAGetProperty
		@odrive,'TotalSize', @TotalSize OUT IF @hr <> 0 EXEC sp_OAGetErrorInfo
		@odrive UPDATE #drives SET TotalSize=@TotalSize/@MB WHERE
		drive=@drive FETCH NEXT FROM dcur INTO @drive
End
Close dcur
DEALLOCATE dcur
EXEC @hr=sp_OADestroy @fso IF @hr <> 0 EXEC sp_OAGetErrorInfo @fso


 
/*******************************************************/
/* Disabling Ole Automation Procedures */
exec sp_configure 'show advanced options', 1
RECONFIGURE with override

exec sp_configure 'Ole Automation Procedures', 0
RECONFIGURE with override
/*******************************************************/

END

	
-- Check LogSpace Usage
DECLARE @SQLString nvarchar(500);
 
SET @SQLString = 'DBCC SQLPERF(LOGSPACE)'

CREATE TABLE #LogSpace
(
	DatabaseName VARCHAR(MAX)
	,LogSize INT
	,LogSpaceUsedPct INT
	,Status INT
)

INSERT INTO #LogSpace
EXEC sp_executesql @SQLString
GO

-- Check failed job

CREATE TABLE #FailedJobs
(	
	ServerName VARCHAR(50)
	,JobName VARCHAR(50)
	,JobStatus VARCHAR (20)
	,LastRunStatus VARCHAR(30)
	,LastRunDate DATETIME
)

INSERT INTO #FailedJobs
SELECT Convert(varchar(20),SERVERPROPERTY('ServerName')) AS ServerName,
j.name AS job_name,
CASE j.enabled WHEN 1 THEN 'Enabled' Else 'Disabled' END AS job_status,
CASE jh.run_status WHEN 0 THEN 'Error Failed'
				WHEN 1 THEN 'Succeeded'
				WHEN 2 THEN 'Retry'
				WHEN 3 THEN 'Cancelled'
				WHEN 4 THEN 'In Progress' ELSE
				'Status Unknown' END AS 'last_run_status',
ja.run_requested_date as last_run_date
FROM
(msdb.dbo.sysjobactivity ja LEFT JOIN msdb.dbo.sysjobhistory jh ON ja.job_history_id = jh.instance_id)
join msdb.dbo.sysjobs_view j on ja.job_id = j.job_id
WHERE ja.session_id=(SELECT MAX(session_id)  from msdb.dbo.sysjobactivity) 
AND jh.run_status = 0
ORDER BY job_name,job_status

-- Check Error logs
declare @Time_Start datetime;
declare @Time_End datetime;
set @Time_Start=getdate()-2;
set @Time_End=getdate();
-- Create the temporary table
CREATE TABLE #ErrorLog (logdate datetime
                      , processinfo varchar(255)
                      , Message varchar(MAX))
-- Populate the temporary table
INSERT #ErrorLog (logdate, processinfo, Message)
EXEC master.dbo.xp_readerrorlog 0, 1, null, null , @Time_Start, @Time_End, N'desc';



-- Create HTML Table for Disk Space Usage Report

DECLARE @tableHTML NVARCHAR(MAX);

SET @TableHTML = N'<H1>Disk Space Usage</H1>' + N'<table border="1">' + N'<tr>' +
				 N'<th bgcolor="#404040"><font color="#FFFFFF">drive</th><th bgcolor="#404040"><font color="#FFFFFF">Total(GB)</th>
					<th bgcolor="#404040"><font color="#FFFFFF">Free(GB)</th><th bgcolor="#404040"><font color="#FFFFFF">Used(GB)</th>
					<th bgcolor="#404040"><font color="#FFFFFF">PercentUsed</th>' + CAST ((
			  SELECT td = drive
			  ,''
			  ,td = cast(TotalSize/1024.0 as decimal(8,2)) 
			  ,'' 
			  ,td = cast(FreeSpace/1024.0 as decimal(8,2)) 
			  ,''
			  ,td = cast(TotalSize/1024.0-FreeSpace/1024.0 as decimal(8,2))
			  ,''
			  ,td = cast((TotalSize/1024.0-FreeSpace/1024.0)/(TotalSize/1024.0)*100 as decimal (8,2)) 
	   FROM
			  #drives
	   ORDER BY drive
	   FOR XML PATH('tr')
                    ,TYPE
                ) AS NVARCHAR(MAX)) + N'</table>';



-- Create HTML Table for Log Space Used
DECLARE @tableHTML_LogSpaceUsage NVARCHAR(MAX);

SET @tableHTML_LogSpaceUsage = N'<H1>Log Space Used</H1>' + N'<table border="1">' + N'<tr>' +
				 N'<th bgcolor="#404040"><font color="#FFFFFF">DatabaseName</th><th bgcolor="#404040"><font color="#FFFFFF">LogSize(MB)</th><th bgcolor="#404040"><font color="#FFFFFF">LogSpaceUsedPct</th>' + CAST ((
			  SELECT td = DatabaseName
			  ,''
			  ,td = LogSize
			  ,'' 
			  ,td = LogSpaceUsedPct
	   FROM
			  #LogSpace
	   FOR XML PATH('tr')
                    ,TYPE
                ) AS NVARCHAR(MAX)) + N'</table>';



-- Create HTML Table for Failed Jobs
DECLARE @tableHTML_FailedJobs NVARCHAR(MAX);

SET @tableHTML_FailedJobs = N'<H1>Failed Jobs</H1>' + N'<table border="1">' + N'<tr>' +
							N'<th bgcolor="#404040"><font color="#FFFFFF">ServerName</th><th bgcolor="#404040"><font color="#FFFFFF">JobName</th><th bgcolor="#404040"><font color="#FFFFFF">JobStatus</th>
							  <th bgcolor="#404040"><font color="#FFFFFF">LastRunStatus</th><th bgcolor="#404040"><font color="#FFFFFF">LastRunDate</th>' + CAST ((
							  SELECT td = ServerName
								  ,''
								  ,td = JobName
								  ,''
								  ,td = JobStatus
								  ,''
								  ,td = LastRunStatus
								  ,''
								  ,td = LastRunDate
							  FROM
								  #FailedJobs
							  FOR XML PATH('tr')
							               ,TYPE
									  ) AS NVARCHAR(MAX)) + N'</table>';

					
-- Create HTML Table for Error Log
DECLARE @tableHTML_ErrorLog NVARCHAR(MAX);
SET @tableHTML_ErrorLog = N'<H1>Error Log</H1>' + N'<table border="1">' + N'<tr>' +
						  N'<th bgcolor="#404040"><font color="#FFFFFF">LogDate</th><th bgcolor="#404040"><font color="#FFFFFF">Message</th>' + CAST ((
							SELECT td = LogDate
								,''
								,td = Message
							FROM
								#ErrorLog
							
							WHERE 
								(Message LIKE '%error%' OR Message LIKE '%failed%') AND processinfo NOT LIKE 'logon'
							ORDER BY
								logdate DESC
							 FOR XML PATH('tr')
							               ,TYPE
									  ) AS NVARCHAR(MAX)) + N'</table>';
									  
								
			
-- Send Email
DECLARE @AllTables NVARCHAR(MAX);

IF @tableHTML IS NOT NULL
	SET @AllTables = @tableHTML
	
IF @tableHTML_LogSpaceUsage IS NOT NULL
	SET @AllTables = @AllTables + @tableHTML_LogSpaceUsage

IF @tableHTML_FailedJobs IS NOT NULL
	SET @AllTables = @AllTables + @tableHTML_FailedJobs

IF @tableHTML_ErrorLog IS NOT NULL
	SET @AllTables = @AllTables + @tableHTML_ErrorLog

    EXEC msdb.dbo.sp_send_dbmail @body = @AllTables 
        ,@body_format = 'HTML'
        ,@profile_name = N'DBA'
        ,@recipients = N'ricky.valencia@optimalplus.com'
        ,@Subject = N'Database Health Check Report - ASEKH'



DROP TABLE #FailedJobs
DROP TABLE #LogSpace
DROP TABLE #drives
DROP TABLE #ErrorLog

You may put this script in SQL job.

Use this script as a template for whatever report you may think about.

Cheers!

SQL SERVER – Setup Blocked Process Report in Extended Events

In this blog post, I will demonstrate how to setup a Blocked Process Report using Extended Events.

1. In Management Studio under Extended Events, right-click Sessions then choose New Session Wizard.

2. Click Next.

3.  Input Session name. Click Next.

4. Choose “Do not use a template“. Click Next.

5. Search for “blocked” in the Event Library. Select blocked_process_report. Click Next.

6. Check database_name and also sql_text (optional because these values will show in the XML report). Click Next.

7. Click Close.

8. Set the blocked process threshold. In this case I have set it to 10 seconds.

sp_configure 'show advanced options', 1 ;  
GO  
RECONFIGURE ;  
GO  
sp_configure 'blocked process threshold', 10 ;  
GO  
RECONFIGURE ;  
GO  

9. Start Extended Events.

ALTER EVENT SESSION [Blocked Process Report] 
ON SERVER STATE = START 

 

Cheers!

VERTICA – Data Loading

Before you can analyse your data, you need to get it loaded into your database. Vertica supports multiple methods of data loading. We can load structed and semi-structed data directly from delimited data files or we can collect data from other data sources using client connectors such as ODBC, JDBC and ADO.NET.

We can connect to a HADOOP database or move data from Hadoop into VERTICA. We can even load real-time streaming data using KAFKA.

This blog covers how VERTICA manages to quickly load and store data from any source using a hybrid data storage model.

Before we can talk about loading data into the VERTICA database, we need to do a quick review of the construct that the data is loaded into projections. Remember, VERTICA supports tables as a logical concept. Since the SQL language is common and familiar, you can directly act with your data using standard SQL commands which reference tables. However, that is not where the data is stored. We physically store the data in projections (column-oriented structures that are based on the tables) each Projections contains a sorted subset of the rows in your table.

There are two basic types of Projections.  1) Superprojection contains all the columns in your table. For every table, VERTICA creates at least one super projection. While this projection type will allow you to query any data from the table, it may not give you the best query performance. For that we can create 2) Query-Specific Projections which only contain a portion of the tables columns. The columns are ordered and sorted based on your most commonly run queries, increasing the performance even further.

VERTICA OBJECT HEIRARCHY

To understand what happens when data is loaded into VERTICA, you will first need to understand the object hierarchy. While VERTICA runs in a multi-node environment with the database distributed as evenly as possible across all the nodes, for simplicity sake, we are showing the object hierarchy on only one node. We’ve already discussed the tables which defines the data you want to store and projections which organize that data based upon your most commonly run queries to increase performance. But these are only structures for organizing your data and accessing it using a common language.

What happens when data is loaded into your database?

Every time data is writing to disk, VERTICA creates a data storage structure on disk called a CONTAINER. We are only showing the containers created for 4 data loads for this first projection. Although of course, each projection has its own containers created on each write to disk. To maximize query performance, VERTICA has a limit of 1024 containers per projection per node. There is no limit on the size of the containers.

Within each container a file is created on each column in the projection. The dot GT format of the file keeps the data sorted, encoded and identifiable across nodes. This is what allows VERTICA to distribute data across multiple nodes.

VERTICA’s HYBRID STORAGE MODEL

To support loading data into the database intermixed with queries in a typical data warehouse workload. VERTICA implements the storage model shown below. This model is the same on each VERTICA node.

The Write Optimized Store (WOS) is a memory resident data store temporarily storing data in memory speeds up the loading process and reduces fragmentation on disk. The data is still available for queries. For organizations who continually load small amounts of data, loading the data to memory first is faster than writing it to disk, making the data accessible quickly.

The Read Optimized Store (ROS) is a disk resident data store. When the TUPLE MOVER tasks move out its run. Containers are created in the ROS and the data is organized in projections on disk. The TUPLE MOVER is the VERTICA database optimizer component that manages the movement of data from memory (WOS) to disk (ROS), as well as managing the number of containers per projection on the ROS. The TUPLE MOVER command MOVE OUT, moves data from memory to disk. The command MERGE OUT, combines the containers on disk.

LOADING DATA: Trickle and Bulk Loads

Why did VERTICA build this loading model? To support different types and different sizes of data loads. If you have small frequent data loads or trickle loads, The BEST practice is to load the records into memory (into the WOS). Data loaded into the WOS is still available for query results. The size of the WOS is limited to 25% of the available RAM or 2GB whichever is smaller. If the amount of data loaded to WOS exceeds this size, the data is automatically spilled to disk in the ROS. For the initial bulk data load and for subsequent large loads, the best practice is to load the data directly to disk where it will be stored in the ROS. This process leads to the most efficient loading with the least demand on cluster resources.

The Tuple Mover: The moveout Task

Here we show how data is moved from memory to disk using the tuple move tasks MOVEOUT. Initially some data has already been written to disk. You see the containers in the ROS. By default, small data batches are loaded into the WOS. It is written to this memory resident data store in row format. This allows you to query data as fast as it is loaded. Data in the WOS is available to be queried but it is not optimized and volatile. It will take longer to return a result from data in the WOS.

When the WOS reaches its maximum capacity (25% of the available RAM per node up to 2GB or at regularly configured intervals) by default every 5 minutes, the tuple mover task MOVEOUT writes the data to temporary memory space where it is reorganized into columnar sorted and encoded format as defined by the projection definitions.

Once the data reorganization is complete. The MOVEOUT task writes the data to disk in containers in the ROS. When the data is committed to disk, it is removed from the temporary space

The Tuple Mover: The MERGOUT Task

Because the process of querying data from many files can be relatively slow, the MERGEOUT task compresses multiple containers into fewer containers this allows the queries to run more efficiently. By default, the MERGEOUT task runs automatically every 10 minutes. Initially there have been multiple loads of data into the ROS creating multiple containers.

The MERGEOUT task first moves the containers into temp space where they are condensed. The data is resorted and re-encoded.

The condensed containers are written back to the ROS. This process creates new larger containers

Finally, the data is removed from temp space.

VERTICA – Introduction to Projections

To process a simple query (shown below), a row-store database must read all columns in all of the table named in the query regardless of how wide the tables might be or how many columns are actually needed. This limits the speed at which can get an answer to your query. This is even more of a concern as your query get more complex referencing multiple tables to return a result.

Vertica database transforms the information from a table into a column-based structure we call projections.

In this query, we are only referencing 3 of the columns from the table (symbol, date, price).

These are the only columns that need to be contained in the projection. You will get a faster query response by not having to reference all the information contained in the table.

There are 2 basic types of projections that can be built based on a table.

If you have frequently run queries, you can build projections specific to those queries. Vertica will automatically choose the projections that best fits the query. The query performance is also increased because the data in each projection is automatically sorted.

But what if you need to run an adhoc query?

For each table, Vertica will also create at least one Superprojections.

Superprojections contain all the columns in the table and each of them can sort the information differently based on the type of adhoc queries that you expect.

PROJECTION SEGMENTATION

Vertica is installed in a clustered machine called nodes. The physical database is distributed across these nodes and all of them participate in the processing of information.

If you have a large FACTS table, that is a table that contains many rows, Vertica will segment that data into related projections and distribute those segments across all the nodes in the database. This way, the processing load of a query request is distributed and the performance of the query increases.

However, if you have small dimension tables, it may not pay to segment that data across the nodes.

In that case, Vertica will simply make a copy of that data that is replicated on each of the nodes.

QUERY JOINS AND PROJECTIONS

Queries made across multiple tables is where the power of projection segmentation and replication is best demonstrated. In this case, we want to join information between two tables. Each node working independently, runs the query on the data located on that node because the node contains only a segment of the data from table A, the query is processed much more quickly.

The results from each node are then aggregated and the results will be returned to the requestor.

Cheers!