VERTICA – Introduction to Projections

To process a simple query (shown below), a row-store database must read all columns in all of the table named in the query regardless of how wide the tables might be or how many columns are actually needed. This limits the speed at which can get an answer to your query. This is even more of a concern as your query get more complex referencing multiple tables to return a result.

Vertica database transforms the information from a table into a column-based structure we call projections.

In this query, we are only referencing 3 of the columns from the table (symbol, date, price).

These are the only columns that need to be contained in the projection. You will get a faster query response by not having to reference all the information contained in the table.

There are 2 basic types of projections that can be built based on a table.

If you have frequently run queries, you can build projections specific to those queries. Vertica will automatically choose the projections that best fits the query. The query performance is also increased because the data in each projection is automatically sorted.

But what if you need to run an adhoc query?

For each table, Vertica will also create at least one Superprojections.

Superprojections contain all the columns in the table and each of them can sort the information differently based on the type of adhoc queries that you expect.

PROJECTION SEGMENTATION

Vertica is installed in a clustered machine called nodes. The physical database is distributed across these nodes and all of them participate in the processing of information.

If you have a large FACTS table, that is a table that contains many rows, Vertica will segment that data into related projections and distribute those segments across all the nodes in the database. This way, the processing load of a query request is distributed and the performance of the query increases.

However, if you have small dimension tables, it may not pay to segment that data across the nodes.

In that case, Vertica will simply make a copy of that data that is replicated on each of the nodes.

QUERY JOINS AND PROJECTIONS

Queries made across multiple tables is where the power of projection segmentation and replication is best demonstrated. In this case, we want to join information between two tables. Each node working independently, runs the query on the data located on that node because the node contains only a segment of the data from table A, the query is processed much more quickly.

The results from each node are then aggregated and the results will be returned to the requestor.

Cheers!

Knowledge worth sharing...Share on linkedin
Linkedin
Share on facebook
Facebook
Share on google
Google
Share on twitter
Twitter