Monday, 28 July 2014

Verifying Credit Card Numbers

Recently we came across an interesting problem of credit card theft and as part of the investigation we found tons of digits which matches the pattern of the credit card. After finding that bunch of data it was difficult to report the correct numbers. Any random number of 16 digits or 15 digits and so on can be either a credit card or some identifier or some dummy number into the system. The problem was really interesting.

After some research on web we found that most of the credit cards companies rather than generating any random number uses some algorithms to generate the number. Luhn test is one of the widely accepted algorithm to distinguish the credit card number. The below mentioned steps and SQL Code can be used to validate a credit card number. We are taking 49927398716 number as an example for the below steps.

Luhn’s Test:
  1. Get the credit card number. For our example we are assuming 49927398716 as the given number.
  2. Reverse the digits of the given number. For our example it will become like 61789372994.
  3. Take the sum of all the odd digits in the reversed number. Which is as 6 + 7 + 9 + 7 + 9 + 4 = 42.
  4. Take all the even digits. E.g. 1,  8,  3,  2,  9 and multiply each one of them with two. For our example the digits will become like 2, 16, 6, 4, 18.
  5. Wherever the number is greater than nine sum the digits of the number to make it a single digit number. The number s in step 4 will become like 2, 7, 6, 4, 9.
  6. Sum the numbers in the step 5. E.g. 2 + 7 + 6 + 4 + 9 = 28.
  7. Now sum the number in the step 3 and step 6. E.g. 28 + 42 = 70
  8. If the sum ends in zero then it can be a valid credit card number. In other words if the remainder of the Step 7 divided by 10 is zero than it can be a valid credit card number. So in our case 70%10 = 0 and hence it can be a valid credit card number.


Sample SQL Code for Luhn’s Test:

Here we have created a T-SQL function to verify the above test but it can be implemented in any other manner as well.


DECLARE
    @cardNo NVARCHAR(20) = '49927398716'
    ,@reverseCardNo NVARCHAR(20)
    ,@sum SMALLINT
    ,@number SMALLINT
    ,@counter SMALLINT

SELECT
    @cardNo = LTRIM(RTRIM(@cardNo))
    ,@reverseCardNo = REVERSE(@cardNo)
    ,@sum = 0
    ,@counter = 1

WHILE(@counter <= LEN(@cardNo))
BEGIN
    SELECT
        @number = TRY_CONVERT(SMALLINT, LEFT(@reverseCardNo, 1))

    IF(@counter%2 = 0) --EVEN NUMBER
    BEGIN
        SELECT
            @number *= 2
        IF (LEN(@number) > 1)
        BEGIN
            SELECT
                @number = TRY_CONVERT(SMALLINT, LEFT(@number, 1)) + TRY_CONVERT(SMALLINT, RIGHT(@number, 1))
        END
    END
   
    SELECT
        @sum += @number
        ,@counter += 1
        ,@reverseCardNo = RIGHT(@reverseCardNo, LEN(@reverseCardNo)-1)
END

IF(@sum > 0 AND @sum%10 = 0)
    PRINT CONCAT('Test passed. ', @cardNo, ' can be a potential card number.')
ELSE
    PRINT CONCAT('Test failed. ', @cardNo, ' invalid card number.')

Wednesday, 29 May 2013

Considerations – Loading Oracle Data Using SSIS

With the changing dynamics of technologies extraction data from the heterogeneous databases is enterprise need. Microsoft SSIS serves as one of the good ETL tools to extract data from multiple and heterogeneous sources. Below are some of the considerations and best practices to be taken care during the pull.


Pulling Data from Oracle

There are several methods to pull data from Oracle. This link provides the detailed documentation for creating ETL and planning SSIS solution using Oracle OLE DB Provider. Below are some more points of consideration while pulling data from Oracle.


Choosing Provider

Choosing a provider can drastically change the speed of the data pull by multiple folds hence it becomes one of the most critical and the important part while planning ETL. Currently below are the options which can be used.

  1. The Microsoft OLE DB provider for Oracle. It is available for 32 bit version and is depreciated.
  2. The Oracle OLE DB provider and can be downloaded from here. This is widely used and is very stable in terms of performance and maturity and provides least amount of issues.
  3. The fastest providers are shipped as Microsoft connectors by Attunity for Oracle and it supports both 32 and 64 bit versions. SQL 2012 can be downloaded from here. These providers tend to be very fast and are almost 20 percent faster than Oracle Providers.

In-Memory Transformations & Minimized/Required Column Approach

SSIS is in-memory pipeline. It works very fast for in-memory operations and if the memory is full then the transformations will spill into the disk and performance will start degrading. Below are the methods which can be considered for the same.

  1. Ensure that only required and minimized columns are being pulled from the oracle source as views or use the sql command to get only required columns. Minimizing the columns will ensure the minimum data size per row.
  2. Prefer using the sql queries than table dropdown options. Using dropdown option uses different set of commands to prepare sql and then execute the command. Due to this sometimes we may have the SSIS hanged up whereas in management studio it is showing results in very quick time. Same behavior is very visible when we select the table from the drop down and it takes long time to show up tables.
  3. The in-memory transformation can be checked by checking the “Buffers spooled” perfmon counter for SSIS. Here are the details about the performance counters for SSIS for more details. The initial value is of this counter is zero and more than zero indicates that swapping is being carried out by the SSIS pipeline.

Data Type Transformations

Using the efficient data type always ensures the best performance and the minimum resources. Considering an example of a big integer value like ‘19876234987’ can be stored as integer as well as characters. Numbers uses different algorithms than the characters to index, compare etc. So in this case characters will give slow performance as well as more space to store data, more memory space while doing ETL (RAM). Best practice is to use the optimized length for each data type and make it as narrow as possible. For the money part Money data type is best and it is fast. Numeric and float have the larger precession which leads to the performance issues.
Below is the default mapping of the data types from Oracle to SSIS.


Oracle Data Type
Integration Services Data Type
VARCHAR2
DT_STR
NVARCHAR2
DT_WSTR
NUMBER
DT_R81
NUMBER(P, S)
When the scale is 0, according to the precision (P) DT_I1, DT_I2, DT_I4, DT_NUMERIC(P, 0)

DT_NUMERIC(P, S)
LONG
Not supported
DATE
DT_DBTIMESTAMP
RAW
DT_BYTES
LONG RAW
Not supported
ROWID
Not supported
CHAR
DT_STR
User-defined type (object type, VARRAY, Nested Table)
Not supported
REF
Not supported
CLOB, NCLOB, BLOB, BFILE
Not supported
TIMESTAMP TIMESTAMP WITH TIME ZONE INTERVAL YEAR TO MONTH INTERVAL DAY TO SECOND TIMESTAMP WITH LOCAL TIME ZONE
DT_STR
UROWID
Not supported

We can use the data conversion task to convert the data types of different types and the package might look like below image. Data type cast is important but too much of casting degrades the performance of the pull as well. Staging tables here plays vital role for these type of issues.

Using Bulk Operations and Use Minimize Logging

While inserting data in the SQL Server, minimized logged operations provide the maximum performance. Below are some of the options which can be used.
  1. In the data flow task use the fast load options to put data in the target tables.
  2. For fast ETL the foreign keys can be removed to perform insert/update in parallel. If it is needed badly best practice is to drop all the foreign keys and then create it again once data pull is complete. Even if large data is getting changed in the target table dropping primary keys and other indexes is also a part of best practices.
  3. Ensure using the TABLOCK if target table is not in frequent use and disabling check constraint.
  4. Ensure describing values for “Maximum insert commit size” to proper to ensure batch wise inserts.
  5. Prefer using NOLOCK for lookup and other tasks to minimize locking impacts.
  6. Uses of SQL Server partitioning and the SWITCH statement increase the performance by several folds while working with large data.
  7. Minimize the SSIS blocking and slow performing operations e.g. sort, aggregations like SUM, MIN, MAX etc.

Revisit ETL Design

Every ETL tool has its own pros and cons. To get the optimized performance out of SSIS below are some of the design considerations which can be taken into account. Most of them are for large set of data.
  1. Redesign the organization of data using staging tables in such a way that TRUNCATE is being used than DELETE.
  2. To UPDATE large data from data flow tasks prefer using staging tables and run the UPDATE statement from SQL Server than using row by row OLEDB calls but for small dimensions it is fine using row by row calls.
  3. Set based operations and transformations provide best performance in the SQL Server as it is designed for the set based operations. All row by row transformations are best for the SSIS.
  4. Date and time conversions tend to be fast in SQL Server than SSIS so it is better to change date and time in the SQL Server.
  5. Delta data load should be evaluated well before implementation. If a large amount of data is getting changed then full load gives better performance than the delta detection logic.
  6. The ideal pull from the heterogeneous pull looks like below. Different parts of the ETL can be removed and added as per the need.

Thursday, 7 February 2013

(5 + 1) Steps to Resolve Network Related Instance Error


A thought to blog inspired from spending ample of time resolving a very common error in SQL Server while connecting to a instance i.e. "A network-related or instance-specific error occurred while establishing connection to SQL Server. The server was not found or was not accessible …". Below are the steps to follow to resolve the error.

1.    Check Services: "SQL Server" and "SQL Server Browser" is up and running. We can check this by going to "All Programs à SQL Server à Configuration Tools à SQL Server Configuration Manager" OR search "SQL Server Con" in windows search.

2.    Check Client Protocols: Now check for the native client protocols e.g. TCP, Named Pipe etc. is enabled. Most of the times we'll get this information in error message towards end of it. Enable the provider if it is not done. Check image below.

3.    Check Server Protocols: Check the "SQL Server Network Configuration" and make sure that respective client protocol is enabled here as well. Check the image below.

4.    Know ports and open them: Now check the properties of the protocol by right clicking and selecting properties or double clicking on it. Select "IP Address” tab and go till end of the window till "IP All" comes as in image below. Check for the port where SQL server is listening in this case it is 3680 (not 1433 which is default) and it is dynamic. We can specify static as well in "TCP Port" section. Now we need to open the same port in this case it is 3680 in the local machine by creating inbound rule (or exception) in "Windows Firewall". We may also need to open the port for "SQL Server Browser Service" as well.

5.    Check Remote Connection: Make sure that SQL Server is enabled for remote connection by running below T-SQL command or from the below screen by “Right Click on the Server from SSMS à Select Connections”.
SP_CONFIGURE 'remote access', 1
RECONFIGURE WITH OVERRIDE
GO


6.    Force Protocol and Port: While connecting from SSMS we can also provide port and protocol to connect.
Protocol can also be forced by prefixing protocol name e.g. "tcp:ServerName, PortNumber" TCP/IP or "ip:ServerName, PortNumber" for named pipe etc. Same can be achieved from "Options >>" option as well.

Friday, 1 February 2013

SQL Server Database Showing in Recovery


 It was 7 O’Clock in the evening and I was about to step out of the office but found one of the colleagues looking disturbed and struggling and on asking came to know that the database of the production application is showing in recovery and recovering since almost 4-5 hours. This happened after some new code deployment in the production server. The best part was that they were having backups till the time of failure and application can recover just by restoring the backups. So we thought of a quick action plan and executed as below.

·         Check what is blocking the database recovery which can be done by inspecting below query.
SELECT DB_NAME(dbid),* FROM dbo.sysprocesses WHERE spid > 50 and spid <> @@SPID
·         Try to kill the blocking SPIDs but got the error message “Only user processes can be killed”.
·         Try to delete the database by putting it to single user mode but the execution was just going on and on inspecting the query in first step found that drop database was waiting on recovery to finish.
·         Try to make the database offline but it was also waiting on recovery to finish.
·         Lastly we decided to
a.    Stop the server because anyway the application was already down.
b.    Then we cut and pasted the ldf, mdf and ndf files to another folder (Actually cut-paste is very fast option as it just changes the pointer in the disk).
c.    Start the SQL Server service again the recovery mode was gone.
·         Delete the database and restored the latest backups.

The query to find the file path of a database is as below.
SELECT D.dbid, D.name, F.physical_name
FROM sys.master_files F
JOIN dbo.sysdatabases D on F.database_id = D.dbid

This may not be a perfect way to deal with the situation but it worked  very quick and nicely because recovery is done through files and when there are no files no recovery can be done. Moreover we were having liberty to restart the SQL Server service as well which others may not have on these scenarios.

Wednesday, 23 January 2013

AlwaysOn Scenarios - Impacts on Application Architecture


With SQL Server 2012 AlwaysOn is centralized and better method to provide HADR (High Availability Disaster Recovery) compared to older versions. AlwaysOn added with read-only secondary replicas provide handy alternatives while defining database, data warehouse and application architectures along with hardware sizing considerations. It perfectly suits the middle and small scale business by providing HADR and significant load balancing without denting costs on hardware(s) & software(s).

As a part of prerequisite it should have read-only routing enabled and client should use read-intend connection request. Enabling and setting up read-only routing is explained here in detail and T-SQL approach for the same can be found here.

The figure-1 shows some of the scenarios where active secondary provides handy architectural alternatives along with HADR. These scenarios are discussed below in detail. Resource governor can be combined wisely with the AlwaysOn availability groups to provide even more robust and responsive system(s). Connection string received by the availability group listener is solely responsible for diverting request to secondary so additional AlwaysOn connection property “ApplicationIntent=ReadOnly” should be added on the connection string for each request.

Figure-1


Real Time Reporting
This is one of the frequently asked requirements from the users. Most of the times the reports come from multiple databases because one database is suffices one single purpose. Using AlwaysOn feature multiple DBs can be maintained with single availability group reducing the overhead of setup and maintenance. With a simple change in the connection string i.e. adding AlwaysOn connection property “ApplicationIntent=ReadOnly” of the report application the requests will be routed to the secondary replicas. This will off-load the primary servers resulting fast response time for Application and Report Users with better resources utilization.

Database Backups
Taking backups is one of the important and daily tasks. Executing this task in peak hours can impact the user experience. It becomes even more difficult if the application is having users around the globe converting every minute to peak hour. This scenario is detailed my Microsoft in this link.

Data Distribution Replicas for ETL
Analysis is a critical for any of the business today. Most of the times analysis results in data warehouses (DW) and ETLs. ETLs most of the times put lot of load to the main application(s) resulting performance degrade and performance escalations. We can configure ETL connection strings to use read intend connection resulting queries execution on one of the secondary replica.

SSAS Cube Processing
SSAS cubes are active part of most of the enterprise applications. SSAS cube processing is very resource intensive task. SSAS generates queries at runtime based on the cube design and fires them on the respective supporting database. We can embed the AlwaysOn connection property to the SSAS data source and the application machines are off loaded. Taking a data warehouse scenario where we have several cubes and user queries as well. The read-only secondary replicas provide handy and robust method to offload the core data warehouse along with the HADR capabilities.

Conclusion
AlwaysOn availability groups provide many handy solutions but the secondary replica routing is very highly dependent on routing tables and it should be considered with deep thoughts while defining the solution. Although the feature provides very fast synchronization of data but in true sense it can never be truly real time; the time spend to move the logs from one replica to another also depends on the networking and hardware resources.

To have better insight of AlwaysOn below links are just a click away.
Enjoy AlwaysOn …