Thursday, 7 February 2013

(5 + 1) Steps to Resolve Network Related Instance Error


A thought to blog inspired from spending ample of time resolving a very common error in SQL Server while connecting to a instance i.e. "A network-related or instance-specific error occurred while establishing connection to SQL Server. The server was not found or was not accessible …". Below are the steps to follow to resolve the error.

1.    Check Services: "SQL Server" and "SQL Server Browser" is up and running. We can check this by going to "All Programs à SQL Server à Configuration Tools à SQL Server Configuration Manager" OR search "SQL Server Con" in windows search.

2.    Check Client Protocols: Now check for the native client protocols e.g. TCP, Named Pipe etc. is enabled. Most of the times we'll get this information in error message towards end of it. Enable the provider if it is not done. Check image below.

3.    Check Server Protocols: Check the "SQL Server Network Configuration" and make sure that respective client protocol is enabled here as well. Check the image below.

4.    Know ports and open them: Now check the properties of the protocol by right clicking and selecting properties or double clicking on it. Select "IP Address” tab and go till end of the window till "IP All" comes as in image below. Check for the port where SQL server is listening in this case it is 3680 (not 1433 which is default) and it is dynamic. We can specify static as well in "TCP Port" section. Now we need to open the same port in this case it is 3680 in the local machine by creating inbound rule (or exception) in "Windows Firewall". We may also need to open the port for "SQL Server Browser Service" as well.

5.    Check Remote Connection: Make sure that SQL Server is enabled for remote connection by running below T-SQL command or from the below screen by “Right Click on the Server from SSMS à Select Connections”.
SP_CONFIGURE 'remote access', 1
RECONFIGURE WITH OVERRIDE
GO


6.    Force Protocol and Port: While connecting from SSMS we can also provide port and protocol to connect.
Protocol can also be forced by prefixing protocol name e.g. "tcp:ServerName, PortNumber" TCP/IP or "ip:ServerName, PortNumber" for named pipe etc. Same can be achieved from "Options >>" option as well.

Friday, 1 February 2013

SQL Server Database Showing in Recovery


 It was 7 O’Clock in the evening and I was about to step out of the office but found one of the colleagues looking disturbed and struggling and on asking came to know that the database of the production application is showing in recovery and recovering since almost 4-5 hours. This happened after some new code deployment in the production server. The best part was that they were having backups till the time of failure and application can recover just by restoring the backups. So we thought of a quick action plan and executed as below.

·         Check what is blocking the database recovery which can be done by inspecting below query.
SELECT DB_NAME(dbid),* FROM dbo.sysprocesses WHERE spid > 50 and spid <> @@SPID
·         Try to kill the blocking SPIDs but got the error message “Only user processes can be killed”.
·         Try to delete the database by putting it to single user mode but the execution was just going on and on inspecting the query in first step found that drop database was waiting on recovery to finish.
·         Try to make the database offline but it was also waiting on recovery to finish.
·         Lastly we decided to
a.    Stop the server because anyway the application was already down.
b.    Then we cut and pasted the ldf, mdf and ndf files to another folder (Actually cut-paste is very fast option as it just changes the pointer in the disk).
c.    Start the SQL Server service again the recovery mode was gone.
·         Delete the database and restored the latest backups.

The query to find the file path of a database is as below.
SELECT D.dbid, D.name, F.physical_name
FROM sys.master_files F
JOIN dbo.sysdatabases D on F.database_id = D.dbid

This may not be a perfect way to deal with the situation but it worked  very quick and nicely because recovery is done through files and when there are no files no recovery can be done. Moreover we were having liberty to restart the SQL Server service as well which others may not have on these scenarios.

Wednesday, 23 January 2013

AlwaysOn Scenarios - Impacts on Application Architecture


With SQL Server 2012 AlwaysOn is centralized and better method to provide HADR (High Availability Disaster Recovery) compared to older versions. AlwaysOn added with read-only secondary replicas provide handy alternatives while defining database, data warehouse and application architectures along with hardware sizing considerations. It perfectly suits the middle and small scale business by providing HADR and significant load balancing without denting costs on hardware(s) & software(s).

As a part of prerequisite it should have read-only routing enabled and client should use read-intend connection request. Enabling and setting up read-only routing is explained here in detail and T-SQL approach for the same can be found here.

The figure-1 shows some of the scenarios where active secondary provides handy architectural alternatives along with HADR. These scenarios are discussed below in detail. Resource governor can be combined wisely with the AlwaysOn availability groups to provide even more robust and responsive system(s). Connection string received by the availability group listener is solely responsible for diverting request to secondary so additional AlwaysOn connection property “ApplicationIntent=ReadOnly” should be added on the connection string for each request.

Figure-1


Real Time Reporting
This is one of the frequently asked requirements from the users. Most of the times the reports come from multiple databases because one database is suffices one single purpose. Using AlwaysOn feature multiple DBs can be maintained with single availability group reducing the overhead of setup and maintenance. With a simple change in the connection string i.e. adding AlwaysOn connection property “ApplicationIntent=ReadOnly” of the report application the requests will be routed to the secondary replicas. This will off-load the primary servers resulting fast response time for Application and Report Users with better resources utilization.

Database Backups
Taking backups is one of the important and daily tasks. Executing this task in peak hours can impact the user experience. It becomes even more difficult if the application is having users around the globe converting every minute to peak hour. This scenario is detailed my Microsoft in this link.

Data Distribution Replicas for ETL
Analysis is a critical for any of the business today. Most of the times analysis results in data warehouses (DW) and ETLs. ETLs most of the times put lot of load to the main application(s) resulting performance degrade and performance escalations. We can configure ETL connection strings to use read intend connection resulting queries execution on one of the secondary replica.

SSAS Cube Processing
SSAS cubes are active part of most of the enterprise applications. SSAS cube processing is very resource intensive task. SSAS generates queries at runtime based on the cube design and fires them on the respective supporting database. We can embed the AlwaysOn connection property to the SSAS data source and the application machines are off loaded. Taking a data warehouse scenario where we have several cubes and user queries as well. The read-only secondary replicas provide handy and robust method to offload the core data warehouse along with the HADR capabilities.

Conclusion
AlwaysOn availability groups provide many handy solutions but the secondary replica routing is very highly dependent on routing tables and it should be considered with deep thoughts while defining the solution. Although the feature provides very fast synchronization of data but in true sense it can never be truly real time; the time spend to move the logs from one replica to another also depends on the networking and hardware resources.

To have better insight of AlwaysOn below links are just a click away.
Enjoy AlwaysOn …

Tuesday, 14 August 2012

Top 5 Considerations to Snowflake a Dimension

Schema designing for the warehouse is a very crucial task for an architect and data modeler. There are two well established design patterns for warehouse schema. First one is the Star Schema and other one is the Snowflake Schema. Both have their own advantages, disadvantages and design life cycle.

This paper highlights top 5 considerations while designing a snow flake dimension.

1.       Different Granularity of Facts is one of the primary reasons for snow flaking. When a dimension cuts two different fact tables in different level of hierarchies or two different grains that can be considered as must snow flake scenario. For instance let us take a dimension DimCustomer containing personal details and the address which is geography information like city, state country etc. and there is another dimension DimStore containing basic vendor information along with geographical details. Let us assume one transactional fact FactSales which can be analyzed at Store, Geography and Customer level. So here the fact table contains 2 different grains of customer i.e. Customer and Geography and two different grains from Store i.e. Store and Geography. Geography being common should be snow flaked from both the dimensions.

2.       Sparsity in the Dimension mostly is the data modelers view point. Many times we have seen that the developer merge these type of snow flaked dimensions into primary dimensions due to the data availability or code management at later point in project lifecycle. Let us consider a dimension DimProduct which contains large number of fields (columns) and most of the values are optional hence they are either populated to Unknown values or kept null. So these type of sparse fields are collected together to create another snow flaked dimensions. The result is simple that the primary dimension contains soiled values and very less number of unknowns.

3.       Monster Dimensions are an interest point for architects/developers struggling to get good performance of the queries. Here we consider the number of records for the snow flaking. If the number of records are enormous than we divide them into snow flake. Another reason for monster dimensions can be fast changing dimensions and can be solved by mini dimensions but this is not the scope of current document. Let us take example of a date dimension i.e. DimDate containing information till seconds. Here if we store data for 20 years the number of records in the date dimension will be around 630,720,000(20*365*24*60*60) which is large and difficult to manage. So it can be snow flake as DimDate containing data till date i.e. 7300(365*20) and DimTime with 86400 (24*60*60) records. Most of the times data modelers do not prefer to have snow flaking due to monster dimensions but practically it is very useful and manageable.

4.       Reporting Needs considerations are to ease the complexity of queries attracting interest of architects/developers. Let us understand this with an example. The requirement is to have a report which will have Product Name and Product Category as the user selection list and list has to show all the categories. Here if we will de-normalize the structure to have one DimProduct than it will only have the Categories for which we have the products. This leads to the requirement inconsistency and again snow flake design serves as rescue boat. This point is mostly requirement driven and to ease some of the queries. Although from the data modeler and the higher level architect viewpoints do not support this design many times but it really is very handy.

5.       Technical Considerations/Constraints is part of the data modeler and the architect viewpoint but ironically it is most unseen part of it. Every database has some limits for storing data e.g. SQL Server have a limit of 8060 bytes per row or 1024 columns per non wide table. If we have a dimension violating this kind of limits then we have to do the vertical partitioning and snowflake can be one of the solution. We can achieve the same by dividing into two primary dimensions but the better approach will be to snow flake. This snow flaking need is bit different because both primary and secondary dimension will be at same level and will be serving the same grains.

Most of the times we have seen hybrid approach of the schema where some of the dimensions will be snow flaked. Choosing schema and driving solution architecture contains many other considerations but the target should be towards star for two primary reasons one - fast query results and two – better usability for the business users.

Monday, 13 August 2012

SSAS Role Members Limit


Working with SMO sometimes lefts you amazed and adding role members is one of them. SSAS having a limit of adding members to a role and that is 1820. On adding next member throws much wearied error than expected as below.

Microsoft.AnalysisServices.OperationException: The following system error occurred:  The parameter is incorrect.

   at Microsoft.AnalysisServices.AnalysisServicesClient.SendExecuteAndReadResponse(ImpactDetailCollection impacts, Boolean expectEmptyResults, Boolean throwIfError)
   at Microsoft.AnalysisServices.AnalysisServicesClient.Alter(IMajorObject obj, ObjectExpansion expansion, ImpactDetailCollection impact, Boolean allowCreate, XmlaWarningCollection warnings)
   at Microsoft.AnalysisServices.Server.Update(IMajorObject obj, UpdateOptions options, UpdateMode mode, XmlaWarningCollection warnings, ImpactDetailCollection impactResult)
   at Microsoft.AnalysisServices.Server.SendUpdate(IMajorObject obj, UpdateOptions options, UpdateMode mode, XmlaWarningCollection warnings, ImpactDetailCollection impactResult)
   at Microsoft.AnalysisServices.MajorObject.Update(UpdateOptions options, UpdateMode mode, XmlaWarningCollection warnings)
   at Microsoft.AnalysisServices.MajorObject.Update()
   at CheckRoleCreation.Program.AddMembersToRole(Database db, Int32 num, Role role)



On investigating more we found that –

1.       There is limit to the members to the role and it is 1820. We can repro the same with the below code segment.

void AddMembers(A.Database db)
{
    foreach (A.Role r in db.Roles)
        AddMembersToRole(db, 1821, r);
}
void AddMembersToRole(A.Database db, int num, A.Role role)
{
    if(null != role)
    {
        //remove all members from role
        for (int count = (role.Members.Count-1); count >= 0; count--)
            role.Members.RemoveAt(count);
        role.Update();

        //add members to the role
        int i = MemberCounter;
        for (int count = MemberCounter; count < (num + i); count++, MemberCounter++)
        {
            AddMembersToLocalComp(String.Format(RoleMemberName, String.Empty, MemberCounter), false);
            role.Members.Add(new A.RoleMember(String.Format(RoleMemberName, Environment.MachineName + @"\", MemberCounter)));
        }
        role.Update();
    }
    Console.WriteLine("Total Role Members in role {0} is: {1}", role.Name, role.Members.Count);
}
void AddMembersToLocalComp(string memberName, bool printMemberInfo)
{
    string path = "WinNT://" + Environment.MachineName;
    string schema = "User";
    DS.DirectoryEntry localMc = new DS.DirectoryEntry(path);
    try
    {
        DS.DirectoryEntry newUser = localMc.Children.Add(memberName, schema);
        newUser.CommitChanges();
    }
    catch (System.Runtime.InteropServices.COMException e)
    {
        if(e.ErrorCode != -2147022672) //mask new user already exists error
            throw e;
    }
    if (printMemberInfo)
        foreach (DS.DirectoryEntry e in localMc.Children)
            if(e.SchemaClassName.Equals(schema)) Console.WriteLine("MemberName: {0}", e.Name);
}



2.       We can have more than 1820 roles. The limit is only on the role members (individual users or groups).

3.       We can have more than 1820 members but scattered in various roles.

4.       The workaround for the same are many but one of them can be adding one more role with the same type of permissions and have a count of users before adding more than 1820 members. These situations come only when we work more with the automated scripts.

Monday, 23 July 2012

Analysis of Business Workflows & Pipelines

Systematic expansion of the businesses across geographies brings in the simple and complex business processes. These processes combine numerous workflows & pipelines representing life cycle of the business events. Many of the data warehouse (DW) sponsors deprioritize the analytical needs of workflows & pipelines which ironically are very rich source of numerous analytical reports which can act as game changer for that particular business.


Let us start with an example of business workflow for online retail store. The company named“eBayz” sells commodities online. Customers can log into the website and place orders. For out of stock items the purchase order will be raised to the respective supplier and the item will be shipped to the warehouse and then will follow the in-stock workflow. In-stock products will be dispatched from warehouse to local area store and from there it will be delivered to the customers. Payment can be done while placing order or at thetime of delivery.


With the existing analytics business users found that the customer complaints have increased related to delivery issues. The “Customer Complaints” analytics shows that the trend graph is touching all time high peaks for last three months and the reaction captured were...


 Alas! This f**k is going to suck our business. What to do now?
What the problem is?
Where the problem is?
How do we know about the problem?

These are the business questions which can’t be answered from the regular analytics. Here the accumulated snapshot facts serve as rescue boat. The whole lifecycle/pipeline/workflow of the order is captured in accumulated snapshot fact table. All the events have the date and each event date is captured as a column. For our example the fact table design may look like below to capture complete pipeline.




What derived the cost saving for shipping? Is it the low in-stock commodities?
How many new orders and how many undelivered orders do we have on daily basis?


Interestingly the time lags are also the part of the fact table. With the design we can find the answers to the above business questions. Very easily we can see where the lag is more and that can be the problem area. To answer above questions it was found that for the same time period the trend chart for the “ArrivedToStoreLag” is also touching peaks implying that the problem area was late product shipping to local store. On investigation it was found that to save the shipping cost the product was kept on hold for more number of days which resulted peaks in “Customer Complaints” analytics. This answers all the related questions and raising some new question like…


These new business questions are also supported by the current design. The new orders can be directly queried joining DimDate and FactOnlineOrders entities whereas undelivered orders are really a daily snapshot calculated based on more than one role of the date dimension. Here undelivered orders for 4-June-2012 are the records having “DeliveryDate” greater than and “OrderPlacedDateId” less than 4-June-2012 including 2 different roles of date.

Another question from the DW designer prospect is like “Is this the only design supporting the above features and answering all the business questions in most performing manner?” The answer lies on individual’s assessment of the problem statements being targeted from the analytics. Here is one more design approach to capture pipeline in transactional manner i.e. store lifecycle in multiple rows rather than putting them together as a single row. Obviously it has its own pros and cons and they are discussed below with the changed fact and introducing new dimension design.




The latter design moves the multiple columns into the rows i.e. fact table is transformed from accumulated snapshot grain to transactional grain. Event grouping helps us flexibility to have more than one lifecycle in same dimension without making it junk.The new design answers the questions asked in the initial state but restrictively. In the former design we can keep multiple time lags easily like we have individual lags and overall lag in the process which can’t be done with the latter design approach. Moreover it is also difficult to get other advanced answers like count of undelivered orders, orders waiting at warehouse, orders waiting at store on daily basis etc. which we were getting easily from the former approach. One of the designs to answer these questions is creating one table for each question on the daily basis but it need very strong archive and purge policies or the data will grow exponentially.


With the above discussion the accumulated snapshot provides us the flexible and preferred design. Kimball group advocates the analytical need of workflows& pipelines using accumulated snapshot. Detailed behavior of the accumulated snapshot facts are detailed in “Design Tip #37, #130 & #145” at Kimball group. Microsoft’s SSAS very nicely supports the workflows & pipelines scenarios with accumulated snapshot facts putting some tricks in place particularly with role playing dimensions. The blog “Counting 'in-flight' events in MDX” from Mosha talks in detail to handle these scenarios in cube and MDX.

Friday, 20 July 2012

Handling Multiple Fiscal Calendars

As the enterprises expending their horizons the enterprise data warehouse is facing multiple challenges. Date and time is one of them. It is amazingly interesting and challenging to work with date dimension in enterprise scenario. The same obvious challenge came to us while developing data warehouse changes to one of our customer.

The blog discusses the solution approaches of respective problem. Putting multiple hierarchies is just like giving different name to the date and can be done easily as below. The below solution didn't deals with the scenario having fiscal year starting from middle of the month e.g. fiscal year 2012 starting from 15-April-2011. We assume that the fiscal year starts from 1st of any of the month e.g. fiscal year 2012 starts from 01-April-2011.

Approach 1: There are several solutions to the problem and one of the most common solutions is to add multiple date hierarchies to the same date dimension one for each consumer's fiscal year. The diagram 1 visualizes the data model for the same.


Approach 2: The approach works nice if we have less number of consumers to the data warehouse having different fiscal year. Adding hundreds of consumers with hundreds of fiscal year will ruin the above approach. Small modifications to the above approach helped to maintain all the fiscal years for any number of consumers. Understanding that all of the hundreds of fiscal year will always converse to twelve months the date dimension is modified to include all of the twelve hierarchies. The diagram 2 visualizes the data model for the same.

The consumer can consume the respective hierarchy based upon his fiscal year. E.g. if the fiscal year for the 'Consumer-1' starts from 01-July then they will be using the JulFiscalYear, JulFiscalQuarter, JulFiscalMonth etc. It can be extended for week and other fields but we just have to make sure that the size of the row doesn't extend the maximum size of the row.