SQL Server FileStream and Full Text Search–PART I

 

If your are planning to implement a custom document library application using SQL Server and .NET, full-text search comes handy to implement content search on your document. Storing your documents in SQL Server FileSteam gives you many advantages. You can implement Google-like search on your document  as well. In PART I of this post I tried to provide some insight into some of these features. In Part II, I will show you how to configure SQL Server to enable filestream and do full-text search and I will also include a sample .NET application.

SQL Server 2008 provides the functionality for applications and users to issue full-text queries against character-based data in SQL Server tables. SQL Server supports full text indexing on columns with any of the following data types: char, varchar, nchar, nvarchar, text, ntext, image, xml, varbinary, or varbinary(max). Full-text queries perform linguistic searches against text data in full-text indexes by operating on words and phrases based on rules of a particular language such as English. Full-text queries can include simple words and phrases or multiple forms of a word or phrase.

Full-text search supports more than 50 languages, such as English, Spanish, Chinese, Japanese, Arabic and German. For each supported language, SQL Server provides language-specific linguistic components, including a word breaker and stemmer and an empty thesaurus file. For each full-text language, SQL Server also provides a file in which you can optionally define language-specific synonyms to extend the scope of search queries (a thesaurus file).

For writing full-text queries, SQL Server provides a set of full-text predicates (CONTAINS and FREETEXT) and rowset-valued functions (CONTAINSTABLE and FREETEXTTABLE). Using these, applications and users can perform a variety of types of full-text searches, such as searching on a single word or phrase (and optionally ranking the result set), searching on a word or phrase close to another word or phrase, or searching on synonymous forms of a specific word. 

 

SQL Server File Stream

FILESTREAM enables SQL Server-based applications to store unstructured data, such as documents and images, on the file system. Applications can leverage the rich streaming APIs and performance of the file system and at the same time maintain transactional consistency between the unstructured data and corresponding structured data. FILESTREAM integrates the SQL Server Database Engine with an NTFS file system by storing varbinary(max) binary large object (BLOB) data as files on the file system. Transact-SQL statements can insert, update, query, search, and back up FILESTREAM data. Win32 file system interfaces provide streaming access to the data. FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

FILESTREAM is not automatically enabled when you install or upgrade SQL Server. You must enable FILESTREAM by using SQL Server Configuration Manager and SQL Server Management Studio. To use FILESTREAM, you must create or modify a database to contain a special type of filegroup. Then, create or modify a table so that it contains a varbinary(max) column with the FILESTREAM attribute. After you complete these tasks, you can use Transact-SQL and Win32 to manage the FILESTREAM data.

When you use FILESTREAM to store binary large object (BLOB) data, you can use Win32 APIs to work with the files. To support working with FILESTREAM BLOB data in Win32 applications, SQL Server provides  couple of functions and some APIs.

PathName returns a path as a token to a BLOB. An application uses this token to obtain a Win32 handle and operate on BLOB data.

GET_FILESTREAM_TRANSACTION_CONTEXT() returns a token that represents the current transaction of a session. An application uses this token to bind FILESTREAM file system streaming operations to the transaction.

The OpenSqlFilestream API obtains a Win32 file handle. The application uses the handle to stream the FILESTREAM data.

All FILESTREAM data container access is performed in a SQL Server transaction. Transact-SQL statements can be executed in the same transaction to maintain consistency between SQL data and FILESTREAM data.

SQL Server Full-Text Engine

The SQL Server Full-Text Engine is a full-text indexing and search engine. In SQL Server 2008, the Full-Text Engine has been fully integrated into the Database Engine. The Full-Text Engine now resides in the SQL Server process rather than residing in a separate process like in the previous versions of SQL Servers. Integrating the Full-Text Engine into the Database Engine has improved full-text manageability, optimization of mixed query, and overall performance. For each instance of SQL Server, there is a dedicated instance of the Full-Text Engine, including dedicated components such as word breakers and filters, resources such as memory, and configuration such as service-level settings at the instance level.

image

 
SQL Server Process

Full-text search uses the following components of the SQL Server process:

User tables: These tables contain the data to be full-text indexed.

Crawl Thread (Full-text gatherer): The full-text gatherer works with the full-text crawl threads. It is responsible for scheduling and driving the population of full-text indexes, and also for monitoring full-text catalogs.

Thesaurus files: These files contain synonyms of search terms

Stoplist objects: Stoplist objects contain a list of common words that are not useful for the search.

SQL Server query processor: The query processor compiles and executes SQL queries. If a SQL query includes a full-text search query, the query is sent to the Full-Text Engine, both during compilation and during execution. The query result is matched against the full-text index.

Full-Text Engine: The Full-Text Engine in SQL Server is now fully integrated with the query processor. The Full-Text Engine compiles and executes full-text queries. As part of query execution, the Full-Text Engine might receive input from the thesaurus and stoplist. In SQL Server 2008 and later versions, the Full-Text Engine for SQL Server runs inside the SQL Server query processor.

Indexer: The index writer builds the structure that is used to store the indexed tokens.

Filter daemon manager: The filter daemon manager is responsible for monitoring the status of the Full-Text Engine filter daemon host.

 

Filter Daemon Host

The filter daemon host is a process that is started by the Full-Text Engine. It runs the following full-text search components, which are responsible for accessing, filtering, and word breaking data from tables, as well as for word breaking and stemming the query input:

Protocol handler: One of its responsibilities is to gather data from the columns being full-text indexed and pass it to the filter daemon host, which will apply filtering and word breaker as required.

Filters: Some data types require filtering before the data in a document can be full-text indexed, including data in varbinary, varbinary(max), image, or xml columns. The filter used for a given document depends on its document type. For example, different filters are used for Microsoft Word (.doc) documents, Microsoft Excel (.xls) documents, and XML (.xml) documents. Then the filter extracts chunks of text from the document, removing embedded formatting and retaining the text and, potentially, information about the position of the text. The result is a stream of textual information.

Word breakers and stemmers: A word breaker is a language-specific component that finds word boundaries based on the lexical rules of a given language (word breaking). Each word breaker is associated with a language-specific stemmer component that conjugates verbs and performs inflectional expansions. At indexing time, the filter daemon host uses a word breaker and stemmer to perform linguistic analysis on the textual data from a given table column. The language that is associated with a table column in the full-text index determines which word breaker and stemmer are used for indexing the column.

Full Text Catalog

The most commonly used indexes in a SQL Server database are clustered and non-clustered indexes that are organized in a B-tree structure. You can create these types of indexes on most columns in a table or a view, except those columns configured with large object (LOB) data types, such as text and varchar(max). Although this limitation is not a problem in many cases, there will be times when you’ll want to query such column types. However, without indexes defined on the columns, the query engine must perform a full table scan to locate the necessary data.

A full-text index is made up of word tokens that are derived from the text being indexed. For example, if the indexed text contains the phrase “tables can include indexes,” the full-text index would contain four tokens: “tables,” “can,” “include,” and “indexes.” Because the list of tokens can be easily searched, full-text queries can quickly locate the necessary records.

A full-text catalog provides a mechanism for organizing full-text indexes. Each catalog can contain zero or more indexes, but each index can be associated with only one catalog. Catalogs are implemented differently in SQL Server 2005 and 2008:

  • SQL Server 2005: A full-text catalog is a physical structure that must reside on the local hard drive associated with the SQL Server instance. Each catalog is part of a specific filegroup. If no filegroup is specified when the catalog is created, the default filegroup is used.
  • SQL Server 2008: A full-text catalog is a logical concept that refers to a group of full-text indexes. The catalog is not associated with a filegroup.
StopList

To prevent a full-text index from becoming bloated, SQL Server has a mechanism that discards commonly occurring strings that do not help the search. These discarded strings are called stopwords (aka noise words). During index creation, the Full-Text Engine omits stopwords from the full-text index. This means that full-text queries will not search on stopwords.

Search Filter

When a cell in a varbinary(max), or image column contains a document with a supported document-file extension, full-text search uses a filter to interpret the binary data. The filter, which implements the iFilter interface, extracts the textual information from the document and submits it for indexing. To identify the filters included in an instance of SQL Server, use the sp_help_fulltext_system_components (Transact-SQL) stored procedure, which returns information for the registered word-breakers, filter, and protocol handlers.

Many document types can be stored in a single varbinary(max), or image column. For each document, SQL Server chooses the correct filter based on the file extension. Because the file extension is not visible when the file is stored in a varbinary(max), or image column, the file extension must be stored in a separate column in the table, called a type column. This type column can be of any character-based data type and contains the document file extension, such as .doc for a Microsoft Word document.

When creating a full-text index on a varbinary(max), or image column you must identify a corresponding type column that has the extension information so that SQL Server knows which filter to use. The IDs of the full-text indexed column and its associated type column can be found using the sys.full-text_index_columns catalog view.

After the varbinary(max), or image column is full-text indexed, it can be queried using the search predicates CONTAINS and FREETEXT.

 

Thesaurus

Thesaurus provides the ability to define synonyms and acronyms and use the synonyms in full text searches. With SQL commands like CONTAINS, CONTAINSTABLE, the thesaurus is used to identify expressions or replacements for the searched terms. The thesaurus files are in XML format with a global file (tsGLOBAL.xml) and then 18 language specific files. By default all of the files have sample XML that is commented out. So by default no synonyms are setup when a full text search is issued.

The files are located in SQL_Server_install_path\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\FTDATA\.

The thesaurus file includes a <diacritics_sensitive> tag. This element contains an integer value that controls accent sensitivity. For example, suppose you specify the pattern “café” to be replaced by other patterns in a Full-Text Search query. If the thesaurus file is accent-insensitive, Full-Text Search replaces the patterns “café” and “cafe”. If the thesaurus file is accent-sensitive, Full-Text Search replaces only the pattern “café”. Note that this setting can only be applied one time in the file, and applies to all the search patterns in the file.

 

Conclusion

In this post, I tried to explain the filestream and full-text search feature in SQL Server. In SQL Sever 2008, there is great improvement in these features compared to older versions of SQL Server.  One of the limitations with full text search is lack of support for soundex search, but we can use Google spell checker or some of the third-party tools such as aspNetSpell or the RADSpell from telerik.

Stay tuned for the PART II .


Seven Habits Of Highly Effective People

 

0671708635.01._PE_PI_SCMZZZZZZZ_

    I. From Dependence to Independence

  • Be ProActive
  • Begin with End in Mind
  • Put First things First

II. From Independence to Interdependence

  • Think WinWn
  • Seek First to Understand, Then to Be Understood
  • SynerGize

III. Continuous Improvement

  • Sharpen the Saw

Anti Patterns

Most of you heard a lot about the software design patterns and most you have used it your day to day software design. The Gang of Four (GoF) patterns are generally considered the foundation for all other patterns. Design patterns are recurring solutions to software design problems in application development.  DoFactory is a good place to start exploring the design patterns, if you are new to patterns.

Ok, enough said about the patterns, here I want talk about the anti patterns. 

What is an anti pattern any way?

It is pattern that tells how to go from a problem to a bad solution to that problem. Anti-Patterns generally represent an effort to define and classify reoccurring non-solutions, i.e., things that lots of projects do that fail to yield a solution, or actually prevent a project from working or being finished. Anti patterns tells you why a  bad solution looks very attractive, and why it is bad and what good pattern are applicable in that case. Identifying bad practices can be as valuable as identifying good practices.

You can think of it as the  old adage, “If your only tool is a hammer, everything looks like a nail”.

Anti Pattern Categories

Architectural Anti Patterns

  • Over-Generalized Interfaces:  Architects often attempt to create systems with infinite flexibility, but only succeed in creating systems that are impossible to maintain. With interfaces,  architects fall in love with the xml interface. All communication between components is done through xml. This created a maintenance nightmare where there are no explicit contracts between interfaces.
  • Over buzzword compliance:  Architects love to find a way to shoehorn the latest buzzword into their architectures, even when there is no fit. Whatever the latest buzzword (xml, REST, WCF), architects will find a way to use it.
  • Product Architecture : Some Architects that are only familiar with a certain product or vendor will design the architecture around the products. The result is often just a marketing sheet, and helps very little to solving the problem at hand.
  • FUD Architecture : The fear of being wrong, or creating an architecture that will change later, results in an architecture that actually solves nothing.

Development Anti Patterns

There is several development anti patterns related to Developers, Scoping,  UI, Tools,  Software Strategy and Development process. You can find more details on these categories in Development Anti Pattern Road Map.  I will point out some of the interesting and most frequent ones here.

  • Control Freek: If a product has a custom language for interacting with the system or its database that doesn’t have equivalent APIs in a common programming language, it ‘will be’ costly to use, customize, and maintain. These specialized languages tend to create high priced experts. This steers business to the vendor’s own consulting services. Most commonly seen in big enterprise software “solutions”.
  • Job Keeper: An undocumented code written for the job security of the current programmer, who can allegedly fix it, usually with more DuctTape.
  • Programmers often miss the distinction between what’s needed and what’s “neat”.
  • ‘Thats Not Really An Issue’: Inexperienced team members running a project ignore concerns that they don’t understand.
  • Voodoo Chicken Coding is when something is not fully understood in a programming problem. This results, typically, in a lot of cut-and-pasting from other code that does something similar, but works; arbitrary reordering of statements; and other tricks.

Management Anti Patterns

  • Band Aid: There is a deep-rooted problem in a company/project that will cause the company/project to fail unless addressed.Unlike the human body, a company/project does NOT heal itself once a band-aid is put over a wound. This Anti Pattern is popular because it is usually quick and painless in the short term, but many long term consequences are ignored. Despite the apparent glossy exterior, a project may now be doomed to fail, because the root problems were not addressed completely or at all!
  • Blame Storming: A discussion – usually amongst adjacent peer-group levels, in which members attempt to assign blame for a particular botch. The  blame is spread thinly enough that not too much sticks to any one individual, and in particular, to no-one in the current group.
  • Scape Goat:  The troubles, even failure, of the project are highly visible. The project is late or over budget or in serious technical difficulty. People are angry and disgruntled and some are even leaving. Management is pressuring to rectify issues with the project. Management find a scapegoat to punish through demotion, re-scoping or removal of responsibilities, or banishment to some area of no value or importance, the ScapeGoat usually does get sacked.
  • Idiot Proof Process: A software team is not delivering.You are under pressure to change something, because the project is clearly broken. You spend weeks in process meetings arguing. In each meeting, you attempt to prevent all possible mistakes that a naive or malignant coder could make. Next, you then argue about whether a given mistake is really always a mistake, or is sometimes a good idea. You begin to doubt the competence of everybody in the meeting, including yourself. Lessons Leaned: You cannot develop an idiot-proof process. A good process depends on good developers to execute it.

See Management Anti Pattern Road Map to read more…

User Interface Anti Patterns

  • Use of rocket science to solve simple problems: Sometimes a designer constructs a solution that is much more complex and confusing than the prior solution. The anti-pattern is often seen when designers have explored new ways of doing things that are then forced upon simple problems. An example is use of drag-and-drop techniques for simple attribute setting. Does a shopping cart need drag-and-drop functionality and are you sure you need fade and fold effects to make your user interface better and more understandable?
  • Designing for the perfect use-case: Some say the user is dumb. But the user never stands a chance guessing your initial intentions. Design your interface defensively and provide feedback to unintended input. Think about how your user interface can and is interpreted by different users and remember to do user testing!
  • Bloated user interface: A bloated user interface tries to incorporate as many operations as can possibly fit into it with the end result of confusing more than helping the user to perform his or her task. A bloated user interface often include features that cannot be used on kinds of data the interface handles.

Organizational Anti Patterns

  • Fear Of Success: Past, often successful projects were threatened by visible problems; Managers implement number of disciplines to eliminate the problems – requirements, testing, architecture, management. Managers have seen those things work in the past. The current important project must be protected from any imaginable problem. No risk is too small to add a full-time worker to mitigate it. Management creates a defensive environment, analogous to large numbers of defenders on a soccer field trying to defend the goal and never attacking.
  • Untested but Finished:  The project is lagging in implementation, and the apparent cause is usually a programming bottleneck – too few or too slow programmers. Managers need to demonstrate progress or enlarge their span of influence; at worse, they want to be able to pretend to make progress. Managers hire outsiders to complete a  portion of the work. On delivery from the outsiders, dictate that the work is complete and only needs to be integrated by the main team. But the work that has supposedly been “finished” by outside forces is unfinished and  unusable.

For more information Organizational Anti Pattern.

Grey Patterns Anti Patterns

Generic Data Model: Generic Data Model is a pattern where all the attributes are separated from the object and saved in different table. It is a very flexible solution but in large databases the performance could be terrible.

Conclusion

This is just a scratch on the surface.  Anti Patterns are good tools to achieve better results in an IT project.  They are as important as the design patterns. Anti Patterns help to identify problems in a project and help to avoid implementing a bad solution to that problem.


WCF Compared to other distributed Technologies

WCF provides a single unified solution, compared to other Microsoft distributed technologies.


The five Dysfunctions of a Team

#1 Absence of Trust

The fear of being vulnerable with team members prevents the building of trust within the team.

#2 Fear of Conflict

The desire to preserve the artificial harmony stifles the occurrence of productive, ideological conflict.

#3 Lack of Commitment

The lack of clarity or buy-in prevent the team members from making decisions they will stick to.

#4 Avoidance of Accountability

The need to avoid interpersonal discomfort prevents team members from holding one another accountable for their behaviors and performance.

#5 Inattention to Results

The pursuit of individual goals and personal status erode the focus on collective success.

The Role of a Leader

  1. Go first!
  2. Mine for conflicts.
  3. Force Clarity and Closure
  4. Confront difficult issues.
  5. Focus on collective outcomes

From the book. “The five Dysfunctions of a Team” by Patrick Lencioni


ACID (Automicity, Consistency, Isolation and Durability)

ACID (an acronymn for Atomicity Consistency Isolation Durability) is a concept that Database Professionals generally look for when evaluating databases and application architectures. For a reliable database all this four attributes should be achieved. It is a set of properties that guarantee database transactions are processed reliably. In the context of database, transaction is a single logical operation the data.

Automicity

Atomicity means that all transactions must follow “all or nothing” rule. Each transaction is said to be automic. If one part of the transaction fails, the entire transaction fails.

For example, consider an ATM transaction where you are moving money from one account to another. There are two parts in this transaction, first you remove money from one account, then you add money to another account. If one of these two parts fail, The entire transaction is considered invalid, and the transaction must be rolled back to the state before the transaction started.

Consistency

This means that, the database will always be in a consistent state. Only valid data will be written to the database. If a transaction is executed that violate the consistency rules, the entire transaction will be rolled back. The database will be restored to a state consistent with those rules. If the transaction is successful, that will take the database to a new state which is also consistent with the database consistency rules.

For example, if a column is constrained to be NOT NULL and an application attempts to add a row with a NULL value in that column, the entire transaction must fail, and no part of the row may be added to the database.

In this example, if consistency were not upheld, the NULL value would initially still not be added as part of the row, but the remaining parts of the row would be added. However, since no value would be specified for the NOT NULL column, it would revert to NULL, anyway, and violate the rules of the database.

Isolation

Multiple transactions occurring at the same time should not impact each others execution. Isolation keep transaction separated from each other until they are finished.

Consider for example, you and your partner and trying to withdraw all your money from your joint account at the same time from different ATMs. The transaction isolation guarantees that these two transactions operate on he database in an isolated manner. The database performs your transaction first and then your partner’s or vice versa. This prevents your transaction from reading intermediate data produced as a side effect of part of your partners’s transaction that will not eventually be committed to the database. However, the isolation property does not ensure which transaction will execute first, merely that they will not interfere with each other.

Durability

This ensures that the transaction committed to the database will not be lost. Durability is ensured through the use of database backups and transaction logs that facilitate the restoration of committed transactions in spite of any subsequent software or hardware failures.

If you are buying something from an online shop, if he system show that your credit card is processed and the order is placed, then that order remain in place, even if he system crashes.

Conclusion

Many databases rely upon locking to provide ACID capabilities. Locking means that the transaction marks the data that it accesses so that the DBMS knows not to allow other transactions to modify it until the first transaction succeeds or fails. The lock must always be acquired before processing data, including data that are read but not modified. Non-trivial transactions typically require a large number of locks, resulting in substantial overhead as well as blocking other transactions.


Nested Loop, Hash and Merge Joins

When I was explaining some of my developers about the SQL Server Execution Plan, we came across the different types of joins SQL Server performs to optimize data retrieval. They are Nested Loop Join, Hash Join, Sort Merge Join. Interesting questions came up, what are the difference between them and how SQL Server determines which algorithm to use. So here is my take on explaining those questions.

 

Nested Loop Join (or NL Join)

A Nested Join compares each row from the Outer table to each row from the Inner table looking for the rows which satisfy the Join predicate.

For example, if you have statement like this one;

   SELECT A.*, B.* FROM  
   Customer A,  
   Sales B    
   WHERE A.CusomerId = B.CustomerId

It is interpreted as

For Each Row CR in Customer
    For Each Row SR in Sales
        IF CR join with SR Then
          Return (CR, SR)

The total number of rows compared and the cost of this algorithm is proportional to the size of the outer table multiplied by the size of the inner table. Since this cost grows quickly as the size of the input tables grow, in practice we try to minimize the cost by reducing the number of inner rows that we must consider for each outer row.

Lets take a look at the below SQL statement;

set statistics profile on;
select * from Sales.Store A 
Inner Join Sales.SalesPerson B on A.BusinessEntityID = B.BusinessEntityID 

You will see the execution plan as shown below.

SqlExecPlan

Rows    Executes    StmtText    StmtId    NodeId    Parent    PhysicalOp    LogicalOp    Argument    DefinedValues    EstimateRows    EstimateIO    EstimateCPU    AvgRowSize    TotalSubtreeCost    OutputList    Warnings    Type    Parallel    EstimateExecutions

2    1    select * from Sales.Store A  Inner Join Sales.SalesPerson B on A.BusinessEntityID = B.BusinessEntityID    1    1    0    NULL    NULL    NULL    NULL    1    NULL    NULL    NULL    0.00918446    NULL    NULL    SELECT    0    NULL

2    1      |--Nested Loops(Inner Join, OUTER REFERENCES:([B].[BusinessEntityID]))    1    2    1    Nested Loops    Inner Join    OUTER REFERENCES:([B].[BusinessEntityID])    NULL    1    0    7.106E-05    4176    0.00918446    [A].[BusinessEntityID], [A].[Name], [A].[SalesPersonID], [A].[Demographics], [A].[rowguid], [A].[ModifiedDate], [B].[BusinessEntityID], [B].[TerritoryID], [B].[SalesQuota], [B].[Bonus], [B].[CommissionPct], [B].[SalesYTD], [B].[SalesLastYear], [B].[rowguid], [B].[ModifiedDate]    NULL    PLAN_ROW    0    1

18    1           |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesPerson].[PK_SalesPerson_BusinessEntityID] AS [B]))    1    3    2    Clustered Index Scan    Clustered Index Scan    OBJECT:([AdventureWorks2008R2].[Sales].[SalesPerson].[PK_SalesPerson_BusinessEntityID] AS [B])    [B].[BusinessEntityID], [B].[TerritoryID], [B].[SalesQuota], [B].[Bonus], [B].[CommissionPct], [B].[SalesYTD], [B].[SalesLastYear], [B].[rowguid], [B].[ModifiedDate]    17    0.003125    0.0001757    76    0.0033007    [B].[BusinessEntityID], [B].[TerritoryID], [B].[SalesQuota], [B].[Bonus], [B].[CommissionPct], [B].[SalesYTD], [B].[SalesLastYear], [B].[rowguid], [B].[ModifiedDate]    NULL    PLAN_ROW    0    1

2    18           |--Clustered Index Seek(OBJECT:([AdventureWorks2008R2].[Sales].[Store].[PK_Store_BusinessEntityID] AS [A]), SEEK:([A].[BusinessEntityID]=[AdventureWorks2008R2].[Sales].[SalesPerson].[BusinessEntityID] as [B].[BusinessEntityID]) ORDERED FORWARD)    1    4    2    Clustered Index Seek    Clustered Index Seek    OBJECT:([AdventureWorks2008R2].[Sales].[Store].[PK_Store_BusinessEntityID] AS [A]), SEEK:([A].[BusinessEntityID]=[AdventureWorks2008R2].[Sales].[SalesPerson].[BusinessEntityID] as [B].[BusinessEntityID]) ORDERED FORWARD    [A].[BusinessEntityID], [A].[Name], [A].[SalesPersonID], [A].[Demographics], [A].[rowguid], [A].[ModifiedDate]    1    0.003125    0.0001581    4107    0.0058127    [A].[BusinessEntityID], [A].[Name], [A].[SalesPersonID], [A].[Demographics], [A].[rowguid], [A].[ModifiedDate]    NULL    PLAN_ROW    0    17

In this case the outer table is Sales.Store and the inner table is Sales.SalesPerson. As you can see there is a clustered index on column BusinessEntityID on Sales.SalesPerson table and Clustered Index on BusinessEntityID on Sales.Stores table. In this the SQL optimized determines that the optimal execution is to take SalesPerson as Outer table and Stores as inner table. Thus it performs an Clustered Index Scan on the Sales.SalesPerson table and Clustered Index Seek on the Stores table.

Notice that the index seek depends on A. BusinessEntityID which comes from the outer table of the join, which in this plan is SalesPerson. Each time we execute the index seek, A. BusinessEntityID has a different value. We refer to A. BusinessEntityID as a “correlated parameter”. If a nested loops join has correlated parameters, it is outputted in the plan as “OUTER REFERENCES.” This type of nested loops join where we have an index seek that depends on a correlated parameter is referred as an “index join.”

There are 3 variants of nested Join. In the simplest case, the search scans an entire table or index; this is called a naive nested loops join. If the search exploits an index, it is called an index nested loops join. If the index is built as part of the query plan (and destroyed upon completion of the query), it is called a temporary index nested loops join.

All these variants are considered by the query optimizer. A nested loops join is particularly effective if the outer input is quite small and the inner input is pre-indexed and quite large. In many small transactions, such as those affecting only a small set of rows, index nested loops joins are far superior to both merge joins and hash joins. In large queries, however, nested loops joins are often not the optimal choice.

 
Merge Join

The merge join requires that both inputs be sorted on the merge columns, which are defined by the equality (WHERE) clauses of the join predicate. The query optimizer typically scans an index, if one exists on the proper set of columns, or places a sort operator below the merge join. In rare cases, there may be multiple equality clauses, but the merge columns are taken from only some of the available equality clauses.

Because each input is sorted, the Merge Join operator gets a row from each input and compares them. For example, for inner join operations, the rows are returned if they are equal. If they are not equal, whichever row has the lower value is discarded and another row is obtained from that input. This process repeats until all rows have been processed.

The full operation is done as shown below.

Get First Row T1R from Table1
Get First Row T2R from Table2

While Not End of Table1 and Not End of Table2
  If T1R joins with T2R
      Get Next Row from Table2
      Returns (T1R, T2R)
  Else T1R < T2R
      Get NEXT Row from Table1
  Else
      Get Next Row from Table2
End Loop

the below statement in SQL Server Management Studio, and look at the execution plan.

set statistics profile on;
select * from Production.Product A
Inner Join Production.ProductInventory B On A.ProductID = B.ProductID

SqlExecPlan3

Rows    Executes    StmtText    StmtId    NodeId    Parent    PhysicalOp    LogicalOp    Argument    DefinedValues    EstimateRows    EstimateIO    EstimateCPU    AvgRowSize    TotalSubtreeCost    OutputList    Warnings    Type    Parallel    EstimateExecutions
1069 1 select * from Production.Product A Inner Join Production.ProductInventory B On A.ProductID = B.ProductID 1 1 0 NULL NULL NULL NULL 968.269 NULL NULL NULL 0.03063076 NULL NULL SELECT 0 NULL
1069 1 |--Merge Join(Inner Join, MERGE:([A].[ProductID])=([B].[ProductID]), RESIDUAL:([AdventureWorks2008R2].[Production].[ProductInventory].[ProductID] as [B].[ProductID]=[AdventureWorks2008R2].[Production].[Product].[ProductID] as [A].[ProductID])) 1 2 1 Merge Join Inner Join MERGE:([A].[ProductID])=([B].[ProductID]), RESIDUAL:([AdventureWorks2008R2].[Production].[ProductInventory].[ProductID] as [B].[ProductID]=[AdventureWorks2008R2].[Production].[Product].[ProductID] as [A].[ProductID]) NULL 968.269 0 0.009000127 274 0.03063076 [A].[ProductID], [A].[Name], [A].[ProductNumber], [A].[MakeFlag], [A].[FinishedGoodsFlag], [A].[Color], [A].[SafetyStockLevel], [A].[ReorderPoint], [A].[StandardCost], [A].[ListPrice], [A].[Size], [A].[SizeUnitMeasureCode], [A].[WeightUnitMeasureCode], [A].[Weight], [A].[DaysToManufacture], [A].[ProductLine], [A].[Class], [A].[Style], [A].[ProductSubcategoryID], [A].[ProductModelID], [A].[SellStartDate], [A].[SellEndDate], [A].[DiscontinuedDate], [A].[rowguid], [A].[ModifiedDate], [B].[ProductID], [B].[LocationID], [B].[Shelf], [B].[Bin], [B].[Quantity], [B].[rowguid], [B].[ModifiedDate] NULL PLAN_ROW 0 1
504 1 |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Production].[Product].[PK_Product_ProductID] AS [A]), ORDERED FORWARD) 1 3 2 Clustered Index Scan Clustered Index Scan OBJECT:([AdventureWorks2008R2].[Production].[Product].[PK_Product_ProductID] AS [A]), ORDERED FORWARD [A].[ProductID], [A].[Name], [A].[ProductNumber], [A].[MakeFlag], [A].[FinishedGoodsFlag], [A].[Color], [A].[SafetyStockLevel], [A].[ReorderPoint], [A].[StandardCost], [A].[ListPrice], [A].[Size], [A].[SizeUnitMeasureCode], [A].[WeightUnitMeasureCode], [A].[Weight], [A].[DaysToManufacture], [A].[ProductLine], [A].[Class], [A].[Style], [A].[ProductSubcategoryID], [A].[ProductModelID], [A].[SellStartDate], [A].[SellEndDate], [A].[DiscontinuedDate], [A].[rowguid], [A].[ModifiedDate] 504 0.01201389 0.0007114 229 0.01272529 [A].[ProductID], [A].[Name], [A].[ProductNumber], [A].[MakeFlag], [A].[FinishedGoodsFlag], [A].[Color], [A].[SafetyStockLevel], [A].[ReorderPoint], [A].[StandardCost], [A].[ListPrice], [A].[Size], [A].[SizeUnitMeasureCode], [A].[WeightUnitMeasureCode], [A].[Weight], [A].[DaysToManufacture], [A].[ProductLine], [A].[Class], [A].[Style], [A].[ProductSubcategoryID], [A].[ProductModelID], [A].[SellStartDate], [A].[SellEndDate], [A].[DiscontinuedDate], [A].[rowguid], [A].[ModifiedDate] NULL PLAN_ROW 0 1
1069 1 |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Production].[ProductInventory].[PK_ProductInventory_ProductID_LocationID] AS [B]), ORDERED FORWARD) 1 4 2 Clustered Index Scan Clustered Index Scan OBJECT:([AdventureWorks2008R2].[Production].[ProductInventory].[PK_ProductInventory_ProductID_LocationID] AS [B]), ORDERED FORWARD [B].[ProductID], [B].[LocationID], [B].[Shelf], [B].[Bin], [B].[Quantity], [B].[rowguid], [B].[ModifiedDate] 1069 0.007569444 0.0013329 54 0.008902345 [B].[ProductID], [B].[LocationID], [B].[Shelf], [B].[Bin], [B].[Quantity], [B].[rowguid], [B].[ModifiedDate] NULL PLAN_ROW 0 1

The merge join operation may be either a regular or a many-to-many operation. A many-to-many merge join uses a temporary table to store rows. If there are duplicate values from each input, one of the inputs will have to rewind to the start of the duplicates as each duplicate from the other input is processed. If a residual predicate is present, all rows that satisfy the merge predicate will evaluate the residual predicate, and only those rows that satisfy it will be returned.

Merge join itself is very fast, but it can be an expensive choice if sort operations are required. However, if the data volume is large and the desired data can be obtained presorted from existing B-tree indexes, merge join is often the fastest available join algorithm.

 
Hash Join

The hash join has two inputs: the build input and probe input. The query optimizer assigns these roles so that the smaller of the two inputs is the build input.

Hash joins are used for many types of set-matching operations: inner join; left, right, and full outer join; left and right semi-join; intersection; union; and difference. Moreover, a variant of the hash join can do duplicate removal and grouping (such as SUM(salary) GROUP BY department). These modifications use only one input for both the build and probe roles.

Similar to a merge join, a hash join can be used only if there is at least one equality (WHERE) clause in the join predicate. However, because joins are typically used to reassemble relationships, expressed with an equality predicate between a primary key and a foreign key, most joins have at least one equality clause. The set of columns in the equality predicate is called the hash key, because these are the columns that contribute to the hash function. Additional predicates are possible and are evaluated as residual predicates separate from the comparison of hash values. The hash key can be an expression, as long as it can be computed exclusively from columns in a single row. In grouping operations, the columns of the group by list are the hash key. In set operations such as intersection, as well as in the removal of duplicates, the hash key consists of all columns.

 

In-Memory Hash Join

The hash join first scans or computes the entire build input and then builds a hash table in memory. Each row is inserted into a hash bucket depending on the hash value computed for the hash key. If the entire build input is smaller than the available memory, all rows can be inserted into the hash table. This build phase is followed by the probe phase. The entire probe input is scanned or computed one row at a time, and for each probe row, the hash key’s value is computed, the corresponding hash bucket is scanned, and the matches are produced.

 

Grace Hash Join

If the build input does not fit in memory, a hash join proceeds in several steps. Each step has a build phase and probe phase. Initially, the entire build and probe inputs are consumed and partitioned (using a hash function on the hash keys) into multiple files. The number of such files is called the partitioning fan-out. Using the hash function on the hash keys guarantees that any two joining records must be in the same pair of files. Therefore, the task of joining two large inputs has been reduced to multiple, but smaller, instances of the same tasks. The hash join is then applied to each pair of partitioned files.

 

Recursive Hash Join

If the build input is so large that inputs for a standard external merge sorts would require multiple merge levels, multiple partitioning steps and multiple partitioning levels are required. If only some of the partitions are large, additional partitioning steps are used for only those specific partitions. In order to make all partitioning steps as fast as possible, large, asynchronous I/O operations are used so that a single thread can keep multiple disk drives busy.

Lets take a look at the execution plan of the below sql statement.

set statistics profile on;
select * from Sales.Customer C
Inner Join Sales.SalesOrderHeader O on C.CustomerID = O.CustomerID
You can see SQL optimizer chose Hash Join to retrieve data in this plan.

SqlExecPlan2

Rows    Executes    StmtText    StmtId    NodeId    Parent    PhysicalOp    LogicalOp    Argument    DefinedValues    EstimateRows    EstimateIO    EstimateCPU    AvgRowSize    TotalSubtreeCost    OutputList    Warnings    Type    Parallel    EstimateExecutions
31465 1 select * from Sales.Customer C Inner Join Sales.SalesOrderHeader O on C.CustomerID = O.CustomerID 1 1 0 NULL NULL NULL NULL 31098.69 NULL NULL NULL 1.455217 NULL NULL SELECT 0 NULL
31465 1 |--Hash Match(Inner Join, HASH:([C].[CustomerID])=([O].[CustomerID])) 1 2 1 Hash Match Inner Join HASH:([C].[CustomerID])=([O].[CustomerID]) NULL 31098.69 0 0.7886466 389 1.455217 [C].[CustomerID], [C].[PersonID], [C].[StoreID], [C].[TerritoryID], [C].[rowguid], [C].[ModifiedDate], [C].[AccountNumber], [O].[SalesOrderID], [O].[RevisionNumber], [O].[OrderDate], [O].[DueDate], [O].[ShipDate], [O].[Status], [O].[OnlineOrderFlag], [O].[PurchaseOrderNumber], [O].[AccountNumber], [O].[CustomerID], [O].[SalesPersonID], [O].[TerritoryID], [O].[BillToAddressID], [O].[ShipToAddressID], [O].[ShipMethodID], [O].[CreditCardID], [O].[CreditCardApprovalCode], [O].[CurrencyRateID], [O].[SubTotal], [O].[TaxAmt], [O].[Freight], [O].[Comment], [O].[rowguid], [O].[ModifiedDate], [O].[SalesOrderNumber], [O].[TotalDue] NULL PLAN_ROW 0 1
0 0 |--Compute Scalar(DEFINE:([C].[AccountNumber]=[AdventureWorks2008R2].[Sales].[Customer].[AccountNumber] as [C].[AccountNumber])) 1 3 2 Compute Scalar Compute Scalar DEFINE:([C].[AccountNumber]=[AdventureWorks2008R2].[Sales].[Customer].[AccountNumber] as [C].[AccountNumber]) [C].[AccountNumber]=[AdventureWorks2008R2].[Sales].[Customer].[AccountNumber] as [C].[AccountNumber] 19820 0 0.001982 56 0.1179369 [C].[CustomerID], [C].[PersonID], [C].[StoreID], [C].[TerritoryID], [C].[rowguid], [C].[ModifiedDate], [C].[AccountNumber] NULL PLAN_ROW 0 1
0 0 | |--Compute Scalar(DEFINE:([C].[AccountNumber]=isnull('AW'+[AdventureWorks2008R2].[dbo].[ufnLeadingZeros]([AdventureWorks2008R2].[Sales].[Customer].[CustomerID] as [C].[CustomerID]),''))) 1 4 3 Compute Scalar Compute Scalar DEFINE:([C].[AccountNumber]=isnull('AW'+[AdventureWorks2008R2].[dbo].[ufnLeadingZeros]([AdventureWorks2008R2].[Sales].[Customer].[CustomerID] as [C].[CustomerID]),'')) [C].[AccountNumber]=isnull('AW'+[AdventureWorks2008R2].[dbo].[ufnLeadingZeros]([AdventureWorks2008R2].[Sales].[Customer].[CustomerID] as [C].[CustomerID]),'') 19820 0 0.001982 56 0.1159549 [C].[CustomerID], [C].[PersonID], [C].[StoreID], [C].[TerritoryID], [C].[AccountNumber], [C].[rowguid], [C].[ModifiedDate] NULL PLAN_ROW 0 1
19820 1 | |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[Customer].[PK_Customer_CustomerID] AS [C])) 1 5 4 Clustered Index Scan Clustered Index Scan OBJECT:([AdventureWorks2008R2].[Sales].[Customer].[PK_Customer_CustomerID] AS [C]) [C].[CustomerID], [C].[PersonID], [C].[StoreID], [C].[TerritoryID], [C].[rowguid], [C].[ModifiedDate] 19820 0.09201389 0.021959 47 0.1139729 [C].[CustomerID], [C].[PersonID], [C].[StoreID], [C].[TerritoryID], [C].[rowguid], [C].[ModifiedDate] NULL PLAN_ROW 0 1
0 0 |--Compute Scalar(DEFINE:([O].[SalesOrderNumber]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderNumber] as [O].[SalesOrderNumber], [O].[TotalDue]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[TotalDue] as [O].[TotalDue])) 1 12 2 Compute Scalar Compute Scalar DEFINE:([O].[SalesOrderNumber]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderNumber] as [O].[SalesOrderNumber], [O].[TotalDue]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[TotalDue] as [O].[TotalDue]) [O].[SalesOrderNumber]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderNumber] as [O].[SalesOrderNumber], [O].[TotalDue]=[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[TotalDue] as [O].[TotalDue] 31465 0 0.0031465 341 0.548631 [O].[SalesOrderID], [O].[RevisionNumber], [O].[OrderDate], [O].[DueDate], [O].[ShipDate], [O].[Status], [O].[OnlineOrderFlag], [O].[PurchaseOrderNumber], [O].[AccountNumber], [O].[CustomerID], [O].[SalesPersonID], [O].[TerritoryID], [O].[BillToAddressID], [O].[ShipToAddressID], [O].[ShipMethodID], [O].[CreditCardID], [O].[CreditCardApprovalCode], [O].[CurrencyRateID], [O].[SubTotal], [O].[TaxAmt], [O].[Freight], [O].[Comment], [O].[rowguid], [O].[ModifiedDate], [O].[SalesOrderNumber], [O].[TotalDue] NULL PLAN_ROW 0 1
0 0 |--Compute Scalar(DEFINE:([O].[SalesOrderNumber]=isnull(N'SO'+CONVERT(nvarchar(23),[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderID] as [O].[SalesOrderID],0),N'*** ERROR ***'), [O].[TotalDue]=isnull(([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SubTotal] as [O].[SubTotal]+[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[TaxAmt] as [O].[TaxAmt])+[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[Freight] as [O].[Freight],($0.0000)))) 1 13 12 Compute Scalar Compute Scalar DEFINE:([O].[SalesOrderNumber]=isnull(N'SO'+CONVERT(nvarchar(23),[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderID] as [O].[SalesOrderID],0),N'*** ERROR ***'), [O].[TotalDue]=isnull(([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SubTotal] as [O].[SubTotal]+[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[TaxAmt] as [O].[TaxAmt])+[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[Freight] as [O].[Freight],($0.0000))) [O].[SalesOrderNumber]=isnull(N'SO'+CONVERT(nvarchar(23),[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SalesOrderID] as [O].[SalesOrderID],0),N'*** ERROR ***'), [O].[TotalDue]=isnull(([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[SubTotal] as [O].[SubTotal]+[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[TaxAmt] as [O].[TaxAmt])+[AdventureWorks2008R2].[Sales].[SalesOrderHeader].[Freight] as [O].[Freight],($0.0000)) 31465 0 0.0031465 341 0.5454844 [O].[SalesOrderID], [O].[RevisionNumber], [O].[OrderDate], [O].[DueDate], [O].[ShipDate], [O].[Status], [O].[OnlineOrderFlag], [O].[SalesOrderNumber], [O].[PurchaseOrderNumber], [O].[AccountNumber], [O].[CustomerID], [O].[SalesPersonID], [O].[TerritoryID], [O].[BillToAddressID], [O].[ShipToAddressID], [O].[ShipMethodID], [O].[CreditCardID], [O].[CreditCardApprovalCode], [O].[CurrencyRateID], [O].[SubTotal], [O].[TaxAmt], [O].[Freight], [O].[TotalDue], [O].[Comment], [O].[rowguid], [O].[ModifiedDate] NULL PLAN_ROW 0 1
31465 1 |--Clustered Index Scan(OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [O])) 1 14 13 Clustered Index Scan Clustered Index Scan OBJECT:([AdventureWorks2008R2].[Sales].[SalesOrderHeader].[PK_SalesOrderHeader_SalesOrderID] AS [O]) [O].[SalesOrderID], [O].[RevisionNumber], [O].[OrderDate], [O].[DueDate], [O].[ShipDate], [O].[Status], [O].[OnlineOrderFlag], [O].[PurchaseOrderNumber], [O].[AccountNumber], [O].[CustomerID], [O].[SalesPersonID], [O].[TerritoryID], [O].[BillToAddressID], [O].[ShipToAddressID], [O].[ShipMethodID], [O].[CreditCardID], [O].[CreditCardApprovalCode], [O].[CurrencyRateID], [O].[SubTotal], [O].[TaxAmt], [O].[Freight], [O].[Comment], [O].[rowguid], [O].[ModifiedDate] 31465 0.5075694 0.0347685 305 0.542338 [O].[SalesOrderID], [O].[RevisionNumber], [O].[OrderDate], [O].[DueDate], [O].[ShipDate], [O].[Status], [O].[OnlineOrderFlag], [O].[PurchaseOrderNumber], [O].[AccountNumber], [O].[CustomerID], [O].[SalesPersonID], [O].[TerritoryID], [O].[BillToAddressID], [O].[ShipToAddressID], [O].[ShipMethodID], [O].[CreditCardID], [O].[CreditCardApprovalCode], [O].[CurrencyRateID], [O].[SubTotal], [O].[TaxAmt], [O].[Freight], [O].[Comment], [O].[rowguid], [O].[ModifiedDate] NULL PLAN_ROW 0 1

 

Note: If the build input is larger but not a lot larger than the available memory, elements of in-memory hash join and grace hash join are combined in a single step, producing a hybrid hash join.

It is not always possible during optimization to determine which hash join will be used. Therefore,  SQL Server starts using an in-memory hash join and gradually transitions to grace hash join, and recursive hash join, depending on the size of the build input.

If the optimizer anticipates wrongly which of the two inputs is smaller and, therefore, should have been the build input, the build and probe roles are reversed dynamically. The hash join makes sure that it uses the smaller overflow file as build input. This technique is called role reversal.

Conclusion

SQL Server Optimizer determines the best, optimized plan for executing the query. The optimizer evaluate the  indices on the table columns, number of rows in the tables , type of joins  etc. when it chose the best plan. 

Nested loop is generally chosen for smaller tables and if it is possible to do Index seek on the inner table to ensure better performance.

Merge Join is considered to be more efficient for large tables with the keys columns sorted. It only need to read the keys once, and does not require lot of CPU cycles.  

Hash Join is also better suited for large tables. This requires less IO, but need more CPU and requires lot of memory.

That being said, it is possible  to override the optimizer’s choice of the join by using the OPTION hint. However, the optimizer is very smart, it does not make very many bad choices. So use caution when you use the  OPTION hint in your queries.  Updating statistics to fix your indexes help the optimizer to choose  the right Join.


MVC, MVP and MVVM

MVC-MVP-MVVM

MVC

All input coming from user interaction, such mouse click. are directed to the Controller first. The Controller then kick off some functionality. A single Controller may render many different Views based on the operation being executed.  Also view doesn’t have any knowledge of or reference to the Controller. Controller interact with the Model and pass the model to the View. So there is a knowledge between the View and Model  being passed on to it.

MVP

MVP pattern looks very similar to MVC pattern, but has some key distinctions. In MVP, the input is directed to the View not the Presenter. For each View, there is a dedicated Presenter to render that View. The View hold the reference to the Presenter. The Presenter is also reacting to events being triggered from the View, so its aware of the View its associated with it. The Presenter updates the View with the data from the Model and vice versa. But he View is not aware of the Model.

MVVM

Like MVP pattern, the input is directed to  the View. View hold a reference to the ViewModel. But ViewModel is not aware of the View. So it is possible to have one ViewModel to support many Views. You can have a WPF view, Windows Phone or Silverlight View use the same ViewModel. Also the Model is decoupled from the View.  ViewModel interact with the Model and feed the View with the data.

Conclusion

This is just a scratch on the surface. This help you at a higher level, to understand the main differences between these 3 pattern. There are lot resources out which explain how to program using these patterns.  I will post some more details on these patterns in my future posts.


SQL Server Reporting Services – Deploying rdl Files

When it comes to deploying rdl files, it can be  very tedious, especially if  you have hundreds of rdl files and server test regions to deploy. There is no good tools available to automate the rdl deployment.

Here I will explain 3 available options to deploy rdl files.The samples and screen shots are based on SQL Server 2008 R2 and vary slightly in older versions of SQL Server. You can download all code sample here (SQL Report UploaderrsUtility Sample).

  • Report Manager
  • rs Utility
  • Reporting Web Service

Report Manager

First I will explain how to deploy your rdl files using Report Manager, more often using http://localhost/Reports.

  1. Go to Report Manager
  2. Create a new folder to keep all your reports or use an existing folder. Click on the New Folder button to create a new folder.
    image
  3. Create a Data Source, if you don’t have one already. Click on New Data Source  link on the toolbar  to create a new data source.
    image
  4. Upload the rdl file using the Upload Files link on the toolbar and navigate to the rdl file in your system.
  5. After upload, click on the Detail View link on the toolbar and edit the report  selecting  Manage  link.
    image
  6. Under the Data Source link, assign the previously created Data Source to the report.   Click on the Apply button to save your selection.
    image

This is manual process and can be used if you have small number of reports and you don’t have to deploy them frequently.

rs Utility

Developers and report server administrators can perform operations on a report server through the use of the rs utility (RS.exe). Using this utility, you can programmatically administer a report server using scripts written with Microsoft Visual Basic. Reporting Services scripts can be used to run any of the Report Server Web service operations. Scripting can be used to copy security to multiple reports on a server, to add and delete items, to copy report server items from one server to another and more.

RS.exe is located at \Program Files\Microsoft SQL Server\100\Tools\Binn. You can run the utility from any folder on your file system. To run the tool, you must have permission to connect to the report server instance you are running the script against. You can run scripts to make changes to the local computer or a remote computer. To make changes to a report server installed on a remote computer, specify the remote computer in the -s argument.

Syntax
rs {-?}
{-i input_file=}
{-s serverURL}
{-u username}
{-p password}
{-e endpoint}
{-l time_out}
{-b batchmode}
{-v globalvars=}
{-t trace}
Example
rs –i c:\scriptfiles\script_copycontent.rss -s http://localhost/reportserver

The .rss file contain the code to read the rdl file from the local system, upload the file to report server and set data source. A sample .rss file is shown below.

 1: Public Sub Main()
 2:     Dim folder As String = PARENTFOLDER
 3:     Dim name As String = RDLNAME
 4:     Dim data As String = DATASOURCE
 5:
 6:     Dim parent As String = "/" + folder
 7:     Dim location As String = name + ".rdl"
 8:
 9:     Dim overwrite As Boolean = True
 10:     Dim reportContents As Byte() = Nothing
 11:     Dim warnings As Warning() = Nothing
 12:
 13:     Dim fullpath As String = parent + "/" + name
 14:
 15:     'Common CatalogItem properties
 16:     Dim descprop As New [Property]
 17:     descprop.Name = "Description"
 18:     descprop.Value = ""
 19:     Dim hiddenprop As New [Property]
 20:     hiddenprop.Name = "Hidden"
 21:     hiddenprop.Value = "False"
 22:
 23:     Dim props(1) As [Property]
 24:     props(0) = descprop
 25:     props(1) = hiddenprop
 26:
 27:     'Read RDL definition from disk
 28:     Try
 29:         Dim stream As FileStream = File.OpenRead(location)
 30:         reportContents = New [Byte](stream.Length-1) {}
 31:         stream.Read(reportContents, 0, CInt(stream.Length))
 32:         stream.Close()
 33:
 34:         warnings = RS.CreateReport(name, parent, overwrite, reportContents, props)
 35:
 36:         If Not (warnings Is Nothing) Then
 37:             Dim warning As Warning
 38:             For Each warning In warnings
 39:                 Console.WriteLine(Warning.Message)
 40:             Next warning
 41:         Else
 42:             Console.WriteLine("Report: {0} published successfully with no warnings", name)
 43:         End If
 44:
 45:         'Set report DataSource references
 46:         Dim dataSources(0) As DataSource
 47:         Dim dsr0 As New DataSourceReference
 48:         dsr0.Reference = region + "/MyDataSource"
 49:         Dim ds0 As New DataSource
 50:         ds0.Item = CType(dsr0, DataSourceDefinitionOrReference)
 51:         ds0.Name = data
 52:         dataSources(0) = ds0
 53:
 54:         RS.SetItemDataSources(fullpath, dataSources)
 55:
 56:         Console.Writeline("Report DataSources set successfully")
 57:         Console.WriteLine("Report: {0} published successfully", name)
 58:
 59:     Catch e As IOException
 60:         Console.WriteLine(e.Message)
 61:     Catch e As SoapException
 62:         Console.WriteLine("Error : " + e.Detail.Item("ErrorCode").InnerText + " (" + e.Detail.Item("Message").InnerText + ")")
 63:         Console.WriteLine("Report: {0} published with error", name)
 64:     End Try
 65: End Sub
 66:

If you several rdl files to deploy, you can write a batch script to automate the deployment. Create batch file name it as deploy_rdl.cmd.

 1: @echo off
 2: ::Script Variables
 3: SET LOGFILE="RS Scripter log.txt"
 4: SET SCRIPTLOCATION=C:\SSRS Scripts\TestReports
 5: SET REPORTSERVER=http://localhost/Reports
 6: SET RS="C:\Program Files\Microsoft SQL Server\100\Tools\Binn\RS.EXE"
 7: SET TIMEOUT=60
 8:
 9: ::Clear Log file
 10: IF EXIST %logfile% DEL %logfile%
 11:
 12: ::Run Scripts
 13: %rs% -i "%SCRIPTLOCATION%\TrendReports.rss" -s %REPORTSERVER% -l %TIMEOUT% -v PARENTFOLDER="MyTestReports" -v RDLNAME="TrendReport" -v DATASOURCE="MyDataSource" >>%LOGFILE% 2>&1
 14: ECHO. >>%LOGFILE%
 15:
 16: %rs% -i "%SCRIPTLOCATION%\SLAReports.rss" -s %REPORTSERVER% -l %TIMEOUT% -v PARENTFOLDER="MyTestReports" -v RDLNAME="SLAReport" -v DATASOURCE="MyDataSource" >>%LOGFILE% 2>&1
 17: ECHO. >>%LOGFILE%
 18:
 19: ECHO. >>%LOGFILE%
 20: ECHO Finished Load at %DATE% %TIME% >>%LOGFILE%
 21: ECHO. >>%LOGFILE%

You download the code Sample for the above scripts  here.

Reporting Web Service

SQL Server Reporting Services provides access to the full functionality of the report server through the Report Server Web service. The Report Server Web service is an XML Web service with a SOAP API. It uses SOAP over HTTP and acts as a communications interface between client programs and the report server. The Web service provides two endpoints – one for report execution and one for report management – with methods that expose the functionality of the report server and enable you to create custom tools for any part of the report life cycle.

Programming Diagram – Web Service Development Options

IC156257

I will explain how to create a WCF client to access SQL Server Reporting Services , upload a report to the report server, Create Data Source and assign data source to a report.

Create a project using your favorite visual Studio IDE . Here I use Visual Studio 2008 and creating  a windows application project.

Add a Service reference to the Web Service URL The URL for the report execution endpoint of the Report Server Web service might look like this:

http://localhost:8080/ReportServer/ReportService2005.asmx

The URL contains the domain in which the Report Server Web service is deployed, the name of the folder containing the service, and the name of the discovery file for the service.

Note: the name of the discovery file is ReportService2005.asmx for SQL Server 2008.

Select the project and add a class file ReportUploader.cs .

Add a using (Import in Visual Basic) directive to the Report Server Web service reference.

 1: using Rhymond.ReportServices.Uploader.ReportingServices;

Declare the client variable

 1: /// <summary>
 2:        /// The sql reporting service client
 3:        /// </summary>
 4:        private ReportingService2005SoapClient client;

Create the client instance and setup permissions to access the report services.

 1: /// <summary>
 2: /// Initializes a new instance of the <see cref="ReportUploader"/> class.
 3: /// </summary>
 4: /// <param name="reportDefition">The report defition.</param>
 5: /// <param name="region">The region.</param>
 6: public ReportUploader(string reportDefition, string region)
 7: {
 8:     this.client = new ReportingService2005SoapClient("ReportExecutionServiceSoap");
 9:     this.client.ClientCredentials.Windows.ClientCredential = System.Net.CredentialCache.DefaultNetworkCredentials;
 10:     this.client.ClientCredentials.Windows.AllowedImpersonationLevel = System.Security.Principal.TokenImpersonationLevel.Impersonation;
 11: }

Create a folder on the reporting server.

 1: // Create Folder if the folder doesn't exists
 2: CatalogItem[] result = null;
 3: this.client.ListChildren(remoteRoot + "/" + this.Region, false, out result);
 4: if (result != null
 5:     && result.Where(i => i.Type == ItemTypeEnum.Folder && i.Name == report.RemotePath).Count() == 0)
 6: {
 7:     BatchHeader batch1 = this.GetBatchHeader();
 8:     this.client.CreateFolder(batch1, report.RemotePath, remoteRoot + "/" + this.Region, this.GetReportProperties(""));
 9:     this.client.ExecuteBatch(batch1);
 10: }

In SQL server 2008 the server calls are executed in batch. You have to create a BatchHeader and call client.ExecuteBatch with the BatchHeader.

 1: /// <summary>
 2: /// Gets the batch header.
 3: /// </summary>
 4: /// <returns></returns>
 5: private BatchHeader GetBatchHeader()
 6: {
 7:     string batchId = string.Empty;
 8:     this.client.CreateBatch(out batchId);
 9:
 10:     BatchHeader header = new BatchHeader();
 11:     header.BatchID = batchId;
 12:
 13:     return header;
 14: }

Upload a Report.

 1: // Upload Report
 2: Warning[] warnings;
 3: // Read rdl file contents to a byte array
 4: byte[] definition = this.GetReportContents(Path.Combine(Path.Combine(localRoot, report.LocalPath), report.Name) + ".rdl");
 5: BatchHeader batch2 = this.GetBatchHeader();
 6: this.client.CreateReport(batch2, report.Name, remoteReportPath, true, definition, this.GetReportProperties(report.Description), out warnings);
 7: this.client.ExecuteBatch(batch2);
 8:
 9: if (warnings != null && warnings.Length > 0)
 10: {
 11:     foreach (Warning warn in warnings)
 12:     {
 13:         Console.WriteLine(String.Format("{0}: {1}", warn.ObjectName, warn.Message));
 14:     }
 15: }

Create Data Source

 1: /// <summary>
 2: /// Creates the data source.
 3: /// </summary>
 4: /// <param name="rdlDataSource">The RDL data source.</param>
 5: public void CreateDataSource(ReportDataSource reportDataSource)
 6: {
 7:     DataSourceDefinition dsd = new DataSourceDefinition();
 8:     dsd.ConnectString = reportDataSource.ConnectionString;
 9:     dsd.Enabled = true;
 10:     dsd.Extension = reportDataSource.DataSourceType.ToString();
 11:
 12:     if (reportDataSource.WindowsAuth)
 13:     {
 14:         dsd.WindowsCredentials = true;
 15:     }
 16:     else
 17:     {
 18:         dsd.CredentialRetrieval = CredentialRetrievalEnum.Store;
 19:         dsd.UserName = reportDataSource.UserName;
 20:         dsd.Password = reportDataSource.Password;
 21:     }
 22:
 23:     BatchHeader batch = this.GetBatchHeader();
 24:     client.CreateDataSource(batch, reportDataSource.Name, reportDataSource.DataSourcePath, true, dsd, this.GetReportProperties(reportDataSource.Description));
 25:     this.client.ExecuteBatch(batch);
 26: }

Assign  data source to a report. Some reports may have multiple data sources, in that case GetItemDataSource return an array of datasources. Find your data source in that array and update he DataSourceReference.

 1: DataSourceReference dsRef = new DataSourceReference();
 2: dsRef.Reference = rdlDataSource.DataSourcePath + "/" + rdlDataSource.Name;
 3:
 4: DataSource[] dataSources = null;
 5: this.client.GetItemDataSources(remoteReportPath + "/" + rdlName, out dataSources);
 6: if (dataSources != null)
 7: {
 8:     if (dataSources.Length > 1)
 9:     {
 10:         DataSource ds = dataSources.Where(d => d.Name == rdlDataSource.Name).FirstOrDefault();
 11:         if (ds != null)
 12:         {
 13:             ds.Item = dsRef;
 14:
 15:             dataSources = new DataSource[1];
 16:             dataSources[0] = ds;
 17:
 18:         }
 19:     }
 20:     else
 21:     {
 22:         dataSources[0].Item = dsRef;
 23:     }
 24: }
 25: else
 26: {
 27:     dataSources = new DataSource[1];
 28:     dataSources[0].Item = dsRef;
 29: }
 30:
 31: BatchHeader batch = this.GetBatchHeader();
 32: this.client.SetItemDataSources(batch, remoteReportPath + "/" + rdlName, dataSources);
 33: this.client.ExecuteBatch(batch);

Note: If you try to create a new DataSource Array and assign DataSource reference, you might get a “Invalid input XML” error.  So you must use GetItemDataSoruces method to get all available data source for the report and update the DataSourceReference.

You can down load the  complete project here. This is a sample but working code; Feel free to make any changes; suggestions are always welcome.


Comparing Cloud Services

A cloud service allows the us to store  media and documents on the internet, so you can access it from anywhere, anytime from any device with an internet connection.  The general idea of the cloud services is to store and access data from the internet and opposed to keeping  it on the local hard drive.  Ideally with cloud services you can access any type of media and document such as video,  photos , e-book and your work and personal documents. Now with cloud services you can juggle around your data on multiple devices.

With the recent announcement of iCloud, Apple joins Google, Microsoft and Amazon in their push for cloud services. But cloud services vary between companies so much that the buzzwords can be very confusing. What are you getting from this service?  What is an online storage?  Does it stream video? How secure my data ?

The below chart give you a side-by-side comparison of  6 major cloud services.

  • .
  • Features
  • Device
  • Music
  • Apple
    iCloud
  • Includes 5 GB of free online storage, synchronization for music, photos, apps, documents, iBooks, contacts, e-mail and calendars;
  • Macs and iOS 5 (Windows PCs get PhotoStream and some basic features only)
  • All iTunes-purchased music can be shared between devices; iTunes Match: $25 per year to push 25,000 tracks in your library to be shared through iCloud
  • Google Apps
  • Includes 1 GB free online storage for Google Docs, 1 GB free storage for Picasa, 7 GB free storage for Gmail; streaming music, synchronized documents, contacts, e-mail, calendars; expandable to 16 TB for $4,000 per year
  • All devices with a web browser.
  • Google Music Beta lets you upload up to 20,000 tracks from your own library
  • Amazon
    Cloud Drive
  • Includes 5GB free online storage; additional storage can be purchased for $1 per gigabyte per year.
  • All devices compatible with Adobe Flash.
  • Includes Cloud Player music-streaming application.
  • Microsoft
    Windows Live SkyDrive
  • Includes 25 GB free storage for files and synchronization for photos.
  • Windows PC, Mac, Windows Phone 7
  • None
  • Dropbox
  • Includes 2GB free storage, upgradeable to 100 GB for $200 per year
  • All devices with a web browser or Dropbox client.
  • Built-in audio player in web interface and iOS client.
  • Box
  • Includes 5GB free storage, upgradeable to 50 GB for $20 per month
  • All devices with a web browser or Boxnet client.
  • None

Here are some more detail on how these services work.

iCloud Service is designed to work  automatic and effortless as if it were invisible. It is seamlessly integrated into your apps, so you can access your contents on all  your devices. iCloud stores music, photos, videos, calendar, apps, documents and more. And wirelessly pushes them to all your devices, automatically.

iCloud is so much more than a hard drive in the sky. It’s the effortless way to access just about everything on all your devices. iCloud stores your content so it’s always accessible from your iPad, iPhone, iPod touch, Mac, or PC (**with limited features). No syncing required. No management required. In fact, no anything required. iCloud does it all for you.

When you sign up for iCloud, you automatically get 5GB of free storage. And that’s plenty of room, because of the way iCloud stores your content. Your purchased music, apps, books, and TV shows, as well as your Photo Stream, don’t count against your free storage. Since your mail, documents, Camera Roll, account information, settings, and other app data don’t use as much space, you’ll find that 5GB goes a long way. And if you need more storage, you can easily purchase a storage upgrade right from your device.

Apple has left some questions unanswered as to whether iCloud will have a web app interface for accessing these services from any device with a browser, like MobileMe did. However, we believe it’s shortsighted to think that iCloud would not eventually have a web app suite to complement the aforementioned services.

Google Apps offer Reliable, secure online applications. It is a collection of web-based programs and file storage that run on a web browser.  There is no one stop destination that host your media and documents.  You have to use Picasa for your photos, Google Music for your music, GMail for your emails and Google docs for your documents.

Gmail offers more than 7 Gb for messages and attachements and other service offers at least 1 GB of free space, and you can plunk down an extra $5 per year to add 20 gigs for most of its services. You can rent up to 16 TB each year for $4,000.

Amazon Cloud Drive is like a personal hard drive in the cloud. Store your music, videos, photos, and documents on Amazon’s secure servers. All you need is a web browser to upload, download, and access your files from any computer. Cloud Drive comes with 5 GB of free storage—enough space to store up to 1000 songs. When you purchase songs or albums from the Amazon MP3 Store, you can now save your purchases to your Cloud Drive. All your purchases are backed up and available for you to download at any time. Even better, you can listen to your music from any web-connected computer with Amazon Cloud Player. You can only access the files from a device that supports Flash. (That means Cloud Drive is useless for any iPhone or iPad customer, since the devices do not support Flash.) . Amazon offer paid storage plans with up to 1000 GB of space and that cost about $1 per 1 Gig of storage.

Like Amazon Cloud Drive, SkyDrive is also your storage place in the cloud. Upload your files and photos so you can access them from anywhere, and share them to collaborate with others. You get 25 GB free, and that’s all you can have — no options to expand. You can throw whatever file you want into the SkyDrive. SkyDrive has some integration with Windows Phone 7. Snap a photo or shoot a video on your Windows phone, and it goes straight to the SkyDrive. With SkyDrive, you can also create, view, edit, and share Microsoft Office docs saved in Word, Excel, PowerPoint, and OneNote for free online.

SkyDrive has permission controls that let you share your files and Office docs with specific people. SkyDrive even has permission settings that allow people to view your files but not edit or delete them. If you’re collaborating on a document online, and you don’t like where an idea has headed, you can simply roll back to an earlier version of the document instead of starting over.

Dropbox, the hottest cloud solution in the startup world. It’s basically a web-connected folder that lives on your PC or smartphone. Any file you save to Dropbox also instantly saves to all your devices, and the Dropbox website. The Dropbox client works the same on every major platform — Windows, Mac, Linux, iOS and Android — and you can also access your files through the Dropbox web interface on any device with a web browser.  When you sign u p, you get 2GB of Dropbox for free, with subscriptions up to 100GB available. It works even when offline. You always have your files, whether or not you have a connection.

Box lets you store documents, media and all of your content online, so you can access, manage and share them from anywhere. You can access Box with Google Apps and several other mobile apps such as GoodReader, Notarize and QuickOffice. You can also access Box on mobile devices.  The Box client works the same on every major platform — Windows, Mac, Linux, iOS and Android — and you can also access your files through the Box web interface on any device with a web browser. When you signup for Personal account, you get 5 GB of Box for free and have a option of buying up to 50 GB for about $20.

Conclusion

  • Microsoft Windows Live give you the largest storage: 25GB, but is not easy to use.
  • Google provides many distributed services with at least 1GB free for each.
  • Dropbox is a good service for storing and sharing data but it’s the most expensive.
  • Amazon Cloud Drive: you have 5GB free storage. If you use iOS devices which don’t support Flash, it’s not easy to access your data.
  • iCloud looks very good but to use it efficiently, you need iOS devices or Macs.
  • Box is a good service for storing and sharing data but it’s the most expensive
The cloud service you choose will likely rely on the device you own. For people with an iPhone and a Mac, iCloud is an elegant solution for tying the two experiences together. Android and Windows customers will probably find good value in Amazon’s Cloud Drive, Google’s cloud suite or Microsoft’s Windows Live. Finally, Dropbox costs a bit more than the other services, but its cross-platform compatibility makes it ideal for people with a mix of devices, such as a Windows PC and an iPhone, or a Mac and an Android device.

Follow

Get every new post delivered to your Inbox.