Get the documents out of SharePoint database


I recently helped a client to get the WSS 3.0 documents out, as the client’s SharePoint is broken and we couldn’t get it fixed in short time, but they really want the Word, Excel, and pdf documents out of there.

WSS 3.0, Windows SharePoint Services, uses Sql Server for data storage. The Sql Server version WSS uses is the Embedded Edition. By default, microsoft##ssee is what you will see listed in services. Usually, Sql Server client tools are not installed on the server. Therefore, it is difficult to control those files. So we installed Sql Server Management Studio on this box.

Then we had a hard time connecting to the instance. It turned out we can only connect to the SharePoint WSS 3.0 databases via named pipe. I used Windows authentication and put in \\.\pipe\mssql$microsoft##ssee\sql\query as server name, and finally got in.

For SharePoint, there are at least 2 databases involved: the configuration database, and the content database. By default, they are located at c:\windows\SYSMSI\SSEE\MSSQL.2005\MSSQL\Data

I then moved the content database to a regular Sql Server 2005 server, and decided to write a little program to suck the documents outside of the database. It turns out that somebody has already written the program. Follow the link here to get the C# source code of the program. Note that the sql statement used should be the one listed below, as mention at comment No. 9 of that post:

com.CommandText = “select [AllDocs].[DirName], [AllDocs].[LeafName], [AllDocStreams].[Content] from [AllDocs],[AllDocStreams] where (LeafName like ‘%.doc’ or LeafName like ‘%.xls’ or LeafName like ‘%.pdf’ or LeafName like ‘%.ppt’) and [AllDocStreams].[Content] is not NULL and [AllDocs].[Id] = [AllDocStreams].[Id]”;

Enjoy!

, , ,

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.