Daniel's profileDaniel CamposBlogGuestbookNetwork Tools Help

Daniel Campos

IT Everywhere - OpenXML Enthusiast
May 05

Office Contest Winner 2009

Hey friends,
 
I`m very proud to say that i won (1st place) the Microsoft Office Contest here at Brazil with the Images 2 OpenXML solution. The competitors should create an application that generates OpenXML files and uses Office as the base of the application (APIs, Front-Ends...). The Images 2 OpenXML project shows how to integrate the Office 2007 OCR engine with custom applications and use OpenXML and Speech Recognition as application features. The main proposal of the tool is to convert scanned documents to Office 2007 new format (OpenXML). The speech recognition improves the application user experience. The application uses the Office 2007 OCR API called MODI (Microsoft Office Document Imaging) that really helps to work with simple document translations. This project was based on the article that i wrote some months ago, the tool now is upgraded and more features will come. Hope you like the application.
 
You can check the results of the contest and the description of project of the winners at http://www.microsoft.com/brasil/office/resultadoconcurso/home.aspx (Portuguese).
 
See ya.
 
 
January 29

A little bit busy

 

Hey friends,

Here i am again, first i want to whish you all happy new year (delayed). The things here at MIC goes a little bit busy, for now i'm working on lots of projects that soon will be released here, there's a lot of new things comming up next few days (months =P). For now i post here the last news about Office Development, OpenXML, and some other cool stuff that i found over the web.

If you're trying to find some good read about OpenXML, here some tips that might help:

Zeyad Rajabi keep posting some interesting articles about OpenXML, the last one is Traversing in the OpenXML DOM. Its basically a sequence of what Ali begins, they talk a little bit about OpenXML basics.

Doug Mahugh bring us the news about the stabilization of the IS 29500 standard (a.k.a OpenXML format ;-)) , there's a very interesting article about the implementation notes for Office 2007 SP2, tou can check out here.

Mary Lee from VSTO Team give us an explanation about how to deploy Office solutions using Windows Installer, you can see the article here. (I Know it's not new, but its very usefull).

You can check out one interesting video about how to customize a Office Ribbon Bar within custom applications using MFC and a free software called Axialis. here's the video.

Ther a cool post on Code Project explaining how to create iPhone UI for Windows Mobile, int his article you'll be intrduced to concepts like AlphaBlend, imaging load and transparency and so on. Here's the link http://www.codeproject.com/KB/mobile/IPhoneUI.aspx. If you want to see this code with some additions (to work with PocketPC 2003 and so on) send me an e-mail.

December i spoke at MAD (Microsoft Academy Day) presenting the OpenXML SDK 2.0, and the development with VSTO 3.0, there's some pictures here: (if you want the demosa and the presentation send me an email too)

daniel 003   daniel 002 daniel 005

 

Finally here's a awesome picture of my vacations ;-)

image

By the way, this is where i live.

 

See ya soon guys

October 06

Microsoft Student to Business Program

 

Hey guys,

Last friday 9/03/2008 we started the 4th edition of Microsoft Student to Business program here at Brazil. We had 36K + registered people throughout the country.

The Students to Business (S2B) program is a Microsoft® Community Initiative designed to connect Microsoft partners and customers with qualified students for entry-level and internship positions.
The objective of the S2B program is to inspire local businesses to communicate the competency requirements for new talent, to evaluate the skills of students ready for an entry-level job or internship and collaborate with Microsoft and local education institutions to provide the curriculum and training needed to ensure students are prepared to meet the innovation needs of company’s around the globe.
Students engaged in S2B benefit from unique mentoring, training and certification opportunities. Various offerings are available to students at each stage of S2B – when profiling, in application and after their job connection.
The S2B program was first piloted in Italy and has since been rolled out 28 countries, aiming to connect 100,000 students with new career skills, leading to 15,000 students with jobs and internships as part of the Microsoft community.

The program is composed of three stages, the first one is a big class where we explain the local market, job positions and so on (related to the IT area), the second stage consist of 36+ hours of traning in two different tracks: System Development (using .NET) and Network Administration (Windows Server Technologies), the third stage consist of development of an Application using the technologies learned during the second stage (.NET, VSTS, WinForms, ASP.NET)... or solving an speified problem with Microsoft Network Administration technologies.

The attendees will have the totally 80+ hours of training with the most specialized professionals (the guys of Microsft Innovation Centers, MVPs and so on). 

Congrats to everyone who passed to the second stage!!!

Students-to-Biz_3_bl

 

See ya!!!

September 19

Converting images to text using Office 2007 OCR, OpenXML and Speech Recognition

 

Hey folks,

Last week i've posted an article on the code project portal and i'll reproduce it here:

Introduction

Sometimes at the development of an application we face situations where we have a scanned document (image) and we want to convert it to text (Word 2007 document). Some scanners provide applications that automatically perform this kind of convertion, but in the most times the generated document format is a .pdf or .odt and so on. If you want to convert directly to .docx (OpenXML) documents, you'll have to use third-party applications or develop it from scratch.

OpenXML became a ISO standard (IS29500) and its adoption is growing up day after day driven by its performance, scalability and security. The format is the default format of Microsoft Office 2007 documents (.pptx, .docx, .xlsx). It's 75 percent smaller than compared binary documents and based in two major technologies: ZIP and XML.
The Speech recognition is a feature included with .NET Framework 3.5. Developers can use this API and provide better User-Experience, easy access to specific informations and so on. The API is available since the .NET Framework 3.0 and it's a default feature of Windows Vista.

Scenario

To facilitate the work of developers and avoid the integration with third-party applications, Microsoft release with Office 2007 one OCR (Optical Character Recognition) API that's called MODI (Microsoft Office Document Imaging). It's important to remember that the API used in this sample is exclusive of Office 2007 (Office 2003 has its own OCR API).

In this article we'll create an windows application that uses the Office 2007 OCR API to generate OpenXML documents. In Addition we'll use the Speech Recognition API to improve the application User-Experience.

Before we start it's necessary that you already have the followinr requirements installed:

  • Visual Studio 2008
  • .NET Framework 3.5
  • OpenXML SDK 1.0
  • Office 2007

It's necessary that you have installed the Microsoft Office Document Imaging 12.0 Type Library. The Office 2007 installation setup doens't install this component by default, being necessary to install it later. To do this:

  • Run the Office 2007 installation setup
  • Click on the button Add or Remove Features
  • Make shure that the component is installed

Using the MODI

To use the Office 2007 OCR API, you have to add a reference to Microsoft Office Document Imaging 12.0 Type Library, to do this:

  • At Solution Explorer select Add Reference
  • At the COM tab select Microsoft Office Document Imaging 12.0 Type Library

Create a MODI object:

image

At the Form class constructor instantiate the MODI object:

image

After that you just have to implement the conversion method, let see how to do this:

image

The method OCRImplementation will convert image files (.tif, .jpg, .gif, .bmp, in this case we're using a TIFF file). The method Create of the md object receives the path of the file to be converted. The OCR method receives three parameters, the first on represents the language of the document, the second parameter specifies whether the OCR engine attempts to determine the orientation of the page and the third parameter specifies whether the OCR engine attempts to fix small angles of misalignment from the vertical.

To retrieve the text, it's necessary add references to the properties of the objects Image and Layout. The object Layout allow the text retrieval. The property Words of this object contains the property Count that allows the iteration through the list of words. You can retrieve the words using indexers, instead we're adding blank spaces between the words.

The method Close of the md object takes a boolean argument indicating whether to save changes to the image file.

 

Using OpenXML SDK

In the Solution Explorer add reference to the DocumentFormat.OpenXML library. This library allows the converted text becomes a word document. There's a constant object that will handle the structure and relationships of the document (It'll define the markup, in this case WordprocessingML).

image

The method CreateDocument is responsible for insert the text inside the document structure.

image

Speech Recognition

image

Add a reference to System.Speech at the .NET tab. After that you just have to adjust the Volume and Rate properties and use the method Speak to speak a string.

image

Conclusion

It is an interesting idea to combine these powerful APIs, the OCR implemented code is very short if compared with third-party APIs. It is a tool that can be explored in many ways and if integrated with the benefits of OpenXML and Speech Recognition improves your applications.

You can download the code here

 

See ya.

August 28

Creating OpenXML Documents with SDK 1.0

 

Hey folks,

Recently some guys asked me how to create word 2007 documents (.docx) without use Microsoft Office Word 2007. Thats a reason to do this post. It's basically an idea about how to create word documents with a custom application, you can improve this application adding new features such as bold, italics, colors, size configurations and so on.

Well lets begin our work:

First you need the OpenXML SDK 1.0 installed, if you don't have it you can download it here.

Let's assume that you already have the SDK installed, so lets begin.

At Visual Studio, create an Windows Forms application and name it as CustomTextEditor.

At Solution Explorer add a refence to DocumentFormat.OpenXML namespace (It's located at .NET tab).

Add a TextBox control on the windows forms. Set the Multiline property to True and the ScrollBars property  to Vertical.

Add a label and change the Text propety to Save Path.

At the right side of the label add a text box, and at the right side os this text box add a button. Set the button property Name to btn_Path and the Text property to ' ... '

Add three more button to the form:

  • Button1 -> Name Property: btn_Save / Text Property: &Save
  • Button2 -> Name Property: btn_Clear / Text Property: &Clear
  • Button3 -> Name Property: btn_Close / Text Property: &Close

Add a saveFileDialog to the the form.

You should have something like this:

image

Two clicks at each button to generate the events.

At the code window add the following code:

image

This constant is responsible for the definition of the document body (Relationships and markup).

Now we'll create a method that will create the document, this method receives a string as a paramenter, this string is basically the text written on the TextBox1.

First we have to do is to create an object to define the package. For this task we have the class WorprocessingDocument that's responsible for define the package that represents a Word document.

image 

As you can see, this class have a Create method, this method receives to parameters. The first parameter is the path that your document will be saved, the second parameter is a enum that will define the type of the document (Document, MacroEnabledDocument, MacroEnabledTemplate or Template).

After this we'll define the Main document part:

image

The property MainDocumentPart gets the ma in part of the WordprocessingDocument, The method AddMainDocumentPart will create the main document part and adds it to the document.

After this we'll create a string that will replace the tag #OLD_TEXT#. It will insert the text from the text box in the structure (It'll be formated within a WordprocessingML structure). To replace the text we'll use the method Replace of the String class.

image

After this we'll have to get the Stream of the document part, encoding it and insert the string that contains the structured text and save it.

image

The method Close is responsible for save and close the OpenXML Package.

Let's implement the button events:

The first event that we have to implement is the btn_Path event, this event is responsible for defines the name and the path of the file.

image

Implement the btn_Save event. This event will call the method CreateDocument.

image

Implement the other two events as the following code:

image

 

 

Well guys this is all. This is a simple application but it demonstrate a little bit of the OpenXML SDK power, you can extend your custom applications using this SDK.

 

See ya

 

Daniel Campos

Occupation
Location
Interests
There are no categories in use.
Thanks for your visit!
Please wait...
Sorry, the comment you entered is too long. Please shorten it.
You didn't enter anything. Please try again.
Sorry, we can't add your comment right now. Please try again later.
To add a comment, you need permission from your parent. Ask for permission
Your parent has turned off comments.
Sorry, we can't delete your comment right now. Please try again later.
You've exceeded the maximum number of comments that can be left in one day. Please try again in 24 hours.
Your account has had the ability to leave comments disabled because our systems indicate that you may be spamming other users. If you believe that your account has been disabled in error please contact Windows Live support.
Complete the security check below to finish leaving your comment.
The characters you type in the security check must match the characters in the picture or audio.