RichText to Text conversion -- fast and free

aikimarkGet vaccinated; Social distance; Wear a mask
CERTIFIED EXPERT
Published:

Introduction

I've recently encountered several questions about converting RTF (Rich Text Format) to plain text. This article shows an easy, fast, and free class module that will provide this functionality in the VBA environment. With a little tweaking, the class can be used in the VBScript environment.
 

History

In the VB6 days, we could convert RTF to plain text using the RichText control. With the passing of VB6, this control is becoming very scarce.
 

The RTF2Text Class (code)

Since I have uploaded the class file, there is no need to copy/paste the code snippet. I have displayed the code so that I can comment on it.

  • Since there is no clipboard object in the VBA/VBScript environment, we use MSForms.Dataobject
  • We are going to let Word do the RTF-to-Text translation
  • The majority of the work is done when the TextRTF property is assigned a value
  • Testing revealed that Word was not very tolerant of some rtf, so I clear the text in the document before I paste as well as looking for, and dismissing, a dialog window that popped up
  • For best performance, you should limit your conversions to rows that actually have RTF data.
  • Word and DataObject private variables are instantiated and destroyed when the class begins and ends its life
  • The StrConv() function does not exist in the VBScript environment, but we can substitute the ADODB Stream object
  • Since Word tends to have extra paragraph marks at the end of the document, I strip them before returning the plain text.
Option Explicit
                      
                      Private oWd As Object
                      Private oDO As Object   'msforms.dataobject -- for clipboard access
                      Private strRTF As String
                      Private strPlaintext As String
                      
                      Private Sub Class_Initialize()
                          Set oWd = CreateObject("word.application")
                          oWd.DisplayAlerts = 0   '=wdAlertsNone
                          oWd.documents.Add
                          Set oDO = CreateObject("new:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
                      End Sub
                      
                      Private Sub Class_Terminate()
                          oWd.activedocument.Close False  'no save
                          oWd.Quit
                          Set oWd = Nothing
                          Set oDO = Nothing
                      End Sub
                      
                      Public Property Get TextRTF() As Variant
                          TextRTF = strRTF
                      End Property
                      
                      Public Property Let TextRTF(ByVal vNewValue As Variant)
                          Static oTS As Object
                          strRTF = vNewValue
                          On Error Resume Next
                          'Convert to byte array and place in clipboard
                          oDO.Clear
                          oDO.SetText StrConv(strRTF, vbFromUnicode), "Rich Text Format"
                          oDO.PutInClipboard
                          'clear out whatever is in the document
                          oWd.activedocument.Range.Text = vbNullString
                          'Paste clipboard contents into Word object
                          oWd.activedocument.Range.Paste
                          If Err = 0 Then
                          Else
                              Err.Clear
                              AppActivate "Microsoft Office Word"
                              If Err = 0 Then
                                  SendKeys "{Enter}", True
                              End If
                              Err.Clear
                          End If
                          'Get the plain text
                          strPlaintext = oWd.activedocument.Range.Text
                          
                      End Property
                      
                      Public Property Get Text() As Variant
                          Dim boolFoundOne As Boolean
                          'remove trailing Word paragraph marks or CrLf before returning plain text
                          Do
                              boolFoundOne = False
                              
                              Do While Right(strPlaintext, 1) = vbCr
                                  strPlaintext = Left(strPlaintext, Len(strPlaintext) - 1)
                                  boolFoundOne = True
                              Loop
                              'remove any CrLf character pairs
                              Do While Right(strPlaintext, 2) = vbCrLf
                                  strPlaintext = Left(strPlaintext, Len(strPlaintext) - 2)
                                  boolFoundOne = True
                              Loop
                          
                          Loop While boolFoundOne
                          
                          Text = strPlaintext
                      End Property
                      
                      'Public Property Let Text(ByVal vNewValue As Variant)
                      '   Text is a read-only property
                      'End Property

Open in new window

Note: It is also possible to do this conversion without using the clipboard. In that version of the class, the RTF text was written to a temporary file and then opened with the Word object, interpreting the temporary file content as RTF. While this does work, each conversion takes a couple of seconds. When faced with many thousands of database records to update, I felt that a non-I/O solution would be best. I generally discourage the use of the clipboard because it will interfere with the user's regular work if it involves any copy/paste operations.
 
 

Using the class in VBScript environment

ADODB Stream object reference:
http://msdn.microsoft.com/en-us/library/windows/desktop/ms675032(v=vs.85).aspx You can substitute the following StringToByteArray() function for the StrConv() function in the class.
Function StringToByteArray(parmString)
                      	Dim OStream
                      	Set oStream = CreateObject("ADODB.Stream")
                      	oStream.Open
                      	oStream.Type = 1		'=adTypeBinary
                      	oStream.WriteText parmString
                      	StringToByteArray = oStream.Read
                      End Function

Open in new window


Using the class

Here is an example of the class used in a VBA routine.

Public Sub testRTF_class()
                          Dim oRTF As New clsRTF2Text
                          Dim strRTF As String, strResult As String
                          strRTF = "{\rtf1\ansi\ansicpg1252\deff0\deflang5129{\fonttbl{\f0\fnil\fcharset0 Tahoma;}}\viewkind4\uc1\pard\f0\fs17\par}"
                          oRTF.TextRTF = strRTF
                          strResult = oRTF.Text
                          Debug.Print strResult
                      End Sub

Open in new window

I am writing this article while solving this open question with this class.
 

Performance Test

I like to include some performance testing with applicable articles. This will show you that the slowest activity is when the class is instantiated, because we have to start an instance of Word. Once instantiated, the individual RTF-to-Text conversions are very very fast.

Public Sub testRTF_Perf()
                          Dim oRTF As clsRTF2Text
                          Dim sngstart As Single, strResult As String, lngLoop As Long
                          sngstart = Timer
                          Set oRTF = New clsRTF2Text
                          Debug.Print "Class instantiation", "Elapsed: " & Timer - sngstart
                          For lngLoop = 1 To 5
                              sngstart = Timer
                              oRTF.TextRTF = "{\rtf1\ansi\ansicpg1252\deff0\deflang1033{\fonttbl{\f0\fnil\fcharset0 MS Sans Serif;}}" & _
                                              "{\colortbl ;\red0\green0\blue0;}" & _
                                              "\viewkind4\uc1\pard\cf1\f0\fs20 6/26/07 Number was busy each time tried, LM at home number, but customer never called back to confirm address for returned catalog; confirm if reorder " & lngLoop & "\cf0\fs17" & _
                                              "\par }"
                              Debug.Print lngLoop, Timer, "Assign TextRTF value", Timer - sngstart
                              sngstart = Timer
                              strResult = oRTF.Text
                              Debug.Print lngLoop, Timer, "Retrieve plaintext value", Timer - sngstart
                              Debug.Print , strResult
                          Next
                          Set oRTF = Nothing
                      End Sub

Open in new window


Here are my performance figures running on my laptop. If you need more exact timing events, you will need to use a more granular timer,  such as the QueryPerformanceCounter API.
Event            	Elapsed (sec)
                      Class instantiation	2.34375
                      (1)Assign TextRTF value	0.3984375
                      (1)Retrieve plaintext	0
                      (2)Assign TextRTF value	0.0078125
                      (2)Retrieve plaintext	0
                      (3)Assign TextRTF value	0.0078125
                      (3)Retrieve plaintext	0
                      (4)Assign TextRTF value	0.0078125
                      (4)Retrieve plaintext	0
                      (5)Assign TextRTF value	0.0078125
                      (5)Retrieve plaintext	0

Open in new window

The original poster in the question reported the ability to process 20 seconds for 10K records (500/second). This performance measurement includes the row retrieval and update time.


Writing Your Query

As noted earlier, you will get better performance if you only try to convert fields that actually contain RTF data.  You should avoid Null values and the field should resemble welformed RTF text.

Where (RTF_fieldname Is Not Null) and (RTF_fieldname Like "{\rtf*}")

Open in new window


The Class file:

To make it easier for you to incorporate this functionality into your application, just import this class file into your VBProject.
clsRTF2Text.cls

=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
If you liked this article and want to see more from this author, please click here.
 
If you found this article helpful, please click the Yes button near the:
 
      Was this article helpful?
 
label that is just below and to the right of this text.   Thanks!
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
1
9,133 Views
aikimarkGet vaccinated; Social distance; Wear a mask
CERTIFIED EXPERT

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.