The solution was a combination of the suggestions: 1) don't use ranges that are updated frequently; 2) use a single string variable and search that.
First, here is a generic example of one line of my data. The structure is the same for each line. The document is about 50 lines with different keywords. I need the groups that have the word DATA in them:
XXXXXXX DATA1XXXXX XXXXXXXXXX KEYWORD XXXXX XXXX DATA2XX XXXX XXXXXXX
The first attempt to increase speed was to read each paragraph of the document into an array element. Each paragraph was one line of data for me due to document structure. I then searched each array element for the keyword and used character position and data length to extract data. This worked, and was faster the searching in the document, but it was still slow. Interestingly, the processing time per file varied wildly, between 10 seconds to over 2 minutes. I put some time tracking code in, and on average it was about 25% reading the document into the array, 75% extracting data, but this percentage also varied wildly. At one point I noticed in Task Manager that Word was using a lot of processor, so I thought I would close the document after reading it into the array which might speed things up. For reasons I don’t understand, when I implemented, the array data disappeared as soon as I closed the document. It was if the array existed in Word, even though the code was running in Excel.
The final code edit reads the entire document into a single string variable. The code then searches the string for the keyword and finds the character position where the keyword starts. It then uses a function to extract the data elements I need based off a relative character position from the keyword start and the data length (so if my data element starts 20 characters back from keyword start, the function is passed -20 as the relative position).
This method does the data extraction in less than 1 second. The speed limiter now is opening and closing each file. In total, I went from averaging about 1.5 minutes per file to about 10 seconds.
Sub ProcessWithString()
Dim WordApp As Object, WordDoc As Object
Dim strFile As String
Dim i As Integer
Dim shtData As Worksheet, shtTemp As Worksheet, shtYear As Worksheet
Dim strWholeDoc As String
Application.EnableEvents = False
Application.Calculation = xlCalculationManual
'Word session creation
Set WordApp = CreateObject("Word.Application")
'true to see it, false to not
WordApp.Visible = True
Set shtYear = Sheets("2024")
Set shtTemp = Sheets("temp data")
Set shtData = Worksheets("Data")
k = shtData.Cells(100000, 1).End(xlUp).Row + 1 'First row to output
iLastFile = shtYear.Range("A1").End(xlDown).Row
For i = 2 To iLastFile
'Fully pathed file name
strFile = shtYear.Cells(i, 1).Value
'open the .doc file
Application.DisplayAlerts = False
Set WordDoc = WordApp.Documents.Open(strFile)
Application.Wait (Now + TimeValue("0:00:2"))
'read entire document into string variable
WordApp.Selection.WholeStory
strWholeDoc = WordApp.Selection
With shtData
'function GetDataElement arguments: 'string of entire document, Search term, data relative start pos, data length
.Range("Z" & k) = GetDataElement(strWholeDoc, "KEYWORD", -26, 10) 'data 1
.Range("AA" & k) = GetDataElement(strWholeDoc, "KEYWORD", 51, 7) 'data2
'repeat using the function for each keyword
End With
nexti:
k = k + 1
Application.StatusBar = i & " of " & iLastFile
shtData.Activate
WordDoc.Close (wdDoNotSaveChanges) 'Leaves app open
Set WordDoc = Nothing
Next i
WordApp.Quit
Set WordApp = Nothing
Application.EnableEvents = True
Application.Calculation = xlCalculationAutomatic
End Sub
Function GetDataElement(strWholeDoc, strSearchTerm, iRelStartLoc, iLength)
iKeyStart = InStr(strWholeDoc, strSearchTerm)
If iKeyStart <> 0 Then
GetDataElement = Mid(strWholeDoc, iKeyStart + iRelStartLoc, iLength)
Else
GetDataElement = "Not found"
End If