Pairing general purpose computer programs to analyze qualitative data:
An illustration based on MS Word® and FileMaker Pro®

 

FROM: Cultural Anthropology Methods 10 (1), February, 1998, p. 4-9

Jon Wagner
Division of Education
University of California, Davis

Specialized computer programs for qualitative data analysis (QDA) are increasingly popular among anthropologists and sociologists, and for good reason: Good programs are available; they have features that can assist researchers in organizing, labeling, coding, analyzing and displaying data; enthusiastic colleagues can be found who know how to use them; and increasingly thoughtful software reviews make informed choice among programs a real possibility (see, for example: Borgatti, 1989; Buchignani, 1996; Conrad & Reinharz, 1984; Schensul, 1993; Seidel & Clark, 1984; Tesch, 1990; Tesch, 1993a; Tesch, 1993b; Weitzman & Miles, 1995).

However, enthusiasm for specialized QDA software can keep researchers from noticing the increased QDA potential of general purpose computer programs, some of which they may already be using. To illustrate this point, I will describe a strategy for linking a standard word processing program (Microsoft Word or "Word") with a standard data-base program (FileMaker Pro or "FileMaker"). Both programs are common in academic and business environments, both run on the three most popular personal computing platforms (Macintosh, Windows and DOS), and both are relatively inexpensive at educational and institutional discounts -- about $100 each for recent versions.

The Word/FileMaker pair provides excellent support for "code and retrieve" QDA functions (Weitzman & Miles, 1995). Used together, these two programs can help you organize, label, sort, keep track of, retrieve and display -- in whatever way you want, on a computer screen or a printed page -- specific chunks of text that you determine to be interesting, illuminating, or significant for one reason or another. These programs also support some forms of "text retrieving" and "text managing" (Weitzman & Miles, 1995), and you can create "buttons" and "scripts" in FileMaker to perform a wide range of functions, including finding and displaying text, graphics, audio and video documents from other programs.

Other word-processing programs generating text that can be copied through the operating system -- most do so -- may work equally well. FileMaker is valuable for allowing very large blocks of text (up to 64K, or 15-20 pages) or graphics to be loaded into a data-base field within an individual record, and it also offers researchers great flexibility in defining and displaying data-base fields and field values -- including text chunks and graphics. But other data base programs with similar features will also work just fine.

The rationale for this approach

However, two developments in the last few years have increased substantially the QDA value of using word processors and data-base programs together: First, computer operating systems now allow different software programs to be open and running simultaneously: documents can be moved back and forth between different programs as if they were being moved between different components or files within the same program. Second, current data-base programs are much better at handling text and graphics than were their predecessors. Indeed, general purpose data-base programs such as FileMaker Pro can now be used to label, organize, code, sort, and search graphic artifacts and text blocks of varied lengths as individual data-base "records."

Applications context and overview

My own field research on the culture and social organization of schooling has drawn on diverse data sources: structured and open-ended interviews, surveys, field notes based on direct observations, and formal and informal documents prepared by research subjects themselves (e.g., examples of student work, agenda for teachers’ staff development meetings, school "mission statements," etc.). The Word-FileMaker pairing works well with data of this sort. It also has proved useful to graduate student and faculty colleagues conducting research on different topics in different field settings and who report their research for different audiences.

Primary and secondary data sets

I work usually with two related but distinct data sets. The primary data set consists of text files prepared in MS Word; mostly original field notes and transcripts of interviews and meetings. The secondary data set consists of chunks of these Word text files that have been copied into a FileMaker database, coded and annotated.

The use of two data sets differs from approaches that lead to a single set of materials in which all coding and commentary is fully embedded, such as The Ethnograph or Folio Views. In reading full interviews and field notes I like to work with "full pages" -- on paper or a full-page monitor -- that are clear and clean, free from code words and commentary, and formatted with a legible font and a fair amount of space between and around the lines. This format makes it easier for me to follow related speech acts, discursive ruminations, and narrative accounts of episodes and events. The most efficient way for me to prepare text in this format is with a full featured word-processor.

Designing a template for the secondary data set

Starting with a primary data set of clean transcripts and field notes as MS Word files, I create a data base in FileMaker to hold three kinds of information: (1) important chunks of text taken from the primary data documents; (2) context descriptors for the text chunk, including where it came from, who said it and when, etc.; and (3) thematic codes that I assign to the text chunk for further analysis.

In FileMaker Pro the first step down this path is to select the "new file" command from the "File" pull down menu, and assign the "new file" a name. FileMaker then creates a new data-base file and presents you with a "Define Fields" window. At this point I can type in the names of the fields I want to include and be done with it, though it helps to specify what kinds of fields these are by choosing from the categories that FileMaker offers (e.g., text, date, number, etc.).

In defining a data base for QDA, I create several fields for contextual information (e.g. time, date, site, etc.), a "text chunk" field, one or more fields for attaching code words to the text chunk, and a field for comments. If I select "done" at this point, FileMaker will create a default data-entry template, or "layout," that could be used immediately to enter and code text chunks. In practice, I revise the default format by switching to the "layout mode" and moving and re-sizing field data entry blanks to correspond to the information they will contain (see Figure 1).

Coding primary chunks in the secondary data set

With both Word and FileMaker "open," and with their respective windows partially overlapped, I read a field note or interview transcript as it appears in MS Word, use the mouse or key commands to select a chunk of text -- all or part of a word, sentence, paragraph or longer passage -- and then copy and paste the selected text through the operating system (e.g., "clipboard") into the "text chunk" field of the open FileMaker record. I then enter information in the contextual fields, assign one or more thematic codes to the text chunk, add comments, and so on.

Figure 1
Revised data entry format for sample data base

I assign code words to text chunks in a "main theme" field," and, in some cases, in fields labeled "sub theme," and "related theme." Code words come from a pre-determined list or reflect a new or "just interesting" tag created on the fly. For pre-determined code lists, FileMaker can display check boxes and pull-down menus that simplify the task of assigning codes to selected chunks (i.e., you can check a box or paste from a displayed list rather than having to type). FileMaker also can display an alpha-sorted list of all entries already made into a field. Code words can be pasted or selected directly from this open-ended "index" -- a great help in exploratory code building and in ensuring that entered code words are spelled consistently.

Research questions and concepts define a working list of things to look for, but I also code whatever text chunks seem interesting, confusing, significant, etc. -- a practice that combines what Strauss (1987) refers to as "open coding" and "selective coding". Usually, I end up with three different kinds of codes: (1) "common sense" categories that I want to keep track of and may want to de-construct or otherwise critically evaluate in light of other information (e.g., classroom instruction, assessment, administration, curriculum, etc.); (2) "in vivo" categories (Strauss, 1987) defined by words or phrases that appear significant to research subjects themselves; and (3) terms linked directly to research concepts that are neither commonsensical nor used by research subjects.

After coding the first chunk imported from an open MS Word file, and before I leave FileMaker, I select the "duplicate record" command. This creates a new record in the data base file that looks exactly like the record I just created, obviating the need to re-enter context data for the next and subsequent chunks I select from that primary data document. Then I click back to the open Word file, read and review, select a new chunk to code, and copy and paste that into the open record in FileMaker, where the new text chunk overwrites the old. I then code or annotate the new chunk, create a duplicate record once again, and move back to Word for more reading and review. The process is repeated until I’ve finished coding that primary data document, then I move on to the next.

Sorting and searching

Once text chunks from the primary data set are entered and coded in FileMaker, they constitute individual records in a full-fledged data base that supports a wide range of searching, sorting and display functions. In response to a "sort" command, File Maker displays a dialog window in which you can select one or more fields for sorting records, assign priorities among the fields you select, and specify an ascending or descending order each field. Stratified sorting is supported fully. In one operation, for example, you can sort by coded theme, within themes by field site, and within sites by date. This will put all the text chunks in the data base in chronological order for each field site for each coded theme.

Sorting is initiated from File Maker’s "browse" or "preview" modes, but search functions are launched through a separate "find mode." Search instructions are specified by selecting fields on a "blank" record that appears in the "find mode" and entering values within each field selected. Target values can be "exact matches," "included" text strings, or, for numerical values, ranges and arithmetic relationships (e.g., "less than," "greater than or equal to," etc.). As one example, a single "find" request within the data-base displayed in Figure 1 might specify "teacher" for position, "Turin HS" for site, and "96/10/15 . . . 97/10/15" for source date. Executing this request will bring records meeting all these criteria (this is a Boolean "AND" operation) into the foreground as a "found set" and will temporarily "hide" all other records. The "found set" can then be sorted, searched by other criteria, displayed on screen or printed. All records in the data base can be restored to the "found set" at any time with a key command.

In contrast to the "AND" operations of a single find request with target values in more than one field, "OR" operations are specified by several find requests launched simultaneously. Through a second "find" request I could add one or more additional schools to "Turin HS," add another range for source dates, etc. In both "AND" and "OR" search requests, a check box also can be used to specify that records be "omitted" rather than found. I could search more narrowly within the examples noted above by adding a request to "omit" all records coded for "assessment," all records in which the text chunk contained the word "standardized testing," etc.

Displaying and reviewing coded text chunks

The default layout that FileMaker creates for a data base is easy to revise, and new layouts can be created easily from scratch with the program’s "layout mode." Layout editing tools can be used to hide or accentuate field names or field values or to change where fields appear on the page; change font types, styles and sizes, the size of the window in which each field value appears, etc. Moreover, you can prepare as many different layouts for each data base as you want. Records displayed in each layout are drawn from the complete data base, but you control which specific fields are displayed and how they are organized on the screen or printed by designing different "layout" templates.

This flexibility of FileMaker layouts -- one consequence of the program’s development as a general applications program -- is an extremely valuable tool for exploratory data analysis. For example, I can design one layout in which a main code word is prominent, but contextual information about the chunk is abbreviated and attenuated. This might be a useful for what Strauss (1987) calls "axial coding" of a particular concept or coding category. To encourage analysis of the chunk in terms of information about the social location of the person to whom the text is attributed, I can design and use another layout in which context fields are prominent. If I have a specific question to investigate that involves comparisons over time, between sites, actors, etc., I can create a layout that displays several text chunks side-by-side, or vertically, arranged along the target dimension.

Figure 2 presents a multi-record display of a teacher’s commentary about assessment and evaluation at three different times. Viewed in this order, the three excerpts reflect an evolving perspective on assessment issues -- from generic questions about evaluation, to collective efforts to design an assessment framework, to the collective application of this framework in examining student work. These excerpts alone are inadequate to document this change, but they reveal a pattern worth pursuing further.

As these comments suggest, how coded text chunks are displayed can encourage or discourage different kinds of analysis. Indeed, flexibility in displaying text data is an important resource for qualitative data exploration and analysis (programs that can generate only one kind of text display are limited in that regard). One consequence of this is that researchers themselves need to be involved directly in decisions about how data

Figure 2
"Assessment" Comments from "Irene Zelia" (a pseudonym) at Three Points in Time

 

Assessment

Source Date

5/13/92

Interview Trans

Record # 0103

Turin HS

Teacher

Irene Zelia

Doc/File Name

TR92-05-13.IN Zelia

 

I'm interested in what we're doing at Turin. I'm interested in changes in education. But I don't know very much about how to go about evaluating that stuff. In fact, I think evaluation in general sort of mystifies me, and mystifies a lot of people, but you know I want to know more about how to evaluate my kids' performance and how to evaluate my program's performance, how do I evaluate what we're doing compared to what other people are doing. How can you say this is working well? So, I don't know. But those are the kinds of things that have been running across my mind for the last six or eight months and I'm feel inspired to do something.

 

Assessment

Source Date

5/17/93

Interview Trans

Record # 0254

Turin HS

Teacher

Irene Zelia

Doc/File Name

TR93-05-17.IN Zelia

 

In terms of assessing programs, we have developed a matrix in one of our other committees, called Monitoring and Evaluation that is sort of a multiple choice. matrix. We brainstormed all the possible measures of a program… attendance, discipline, all the different qualitative… student reports, teacher reports. We looked at all those different ways, and then asked each program to evaluate itself on any of those ways. We make no requirement whatsoever, portfolios, whatever, the whole gamut.

 

Assessment

Source Date

5/20/95

Meeting Trans

Record # 0529

Turin HS

Teacher

Irene Zelia

Doc/File Name

TR95-05-20.MT

There's I think the thing we're struggling with this year around assessment is that everybody's A’s aren't the same. And everybody's B’s aren't the same. And that -- or everybody's 1’s aren't the same. And that we all got together and graded our senior project papers together, 15 teachers, not all of whom were senior teachers, and we didn't read our own kid's papers and we did all the stuff that you would do to kind of norm. And still there were these huge discrepancies. Still people weren't able to put aside their own grading practices and the rubric wasn't adequate to the task so again we'll have to go back and look at the rubric and try to describe what it is about a paper that makes it a good paper, and what it is about a paper that makes it a medium paper, and what it is about a paper that makes it a not acceptable paper. That's been the thing this year for senior project. That we now know how to organize people into groups. We know how to do the process. But the assessment part is just not satisfactory and we're mad at each other.

will be displayed, either by working closely with technical staff who design data display

options or by designing them on their own. And, hardware as well as software plays a role. Three chunks from Figure 2 can appear on the same screen in a full-page monitor, but only one or two on a smaller monitor.

Other considerations and features

Revising the data base: As you work with a FileMaker data base you can add new fields or delete old ones; copy already-coded chunks or their parts and re-code them; edit text, field names and values; and, as noted above, reconfigure how records are displayed on the screen or in paper print outs. You also can add new records at any time, including records constructed out of text already coded, research comments etc., and you can copy and save results of searches, sorts, etc. as a new database.

Structured and unstructured data: You can import into FileMaker tab-delimited text or information from another data base. You set the correspondence between imported and existing fields within a dialogue box, then import the new data with a single command. This can be particularly valuable when turning transcripts of survey responses into a data base -- i.e., each question becomes a field and each subject/respondent a record.

Multiple coding of the same text chunk: An easy way to go is to make a "duplicate" of the record and assign a new code word and, if necessary, new comments. This preserves all sorting and searching functions by code word and makes it easy to export data as tab-delimited text. However, it does make determining whether a specific chunk has been tagged with more than one code word a two-step process (i.e., you search by the "text chunk" field and paste in a character string from the specified chunk to see if more than one record can be found).

Multiple code words also can be assigned within the same field, as follows: In "layout mode" enlarge the "theme" data entry box and re-position it to run vertically, instead of horizontally, so that several code words can be seen at the same time. Then specify "pull-down menu" as the "field type" for that field. Selecting the field in "browse mode" will open the menu and let you to select a single code word. Selecting with the "shift key" down, you can select more than one code word from the menu. This approach, suggested by Eben Weitzman, makes it easy to see at a glance all code words assigned to a specific chunk, and it also works very well when searching (you can even make "OR" searches for code words within a single "find request"). This approach does complicate sorting by code (i.e., sort functions will focus on the first letter of the first word listed in the field and code words will be ignored), and it also makes it hard to export data as tab-delimited text files that can be read by other data bases. However, in most cases, the attractions of this approach will outweigh these shortcomings.

"Automatic" searching: FileMaker will "index" text within any field in the FileMaker data base. This allows you to search for text strings within specified fields across all records in the data base. For example, you can find every text chunk in the data-base that had the word "principal" in it, or that contained the phrase "that depends on." If you also copy each primary data file as a whole into a the data base, FileMaker can find all records (i.e. primary data files) that include the string you are looking for.

Scripts: FileMaker has a powerful and easy to use "scripting" language that allows you to automate or repeat routine or complex command sequences, including instructions to find data (text, graphics, and/or audio and video records) in other files and programs.

Frequency counts: FileMaker automatically reports the number of records in the "found set" resulting from a search. Frequencies can also be produced as summary statements for specified fields in printed output. For more complex quantitative analyses, all or part of the data-base can be exported as a tab-delimited text file into a statistical analysis package (I’ve been using JMP 3.0). If you have entered only one code per field (as noted above), the statistics software can generate frequency distributions and correlations for different code values. Frequency tables also are useful in checking the distribution of coded chunks by original data source, reviewing consistency among coding categories, etc. (If you are so disposed, you also can use these numerical indices to test frequency-based hypotheses about the data set you are working with).

Finding primary documents from a secondary chunk: Locating the primary Word document from a text chunk in the FileMaker data set is a two step process: Include the primary data file name as an entry in one of the "context fields" in the FileMaker record. You can then copy and paste the file name into an operating system "find" request to locate your original file quickly, wherever it is on your computer, supplemental drive, or floppy disk. (Macs and PC’s with Windows or DOS differ a bit from each other in these details). To find a particular text chunk within the same primary data file, use the MS Word "find" command: copy and paste the first few words of the chunk and the program will take you to where that string appears in the original document.

Comparisons with dedicated QDA programs

The Word-FileMaker pair is a powerful code-and-retrieve program combination that provides great flexibility in the following: specifying the size and composition of text chunks to be coded; displaying text, context and codes on screen or in printed output; and designing and revising data-base elements and structures. This program pair also offers a full range of data-base sort and search functions and well-developed protocols for importing, exporting and linking data with other programs.

Word-FileMaker does not support the kind of "conceptual network building" that can be pursued through specialized programs like SemNet or Inspiration (Weitzman & Miles, 1995). Nor will combining Word and FileMaker help you to generate a code-embedded version of original fields notes or transcripts -- if this is important, consider a program that provides code embedding either on the screen (e.g., FolioViews) or in a printed copy (e.g. The Ethnograph).

Three other limitations and qualifications also are worth noting: First, in contrast to some other programs (e.g., NUD*IST and The Ethnograph), FileMaker will not identify "nested" chunks (i.e., chunks coded within other coded chunks). Second, Word-FileMaker neither displays nor recognizes the hierarchical code relationships built into a program like NUD*IST. You can review codes and specify relationships between them with text comments and labels, but these labels have no logical properties relative to FileMaker searches and sorts (logical relationships between codes are specified by embedding them in the search instructions themselves). And third, Word-FileMaker arrangements do not provide the "fully automated" coding functions available in some dedicated QDA programs (e.g., automatically searching primary documents for specified character strings, selecting a block of text around the string and copying or concatenating all found text blocks into a new data file). These specific strengths, shortcomings, and qualifications noted above are more important to some research applications than to others.

On balance, the Word-FileMaker pairing illustrates that general purpose word processing and data base programs, when used together, can provide for many researchers a viable alternative to dedicated QDA programs. As an added benefit, by building on general applications programs already familiar to research faculty, students, and staff, arrangements of this sort may require less than dedicated QDA programs in the way of start-up costs, training and technical support.

Notes

References

Bernard, H. R., & Evans, M. J. (1983). New microcomputer techniques for anthropologists. Human Organization, 42(2), 182-185.

Borgatti, S. (1989). Using ANTHROPAC to investigate a cultural domain. Cultural Anthropology Methods, 1(2).


Buchignani, N. (1996). Using Folio Views for qualitative data analysis. Cultural Anthropology Methods, 8(2).


Carney, J. H., Joiner, J. F., & Tragou, H. (1997). Categorizing, coding, and manipulating qualitative data using the WordPerfect®+ word processor. The Qualitative Report, 3(1), http://www.nova.edu/ssss/QR/QR3-1/carney.html.

Cohen, A. (1995). Using word-processing software for analysis of fieldnotes and other research materials. Cultural Anthropology Methods, 7(1).

Conrad, P., & Reinharz, S. (1984). Computers and qualitative data [Special issue]. Qualitative Sociology, 7.


Gillespie, G. W. J. (1986). Using word processor macros for computer-assisted qualitative analysis. Qualitative Sociology, 9(3), 283-292.


Hong, L. K. (1984). List processing free responses: Analysis of open-ended questions with word processors. Qualitative Sociology, 7(1 & 2), 98-109.


Ryan, G. (1993). Using WordPerfect macros to handle fieldnotes. Cultural Anthropology Methods, 5(1).


Schensul, S. (1993). Using ETHNOGRAPH to build a survey instrument. Cultural Anthropology Methods, 5(2).


Seidel, J. V., & Clark, J. A. (1984). The ETHNOGRAPH: A computer program for the analysis of qualitative data. Qualitative Sociology, 7(1 & 2), 110-125.

Strauss, A. L. (1987). Qualitative analysis for social scientists. Cambridge: Cambridge University Press.

Tesch, R. (1990). Qualitative research: Analysis types and software tools. New York: Falmer.
Tesch, R. (1993a). Personal computers and qualitative data [Special issue]. Qualitative Sociology, 14(3,4).

Tesch, R. (1993b). Personal computers in qualitative research. In M. D. LeCompte & J. Preissle (Eds.), Ethnography and qualitative design in educational research, (pp. 279-314). San Diego, CA: Academic Press.


Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CN: Graphics Press.


Tufte, E. R. (1990). Envisioning information. Cheshire CN: Graphics Press.


Weitzman, E. A., & Miles, M. B. (1995). Computer programs for qualitative data analysis: A software sourcebook. Thousand Oaks, CA: SAGE.