Purpose: This study examined the test-retest reliability of select measures of word-retrieval errors in narrative discourses of individuals with aphasia assessed with the AphasiaBank stimuli. Method: Ten participants with aphasia were video recorded during 2 sessions producing narratives elicited with pictures. Discourses were transcribed and coded using AphasiaBank procedures, then analyzed for the stability of rates of phonological errors, semantic errors, false starts, time fillers, and repetitions per minute. Values for correlation coefficients and the minimal detectable change score were used to assess stability for research and clinical decision making. Results: There was poor test-retest reliability when the discourses were analyzed by each narrative subgenre. When the narrative discourses were combined for analysis, several measures appeared to be sufficiently stable across sessions for use in group studies, and 1 could be adequately stable for making clinical decisions about an individual. Conclusions: Because the short speech samples yielded by the subgenre narrative analyses demonstrated poor test-retest reliability, it is recommended that all of the picturebased narrative discourse tasks be combined for analysis of word-retrieval impairments when the AphasiaBank stimuli are used. However, the confidence intervals associated with the reliability coefficients obtained in this study suggest caution in using the measures if they are based on performance in a single session. More investigations of the test-retest reliability of measures used to study language impairment in discourse contexts are essential.