Augmenting API Documentation with Insights from Stack Overflow -- Online Appendix
Coding Guide for "Meaningful Sentences"
The following kinds of sentences should be rated as not being meaningful on their own:
- The sentence contains an explicit reference to another sentence.
Example: See the next step if you need a mutable list. - The sentence contains references to "it", "this", "that", etc., which are not resolved within the sentence.
Example: It's a good way to produce a [n][3]-Matrix. - The sentence is a question.
Example: Why Guava? - The sentence is prefacing a code snippet (often indicated by a colon).
Example: You could create a factory method: - The sentence is grammatically incomplete.
Example: So yes, ArrayList. - The sentence contains communication between Stack Overflow users.
Example: Thanks to the comments I have to update my answer. - The sentence references code elements that come from user examples rather than the API.
Example: You might be surprised to find out that in your sample code, s1 == s2 returns true! - The sentence references specific Stack Overflow users.
Example: Like EJP said the key part is to save in a byte array. - The sentence only contains a link.
Example: http://www.jmagick.org/index.html. - The sentence contains a reference to something that is not an obvious part of the API ("block" in the example)
Example: The block size is parameterized for run-time performance optimization. - The sentences starts with "but", "and", "or" etc.
Example: And then another instance of string with content Hello world. - The sentence is a generic statement that is unrelated to the API type.
Example: Most random number generators are, in fact, pseudo random. - The sentence contains a comparison that is incomplete (i.e., one part of the comparison is missing).
Example: char[] is less vulnerable. - The sentence resulted from a parsing error.
Example: Because the java.io. - The sentence requires another sentence to be complete.
Example: First, you have to know the encoding of string that you want to convert. - The sentence contains an explicit reference to a piece of context that's missing.
Example: What you're trying to do here is rather unusual, to say the least.
Other sentences are considered meaningful.
Regular Expressions for the Detection of Code Elements
- [A-Z][a-zA-Z]+ ?<[A-Z][a-zA-Z]*>
- [a-zA-Z0-9\.]+[(][a-zA-Z_,\.]*[)]
- (https?://)?[a-zA-Z_\\-/]{2,}(\.[a-zA-Z_0-9\\-]{2,})+[^\s\<\>{\(\),'\"”’}:]*
- ([\.]?[/]?\w+\.\w+\.?\w+(?:\.\w+)*)
- [A-Za-z]+\.[A-Z]+
- [@][a-zA-Z]+
- (?:\s|^)([a-zA-z]{3,}\.[A-Za-z]+_[a-zA-Z_]+)
- \b([A-Z]{2,})\b
- (?:\s|^)([A-Z]+_[A-Z0-9_]+)
- (?:\s|^)([a-z]+_[a-z0-9_]+)
- \w{3,}:\w+[a-zA-Z0-9:]*
- (?:\s|^)([A-Z]+[a-z0-9]+[A-Z][a-z0-9]+\w*)(\s|\.\s|\.$|$|,\s)
- (?:\s|^)([A-Z]{3,}[a-z0-9]{2,}\w*)(\s|\.\s|\.$|$|,\s)
- (?:\s|^)([a-z0-9]+[A-Z]+\w*)(\s|\.\s|\.$|$|,\s)
- (?:\s|^)(\w+\([^)]*\))(\s|\.\s|\.$|$|,\s)
- ([A-Z][a-z]+[A-Z][a-zA-Z]+)(\s|,|\.|\))
- (?:\s|^)([a-z]+[A-Z][a-zA-Z]+)(\s|,|\.|\))
- (?:\s|^)([A-Z]+[a-z0-9]+[A-Z][a-z0-9]+\w*)(\s|\.\s|\.$|$|,\s)
- (?:\s|^)([A-Z]{3,}[a-z0-9]{2,}\w*)(\s|\.\s|\.$|$|,\s)
- (?:\s|^)([a-z0-9]+[A-Z]+\w*)(\s|\.\s|\.$|$|,\s)
- (?:\s|^)(\w+\([^)]*\))(\s|\.\s|\.$|$|,\s)
- ([A-Z][a-z]+[A-Z][a-zA-Z]+)(\s|,|\.|\))
- ([a-z]+[A-Z][a-zA-Z]+)(\s|,|\.|\))
- ([a-z] )([A-Z][a-z]{3,11})( )
- </?[a-zA-Z0-9 ]+>
- \{\{[^\}]*\}\}
- \{\%[^\%]*\%\}
- /[^/]*/
- ‘[^’]*’
- __[^_]*__
- \$[A-Za-z\_]+