java stringtokenizer 計每個單詞第一個字母 15點

2008-05-18 7:00 pm
我想計算每個英文單詞既第一個letter, 不過有時一行晤夠位既時候就用hyphen (-) 黎代錶連住下一行.
例如:
THE QUICK
BROWN FOX JUMP-
ED OVER THE DOG.

個程式就會得:
T = 2
Q = 1
B = 1
F = 1
J = 1
E != 1 -- recognize that it was hyphenated, not the start of a word.
O = 1
D = 1

我試過用 StringTokenizer st = new StringTokenizer(text, "-."); 不過晤得
究竟應該點寫先啱?

回答 (1)

2008-05-18 8:07 pm
✔ 最佳答案
It is an interesting problem, ...with the text.

How do you take care of 'legal' hyphens which appear in the middle of a line?

If you consider them as separate words, then you have a problem, because you program will not be able to tell if the hyphen at the end of a line is between one or two words... an ambiguity of the English language punctuation.

However, if you consider hyphenated words as a single word, then all you have to do is to first remove hyphens and the surrounding blanks/end-of-line/tabs/line-feeds, etc.

Your StringTokenizer delimeter was defined as "-.", which I believe should very well be at least " .,-", i.e. including a space.

The API documentation recommends the use of the String.split method as it allows the use of regular expressions, which for example, can be
String[] myTokens=myText.split("[^a-zA-Z]");
meaning anything other than letters will be treated as tokens, including tabs, all punctuation, special characters, line-feeds and carriage returns.

If you need more information, feel free to PM me.

2008-05-19 10:16:23 補充:
Using the additional requirement (StringTokenizer and not String.split()), your program has been modified slightly to account for upper and lower cases.
You will find the description of the question and the code at the following link:
http://mathmate.brinkster.net/programming.htm


收錄日期: 2021-04-13 15:34:28
原文連結 [永久失效]:
https://hk.answers.yahoo.com/question/index?qid=20080518000051KK00767

檢視 Wayback Machine 備份