Tagging of English and Cantonese data
Grammatical categoriesThe grammatical category labels for the English corpus are based on the MOR grammars for English in the CHILDES Windows Tools while those for the Cantonese corpus are based on those of Cancorp (Lee et. al 1996) with thirty-three categories distinguished, as shown in Table 1 (see MacWhinney 2000:364-365). These are as used in Cancorp apart from the following modifications: (i) the category 'particle' (prt) rather than 'clitic' is used for the postverbal modal dak1 and postverbal dou3 introducing an extent complement; (ii) the category 'localizer' (loc) is used for locative expressions such as dou6 as in zoeng1 toi2 dou6 '(lit.) the table there' as well as for expressions such as haa6bin6 'down there' which are tagged as locative noun phrases (nnloc) in Cancorp. (iii) the category 'onomatopoeic expression' (onoma) is introduced in our Cantonese corpus for sounds such as wo1wo1 'barking of dogs' and baang4 'crashing/shooting noise'. (iv) the category 'ditransitive verb' (vd) is applied only to verbs which allow two NP objects such as bei2 'give', excluding other three-place predicates such as baai2 'put'. |
Syntactic
categories
|
Example
|
||
1.
|
adj
|
adjective
|
sau3
thin, pretty,
fast, good to listen to![]() ![]() ![]() ![]() ![]() |
2.
|
advf
|
focus adverb
|
dou1
also, first, again, still
![]() ![]() ![]() ![]() |
3.
|
advi
|
adverb of
intensity
|
gam3
so, very, too, most![]() ![]() ![]() ![]() |
4.
|
advm
|
adverb of
manner
|
gwaai1gwaai1dei2
obediently, slowly![]() ![]() ![]() ![]() ![]() |
5.
|
advs
|
sentential
adverb
|
jan1wai6
because, therefore, how about![]() ![]() ![]() ![]() ![]() ![]() |
6.
|
asp
|
aspectual
marker
|
zo2
PFV, EXP, PROG, HAB, DEL![]() ![]() ![]() ![]() ![]() |
7.
|
aux
|
auxiliary/modal
verb
|
jing1goi1
should, would, don't
![]() ![]() ![]() ![]() ![]() |
8.
|
cl
|
classifer
|
bun2
CL![]() ![]() ![]() ![]() |
9.
|
com
|
comparative
morpheme
|
di1
more beautiful, prettier
than her ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
10.
|
conj
|
connective
|
ding6hai6
or, and, or ![]() ![]() ![]() ![]() ![]() ![]() |
11.
|
corr
|
correlative
|
jat1lou6
while, the more...the more
![]() ![]() ![]() ![]() ![]() ![]() |
12.
|
det
|
determiner
|
li1
this, that, number
![]() ![]() ![]() |
13.
|
dir
|
directional
verb
|
lei4/lai4
come, go, out, in, go up,
go down ![]() ![]() ![]() ![]() ![]() ![]() |
14.
|
ex
|
expressive
utterance
|
ai1jaa3
oops, well, please/thanks![]() ![]() ![]() ![]() |
15.
|
gen
|
genitive
marker
|
ge3
Timmy's friends![]() ![]() ![]() ![]() |
16.
|
ins
|
emphatic
inserted marker
|
gwai2
what a mess!
![]() ![]() ![]() ![]() |
17.
|
loc
|
localizer
|
dou6
on the table, up there
![]() ![]() ![]() ![]() ![]() ![]() |
18.
|
nn
|
noun
|
ce1
car, toy, star, uncle
![]() ![]() ![]() ![]() ![]() ![]() ![]() |
19.
|
nnpr
|
pronoun
|
ngo5
![]() ![]() ![]() ![]() ![]() lei5dei6
![]() ![]() ![]() ![]() I/me,
you, s/he, we/us, you(pl), they/them
|
20.
|
nnpp
|
proper noun
|
ciu1jan4
Superman, Jesus, Britain
![]() ![]() ![]() ![]() ![]() ![]() |
21.
|
neg
|
negative
morphem
|
m4
not, not, not have
![]() ![]() ![]() |
22.
|
onoma
|
onomatopoeic
expression
|
wou1wou1,
baang4
ONOMA ![]() ![]() ![]() |
23.
|
prt
|
(postverbal)particle
|
dak1
can, until, all, as well,
finish ![]() ![]() ![]() ![]() ![]() |
24.
|
prep
|
preposition
|
hai2
at, for ![]() ![]() |
25.
|
q
|
quantifier
|
jat1
one, thirteen, each
![]() ![]() ![]() ![]() |
26.
|
rfl
|
reflexive
pronoun
|
zi6gei2
self ![]() ![]() |
27.
|
sfp
|
sentence-final
particle
|
aa3
SFP ![]() ![]() ![]() ![]() |
28.
|
vd
|
ditransitive
verb
|
bei2
give, give (as a gift)
![]() ![]() |
29.
|
verg
|
ergative(unaccusative)
verb
|
dit3
fall, break ![]() ![]() |
30.
|
vf
|
function
verb
|
hai6
be, have ![]() ![]() |
31.
|
vi
|
intransitive
verb
|
siu3
smile, rest, pray
![]() ![]() ![]() ![]() ![]() |
32.
|
vt
|
transitive
verb
|
sik6
eat, say, know
![]() ![]() ![]() ![]() |
33.
|
wh
|
wh phrases
|
bin1go3 |
Morpheme tier %morThe %mor tier was generated using a tagging program developed by Lawrence Cheung. Since Cantonese has many homophonous morphemes, it was necessary to carry out disambiguation with respect to word class. The disambiguation and checking were performed by Gene Chu and Simon Huang for both Cantonese and English files. Cantonese Tier %canThe child's Cantonese was first transcribed using romanized Cantonese instead of Chinese characters. The %can tier was generated at a later stage to provide readers who can read Chinese characters with quicker access to the speakers' utterances. Fonts for Cantonese characters are available at the Hong Kong SAR government website, http://www.5c.org/ as well as through Microsoft. The same characters are used
for allophonic representations of a morpheme. Due to ongoing sound changes,
there is variation especially between n/l and ng/ |