A set of utterances collected from a plurality of contributors is received. Semantically irrelevant utterances are removed from the set of utterances to obtain a processed set of utterances, including by applying a machine learning model to the set of utterances. An annotation user interface is provided to a plurality of human annotators to perform annotation on the processed set of utterances to obtain an annotated set of utterances. A curation user interface is provided to one or more domain experts to perform curation of the annotated set of utterances to obtain a curated set of utterances. The curated set of utterances is outputted as a training set for an automated dialogue agent.
展开▼