High quality, real world speech datasets

Speech Collection Datasets

We create speech datasets including transcripts for machine learning purposes. Our service is used for technologies looking to create or improve existing automatic speech recognition models (ASR) using natural language processing (NLP) for select languages and various domains.

Each dataset can be created according to dialect, demographics, domain or any other required conditions.

Speech datasets for select languages and industries are available, or bespoke speech collection projects available on request.

 

Speech Collection for AI training

Why Use Way With Words

99%+ accurate transcripts

99%+ Accurate

We produce highly accurate transcripts.

We deliver on time

On Time

We complete your transcripts on time.

We are Data Compliant

Data Compliant

We are fully GDPR and DPA 2018 Compliant.

Client Support

Priority Support

We answer all your questions as a priority.

99%+ accurate transcripts

99%+ Accurate

We produce highly accurate transcripts.

We deliver on time

On Time

We complete your transcripts on time.
We are Data Compliant

Data Compliant

We are fully GDPR and DPA 2018 Compliant.
Client Support

Priority Support

We answer all your questions as a priority.

Speech Collection Process

How it works

Our speech collection process can be customised to suit your needs.

STEP 1

Request a speech dataset

Submit your speech dataset requirements using our form below. We review your request and send a proposed job and price plan for approval.

STEP 2

We create a speech dataset

On acceptance, we proceed with your job which involves recording your required speech and transcribing it.

STEP 3

Receive speech dataset

On completion, or at agreed intervals, we transfer your speech datasets (recordings with completed transcripts) to you.

Frequently Asked Questions about our

Speech Collection Services

Who uses your Speech Collection service?

Our Speech Collection service is available to clients that want to create or improve existing automatic speech recognition models. Off-the-shelf datasets are available for these purposes, which comprise of unscripted, natural conversations that are conducted by participants recruited, trained, and approved to simulate real-world conversations in common domains. For custom datasets that require specific dialects, languages, domains or conventions, please get in touch to learn more.

Do you specialise in any languages or dialects?

Way With Words has completed Speech Collection projects across a range of English dialects, including Australian, Irish, Scottish, South African and Welsh. With a strong presence in Africa, we have also completed Speech Collection projects in languages such as Afrikaans, isiZulu and seSotho.

Which domains have your Speech Collection services included?

Way With Words has created datasets across many domains, including healthcare, insurance, telecom, finance, retail, fast food, travel, airline, and many more. Custom domains can be commissioned to exact client requirements.

Do you sign Service Level Agreements?

For ongoing work, we prefer to work with an SLA. The SLA sets out a clear timetable that includes an initialisation period to set up the required team and logistics for client work. The SLA also covers terms and conditions related to the work and data privacy. If a client requires ongoing work, over an agreed period, Way With Words also usually provides a dedicated MTP team with management oversight, recruitment, selection, assessment, training processes and any other logistical assistance to aid the bespoke requirement.

Datasets Available for Purchase

Our speech data collection was planned, collected, annotated and curated with natural language processing best practice in mind.

Afrikaans Call Recording
Scottish Accented English Speech Collection
Afrikaans Call Recording
Scottish Accented English Speech Collection

Bespoke Speech Collection Projects Completed

Our Speech Collection service is used by clients to improve speech recognition and voice recognition technologies, services or platforms. Speech datasets are required to support and enable acoustic modelling and automated speech recognition.

Afrikaans Call Recording

Afrikaans

Scottish Accented English Speech Collection

English (Scottish Dialect)

United Kingdom Accented English Speech Collection

English (UK Dialect)

United Kingdom Accented English Speech Collection

English (UK Expats)

Irish Accented English Speech Collection

English (Irish Dialect)

Australian Accented English Speech Collection

English (UK Dialect)