Table 2 Top 50 classifier features, ranked by information gain.

From: Lymelight: forecasting Lyme disease risk using web search data

Feature

Information gain (in bits of information)

Lyme

1.10E−03

Lyme disease

1.08E−03

Tick

6.90E−04

Ticks

6.60E−04

Of lyme

6.40E−04

Disease

6.20E−04

[Lyme disease] (KG concept)

5.50E−04

A tick

5.10E−04

[Tick] (KG concept)

4.70E−04

Parasites

4.50E−04

Tick borne

4.40E−04

Tick bite

4.30E−04

Tick bites

3.80E−04

[Pathogenic bacteria] (KG concept)

3.80E-04

Borrelia

3.70E-04

For lyme

3.50E-04

Conditions lyme

3.50E-04

Diseases

3.40E-04

Bite

3.30E-04

Borne

3.30E-04

Burgdorferi

3.30E-04

cdc

3.30E-04

[Disease vectors] (KG concept)

3.20E-04

[Disease] (KG concept)

3.20E−04

Borrelia burgdorferi

3.20E−04

Disease cdc

3.20E−04

Disease is

3.10E−04

[Infectious diseases] (KG concept)

3.10E−04

Ticks are

3.10E−04

Ticks and

2.80E−04

The tick

2.80E−04

[Disease or medical conditions] (KG concept)

2.80E−04

Symptoms

2.70E−04

Blacklegged

2.60E−04

Of ticks

2.50E−04

Disease symptoms

2.50E−04

The bite

2.40E−04

Of tick

2.40E−04

Disease lyme

2.40E−04

Lyme disease

2.40E−04

Health

2.30E−04

Infection

2.20E−04

Bites

2.20E−04

Treatment

2.20E−04

Infected

2.20E−04

Rash

2.20E−04

Transmitted

2.20E−04

About lyme

2.20E−04

With lyme

2.10E−04

Deer ticks

2.00E−04

  1. Top 50 features, ranked by information gain. KG concepts are those found in the Google Knowledge Graph.