Detection and Identification of Plant Leaf Diseases Using YOLOv4

Early detection of plant diseases can prevent up to 40% of annual crop losses. In this hands-on guide, we build an automated detection system using YOLOv4 that achieves 99.99% accuracy on the PlantVillage dataset — with every step explained from data collection to deployment. (Aldakheel et al., 2024)
Why Automated Plant Disease Detection?
Plant diseases cause billions of dollars in economic losses worldwide every year. Farmers traditionally rely on manual visual inspection — a slow process prone to human error, especially during early disease stages when symptoms are subtle.
Deep learning offers an effective solution: a system that can analyze leaf images in milliseconds and identify the disease type with accuracy that surpasses human experts.
YOLOv4 (You Only Look Once v4) stands out from traditional classification models by its ability to localize the disease on the leaf, not just classify it — which is extremely useful for assessing infection severity.
What You'll Learn
- Setting up the development environment with Darknet and CUDA
- Preparing and annotating the PlantVillage dataset
- Configuring and training a custom YOLOv4 model
- Evaluating performance using mAP and confusion matrix
- Using the model for real-time detection
Prerequisites
Before getting started, make sure you have:
- Python 3.8+ with
pip - CUDA 11.0+ and cuDNN 8.0+ (for GPU acceleration)
- OpenCV 4.5+
- At least 6 GB GPU memory (NVIDIA RTX 2060 or higher recommended)
- Basic knowledge of Python and deep learning
Step 1: Setting Up the Development Environment
Installing Darknet
Darknet is the original framework for running YOLO. Let's compile it with GPU support:
# Clone the repository
git clone https://github.com/AlexeyAB/darknet.git
cd darknet
# Enable GPU and OpenCV in Makefile
sed -i 's/GPU=0/GPU=1/' Makefile
sed -i 's/CUDNN=0/CUDNN=1/' Makefile
sed -i 's/OPENCV=0/OPENCV=1/' Makefile
# Build
make -j$(nproc)Creating a Python Virtual Environment
python3 -m venv yolov4-env
source yolov4-env/bin/activate
pip install opencv-python numpy matplotlib pillow
pip install torch torchvision # Optional — for analysis and evaluationMake sure your installed CUDA version is compatible with your cuDNN version and GPU. You can verify by running nvcc --version and nvidia-smi.
Step 2: The PlantVillage Dataset
Overview
The PlantVillage dataset contains over 54,000 images of plant leaves from 14 different species, covering 38 categories of healthy and diseased leaves.
| Statistic | Value |
|---|---|
| Total images | 54,305 |
| Plant species | 14 |
| Disease categories | 38 |
| Resolution | 256×256 px |
| Format | JPG/PNG |
Downloading the Data
# Download from Kaggle
pip install kaggle
kaggle datasets download -d emmarex/plantdisease
unzip plantdisease.zip -d data/plantvillageDirectory Structure
data/plantvillage/
├── Apple___Apple_scab/
│ ├── image_001.jpg
│ ├── image_002.jpg
│ └── ...
├── Apple___Black_rot/
├── Apple___Cedar_apple_rust/
├── Apple___healthy/
├── Tomato___Bacterial_spot/
├── Tomato___Early_blight/
├── Tomato___Late_blight/
├── Tomato___healthy/
└── ... (38 directories)
Step 3: Preparing Data for YOLOv4
YOLOv4 requires a specific annotation format. Each image needs an accompanying .txt file containing bounding box coordinates.
Conversion Script
import os
import cv2
import random
from pathlib import Path
DATA_DIR = "data/plantvillage"
OUTPUT_DIR = "data/yolo_format"
CLASSES_FILE = "data/classes.txt"
# Collect classes
classes = sorted(os.listdir(DATA_DIR))
with open(CLASSES_FILE, "w") as f:
for cls in classes:
f.write(f"{cls}\n")
print(f"Number of classes: {len(classes)}")
# Convert each image to YOLO format
os.makedirs(f"{OUTPUT_DIR}/images/train", exist_ok=True)
os.makedirs(f"{OUTPUT_DIR}/images/val", exist_ok=True)
os.makedirs(f"{OUTPUT_DIR}/labels/train", exist_ok=True)
os.makedirs(f"{OUTPUT_DIR}/labels/val", exist_ok=True)
all_images = []
for class_idx, class_name in enumerate(classes):
class_dir = os.path.join(DATA_DIR, class_name)
for img_name in os.listdir(class_dir):
if img_name.lower().endswith(('.jpg', '.jpeg', '.png')):
all_images.append((
os.path.join(class_dir, img_name),
class_idx,
img_name
))
# 80/20 train/val split
random.shuffle(all_images)
split = int(0.8 * len(all_images))
train_set = all_images[:split]
val_set = all_images[split:]
def process_image(img_path, class_idx, img_name, split_name):
"""Convert a single image to YOLO format"""
img = cv2.imread(img_path)
h, w = img.shape[:2]
# Copy image
dest = f"{OUTPUT_DIR}/images/{split_name}/{img_name}"
cv2.imwrite(dest, img)
# Create label file — the leaf occupies most of the image
# Format: class_id center_x center_y width height (normalized)
label_name = Path(img_name).stem + ".txt"
label_path = f"{OUTPUT_DIR}/labels/{split_name}/{label_name}"
with open(label_path, "w") as f:
# Bounding box covers 90% of image (leaf is the main subject)
f.write(f"{class_idx} 0.5 0.5 0.9 0.9\n")
print(f"Train: {len(train_set)} | Val: {len(val_set)}")
for img_path, cls_idx, name in train_set:
process_image(img_path, cls_idx, name, "train")
for img_path, cls_idx, name in val_set:
process_image(img_path, cls_idx, name, "val")Image Normalization
import numpy as np
import cv2
def normalize_image(img_path):
"""Normalize pixel values from [0, 255] to [0, 1]"""
img = cv2.imread(img_path)
img = img.astype(np.float32) / 255.0
# Resize to 416×416 (YOLOv4 default)
img = cv2.resize(img, (416, 416))
return img
# Verify a sample image
sample = normalize_image("data/yolo_format/images/train/sample.jpg")
print(f"Range: [{sample.min():.2f}, {sample.max():.2f}]")
print(f"Shape: {sample.shape}") # (416, 416, 3)Step 4: Image Annotation Tools
For better results with YOLOv4, it's preferable to use precise bounding boxes around the diseased areas rather than the entire leaf.
Recommended Annotation Tools
| Tool | Platform | Features |
|---|---|---|
| LabelImg | Desktop | Free, direct YOLO format support |
| CVAT | Web | Collaborative, video support |
| Roboflow | Web | Auto data augmentation, multi-format export |
| VoTT | Desktop | By Microsoft, Active Learning support |
Using LabelImg
pip install labelImg
labelImg data/yolo_format/images/train/ data/classes.txtFor more accurate results, draw the bounding box around the diseased area only, not the entire leaf. This helps the model learn to distinguish between healthy and diseased tissue.
Step 5: Understanding YOLOv4 Architecture
YOLOv4 consists of three main parts:
1. Backbone — CSPDarknet53
Extracts core features from the image using:
- Cross-Stage Partial connections (CSP): Reduces computational redundancy
- Mish activation function:
f(x) = x × tanh(softplus(x))— smoother than ReLU
2. Neck — SPP + PANet
Aggregates features from different levels:
- Spatial Pyramid Pooling (SPP): Expands the receptive field
- Path Aggregation Network (PANet): Merges features top-down and bottom-up
3. Head — YOLOv3 Head
Produces final predictions at three different scales (13×13, 26×26, 52×52).
Input (416×416×3)
│
▼
CSPDarknet53 (Feature extraction)
│
├──► Small scale (13×13) — Large objects
├──► Medium scale (26×26) — Medium objects
└──► Large scale (52×52) — Small objects
│
▼
SPP + PANet (Feature aggregation)
│
▼
Detection at 3 scales
Step 6: Configuring Training Files
Data File (plant.data)
classes = 38
train = data/train.txt
valid = data/val.txt
names = data/classes.txt
backup = backup/Generating Image Lists
import glob
# Training image list
train_images = glob.glob("data/yolo_format/images/train/*.*")
with open("data/train.txt", "w") as f:
f.write("\n".join(train_images))
# Validation image list
val_images = glob.glob("data/yolo_format/images/val/*.*")
with open("data/val.txt", "w") as f:
f.write("\n".join(val_images))
print(f"Training images: {len(train_images)}")
print(f"Validation images: {len(val_images)}")Configuration File (yolov4-plant.cfg)
Copy the original config and modify the parameters:
cp cfg/yolov4-custom.cfg cfg/yolov4-plant.cfgKey parameters to modify:
[net]
batch = 64
subdivisions = 16
width = 416
height = 416
max_batches = 76000 # = num_classes × 2000 = 38 × 2000
steps = 60800,68400 # = 80% and 90% of max_batches# For each [yolo] layer (3 layers)
[yolo]
classes = 38
# For each [convolutional] layer before [yolo]
[convolutional]
filters = 129 # = (classes + 5) × 3 = (38 + 5) × 3Common mistake: Forgetting to modify filters in all three [convolutional] layers before the [yolo] layers. The value must always be (classes + 5) × 3.
Step 7: Training the Model
Downloading Pre-trained Weights
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v3_optimal/yolov4.conv.137Starting Training
./darknet detector train \
data/plant.data \
cfg/yolov4-plant.cfg \
yolov4.conv.137 \
-map \
-dont_showMonitoring Progress
Darknet automatically generates a loss chart at chart.png. You can also follow progress in the terminal:
CUDA-version: 11080 (11080)
Loading weights from yolov4.conv.137...
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
Iteration: 1000, loss: 2.5, avg loss: 3.1, rate: 0.001
Iteration: 2000, loss: 1.2, avg loss: 1.8, rate: 0.001
...
Iteration: 10000, loss: 0.08, avg loss: 0.15, rate: 0.0001
The model is automatically saved every 1,000 iterations in the backup/ folder. If training stops, you can resume from the last checkpoint: ./darknet detector train ... backup/yolov4-plant_last.weights
Step 8: Evaluating the Model
Computing mAP (Mean Average Precision)
./darknet detector map \
data/plant.data \
cfg/yolov4-plant.cfg \
backup/yolov4-plant_best.weightsPerformance Results
| Metric | Value |
|---|---|
| mAP@0.5 | 99.99% |
| Precision | 99.99% |
| Recall | 99.98% |
| F1-Score | 99.99% |
| Inference speed | ~30 FPS (RTX 2080) |
Confusion Matrix with Python
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
def evaluate_model(predictions, ground_truths, class_names):
"""Generate and display confusion matrix"""
cm = confusion_matrix(ground_truths, predictions)
fig, ax = plt.subplots(figsize=(20, 20))
disp = ConfusionMatrixDisplay(cm, display_labels=class_names)
disp.plot(ax=ax, cmap='Greens', xticks_rotation=45)
plt.title("Confusion Matrix — YOLOv4 Plant Disease Detection")
plt.tight_layout()
plt.savefig("confusion_matrix.png", dpi=150)
plt.show()
# Per-class metrics
for i, name in enumerate(class_names):
tp = cm[i, i]
fp = cm[:, i].sum() - tp
fn = cm[i, :].sum() - tp
precision = tp / (tp + fp) if (tp + fp) > 0 else 0
recall = tp / (tp + fn) if (tp + fn) > 0 else 0
print(f"{name}: Precision={precision:.4f}, Recall={recall:.4f}")Step 9: Inference and Real-Time Detection
Single Image Detection
./darknet detector test \
data/plant.data \
cfg/yolov4-plant.cfg \
backup/yolov4-plant_best.weights \
data/test/tomato_late_blight.jpg \
-thresh 0.5Detection with Python and OpenCV
import cv2
import numpy as np
def detect_disease(image_path, config, weights, classes_file, conf_threshold=0.5):
"""Detect leaf diseases in an image"""
# Load network
net = cv2.dnn.readNetFromDarknet(config, weights)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
# Load class names
with open(classes_file, "r") as f:
classes = f.read().strip().split("\n")
# Prepare image
img = cv2.imread(image_path)
h, w = img.shape[:2]
blob = cv2.dnn.blobFromImage(img, 1/255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
# Inference
layer_names = net.getLayerNames()
output_layers = [layer_names[i - 1] for i in net.getUnconnectedOutLayers()]
outputs = net.forward(output_layers)
# Process results
boxes, confidences, class_ids = [], [], []
for output in outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > conf_threshold:
center_x = int(detection[0] * w)
center_y = int(detection[1] * h)
bw = int(detection[2] * w)
bh = int(detection[3] * h)
x = int(center_x - bw / 2)
y = int(center_y - bh / 2)
boxes.append([x, y, bw, bh])
confidences.append(float(confidence))
class_ids.append(class_id)
# Non-Maximum Suppression
indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, 0.4)
# Draw results
colors = np.random.uniform(0, 255, size=(len(classes), 3))
for i in indices.flatten():
x, y, bw, bh = boxes[i]
label = f"{classes[class_ids[i]]}: {confidences[i]:.2f}"
color = colors[class_ids[i]]
cv2.rectangle(img, (x, y), (x + bw, y + bh), color, 2)
cv2.putText(img, label, (x, y - 10),
cv2.FONT_HERSHEY_SIMPLEX, 0.6, color, 2)
return img, [(classes[class_ids[i]], confidences[i]) for i in indices.flatten()]
# Usage
result_img, detections = detect_disease(
"test_leaf.jpg",
"cfg/yolov4-plant.cfg",
"backup/yolov4-plant_best.weights",
"data/classes.txt"
)
for disease, conf in detections:
print(f" {disease}: {conf:.1%}")
cv2.imwrite("result.jpg", result_img)Real-Time Detection from Camera
import cv2
cap = cv2.VideoCapture(0) # Or path to a video file
while True:
ret, frame = cap.read()
if not ret:
break
# Prepare frame
blob = cv2.dnn.blobFromImage(frame, 1/255.0, (416, 416), swapRB=True)
net.setInput(blob)
outputs = net.forward(output_layers)
# ... process results (same code as above)
cv2.imshow("Plant Disease Detection", frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()Comparative Analysis
The study showed that YOLOv4 clearly outperforms other models:
| Model | Accuracy | Speed (FPS) | Size |
|---|---|---|---|
| YOLOv4 | 99.99% | 30 | 256 MB |
| DenseNet-121 | 99.75% | 12 | 33 MB |
| ResNet-50 | 99.68% | 15 | 98 MB |
| AlexNet | 97.82% | 45 | 233 MB |
| VGG-16 | 98.40% | 8 | 528 MB |
| Traditional SVM | 89.50% | 2 | - |
YOLOv4 combines the highest accuracy with excellent real-time speed — making it the optimal choice for field applications on mobile devices and drones.
Future Directions
- Expanding to more diseases: Extend coverage to include root, stem, and fruit diseases
- Multimodal data integration: Combine images with sensor data (humidity, temperature, soil)
- Edge deployment: Convert the model to TensorRT or ONNX for running on Jetson Nano or Raspberry Pi
- Continuous learning: Develop mechanisms to automatically update the model with new field data
- Model interpretability: Use techniques like Grad-CAM to explain model predictions
Conclusion
In this guide, we built a plant leaf disease detection system using YOLOv4 that achieved 99.99% accuracy on the PlantVillage dataset. The methodology includes:
- Data collection and preparation with precise annotations
- Configuration and training of a custom YOLOv4 model
- Comprehensive evaluation using multiple metrics
- Practical deployment for real-time detection
This approach empowers farmers to detect diseases early and take swift action — reducing losses and tangibly improving crop yields.
References
- Aldakheel EA, Zakariah M, Alabdalall AH. Detection and identification of plant leaf diseases using YOLOv4. Front. Plant Sci. 15:1355941. doi: 10.3389/fpls.2024.1355941. Full article
- Bochkovskiy A, Wang CY, Liao HYM. YOLOv4: Optimal Speed and Accuracy of Object Detection. arXiv:2004.10934. Paper
- Hughes DP, Salathé M. An open access repository of images on plant health to enable the development of mobile disease diagnostics. arXiv:1511.08060. PlantVillage dataset
Discuss Your Project with Us
We're here to help with your web development needs. Schedule a call to discuss your project and how we can assist you.
Let's find the best solutions for your needs.
Related Articles

Getting Started with ALLaM-7B-Instruct-preview
Learn how to use the ALLaM-7B-Instruct-preview model with Python, and how to interact with it from JavaScript via a hosted API (e.g., on Hugging Face Spaces).

Building a Custom Code Interpreter for LLM Agents
Learn how to create a custom code interpreter for Large Language Model (LLM) agents, enabling dynamic tool calling and isolated code execution for enhanced flexibility and security.

An Introduction to GPT-4o and GPT-4o mini
Explore the future of AI with our introduction to GPT-4o and GPT-4o mini, OpenAI's latest multimodal models capable of processing and generating text, audio, and visual content seamlessly.