Illustration showing how Polyfactory generates realistic Python test data using dataclasses, Pydantic, attrs, nested models, and batch factories.

AI Tools

Building Realistic Test Data in Python Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models

February 8, 2026 3 Min Read

In this hands-on guide, we explore how to generate realistic, structured test data in Python using Polyfactory. Starting from simple dataclasses, we progressively move toward advanced patterns including custom field generation, dependent values, nested objects, Pydantic validation, and attrs integration. By the end of this tutorial, you will be able to create powerful, production-like mock data pipelines that reduce boilerplate, improve test coverage, and accelerate development workflows.

Table Of Content

1️⃣ Installing Required Libraries
2️⃣ Generating Data from Basic Dataclasses
3️⃣ Customizing How Fields Are Generated
4️⃣ Derived and Dependent Fields
5️⃣ Working with Nested Models
6️⃣ Using Attrs Classes
7️⃣ Overriding Specific Values
8️⃣ Generating Data for Pydantic Models
✅ Conclusion

This tutorial focuses on practical implementation with clear examples that mirror real-world application models.

1️⃣ Installing Required Libraries

We begin by installing the libraries needed for generating structured test data.

import sys
import subprocess

def pip_install(pkg):
    subprocess.check_call([sys.executable, "-m", "pip", "install", pkg, "-q"])

dependencies = [
    "polyfactory",
    "faker",
    "pydantic",
    "attrs",
    "email-validator"
]

for dep in dependencies:
    pip_install(dep)

print("All packages installed successfully.")

2️⃣ Generating Data from Basic Dataclasses

from dataclasses import dataclass
from typing import List, Optional
from datetime import date
from uuid import UUID
from polyfactory.factories import DataclassFactory

@dataclass
class Location:
    street: str
    city: str
    country: str
    postal_code: str

@dataclass
class UserProfile:
    user_id: UUID
    full_name: str
    email: str
    age: int
    joined_on: date
    active: bool
    location: Location
    tags: List[str]
    about: Optional[str] = None


class UserProfileFactory(DataclassFactory[UserProfile]):
    pass


sample_user = UserProfileFactory.build()
print(sample_user)

multiple_users = UserProfileFactory.batch(3)
print(multiple_users)

3️⃣ Customizing How Fields Are Generated

from faker import Faker
from datetime import date

fake = Faker()

@dataclass
class Staff:
    staff_code: str
    name: str
    department: str
    salary: float
    start_date: date
    email: str


class StaffFactory(DataclassFactory[Staff]):
    __faker__ = fake
    __random_seed__ = 7

    @classmethod
    def staff_code(cls) -> str:
        return f"STF-{cls.__random__.randint(1000, 9999)}"

    @classmethod
    def department(cls) -> str:
        return cls.__random__.choice(["IT", "Sales", "Support", "HR"])

    @classmethod
    def salary(cls) -> float:
        return round(cls.__random__.uniform(40000, 120000), 2)

    @classmethod
    def email(cls) -> str:
        return cls.__faker__.company_email()


print(StaffFactory.batch(2))

4️⃣ Derived and Dependent Fields

Factories can compute values after object creation.

@dataclass
class InventoryItem:
    code: str
    title: str
    price: float
    discount: float
    final_price: Optional[float] = None


class InventoryFactory(DataclassFactory[InventoryItem]):

    @classmethod
    def code(cls) -> str:
        return f"ITEM-{cls.__random__.randint(100, 999)}"

    @classmethod
    def price(cls) -> float:
        return round(cls.__random__.uniform(20, 500), 2)

    @classmethod
    def discount(cls) -> float:
        return round(cls.__random__.uniform(5, 25), 2)

    @classmethod
    def build(cls, **kwargs):
        obj = super().build(**kwargs)
        obj.final_price = round(obj.price * (1 - obj.discount / 100), 2)
        return obj


print(InventoryFactory.build())

5️⃣ Working with Nested Models

You can compose factories for complex object graphs.

from datetime import datetime
from enum import Enum

class Status(str, Enum):
    NEW = "new"
    SENT = "sent"
    RECEIVED = "received"

@dataclass
class LineItem:
    name: str
    quantity: int
    rate: float
    total: Optional[float] = None


class LineItemFactory(DataclassFactory[LineItem]):
    @classmethod
    def build(cls, **kwargs):
        obj = super().build(**kwargs)
        obj.total = round(obj.quantity * obj.rate, 2)
        return obj


@dataclass
class Invoice:
    invoice_no: str
    created_at: datetime
    status: Status
    items: List[LineItem]
    grand_total: Optional[float] = None


class InvoiceFactory(DataclassFactory[Invoice]):

    @classmethod
    def items(cls) -> List[LineItem]:
        return LineItemFactory.batch(3)

    @classmethod
    def build(cls, **kwargs):
        inv = super().build(**kwargs)
        inv.grand_total = sum(i.total for i in inv.items)
        return inv


print(InvoiceFactory.build())

6️⃣ Using Attrs Classes

Polyfactory also supports attrs.

import attrs
from polyfactory.factories.attrs_factory import AttrsFactory
from datetime import datetime

@attrs.define
class Article:
    title: str
    author: str
    body: str
    published: bool = False
    created_at: datetime = datetime.now()


class ArticleFactory(AttrsFactory[Article]):
    pass


print(ArticleFactory.build())

7️⃣ Overriding Specific Values

You can override fields while keeping others random.

custom_user = UserProfileFactory.build(
    full_name="John Doe",
    age=28,
    email="john@example.com"
)

print(custom_user)

8️⃣ Generating Data for Pydantic Models

from pydantic import BaseModel
from polyfactory.factories.pydantic_factory import ModelFactory

class Payment(BaseModel):
    method: str
    verified: bool = False


class PaymentFactory(ModelFactory[Payment]):
    __model__ = Payment


print(PaymentFactory.batch(3))

✅ Conclusion

Polyfactory dramatically simplifies the process of generating meaningful, structured, and realistic test data. Whether you are working with dataclasses, attrs, or Pydantic models, it allows you to build reliable test datasets with minimal code and maximum flexibility.

This approach improves testing quality, speeds up prototyping, and reduces the burden of manually crafting mock objects.

Tags:

Stay Ahead in the World of Artificial Intelligence

Social

Menu

Building Realistic Test Data in Python Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models

Table Of Content

1️⃣ Installing Required Libraries

2️⃣ Generating Data from Basic Dataclasses

3️⃣ Customizing How Fields Are Generated

4️⃣ Derived and Dependent Fields

5️⃣ Working with Nested Models

6️⃣ Using Attrs Classes

7️⃣ Overriding Specific Values

8️⃣ Generating Data for Pydantic Models

✅ Conclusion

Tags:

Promote your Website or AI Tool

Subscribe Newsletter

Categories

Support

Links

Follow

Type and hit Enter to search

Stay Ahead in the World of Artificial Intelligence

Social

Menu

Building Realistic Test Data in Python Using Polyfactory with Dataclasses, Pydantic, Attrs, and Nested Models

Table Of Content

1️⃣ Installing Required Libraries

2️⃣ Generating Data from Basic Dataclasses

3️⃣ Customizing How Fields Are Generated

4️⃣ Derived and Dependent Fields

5️⃣ Working with Nested Models

6️⃣ Using Attrs Classes

7️⃣ Overriding Specific Values

8️⃣ Generating Data for Pydantic Models

✅ Conclusion

Tags:

Share Article

Promote your Website or AI Tool

Categories

Support

Links

Follow