feat: add OpenAI DALL-E image generation and make OpenAI defaultv0.1.0

- Implement OpenAI DALL-E provider for generating educational flashcard images - Add support for DALL-E 2 and DALL-E 3 with configurable size, quality, and style - Implement intelligent caching to minimize API costs - Make OpenAI the default provider for both audio (TTS) and images (DALL-E) - Add automatic fallback to free alternatives (espeak/pixabay) when OpenAI unavailable - Fix bug where cached images couldn't be copied to output directory - Update documentation with OpenAI setup instructions and examples - Add comprehensive unit tests for OpenAI image provider - Bump version to 0.1.0 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
author: Paul Buetow <paul@buetow.org> 2025-07-15 07:48:58 +0300
committer: Paul Buetow <paul@buetow.org> 2025-07-15 07:48:58 +0300
commit: c791e9fdd57af52599de266facbaba0077f31558 (patch)
tree: a5eb002a02b4cf279897d748bdcdc5b179c50098 /README.md
parent: b03d096d12df59b66cf52991c46dfce44c20ae3b (diff)
1 files changed, 75 insertions, 16 deletions
diff --git a/README.md b/README.md
index f4d6b96..d27b734 100644
--- a/README.md
+++ b/README.md
@@ -1,13 +1,18 @@
 # totalrecall - Bulgarian Anki Flashcard Generator
 
-`totalrecall` is a command-line tool that generates Anki flashcard materials from Bulgarian words. It creates audio pronunciation files using espeak-ng or OpenAI TTS and downloads representative images from web search APIs.
+`totalrecall` is a command-line tool that generates Anki flashcard materials from Bulgarian words. It creates audio pronunciation files and generates images using AI.
+
+⚠️ **Important:** This tool uses OpenAI services by default, which requires an API key. See [Quick Start](#quick-start) for setup instructions or use the free alternatives with `--audio-provider espeak --image-api pixabay`.
 
 ## Features
 
 - Audio generation with multiple providers:
   - **espeak-ng**: Free, offline Bulgarian voices (robotic quality)
   - **OpenAI TTS**: High-quality, natural-sounding voices (requires API key)
-- Image search via Pixabay and Unsplash APIs
+- Image search and generation:
+  - **Pixabay**: Free stock photo search (optional API key)
+  - **Unsplash**: High-quality photo search (requires API key)
+  - **OpenAI DALL-E**: AI-generated educational images (requires API key)
 - Batch processing of multiple words
 - Anki-compatible CSV export
 - Configurable voice variants and speech speed
@@ -59,17 +64,27 @@ go install codeberg.org/snonux/totalrecall/cmd/totalrecall@latest
 
 ## Quick Start
 
-1. Generate materials for a single word:
+**Note:** By default, totalrecall uses OpenAI for both audio and images. Make sure to set your OpenAI API key:
+```bash
+export OPENAI_API_KEY="sk-..."
+```
+
+1. Generate materials for a single word (uses OpenAI by default):
    ```bash
    totalrecall ябълка
    ```
 
-2. Process multiple words from a file:
+2. Use free alternatives (espeak + pixabay):
+   ```bash
+   totalrecall ябълка --audio-provider espeak --image-api pixabay
+   ```
+
+3. Process multiple words from a file:
    ```bash
    totalrecall --batch words.txt
    ```
 
-3. Generate with Anki CSV:
+4. Generate with Anki CSV:
    ```bash
    totalrecall ябълка --anki
    ```
@@ -80,7 +95,7 @@ Create a `.totalrecall.yaml` file in your home directory or project folder:
 
 ```yaml
 audio:
-  provider: openai       # Audio provider (espeak or openai)
+  provider: openai       # Audio provider (espeak or openai) - default: openai
   format: mp3           # Audio format (wav or mp3)
   
   # ESpeak settings
@@ -99,10 +114,19 @@ audio:
   cache_dir: "./.audio_cache"
 
 image:
-  provider: pixabay       # Image provider (pixabay or unsplash)
+  provider: openai       # Image provider (pixabay, unsplash, or openai) - default: openai
   pixabay_key: ""        # Optional API key for higher limits
   unsplash_key: ""       # Required for Unsplash
-  size: medium           # Image size preference
+  
+  # OpenAI DALL-E settings
+  openai_model: "dall-e-2"  # Model: dall-e-2 or dall-e-3
+  openai_size: "512x512"    # Size: 256x256, 512x512, 1024x1024
+  openai_quality: "standard" # Quality: standard or hd (dall-e-3 only)
+  openai_style: "natural"    # Style: natural or vivid (dall-e-3 only)
+  
+  # Caching
+  enable_cache: true
+  cache_dir: "./.image_cache"
 
 output:
   directory: ./anki_cards
@@ -125,21 +149,27 @@ totalrecall [word] [flags]
 - `--skip-audio`: Skip audio generation
 - `--skip-images`: Skip image download
 - `--images-per-word int`: Number of images per word (default 1)
-- `--image-api string`: Image source - pixabay or unsplash (default "pixabay")
+- `--image-api string`: Image source - pixabay, unsplash, or openai (default "openai")
 
 #### Audio Provider Options
-- `--audio-provider string`: Audio provider - espeak or openai (default "espeak")
+- `--audio-provider string`: Audio provider - espeak or openai (default "openai")
 
 #### ESpeak Tuning Options
 - `--pitch int`: Pitch adjustment 0-99 (default 50, lower=deeper, espeak only)
 - `--amplitude int`: Volume 0-200 (default 100, espeak only)
 - `--word-gap int`: Gap between words in 10ms units (default 0, espeak only)
 
-#### OpenAI Options
+#### OpenAI Audio Options
 - `--openai-model string`: Model - tts-1 or tts-1-hd (default "tts-1")
 - `--openai-voice string`: Voice - alloy, echo, fable, onyx, nova, shimmer (default "nova")
 - `--openai-speed float`: Speech speed 0.25-4.0 (default 1.0)
 
+#### OpenAI Image Options
+- `--openai-image-model string`: Model - dall-e-2 or dall-e-3 (default "dall-e-2")
+- `--openai-image-size string`: Size - 256x256, 512x512, 1024x1024 (default "512x512")
+- `--openai-image-quality string`: Quality - standard or hd (default "standard", dall-e-3 only)
+- `--openai-image-style string`: Style - natural or vivid (default "natural", dall-e-3 only)
+
 ## API Keys
 
 ### Pixabay
@@ -150,15 +180,21 @@ totalrecall [word] [flags]
 - Required for Unsplash searches
 - Get your key at: https://unsplash.com/developers
 
+### OpenAI
+- Required for both OpenAI TTS audio and DALL-E image generation
+- Get your key at: https://platform.openai.com/api-keys
+- Set via environment variable: `export OPENAI_API_KEY="sk-..."`
+- Or add to config file as `audio.openai_key`
+
 ## Examples
 
 ### Basic Usage
 ```bash
-# Single word with espeak-ng
+# Single word (uses OpenAI by default)
 totalrecall котка
 
-# Using OpenAI TTS (requires API key in config)
-totalrecall котка --audio-provider openai
+# Using espeak-ng (free alternative)
+totalrecall котка --audio-provider espeak
 
 # High-quality OpenAI with specific voice
 totalrecall ябълка --audio-provider openai --openai-model tts-1-hd --openai-voice alloy
@@ -174,6 +210,15 @@ totalrecall куче --skip-images
 
 # Generate Anki import file
 totalrecall --batch words.txt --anki
+
+# Generate AI images with OpenAI DALL-E
+totalrecall ябълка --image-api openai
+
+# High-quality DALL-E 3 images
+totalrecall котка --image-api openai --openai-image-model dall-e-3 --openai-image-quality hd
+
+# Combine OpenAI audio and images
+totalrecall куче --audio-provider openai --image-api openai
 ```
 
 ### Batch File Format
@@ -213,9 +258,23 @@ Make sure espeak-ng is installed and in your PATH.
 
 ### OpenAI API errors
 - Verify your API key is correct and has credits
-- Check the API key has TTS permissions enabled
+
+## Cost Considerations
+
+### OpenAI Services
+- **TTS Audio**: ~$0.015 per 1K characters (tts-1), ~$0.030 (tts-1-hd)
+- **DALL-E 2 Images**: ~$0.02 per image (512x512)
+- **DALL-E 3 Images**: ~$0.04 per image (standard), ~$0.08 (HD)
+- Both services cache results to avoid regenerating identical content
+
+### Free Alternatives
+- **Audio**: Use espeak-ng (free but robotic quality)
+- **Images**: Use Pixabay without API key (limited rate)
+
+### OpenAI Troubleshooting
+- Check the API key has proper permissions enabled
 - If you get rate limit errors, wait a moment and try again
-- The tool will automatically fall back to espeak-ng if OpenAI fails
+- The tool will automatically fall back to espeak-ng if OpenAI audio fails
 
 ### Audio sounds robotic
 The Bulgarian voice in espeak-ng can sound robotic. To improve quality:
author	Paul Buetow <paul@buetow.org>	2025-07-15 07:48:58 +0300
committer	Paul Buetow <paul@buetow.org>	2025-07-15 07:48:58 +0300
commit	c791e9fdd57af52599de266facbaba0077f31558 (patch)
tree	a5eb002a02b4cf279897d748bdcdc5b179c50098 /README.md
parent	b03d096d12df59b66cf52991c46dfce44c20ae3b (diff)