summaryrefslogtreecommitdiff
path: root/README.md
diff options
context:
space:
mode:
authorPaul Buetow <paul@buetow.org>2025-07-15 07:48:58 +0300
committerPaul Buetow <paul@buetow.org>2025-07-15 07:48:58 +0300
commitc791e9fdd57af52599de266facbaba0077f31558 (patch)
treea5eb002a02b4cf279897d748bdcdc5b179c50098 /README.md
parentb03d096d12df59b66cf52991c46dfce44c20ae3b (diff)
feat: add OpenAI DALL-E image generation and make OpenAI defaultv0.1.0
- Implement OpenAI DALL-E provider for generating educational flashcard images - Add support for DALL-E 2 and DALL-E 3 with configurable size, quality, and style - Implement intelligent caching to minimize API costs - Make OpenAI the default provider for both audio (TTS) and images (DALL-E) - Add automatic fallback to free alternatives (espeak/pixabay) when OpenAI unavailable - Fix bug where cached images couldn't be copied to output directory - Update documentation with OpenAI setup instructions and examples - Add comprehensive unit tests for OpenAI image provider - Bump version to 0.1.0 πŸ€– Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'README.md')
-rw-r--r--README.md91
1 files changed, 75 insertions, 16 deletions
diff --git a/README.md b/README.md
index f4d6b96..d27b734 100644
--- a/README.md
+++ b/README.md
@@ -1,13 +1,18 @@
# totalrecall - Bulgarian Anki Flashcard Generator
-`totalrecall` is a command-line tool that generates Anki flashcard materials from Bulgarian words. It creates audio pronunciation files using espeak-ng or OpenAI TTS and downloads representative images from web search APIs.
+`totalrecall` is a command-line tool that generates Anki flashcard materials from Bulgarian words. It creates audio pronunciation files and generates images using AI.
+
+⚠️ **Important:** This tool uses OpenAI services by default, which requires an API key. See [Quick Start](#quick-start) for setup instructions or use the free alternatives with `--audio-provider espeak --image-api pixabay`.
## Features
- Audio generation with multiple providers:
- **espeak-ng**: Free, offline Bulgarian voices (robotic quality)
- **OpenAI TTS**: High-quality, natural-sounding voices (requires API key)
-- Image search via Pixabay and Unsplash APIs
+- Image search and generation:
+ - **Pixabay**: Free stock photo search (optional API key)
+ - **Unsplash**: High-quality photo search (requires API key)
+ - **OpenAI DALL-E**: AI-generated educational images (requires API key)
- Batch processing of multiple words
- Anki-compatible CSV export
- Configurable voice variants and speech speed
@@ -59,17 +64,27 @@ go install codeberg.org/snonux/totalrecall/cmd/totalrecall@latest
## Quick Start
-1. Generate materials for a single word:
+**Note:** By default, totalrecall uses OpenAI for both audio and images. Make sure to set your OpenAI API key:
+```bash
+export OPENAI_API_KEY="sk-..."
+```
+
+1. Generate materials for a single word (uses OpenAI by default):
```bash
totalrecall ябълка
```
-2. Process multiple words from a file:
+2. Use free alternatives (espeak + pixabay):
+ ```bash
+ totalrecall ябълка --audio-provider espeak --image-api pixabay
+ ```
+
+3. Process multiple words from a file:
```bash
totalrecall --batch words.txt
```
-3. Generate with Anki CSV:
+4. Generate with Anki CSV:
```bash
totalrecall ябълка --anki
```
@@ -80,7 +95,7 @@ Create a `.totalrecall.yaml` file in your home directory or project folder:
```yaml
audio:
- provider: openai # Audio provider (espeak or openai)
+ provider: openai # Audio provider (espeak or openai) - default: openai
format: mp3 # Audio format (wav or mp3)
# ESpeak settings
@@ -99,10 +114,19 @@ audio:
cache_dir: "./.audio_cache"
image:
- provider: pixabay # Image provider (pixabay or unsplash)
+ provider: openai # Image provider (pixabay, unsplash, or openai) - default: openai
pixabay_key: "" # Optional API key for higher limits
unsplash_key: "" # Required for Unsplash
- size: medium # Image size preference
+
+ # OpenAI DALL-E settings
+ openai_model: "dall-e-2" # Model: dall-e-2 or dall-e-3
+ openai_size: "512x512" # Size: 256x256, 512x512, 1024x1024
+ openai_quality: "standard" # Quality: standard or hd (dall-e-3 only)
+ openai_style: "natural" # Style: natural or vivid (dall-e-3 only)
+
+ # Caching
+ enable_cache: true
+ cache_dir: "./.image_cache"
output:
directory: ./anki_cards
@@ -125,21 +149,27 @@ totalrecall [word] [flags]
- `--skip-audio`: Skip audio generation
- `--skip-images`: Skip image download
- `--images-per-word int`: Number of images per word (default 1)
-- `--image-api string`: Image source - pixabay or unsplash (default "pixabay")
+- `--image-api string`: Image source - pixabay, unsplash, or openai (default "openai")
#### Audio Provider Options
-- `--audio-provider string`: Audio provider - espeak or openai (default "espeak")
+- `--audio-provider string`: Audio provider - espeak or openai (default "openai")
#### ESpeak Tuning Options
- `--pitch int`: Pitch adjustment 0-99 (default 50, lower=deeper, espeak only)
- `--amplitude int`: Volume 0-200 (default 100, espeak only)
- `--word-gap int`: Gap between words in 10ms units (default 0, espeak only)
-#### OpenAI Options
+#### OpenAI Audio Options
- `--openai-model string`: Model - tts-1 or tts-1-hd (default "tts-1")
- `--openai-voice string`: Voice - alloy, echo, fable, onyx, nova, shimmer (default "nova")
- `--openai-speed float`: Speech speed 0.25-4.0 (default 1.0)
+#### OpenAI Image Options
+- `--openai-image-model string`: Model - dall-e-2 or dall-e-3 (default "dall-e-2")
+- `--openai-image-size string`: Size - 256x256, 512x512, 1024x1024 (default "512x512")
+- `--openai-image-quality string`: Quality - standard or hd (default "standard", dall-e-3 only)
+- `--openai-image-style string`: Style - natural or vivid (default "natural", dall-e-3 only)
+
## API Keys
### Pixabay
@@ -150,15 +180,21 @@ totalrecall [word] [flags]
- Required for Unsplash searches
- Get your key at: https://unsplash.com/developers
+### OpenAI
+- Required for both OpenAI TTS audio and DALL-E image generation
+- Get your key at: https://platform.openai.com/api-keys
+- Set via environment variable: `export OPENAI_API_KEY="sk-..."`
+- Or add to config file as `audio.openai_key`
+
## Examples
### Basic Usage
```bash
-# Single word with espeak-ng
+# Single word (uses OpenAI by default)
totalrecall ΠΊΠΎΡ‚ΠΊΠ°
-# Using OpenAI TTS (requires API key in config)
-totalrecall ΠΊΠΎΡ‚ΠΊΠ° --audio-provider openai
+# Using espeak-ng (free alternative)
+totalrecall ΠΊΠΎΡ‚ΠΊΠ° --audio-provider espeak
# High-quality OpenAI with specific voice
totalrecall ябълка --audio-provider openai --openai-model tts-1-hd --openai-voice alloy
@@ -174,6 +210,15 @@ totalrecall ΠΊΡƒΡ‡Π΅ --skip-images
# Generate Anki import file
totalrecall --batch words.txt --anki
+
+# Generate AI images with OpenAI DALL-E
+totalrecall ябълка --image-api openai
+
+# High-quality DALL-E 3 images
+totalrecall ΠΊΠΎΡ‚ΠΊΠ° --image-api openai --openai-image-model dall-e-3 --openai-image-quality hd
+
+# Combine OpenAI audio and images
+totalrecall ΠΊΡƒΡ‡Π΅ --audio-provider openai --image-api openai
```
### Batch File Format
@@ -213,9 +258,23 @@ Make sure espeak-ng is installed and in your PATH.
### OpenAI API errors
- Verify your API key is correct and has credits
-- Check the API key has TTS permissions enabled
+
+## Cost Considerations
+
+### OpenAI Services
+- **TTS Audio**: ~$0.015 per 1K characters (tts-1), ~$0.030 (tts-1-hd)
+- **DALL-E 2 Images**: ~$0.02 per image (512x512)
+- **DALL-E 3 Images**: ~$0.04 per image (standard), ~$0.08 (HD)
+- Both services cache results to avoid regenerating identical content
+
+### Free Alternatives
+- **Audio**: Use espeak-ng (free but robotic quality)
+- **Images**: Use Pixabay without API key (limited rate)
+
+### OpenAI Troubleshooting
+- Check the API key has proper permissions enabled
- If you get rate limit errors, wait a moment and try again
-- The tool will automatically fall back to espeak-ng if OpenAI fails
+- The tool will automatically fall back to espeak-ng if OpenAI audio fails
### Audio sounds robotic
The Bulgarian voice in espeak-ng can sound robotic. To improve quality: