diff options
| author | Paul Buetow <paul@buetow.org> | 2025-07-15 07:48:58 +0300 |
|---|---|---|
| committer | Paul Buetow <paul@buetow.org> | 2025-07-15 07:48:58 +0300 |
| commit | c791e9fdd57af52599de266facbaba0077f31558 (patch) | |
| tree | a5eb002a02b4cf279897d748bdcdc5b179c50098 /README.md | |
| parent | b03d096d12df59b66cf52991c46dfce44c20ae3b (diff) | |
feat: add OpenAI DALL-E image generation and make OpenAI defaultv0.1.0
- Implement OpenAI DALL-E provider for generating educational flashcard images
- Add support for DALL-E 2 and DALL-E 3 with configurable size, quality, and style
- Implement intelligent caching to minimize API costs
- Make OpenAI the default provider for both audio (TTS) and images (DALL-E)
- Add automatic fallback to free alternatives (espeak/pixabay) when OpenAI unavailable
- Fix bug where cached images couldn't be copied to output directory
- Update documentation with OpenAI setup instructions and examples
- Add comprehensive unit tests for OpenAI image provider
- Bump version to 0.1.0
π€ Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
Diffstat (limited to 'README.md')
| -rw-r--r-- | README.md | 91 |
1 files changed, 75 insertions, 16 deletions
@@ -1,13 +1,18 @@ # totalrecall - Bulgarian Anki Flashcard Generator -`totalrecall` is a command-line tool that generates Anki flashcard materials from Bulgarian words. It creates audio pronunciation files using espeak-ng or OpenAI TTS and downloads representative images from web search APIs. +`totalrecall` is a command-line tool that generates Anki flashcard materials from Bulgarian words. It creates audio pronunciation files and generates images using AI. + +β οΈ **Important:** This tool uses OpenAI services by default, which requires an API key. See [Quick Start](#quick-start) for setup instructions or use the free alternatives with `--audio-provider espeak --image-api pixabay`. ## Features - Audio generation with multiple providers: - **espeak-ng**: Free, offline Bulgarian voices (robotic quality) - **OpenAI TTS**: High-quality, natural-sounding voices (requires API key) -- Image search via Pixabay and Unsplash APIs +- Image search and generation: + - **Pixabay**: Free stock photo search (optional API key) + - **Unsplash**: High-quality photo search (requires API key) + - **OpenAI DALL-E**: AI-generated educational images (requires API key) - Batch processing of multiple words - Anki-compatible CSV export - Configurable voice variants and speech speed @@ -59,17 +64,27 @@ go install codeberg.org/snonux/totalrecall/cmd/totalrecall@latest ## Quick Start -1. Generate materials for a single word: +**Note:** By default, totalrecall uses OpenAI for both audio and images. Make sure to set your OpenAI API key: +```bash +export OPENAI_API_KEY="sk-..." +``` + +1. Generate materials for a single word (uses OpenAI by default): ```bash totalrecall ΡΠ±ΡΠ»ΠΊΠ° ``` -2. Process multiple words from a file: +2. Use free alternatives (espeak + pixabay): + ```bash + totalrecall ΡΠ±ΡΠ»ΠΊΠ° --audio-provider espeak --image-api pixabay + ``` + +3. Process multiple words from a file: ```bash totalrecall --batch words.txt ``` -3. Generate with Anki CSV: +4. Generate with Anki CSV: ```bash totalrecall ΡΠ±ΡΠ»ΠΊΠ° --anki ``` @@ -80,7 +95,7 @@ Create a `.totalrecall.yaml` file in your home directory or project folder: ```yaml audio: - provider: openai # Audio provider (espeak or openai) + provider: openai # Audio provider (espeak or openai) - default: openai format: mp3 # Audio format (wav or mp3) # ESpeak settings @@ -99,10 +114,19 @@ audio: cache_dir: "./.audio_cache" image: - provider: pixabay # Image provider (pixabay or unsplash) + provider: openai # Image provider (pixabay, unsplash, or openai) - default: openai pixabay_key: "" # Optional API key for higher limits unsplash_key: "" # Required for Unsplash - size: medium # Image size preference + + # OpenAI DALL-E settings + openai_model: "dall-e-2" # Model: dall-e-2 or dall-e-3 + openai_size: "512x512" # Size: 256x256, 512x512, 1024x1024 + openai_quality: "standard" # Quality: standard or hd (dall-e-3 only) + openai_style: "natural" # Style: natural or vivid (dall-e-3 only) + + # Caching + enable_cache: true + cache_dir: "./.image_cache" output: directory: ./anki_cards @@ -125,21 +149,27 @@ totalrecall [word] [flags] - `--skip-audio`: Skip audio generation - `--skip-images`: Skip image download - `--images-per-word int`: Number of images per word (default 1) -- `--image-api string`: Image source - pixabay or unsplash (default "pixabay") +- `--image-api string`: Image source - pixabay, unsplash, or openai (default "openai") #### Audio Provider Options -- `--audio-provider string`: Audio provider - espeak or openai (default "espeak") +- `--audio-provider string`: Audio provider - espeak or openai (default "openai") #### ESpeak Tuning Options - `--pitch int`: Pitch adjustment 0-99 (default 50, lower=deeper, espeak only) - `--amplitude int`: Volume 0-200 (default 100, espeak only) - `--word-gap int`: Gap between words in 10ms units (default 0, espeak only) -#### OpenAI Options +#### OpenAI Audio Options - `--openai-model string`: Model - tts-1 or tts-1-hd (default "tts-1") - `--openai-voice string`: Voice - alloy, echo, fable, onyx, nova, shimmer (default "nova") - `--openai-speed float`: Speech speed 0.25-4.0 (default 1.0) +#### OpenAI Image Options +- `--openai-image-model string`: Model - dall-e-2 or dall-e-3 (default "dall-e-2") +- `--openai-image-size string`: Size - 256x256, 512x512, 1024x1024 (default "512x512") +- `--openai-image-quality string`: Quality - standard or hd (default "standard", dall-e-3 only) +- `--openai-image-style string`: Style - natural or vivid (default "natural", dall-e-3 only) + ## API Keys ### Pixabay @@ -150,15 +180,21 @@ totalrecall [word] [flags] - Required for Unsplash searches - Get your key at: https://unsplash.com/developers +### OpenAI +- Required for both OpenAI TTS audio and DALL-E image generation +- Get your key at: https://platform.openai.com/api-keys +- Set via environment variable: `export OPENAI_API_KEY="sk-..."` +- Or add to config file as `audio.openai_key` + ## Examples ### Basic Usage ```bash -# Single word with espeak-ng +# Single word (uses OpenAI by default) totalrecall ΠΊΠΎΡΠΊΠ° -# Using OpenAI TTS (requires API key in config) -totalrecall ΠΊΠΎΡΠΊΠ° --audio-provider openai +# Using espeak-ng (free alternative) +totalrecall ΠΊΠΎΡΠΊΠ° --audio-provider espeak # High-quality OpenAI with specific voice totalrecall ΡΠ±ΡΠ»ΠΊΠ° --audio-provider openai --openai-model tts-1-hd --openai-voice alloy @@ -174,6 +210,15 @@ totalrecall ΠΊΡΡΠ΅ --skip-images # Generate Anki import file totalrecall --batch words.txt --anki + +# Generate AI images with OpenAI DALL-E +totalrecall ΡΠ±ΡΠ»ΠΊΠ° --image-api openai + +# High-quality DALL-E 3 images +totalrecall ΠΊΠΎΡΠΊΠ° --image-api openai --openai-image-model dall-e-3 --openai-image-quality hd + +# Combine OpenAI audio and images +totalrecall ΠΊΡΡΠ΅ --audio-provider openai --image-api openai ``` ### Batch File Format @@ -213,9 +258,23 @@ Make sure espeak-ng is installed and in your PATH. ### OpenAI API errors - Verify your API key is correct and has credits -- Check the API key has TTS permissions enabled + +## Cost Considerations + +### OpenAI Services +- **TTS Audio**: ~$0.015 per 1K characters (tts-1), ~$0.030 (tts-1-hd) +- **DALL-E 2 Images**: ~$0.02 per image (512x512) +- **DALL-E 3 Images**: ~$0.04 per image (standard), ~$0.08 (HD) +- Both services cache results to avoid regenerating identical content + +### Free Alternatives +- **Audio**: Use espeak-ng (free but robotic quality) +- **Images**: Use Pixabay without API key (limited rate) + +### OpenAI Troubleshooting +- Check the API key has proper permissions enabled - If you get rate limit errors, wait a moment and try again -- The tool will automatically fall back to espeak-ng if OpenAI fails +- The tool will automatically fall back to espeak-ng if OpenAI audio fails ### Audio sounds robotic The Bulgarian voice in espeak-ng can sound robotic. To improve quality: |
