summaryrefslogtreecommitdiff
path: root/README.md
blob: e6aed9c3b1a44970dfffc06bc713caa945a746fd (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
# totalrecall - Bulgarian Anki Flashcard Generator

`totalrecall` is a command-line tool that generates Anki flashcard materials from Bulgarian words. It creates audio pronunciation files and generates images using AI.

It has mainly been vibe coded using Claude Code CLI.

⚠️ **Important:** This tool uses OpenAI services for audio generation, which requires an API key. See [Quick Start](#quick-start) for setup instructions.

## Features

- Audio generation using **OpenAI TTS**: High-quality, natural-sounding voices (requires API key)
  - Random voice selection by default for variety
  - Option to generate in all 11 available voices
- Automatic Bulgarian to English translation
  - Saves translations to separate text files
  - Includes translations in Anki CSV export
- Image generation:
  - **OpenAI DALL-E**: AI-generated educational images with random art styles (requires API key)
- Batch processing of multiple words
- Anki-compatible CSV export with translations
- Configurable voice variants and speech speed
- Support for WAV and MP3 audio formats
- Audio and image caching to save API costs

## Installation

### Prerequisites

1. **For OpenAI TTS** (required for audio generation):
   - Create an account at https://platform.openai.com
   - Generate an API key at https://platform.openai.com/api-keys
   - Set the key using one of these methods:
     - Environment variable: `export OPENAI_API_KEY="sk-..."`
     - Configuration file: Add to `.totalrecall.yaml`

### Building from Source

```bash
git clone https://github.com/yourusername/totalrecall.git
cd totalrecall
go build -o totalrecall ./cmd/totalrecall
```

Or install directly:

```bash
go install codeberg.org/snonux/totalrecall/cmd/totalrecall@latest
```

## Quick Start

**Note:** By default, totalrecall uses OpenAI for both audio and images. Make sure to set your OpenAI API key:
```bash
export OPENAI_API_KEY="sk-..."
```

1. Generate materials for a single word (uses OpenAI by default):
   ```bash
   totalrecall ябълка
   ```

2. Generate with specific DALL-E model:
   ```bash
   totalrecall ябълка --openai-image-model dall-e-3
   ```

3. Process multiple words from a file:
   ```bash
   totalrecall --batch words.txt
   ```

4. Generate with Anki CSV:
   ```bash
   totalrecall ябълка --anki
   ```

## Configuration

Create a `.totalrecall.yaml` file in your home directory or project folder:

```yaml
audio:
  format: mp3           # Audio format (wav or mp3)
  
  # OpenAI settings
  openai_key: "sk-..."  # Your OpenAI API key
  openai_model: "gpt-4o-mini-tts" # Model: tts-1, tts-1-hd, or gpt-4o-mini-tts
  openai_voice: "nova"  # Voice: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse
  openai_speed: 0.8     # Speed: 0.25 to 4.0 (may be ignored by gpt-4o-mini models)
  openai_instruction: "You are speaking Bulgarian language (български език). Pronounce the Bulgarian text with authentic Bulgarian phonetics, not Russian." # For gpt-4o-mini models only
  
  # Caching
  enable_cache: true
  cache_dir: "./.audio_cache"

image:
  provider: openai       # Image provider (currently only openai is supported)
  
  # OpenAI DALL-E settings
  openai_model: "dall-e-2"  # Model: dall-e-2 or dall-e-3
  openai_size: "512x512"    # Size: 256x256, 512x512, 1024x1024
  openai_quality: "standard" # Quality: standard or hd (dall-e-3 only)
  openai_style: "natural"    # Style: natural or vivid (dall-e-3 only)
  
  # Caching
  enable_cache: true
  cache_dir: "./.image_cache"

output:
  directory: ./anki_cards
  naming: "{word}_{type}"
```

## Usage

```bash
totalrecall [word] [flags]
```

### Flags

- `-v, --voice string`: Voice variant (default "bg+f1")
- `-o, --output string`: Output directory (default "./anki_cards")
- `-f, --format string`: Audio format - wav or mp3 (default "mp3")
- `--batch string`: Process words from file (one per line)
- `--anki`: Generate Anki import CSV file
- `--skip-audio`: Skip audio generation
- `--skip-images`: Skip image download
- `--images-per-word int`: Number of images per word (default 1)
- `--image-api string`: Image source - currently only openai is supported (default "openai")
- `--all-voices`: Generate audio in all available OpenAI voices (creates 11 files per word)

#### Audio Options

#### OpenAI Audio Options
- `--openai-model string`: Model - tts-1, tts-1-hd, or gpt-4o-mini-tts (default "gpt-4o-mini-tts", requires special access)
- `--openai-voice string`: Voice - alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, verse (default: random)
- `--openai-speed float`: Speech speed 0.25-4.0 (default 0.8, may be ignored by gpt-4o-mini-tts)
- `--openai-instruction string`: Voice instructions for gpt-4o-mini-tts model (e.g., "speak with a Bulgarian accent")

#### OpenAI Image Options
- `--openai-image-model string`: Model - dall-e-2 or dall-e-3 (default "dall-e-2")
- `--openai-image-size string`: Size - 256x256, 512x512, 1024x1024 (default "512x512")
- `--openai-image-quality string`: Quality - standard or hd (default "standard", dall-e-3 only)
- `--openai-image-style string`: Style - natural or vivid (default "natural", dall-e-3 only)

## API Keys

### OpenAI
- Required for both OpenAI TTS audio and DALL-E image generation
- Get your key at: https://platform.openai.com/api-keys
- Set via environment variable: `export OPENAI_API_KEY="sk-..."`
- Or add to config file as `audio.openai_key`

## Examples

### Basic Usage
```bash
# Single word (uses OpenAI by default)
totalrecall котка

# High-quality OpenAI with specific voice
totalrecall ябълка --openai-model tts-1-hd --openai-voice alloy

# Use gpt-4o-mini-tts with custom voice instructions
totalrecall ябълка --openai-instruction "Speak like a patient Bulgarian teacher, very slowly and clearly"

# Multiple words with custom output
totalrecall --batch animals.txt -o ./animal_cards

# Skip images, audio only
totalrecall куче --skip-images

# Generate Anki import file
totalrecall --batch words.txt --anki

# Generate AI images with OpenAI DALL-E
totalrecall ябълка --image-api openai

# High-quality DALL-E 3 images
totalrecall котка --image-api openai --openai-image-model dall-e-3 --openai-image-quality hd

# Combine OpenAI audio and images
totalrecall куче --image-api openai

# Generate audio in all 11 OpenAI voices
totalrecall котка --all-voices --skip-images
```

### Batch File Format
Create a text file with one Bulgarian word per line:
```
ябълка
котка
куче
хляб
вода
```

### Output Files
For each word, the tool generates:
- `word.mp3` - Audio pronunciation (random voice)
- `word_translation.txt` - English translation
- `word_1.jpg`, `word_2.jpg`, etc. - Generated images
- `anki_import.csv` - Anki import file (when using --anki flag)

With `--all-voices` flag:
- `word_alloy.mp3`, `word_nova.mp3`, etc. - Audio in all 11 voices

## Anki Import

1. Generate materials with the `--anki` flag
2. In Anki, go to File → Import
3. Select the generated `anki_import.csv`
4. Copy all media files to your Anki media folder
5. Map fields appropriately during import

## Voice Variants

Available Bulgarian voices:
- `bg` - Default Bulgarian voice
- `bg+m1`, `bg+m2`, `bg+m3` - Male voices
- `bg+f1`, `bg+f2`, `bg+f3` - Female voices

## Troubleshooting


### No images found
- Check your internet connection
- Verify API keys in configuration
- Try using English translations for better results

### OpenAI API errors
- Verify your API key is correct and has credits

## Cost Considerations

### OpenAI Services
- **TTS Audio**: ~$0.015 per 1K characters (tts-1), ~$0.030 (tts-1-hd)
- **DALL-E 2 Images**: ~$0.02 per image (512x512)
- **DALL-E 3 Images**: ~$0.04 per image (standard), ~$0.08 (HD)
- Both services cache results to avoid regenerating identical content

### Cost Savings
- Both audio and images are cached to avoid regenerating identical content

### OpenAI Troubleshooting
- Check the API key has proper permissions enabled
- If you get rate limit errors, wait a moment and try again


### OpenAI TTS Configuration

OpenAI TTS provides natural Bulgarian pronunciation:

```bash
# Option 1: Use environment variable
export OPENAI_API_KEY="sk-your-key-here"
totalrecall ябълка

# Option 2: Set in .totalrecall.yaml
audio:
  openai_key: "sk-your-key-here"

# Use with custom voice
totalrecall ябълка --openai-voice alloy
```

**OpenAI TTS Models**:
- **gpt-4o-mini-tts** (default): New model with voice instruction support for customizable speech styles. Requires special API access.
- **tts-1**: Standard quality at $0.015 per 1K characters (~$0.0001 per word)
- **tts-1-hd**: Higher quality at $0.030 per 1K characters (~$0.0002 per word)

The gpt-4o-mini-tts model allows you to control how the voice speaks using natural language instructions, making it ideal for language learning applications. The tool caches audio to avoid repeated API calls for the same words.

## License

MIT License - see LICENSE file for details