[Athen] AI tool question
Kelly Ford via athen-list
athen-list at u.washington.edu
Sat Feb 1 18:44:39 PST 2025
Hi,
I suspect with a bit more experimentation and or better knowledge here on my part, the following Python script could solve this situation. I used ChatGPT to create this and asked it to find all images in a word document, get image descriptions from Open AI and then add those descriptions to the document. I couldn’t yet figure out how to have the descriptions used as alt text directly in the Word document, so they are currently added to the end of the document. I need to learn more about the docx module for Python I expect to solve this situation.
Additionally, within the script, you may want to adjust the prompt text. This currently generates lengthy descriptions. Also, see that the file locations are currently hard coded to a specific directory on my computer so you’d want to modify those.
This does require an Open AI key and money on the account but my experience in generating batches of descriptions with different API uses as I experiment is that in general I’ve spent less than one cent per description.
There may be entirely better ways to do this. I just don’t like the default automatic alt text from Word and this was a way to experiment with something I’ve already been doing to try different image description technology. This is by no means production ready.
Here is more from ChatGPT on all the requirements and then the script.
Here is a comprehensive guide to setting up and running a Python-based script that extracts images from a Word document, generates detailed descriptions using llm, and appends the descriptions to the same document under a dedicated section.
1. Prerequisites
Before running the script, ensure you have the following installed:
* Python 3.x (Recommended: Latest stable version)
* Download & install from: https://www.python.org/downloads/
* During installation, check the box for "Add Python to PATH"
* Virtual Environment (Recommended but Optional)
* Open Command Prompt (cmd.exe) and run:
* python -m venv myenv
* myenv\Scripts\activate (For Windows)
* source myenv/bin/activate (For macOS/Linux)
* Required Python Libraries
Install necessary dependencies via pip:
* pip install python-docx pillow lxml
* llm CLI Installation (if not installed)
* pip install llm
Test if llm is correctly installed by running:
llm --help
2. File Structure Setup
Ensure you have the following files in the working directory:
* testinput.docx → Your input Word document with images
* testoutput.docx → The script will generate this as the output file
* extracted_images/ → The script automatically creates this folder for temporary image storage
3. Running the Script
1. Save the script as process_word_images.py
2. Run it from the command line:
3. python process_word_images.py
4. The script will:
* Extract images from testinput.docx
* Send them to llm for detailed descriptions
* Append a "Detailed Image Descriptions" section at the end of testoutput.docx, listing each image with its description
4. Verifying Results
* Open testoutput.docx in Microsoft Word
* Scroll to the section "Detailed Image Descriptions"
* Check that each image has a heading and a generated description
If you encounter any issues, ensure:
* Python and required libraries are installed
* llm is correctly configured
* The API key is valid and properly set in the script
import os
import docx
import subprocess
from PIL import Image
from io import BytesIO
# Configuration
INPUT_DOCX = r"C:\playground\testinput.docx"
OUTPUT_DOCX = r"C:\playground\testoutput.docx"
IMAGE_DIR = "extracted_images"
API_KEY = "<your api key here>”
PROMPT = "Describe this image in vivid detail including all colors, objects, and as many details as you can."
# Ensure the image directory exists
os.makedirs(IMAGE_DIR, exist_ok=True)
# Load the Word document
doc = docx.Document(INPUT_DOCX)
# Track images found
image_index = 0
doc.add_paragraph("Detailed Image Descriptions", style="Heading 2")
# Process each inline shape (image) in the document
for shape in doc.inline_shapes:
image_index += 1
image_name = f"image_{image_index}.png"
image_path = os.path.join(IMAGE_DIR, image_name)
# Extract image bytes
image_data = shape._inline.graphic.graphicData.pic.blipFill.blip.embed
image_part = doc.part.rels[image_data].target_part
image_bytes = image_part.blob
# Save extracted image
with open(image_path, "wb") as img_file:
img_file.write(image_bytes)
print(f"Processing image: {image_name}")
# Get description using LLM
try:
result = subprocess.run(
["llm", PROMPT, "-a", image_path, "--key", API_KEY],
capture_output=True, text=True, check=True
)
description = result.stdout.strip()
except subprocess.CalledProcessError as e:
description = f"Error processing image: {e.stderr}"
# Append description at the end with heading
doc.add_paragraph(f"Image {image_index}: {image_name}", style="Heading 3")
doc.add_paragraph(description)
print(f"Added description for {image_name}")
# Save the updated document
doc.save(OUTPUT_DOCX)
print(f"Processing complete. Descriptions added at the end of {OUTPUT_DOCX}.")
From: Kelly Ford
Sent: Saturday, February 1, 2025 4:22 PM
To: enews at toptechtidbits.com; athen-list at u.washington.edu
Cc: Deborah Armstrong <armstrongdeborah at fhda.edu>; athen-list at u.washington.edu
Subject: Re: [Athen] AI tool question
There are still possible solutions here, but the last one I would suggest is the automatic alttext generation in office 365. That is far and away, not using any of the newest technology, even that from Microsoft, that’s available to generate descriptions.
On Feb 1, 2025, at 4:01 PM, Top Tech Tidbits via athen-list <athen-list at u.washington.edu> wrote:
I could not agree with you more Debee. While there isn’t yet a widely known tool that fully automates the process of describing all images in a Word document or PDF at once, there are some promising approaches and partial solutions:
1. Microsoft 365’s Accessibility Checker & Alt Text Auto-Generation
* Word and PowerPoint in Microsoft 365 have an automatic alt text generation feature. It can generate descriptions for images, but it requires manual review and editing.
* You can run the Accessibility Checker (Review > Check Accessibility) to find missing alt text and fill in some of it automatically.
1. Adobe Acrobat Pro (for PDFs)
* Acrobat has "Auto-Tag Document" in its accessibility tools, which sometimes adds descriptions, but they are basic and often need improvement.
* You can also extract all images and process them separately with an AI tool.
1. Seeing AI & Lookout (for scanning documents)
* If the document is printed or saved as an image-based PDF, Seeing AI (iOS) and Google Lookout (Android) can scan pages and read out descriptions of images alongside the text.
1. Custom AI Workflows (Python + GPT-based models)
* If you're open to a more technical approach, you can extract images from a document using Python (with PyMuPDF or pdf2image for PDFs, or python-docx for Word) and run them through AI vision models like OpenAI's GPT-4V, Google Vision AI, or Microsoft Azure Computer Vision to generate descriptions automatically.
1. Be My Eyes (AI-Powered Virtual Assistant Mode)
* The latest AI-powered "Virtual Volunteer" mode in Be My Eyes (powered by GPT-4V) allows users to upload an entire document with images and get descriptions—though still not fully automated.
Possible Future Solutions
It would be amazing if JAWS, NVDA, or another screen reader implemented a feature where you could just press a button, and it would automatically describe all images in a document in one go. Hopefully, developers will take note of this need!
Aaron Di Blasi, PMP<https://www.pmi.org/>
<https://www.linkedin.com/in/aarondiblasi/>
<image001.png><https://www.linkedin.com/in/aarondiblasi/>
“The greatest barrier to accessibility is indifference.” 💡
PR Director (2024-Present)
AT-Newswire
Access Technology's Digital Newswire
https://at-newswire.com 🌐
Publisher (2024-Present)
AI-Weekly
The Week's News in Artificial Intelligence
https://ai-weekly.ai<https://ai-weekly.ai/> 🌐
Publisher (2022-Present)
Access Information News
The Week's News in Access Information
https://accessinformationnews.com 🌐
Publisher (2020-Present)
Top Tech Tidbits
The Week's News in Access Technology
https://toptechtidbits.com<https://toptechtidbits.com/> 🌐
Sr. Project Management Professional (2006-Present)
Mind Vault Solutions, Ltd.
https://mvsltd.com<https://mvsltd.com/> 🌐
Certified:
Digital Marketing Associate, Meta Certified<https://mvsltd.com/news/aaron-di-blasi-pmp-mind-vault-solutions-ltd-awarded-digital-marketing-associate-certification-by-meta/> (2022 - Present)
Social Marketing Professional, Hootsuite Certified<https://mvsltd.com/news/aaron-di-blasi-pmp-mind-vault-solutions-ltd-awarded-social-media-marketing-certification-by-hootsuite-world-leader-in-social-media-marketing-solutions/> (2020 - Present)
Email Marketing Professional, Constant Contact Certified<https://mvsltd.com/news/aaron-di-blasi-pmp-mind-vault-solutions-ltd-named-a-constant-contact-certified-solution-provider/> (2019 - Present)
Specializing in:
Digital Strategy and Content Marketing
Social Media Advertising
Online Fundraising
ADA<https://www.ada.gov/>, WCAG<https://www.w3.org/WAI/standards-guidelines/wcag/> and Section 508 Compliance<https://www.justice.gov/crt/section-508-home-page-1>
Website: https://mvsltd.com 🌐
Email: ad at mvsltd.com<mailto:ad at mvsltd.com> 📧
Toll Free: +1 (855) 578-6660<tel:+18555786660>📱️
Schedule A Meeting: https://calendly.com/aarondiblasi
News: https://mvsltd.com/news
Services: https://mvsltd.com/services
Testimonials: https://mvsltd.com/testimonials
Facebook: https://mvsltd.com/facebook
X (Formerly Twitter): https://mvsltd.com/x
LinkedIn: https://mvsltd.com/linkedin
Instagram: https://mvsltd.com/instagram
YouTube: https://mvsltd.com/youtube
Google: https://mvsltd.com/google
CONFIDENTIALITY NOTICE: This e-mail and attachments, if any, may contain confidential information, which is privileged and protected from disclosure by Federal and State confidentiality laws, rules, and regulations. This e-mail and attachments, if any, are intended for the designated addressee only. If you are not the designated addressee, you are hereby notified that any disclosure, copying, or distribution of this e-mail and its attachments, if any, may be unlawful and may subject you to legal consequences. If you have received this e-mail and attachments in error, please delete the e-mail and its attachments from your computer.
From: athen-list <athen-list-bounces at mailman12.u.washington.edu> On Behalf Of Deborah Armstrong via athen-list
Sent: Saturday, February 1, 2025 5:04 PM
To: Access Technology Higher Education Network <athen-list at u.washington.edu>
Subject: [Athen] AI tool question
Has anyone found a tool that will automatically describe all pictures in a word document or PDF, such as a class handout, a slide ceck or a textbook chapter?
I know JAWS has a great picture smart AI feature that lets you locate a graphic on a web page or in a document and have it thoroughly described, but the user has to locate the picture, focus on it, and hit the right keystrokes.
And users of other screen readers can download the free Be My Eyes app for Windows to do the same thing.
A variety of iPhone and Android apps also describe pictures and scenes for the visually impaired including Seeing AI, Lookout, Speak-A-Boo, Focus Assist and and Be My Eyes. And of course the Meta smart glasses are super for this as well if properly prompted.
But I know of no tool that has automated this for an entire document.
It would be so cool if such a tool existed.
--Debee
_______________________________________________
athen-list mailing list
athen-list at mailman12.u.washington.edu
http://mailman12.u.washington.edu/mailman/listinfo/athen-list
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman12.u.washington.edu/pipermail/athen-list/attachments/20250202/5cddc1c0/attachment.html>
More information about the athen-list
mailing list