2.3 Automated visual analyses

Last modified by Tomi Toivio on 2024/11/15 11:24

Multimodal LLMs

Machine learning tools

  • Whisper for audio to text transcripts.
  • CLIP for image classification.
  • BLIP for image description generation. 
  • OpenCV for YOLO, scene detection etc.



Image recognition systems

There are many commercially available image recognition systems which automatically label image content. Services include Google Vision, Azure AI Vision, Amazon Rekognition, and OpenCV. These work by inputting an image and outputting set of labels on content which has been recognised from images.

Figure1.png

However, image recognition systems differ on the labels they provide. Table below are the labels provided by three services in the example provided by Berg & Nelimarkka (2023) when they analyse the image above. Clear differences on the count of labels provided as well as content of labels can be observed. To address these challenges, they have developed a tool to evaluate the quality of labels using the Cross-service Label Agreement Score COSLAB.

 

Google

Azure

AWS

apartment

 ✓

 

 

architecture

 

 

asphalt

 

 

backpack

 

 

 ✓

bag

 

 

building

campus

 ✓

 

car

 

 

city

 ✓

 ✓

cityscape

 

 

clothing

 

 

cloud

 

commercial building

 ✓

 

 

condo

 

 

condominium

 ✓

 

 

downtown

 ✓

 ✓

driveway

 ✓

 

 

evening

 ✓

 

 

facade

 ✓

 

 

footwear

 

 

 ✓

freeway

 

 

 ✓

grass

 ✓

 ✓

headquarters

 ✓

 

 

high rise

 

 

 ✓

highway

 

 

 ✓

home

 ✓

 

 

house

 ✓

 

 

housing

 

 

 ✓

intersection

 

 

landscape

 ✓

 

 

lane

 ✓

 

 

leisure

 ✓

 

 

metropolis

 

 

mixed-use

 ✓

 

 

nature

 

 

neighborhood

 

office building

 

outdoor

 

 

outdoors

 

 

park

 

 ✓

parking

 ✓

 

 

path

 

 

 ✓

person

 

 

plant

public space

 

 ✓

 

recreation

 ✓

 

 

road

 ✓

road surface

 

shadow

 ✓

 

 

shoe

 

 

 ✓

sidewalk

 ✓

 ✓

 ✓

sky

 

street

 ✓

 ✓

street light

 ✓

 

 

suburb

 ✓

 

 ✓

tar

 ✓

 

 

tarmac

 

 

 ✓

thoroughfare

 ✓

 ✓

 

tower block

 ✓

 

 

transportation

 

 ✓

tree

 ✓

 ✓

urban

 

 

urban design

 ✓

 

 

vegetation

 

 

 ✓

vehicle

 

 

 ✓

walking

 

 

 ✓

walkway

 ✓

 

 

n

37

15

39

Building image classifier

It is possible to build an image classifier model by manually detecting different kinds of images and then using neural network models (usually with through fine-tuning) to replicate said materials.  For details, see materials such as