VizWiz-Visual Grounding

MOTIVATION

We aim to build a smart system that can automatically answer visual questions from blind people. The images and questions you will see are from visually impaired people and the answer is from crowd workers.

TASK

In this task, we ask you to carefully review the question, image, and the answer provided, and then finish step 1, step 2, step 3 (if activated), and step 4. There are examples to follow for each step.

Step 1: Indicate if there is more than one question asked.

Show step 1's examples

Question: Could you tell me the expiration date on this milk and is this lactose free?
Answer: Feb 20 2021. Yes.

Yes

No

Though there are two question markers, only one question is needed to be answered.

Question: Is this a chair what color?
Answer: Yes. Black.

Yes

No

It just refers to one object.

Question: What is this?
Answer: Mushroom.

No

It is referring the mushroom as a whole

Question: What is this?
Answer: Rice Vinegar, garlic powder, and pepper.

Yes

The answer focus on multi-objects.

Question: How many mushrooms are there?
Answer: Around 60

Yes

Usually, the answer to a counting question is referring to more than one object unless the counting answer is one.

Question: How many bottles are there?
Answer: 5

Yes

The answer is referring to 5 objects instead of one.

If both Step 1 and Step 2 are "No", go to step 3. Otherwise, step 3 will not be activated, please skip step 3 and go to step 4.

Step 3: You have two options for step 3:

Option (a): If the answer is not in the image, select the "cannot draw" option and indicate the reason why you cannot draw it.

Option (b): If the answer exists in the image, draw ONE closed polygon to segment the region/object that most prominently justifies the answer.

If you selected option (b), please follow the following instructions to draw the polygon:

• How to draw: click the image to draw points one by one to form a polygon. No drag operation is needed.

• How to finish drawing: move your cursor to the first point (the polygon will turn purple when your cursor is on the first point you draw), and click the first point to finish. Or you can press the keborad shortcut 'Enter' to finish.

• How to undo a point: You can use the keyboard shortcut 'Ctrl+Z' or click the Undo button to Undo.

• After finishing, the cursor will be disabled. If you would like to make a change, either click the Clear button or Undo button or use the keyboard shortcut 'Ctrl+Z' to enable the cursor.

Show step 3 option (b)'s examples

Please view the 5 tabs to see the 5 different kinds of examples.

If the object (e.g., tyre, donut, ring) has a hole, you just need to draw the outside boundary

Question: What is this?
Answer: laundry detergent

Just One closed polygon is allowed.

when the answer is related to text, please first identify if the object is mentioned in the question.

If not, see the example on the left. Usually, they ask questions like “what is this?”. In this case, text is used to describe the object and you should draw the outline of the object.

If so, see the example on the middle and right. You should draw outline around the related text.

Question: What is this?
Answer: CeraVe daily moisturizing lotion

The question is asking what the object is. Thus you need to draw the outline of the whole object. Note that the answer to this question is the same as the answer to the middle image, while their annotated area are different.

Question: What type of the lotion is this?
Answer: CeraVe daily moisturizing lotion

The question has metioned what the object is. Thus you only need to draw outline around the related text area.

Question: What's the brand of the lotion?
Answer: CeraVe

The question has metioned what the object is. Thus you only need to draw the outline of the related text area.

If the answer is referring to the whole image, draw a rectangle to ground the whole picture as the target region. Usually, this happens when the camera is set too close to the object.

Question: What is this?
Answer: CeraVe daily moisturizing lotion

Draw a rectangle to include the whole image

Please try your best to trace the boundary of the object as tightly as possible. Only when the boundary is too complex that you may not need to perfectly trace the boundary.

Question: What is this?
Answer: CeraVe daily moisturizing lotion for normal to dry skin.

Please trace the boundar as tightly as you can

Question: What plant is this?
Answer: Dracaena sanderiana

Please trace the boundary as tightly as possible

If something obscures the target object, please do not include that. Please only label for the visible part of the target object.

Question: What is this?
Answer: CeraVe daily moisturizing lotion for normal to dry skin

Avoid Occulusion if you can.

Question: What is this?
Answer: CeraVe daily moisturizing lotion for normal to dry skin

In this case, it's acceptable to include the chopsticks. Otherwise you cannot use just one closed polygon to label for this image.

Question: What is this?
Answer: CeraVe daily moisturizing lotion for normal to dry skin

You should not include the hand for this image annotation

Step 4: Click next and go to the next image.

NOTE

You will annotate for five image-question pairs in one HIT.
You cannot go to next page until you finish the current one.
Please do not refresh the webpage once you have started working, as you will lose all your progress, and have to start at the beginning.
It is possible that some images could be meaningless, inappropriate, or offensive. We cannot control what pictures are taken. Kindly use your best judgement for this task.

You can see this information anytime by clicking "Hide / Show Details" button above.

MOTIVATION

TASK

Yes

No

Yes

No

No

Yes

Yes

Yes

NOTE

Please read the following question and answer about the image to the left and finish the 3 steps below

Step 1: Are there more than one question asked?

Step 2: Is the answer referring to multi-regions/objects?

Step 3: Draw one closed polygon to localize the region that the answer is referring to by clicking on the image.

MOTIVATION

TASK

Yes

No

Yes

No

No

Yes

Yes

Yes

Correct segmentation

Cannot draw like this

correct segmentation

correct segmentation

correct segmentation

Correct segmentation

Good segmentation

Bad segmentation

Awesome segmentaion!

Acceptable segmentation

Unacceptable segmentation

Good segmentation

Good segmentation

Bad segmentation

NOTE

Please read the following question and answer about the image to the left and finish the 3 steps below

Step 1: Are there more than one question asked?

Step 2: Is the answer referring to multi-regions/objects?

Step 3: Draw one closed polygon to localize the region that the answer is referring to by clicking on the image.