Abstract-In radiotherapy using 18-fluorodeoxyglucose

positron emission tomography (18F-FDG-PET), the accurate delineation of the

biological tumour volume (BTV) is a crucial step. In this study, new approach

to segment the BTV in F-FDG-PET images is suggested. The technique is based on

the k-means clustering algorithm incorporating automatic optimal cluster number

estimation, using intrinsic positron emission tomography image information. Partitioning

data into a finite number of k homogenous and separate clusters (groups)

without use of prior knowledge is carried out by some unsupervised partitioning

algorithm like the k-means clustering algorithm. To evaluate these resultant

clusters for finding optimal number of clusters, properties such as cluster

density, size, shape and separability are typically examined by some cluster

validation methods. Mainly the aim of clustering analysis is to find the

overall compactness of the clustering solution, for example variance within

cluster should be a minimum and separation between the clusters should be a

maximum.

I.

INTRODUCTION

There

are several approaches to validate the segmentation techniques such as phantom

studies and the macroscopic surgical specimen obtained from histology. The use

of macroscopic samples for validation of segmentation techniques in positron

emission tomography (PET) images is one of the most promising approaches

reported so far in clinical studies, the procedure consists of the comparison

of the tumour volumes defined on the PET data with actual tumour volumes

measured on the macroscopic samples recorded from histology (where PET was

performed prior to surgery). Segmentation using the cluster –based algorithms

is very popular, but the main problem in this case is the determination of the

optimal and desired number of clusters. In this, we have implemented an

approach based on k-means algorithm with an automatic estimation12 of the

optimal number of clusters, based on the maximum intensity ratio in a given

volume of interest (VOI).

II.

METHEDOLOGY

Calculate

the VE for a range of k (2–50

clusters), and the optimal number which corresponds to the minimum of SVEs. This method gives

good results but consumes a significant computation time by performing the

clustering for a large range of cluster values before selecting the optimal

number of clusters. Several approaches have been proposed in the literature to

identify the optimal cluster number to better fit the data, three of them are

used.

Unfortunately,

the results are not promising because they are not adapted to PET image

segmentation. So our goal in this study is to improve the k-means clustering

method, by incorporating an automatic determination of the optimal number of

clusters using a new criterion based on PET image features.

After

analysing the variation of the maximum activity (intensity) of the uptake

(

) in

the VOI by scanning all slices, we conclude for all patients that the maximum

intensity value decreases from middle to frontier slices, and the maximum

intensity is often situated almost at the centre of the BTV. The optimal

cluster number has a minimum value at the centre of the BTV, and increases from

central to frontier slices. This correlation between the optimal number of

clusters and the maximum intensity motivates our choice of the following slice

image feature:

Where

is the maximum activity

(intensity) of the uptake

in the corresponding slice,

is the maximum activity

in all slices that encompasses tumour volume BTV inside the

, and

is the difference between the maximum and the

minimum values of (Imax (slice)/Imax (VOI)) in the

.

Similarly

to

, the new criterion

, has a maximum value for the middle

slices and decreases for the frontiers of the BTV. Note that the r values

range from ‘0’ to ‘1’ for all patients.

1.

Modelling: This section is dedicated to finding a

relationship between the optimal number of clusters k, and the new criterion r.

This relationship could be used to determine the optimal cluster number for the

segmentation of new PET images using only the new slice image feature r.

After analysing the variation of k in function of r criterion

(for all patients included in this study), we use two fitting models: an

exponential and a power function given by (a) and (b), respectively,

k

= ? ? e?. r + 1 (a)

k

= a ?

+ 1 (b)

Where

?, ?, a, b are coefficients of fitting models and r

is the proposed criterion. The fitting accuracy evaluation is based on the R-square

criterion. Note that we added ‘1’ to the original fitting equation to avoid

clustering the image with one cluster for the high values of r.

2.

Generalisation: The aim of this step is to automate the

choice of the optimal cluster number for all patients using one corresponding

relationship function by defining a generalised model for all patients. For

this reason, we have divided the database randomly into two parts of 50% each.

The first part (validation set), contains three patients is used for optimising

the model coefficients and fixing the optimal power and exponential generalised

model. The second part (test set), contains four patients, is used for testing

the accuracy of the fixed optimal model.

According

to the R-square criterion, the optimal exponential and power generalised

function can be rewritten as follows:

k

= 46.52e?5.918 × r

+ 1

k

= 1.683r?1.264 + 1

k

mean alogorithm 3 :

Fig

1. K-mean

clustering algorithm

3.

Optimization by new technique: The

underlying idea behind it is an analysis of the “movement” of objects between

clusters, considered either forward from k to k+1 or backwards from k+1 to k

groups. In other words we find the movement in membership or joint probability

around k groups. The joint probability obtained from adjacent consecutive k

numbers of groups will be used to produce a diagonally dominant probability

matrix for optimal value of k homogenous and separable groups. The maximum normalised

value of the trace as the greatest value for k within the range tested, will be

defined as the optimal value of k clusters for the given dataset. Formally, we

may describe our approach as follows. For a given choice of k = number of

clusters, a given choice of clustering technique U, and a given choice of V =

set of parameters v1…vn used to control the clustering technique, we first

construct a set of clusters

(U,V) = {

} with i=1..k. Next, we construct sets

of clusters

(U,V), and

(U,V) using the same clustering

technique. In the work reported here, we will not vary U and V so we may write

these cluster sets more simply as

,

and

. Now these three (

,

and

) consecutive groups around k will be

used to find the proportion of common objects from Ck to

,

and

, to create a rectangular proportion

matrix of size m x n, where m and n correspond to rows and columns of

proportion matrix

. We denote the proportion of data

elements in common between a particular pair of clusters, say cluster

from

and cluster

from

by

, which can be abbreviated to

. Similarly, we can compute

, k to create a rectangular matrix of

size n x m where n correspond to columns and m to rows. Note that in general

is not equal to

as they have different cardinality

meaning k+1 not equal to k and vice versa. To investigate how much movement of

objects occurs from Ck to Ck+1 and from

to

we will apply the dot product of matrix for

size m x n (

to

) and n x m rectangular matrices (

to

) to get the joint information in

square matrix m x m for

clusters. Due to the row sum constant of

1 the resultant

square matrix is also known as a row

stochastic matrix 4 or probability and transition matrix 5. The trace of

resultant set of

clusters will be normalised (average of

the trace) to determine the set of more stable (optimal k)

clusters as if the normalised trace is

maximum and may change occur in the

depending on the dataset for the range

of adjacent set of k values. This

matrix will be used to determine the

maximum normalised trace value for determining set of more stable or consistent

clusters , that will indicate set of clusters in

are stable and completely separated from

one another. The steps involves in determining the optimal value of k from the

resultant clusters are follows as:

1.

Create the m x n forward proportion matrix from k to k+1

=

(1)

2.

Create the n x m backward proportion matrix from k+1 to k

=

(2)

Where

=

1,2,3 … . .

and

=

1,2,3 … . .

The

dot product of (1) by (2) will give us a m x m matrix as in (3) below with the

entries showing the joint probabilities of the forward/back movement of the

objects between the set of clusters from k to k+1 and k+1 to k.

=

*

(3)

The

new index

can be calculated from

in (3) as follows:

=

(4)

From (4) the normalised maximum trace value

for

will indicate the set of stable cluster at k

value. In an extreme situation the normalised trace is equal to 1, that is

where the set of clusters in

will keep splitting until the value of

the trace is 1 and it may decrease or increase as we continue but will always

stay less than 1.

III.

CONCLUSION

A new unsupervised cluster-based approach for segmenting the BTV in

-FDG-PET

images is introduced. The system is more reliable and has very less error. It can be improved by

technique used in determining an optimal value of K in K-means clustering, for which k-means clustering it uses a

method to find an optimal value of k number of clusters, using the features and

variables inherited from datasets. The new proposed method is based on

comparison of movement of objects forward/back from k to k+1 and k+1 to k set

of clusters to find the joint probability, which is different from the other

methods and indexes that are based on the distance.