Dosimetric Validation of Treatment Planning System for Volumetric Modulated Arc Therapy Using AAPM Medical Physics Practice Guideline 5.b

Aim: To assess the precision of dose calculations for Volumetric Modulated Arc Therapy (VMAT) using megavoltage (MV) photon beams, we validated the accuracy of two algorithms: AUROS XB and Analytical Anisotropic Algorithm (AAA). This validation will encompass both flattening filter (FF) and flattening filter-free beam (FFF) modes, using AAPM Medical Physics Practice Guideline (MPPG 5b). Materials and Methods: VMAT validation tests were generated for 6 MV FF and 6 MV FFF beams using the AAA and AXB algorithms in the Eclipse V.15.1 treatment planning system (TPS). Corresponding measurements were performed on a linear accelerator using a diode detector and a radiation field analyzer. Point dose (PD) and in-vivo measurements were conducted using an A1SL ion chamber and (TLD) from Thermofisher, respectively. The Rando Phantom was employed for end-to-end (E2E) tests. Results: The mean difference (MD) between the TPS-calculated values and the measured values for the PDD and output factors were within 1% and 0.5%, respectively, for both 6 MV FF and 6 MV FFF. In the TG 119 sets, the MD for PD with both AAA and AXB was <0.9%. For the TG 244 sets, the minimum, maximum, and mean deviations in PD for both 6 MV FF and 6 MV FFF beams were 0.3%, 1.4% and 0.8% respectively. In the E2E test, using the Rando Phantom, the MD between the TLD dose and the TPS dose was within 0.08% for both 6 MV FF (p=1.0) and 6 MV FFF (0.018) beams. Conclusion: The accuracy of the TPS and its algorithms (AAA and AXB) has been successfully validated. The recommended tests included in the VMAT/IMRT validation section proved invaluable for verifying the PDD, output factors, and the feasibility of complex clinical cases. E2E tests were instrumental in validating the entire workflow from CT simulation to treatment delivery.


Introduction
Modeling the treatment planning system (TPS) is crucial for ensuring the accurate delivery of doses in external beam therapy [1].It plays a pivotal role in achieving the objectives of radiotherapy by calculating precise doses for patients to deliver the prescribed dosage to the tumor while minimizing radiation exposure to critical organs as much as possible [2].Intensity-Modulated Radiotherapy (IMRT) and Volumetric Modulated Arc Therapy (VMAT) have been widely adopted for over a decade.The International Commission on Radiation Units

Dosimetric Validation of Treatment Planning System for
Volumetric Modulated Arc Therapy Using AAPM Medical Physics Practice Guideline 5.b and validation of the IMRT/VMAT technique within TPS using various methods and measurement detectors [1,[6][7][8][9].Typically, IMRT/VMAT commissioning involves several steps, ranging from collecting beam data to making additional measurements for small beam apertures and the multileaf collimator (MLC) [10][11][12].The configuration of the multileaf collimator (MLC), with its rounded leaf ends, can introduce discrepancies between dosimetric and geometric field widths due to the additional X-ray transmission through the leaf ends [10].Recently published AAPM Medical Physics Practice Guidelines (MPPG) 5b validation tests include: Photon beams: basic dose calculation algorithm validation.
Electron beam dose calculation algorithm validation.
In this study, the area of interest was chosen to validate the IMRT/VMAT in photon beams and to evaluate the dosimetric accuracy in VMAT using two different dose calculation algorithms namely Anisotropic Analytical Algorithm (AAA) and Acuros XB (AXB).We have used the recommended detectors and methods for validating the accuracy of VMAT dose delivery [5].To verify the dose agreement between the TPS calculated dose using AAA and AXB algorithms with the measured dose for the 6 MV FF and 6 MV FFF photon beam, the patient-specific quality assurance (PSQA) performed using TG 119 and TG 244 study sets [13].
The Thermoluminescence Dosimeters (TLD)'s results were analyzed for the end-to-end test (ETET) using the Rando Phantom for the AAA and AXB calculated TPS algorithms for the photon energies 6 MV FF and 6 MV FFF.Also, we have studied the Dosimetric Leaf Gap (DLG) and compared target coverage metrics, Homogeneity Index (HI) and Conformity Index (CI), between the AAA and AXB algorithm-calculated plans.

Materials and Methods
The methodology for the Measurement and Verification MPPG 5b: Guideline for Photon Beams -Validation tests for VMAT/IMRT were summarized below: The first test recommends verifying the very small field PDD with diode chamber.The IBA 3D Blue Phantom (IBA Dosimetry GmbH, Germany) with a diode detector (IBA Dosimetry GmbH, Germany) (thickness of active volume 0.06 mm) was used for this test to verify the PDD measurements (as depicted in supplementary Figure 1).The Parameters such as PDD10cm (Percentage Depth Dose at 10 cm depth) and the depth of maximum dose (dmax) were compared with the TPS values for MLC field sizes of 1×1 cm 2 and 2×2 cm 2 for both 6 MV FFF and 6 MV FF photon beams generated by the Truebeam SVC Linear Accelerator (Varian, Palo Alto, USA).A virtual water phantom was constructed in Eclipse version 15.1 (Varian Medical Systems, Palo Alto, USA), and the measured PDD values were compared with the TPS-calculated values.
The second test recommends verifying the output for small MLC-defined fields at a clinically relevant depth.We have used the 0.053 cc A1SL Exradin Ion chamber (Standard Imaging, Middleton, WI, USA) with the slab phantoms at 10 cm depth for measuring the output factors for the field sizes of 2×2 cm 2 , 3×3 cm 2 , 4×4 cm 2 , 5×5 cm 2 , 6×6 cm 2 , 10×10 cm 2 , 15×15 cm 2 , and 20×20 cm 2 .These normalized output factors were compared with the TPS calculated values using algorithms AAA and AXB.
The third test recommends checking using the TG-119 test to plan, measure, and compare planning and QA results for both head and neck and C-shaped cases, with the Ion chamber.We have used the TG-119 data sets of the head and neck and C-shaped cases.VMAT plans were calculated in the TPS and performed the delivery quality assurance (DQA) for the point dose measurements using A1SL Ion chamber (with a sensitive volume of 0.053 cc), along with cheese phantom (Standard Imaging, Middleton, WI, USA), and fluence check using film dosimetry (Table 1).

PTV Shape and Region of Interest for Point Dose Measurement
In TG 119 cases, measurement points were chosen in the following areas 1) C Shape case -PTV, high dose region 2.5 cm anterior to the isocenter, 2) C shape case -within the OAR, low dose region at the isocenter, 3) H&N case-PTV, high dose region in the isocenter 4) H&N case-OAR, low dose region, 4.5 cm posterior to the isocenter The fourth test recommended the clinical tests, chosen at two relevant clinical cases, planned, measured, and performed an in-depth analysis of the results.We have downloaded the AAPM TG 244 image sets of the head and neck and the abdomen of the anonymized clinical cases (https://www.aapm.org/pubs/MPPG/TPS/).With these cases, VMAT plans were generated for 6MV FFF and 6MV FF photon beams (Table 3).The patient-specific quality assurance for the point dose was performed using the Cheese Phantom with a 0.053 cc A1SL Exradin ion chamber (Standard Imaging, Middleton, WI, USA).

Phantom Selection for End-to-End Test and Invivo Dosimetry
The fifth test recommends to simulate, plan and treat an anthropomorphic phantom with embedded dosimeters.We have simulated the head and neck parts of the Rando Phantom (shown in the supplement Figure 2(a)) in our dedicated RT CT scanner LightSpeed RT16 (WIPRO GE Healthcare PVT LTD).VMAT plans were generated in the TPS following the institutional protocol for the OAR constraints (Table 5).Thermoluminescence dosimeters (TLD) (Thermofisher Scientific) were used to perform the in-vivo dosimetry.The TLD inserted inside the PTV area of the oral cavity in the Rando Phantom.Cone Beam Computer Tomography (CBCT) was performed to verify the position of the Rando Phantom on the linear accelerator couch.

Complex Clinical Cases Selection and Plan Evaluation
In addition to the above test mentioned in the MPPG guidelines, all the TG-244 plans evaluation were done using the Conformity Index [14], CI given as: where TV PIV was the volume of target covered by prescription isodose, PIV was the prescription isodose in the total body, TV was the target volume and the homogeneity index from ICRU: Prescribing, Recording, and Reporting Photon-Beam Intensity-Modulated Radiation Therapy (IMRT).ICRU Report 83, [15], given as: where D 2 % was the dose received by the 2% absolute volume of target, D 98 % was the dose received by the 98% PDD10cm using a 6 MV FF beam for AAA and AXB were 0.70% and 0.70% respectively and the difference between the measured and calculated dmax values using AAA and AXB were 0.09 mm and 0.03 mm, respectively.Similarly, for the 6 MV FFF, the difference between the measured and calculated PDD10cm values using AAA and AXB were -1.40% and -1.10% respectively and the difference between the measured and calculated dmax values using AAA and AXB were 0.13 mm and 0.05 mm respectively.
In the C-shaped case of TG-119 tests, using the 6 MV absolute volume of target, and D 50 % was the dose received by the 50% absolute volume of target.The 0.6 cc Farmer Chamber (PTW-Freiburg, Germany) along with PTW Unidos electrometer used with 30×30×30cm 3 solid water phantom for the measurement of the DLG for the 6 MV FF beam and the 6 MV FFF beam at a depth of 10 cm in the phantom in the SAD 100 cm.

Results
The differences between the calculated and measured PDD values, PDD 10cm for a 1cm × 1cm field size using a 6 MV FF beam (Figure 2(a)), were 0.60% and 0.10% for AAA and AXB, respectively and the differences between the measured and calculated dmax values using AAA and AXB were 0.09 mm and 0.19 mm respectively.Similarly, for the 6 MV FFF (Figure 2(b)), the differences between the measured and calculated PDD10cm values using AAA and AXB were 0.78% and -0.98% respectively and the differences between the measured and calculated dmax values using AAA and AXB algorithms were 0.23 mm and 0.04 mm respectively.
Similarly, for 2 cm × 2 cm field size, the differences between the calculated and measured PDD values,  FF beam, the deviations for the measured and the TPS calculated point dose using AAA and AXB were -2.1% and -0.45%, respectively at the isocenter within the OAR region, where as the deviations between the measured and calculated point dose within the PTV region located 2.5 cm anterior to the isocenter using AAA and AXB were -2.05% and -2.05% (Table 4).Similarly, for the 6 MV FFF beam , C-shaped case, the deviation for the measured and TPS calculated point dose using AAA and AXB were -0.45% and 1.45% respectively at the isocenter within the OAR region, where as the deviation between the measured and the TPS calculated point dose using within the PTV region located 2.5 cm anterior to the isocenter using AAA and AXB were -1.3% and 0.45%, respectively.In the Head and neck case of TG-119 tests, using the 6 MV FF beam, the deviation for the measured and TPS calculated point dose within the PTV region using AAA and AXB were 0.6% and -1.2%, respectively, whereas the deviation between measured and calculated point dose within the OAR region using AAA and AXB were -2.05% and 0.1%, respectively.Similarly, for the 6 MV FFF beam, the deviations for the measured and TPS calculated point dose within the PTV region using AAA and AXB were 0.4% and -1.8%, respectively, whereas the deviation within the OAR region were -1.55% and -0.75%, respectively.The confidence limit (CL) achieved was 3.07%, the Standard Deviation (SD) was 1.10%, and the mean was 0.9%.The Gamma Pass for the criteria of 3%/3mm for the C shape case at the isocenter (minimum) and at 2.5 cm anterior (maximum) were 93% and 97% respectively and the mean Gamma pass of 95% (as shown in Figure 3).
In the Head and neck case of TG 244 sets, using the 6 MV FF beam, the deviation between the measured and TPS-calculated point doses using AAA and AXB algorithms were 0.66% and 0.66% respectively.Similarly, for 6 MV FFF beams using the same study sets, the percentage deviation between the measured and TPS calculated point dose using AAA and AXB were 1.72% and 0.88%, respectively.In the Abdomen case of TG 244 sets, using the 6 MV FF beam, the deviation between the measured and TPS calculated point dose using AAA and AXB algorithms were 0.56% and 1.69%, respectively.Similarly, for the 6 MV FFF beam in the same study set, the percentage deviation between the measured and TPS-calculated point doses using AAA and AXB was -0.13% and 1.57%, respectively.In the end-to-end (E2E) tests, the percentage deviation between the measured TLD dose for 6FF with AAA and AXB were -0.54% and 9.4% respectively and for the 6FFF beam with AAA and AXB were 2.36% and 8.03%, respectively.The overall mean difference between the TPS-calculated and measured TLD dose was within -0.08%.
The overall deviation in the HI and CI in the TG-244 study sets for Head and neck and Abdomen comparison between AAA and AXB shows insignificant for both 6 MV FF (p=0.3) and 6 MV FFF (p=0.15)beams (as shown in Table 2(b)).The Dosimetric Leaf Gap (DLG) and MLC transmission were very important parameters for commissioning the rounded end MLC in the Eclipse software.Apart from the MPPG guidelines, we have studied the DLG, and the values for the 6 FF beam and 6 FFF beam were 0.24 mm and 1.47 mm, respectively.

Discussion
The MPPG 5.b [5] provided a flexible, simple framework for the validating TPS dose calculation algorithms.We found that the implementation of the MPPG 5.b was very valuable task [5].The Eclipse software met the minimum tolerances set by MPPG 5.b [5] guidelines, which made the TPS calculation models clinically acceptable.We have shared our experience of implementation of the MPPG 5.b, which would benefit the future medical physics community in VMAT validation process at their institutions.We compared the measured PDD values with TPS calculated PDD values, calculated using the algorithms AAA and AXB for the field sizes 1 cm × 1 cm and 2 cm × 2 cm for 6X and 6FFF beams.The mean difference between the measured and calculated PDD, for the depth of dmax value, and the PDD10cm Table 5.Comparison of the Dose Achieved to the Critical Organs between AAA and AXB in the End-to-End Test value were 0.02 mm and 0.002%, respectively, for the 1 cm × 1 cm field size for both 6 MV FF and 6 MV FFF beams.Similarly, for the 2 cm × 2 cm field size, the mean differences between measured and calculated PDD, depth of dmax value, and PDD10cm value were 0.07 mm and 0.009% respectively for both 6FF and 6FFF beams.
The AAPM TG 155 [16] recommends the appropriate selection of detectors for the small field dosimetry and methods [16] In this report, dimension of the detector was recommended for small field dosimetry and also about the lateral charge particle equilibrium range and it was referred to as rLCPE.From the analysis of point doses in the TG 119 sets [13], minimum, maximum, and mean deviation in point dose measurements with AAA calculated dose for 6 MV FF beam were -2.1%, 0.6%, and -1.4% [13] where the minimum, maximum, and mean deviation of point dose measurement with that of the AXB calculated dose were -2.05%, -0.1%, and -0.95%.[13] Similarly, for 6 MV FFF beam, the minimum, maximum, and mean deviations of point dose measurement with AAA calculated dose were -1.55%, -0.4%, and -0.93% [13] whereas minimum, maximum, and mean deviation of point dose measurement with that of the AXB calculated dose were -1.8%, 1.45%, and -0.16% [13].From the above TG 119 study set, we observed that the mean deviation in all point dose measurements were within 1% for both 6 MV FF beams and 6 MV FFF beams using AAA and AXB algorithms [13] The point dose measurements for the IMRT/VMAT validation test procedures of AAPM MPPG 5.b, the validation test using two of the complex clinical cases from TG 244 namely Head and neck and Abdomen were performed and compared with the measured point doses.The minimum, maximum, and mean deviations achieved in the point dose measurement for 6 MV FF beam for the cases TG244 HN and TG 244 Abdomen with AAA calculation were 0.56%, 0.66%, and 0.61% whereas the minimum, maximum, and mean deviations in point dose measurements [5] In comparison, the minimum, maximum, and mean deviation of point dose with that of AXB calculation were 0.66%, 1.69%, and 1.17%.Similarly for the 6 MV FFF beam, the minimum, maximum, and mean deviations of point dose for the cases TG-244 HN and TG-244 Abdomen with AAA calculation were -0.13%, 1.72%, and 0.79% respectively, whereas the minimum, maximum, and mean deviation in point dose with that of AXB calculations were 0.28%, 1.57%, and 0.92% respectively.From the above point dose measurements of the TG-244 study sets, we observed the mean deviation in the point doses were within 1%.When compared to the 6FF beams (p=1.0), the agreement in the point dose measurements for 6FFF beams (p=0.018) was significant (Table 2a) Similar results were reported in earlier studies by Sarkar et al. and other authors, such as Sarkar et al. [17] and Manikandan et al. [18], Geurts et al. [5].From the point dose measurements in the TG-244 study sets, we observed that the mean deviation in the point doses was within 1% [5].The agreement in point dose measurements for 6FFF beams was significant compared to 6FF beams.The current set of measurements for preclinical validation of machine-or patient-specific quality assurance is superior to electronic portal imaging-based measurements using AXB, as the latter equipment does not account for heterogeneity correction [19].
The verification of calculation accuracy of the of the treatment planning system was very important in the quality assurance of radiotherapy treatment process.The most basic measurement for the patient-specific plan verification was a point-dose measurement [17].The maximum and minimum doses across the Ion chamber should be within 5% of the mean chamber dose to minimize the effect of volume averaging over a gradient region and this should be taken as good rule of thumb.Lang et al. [20], presented pre-treatment QA data for 224 cases from four centers measured with different verification devices to assess the reliability of flattening filter-free beam delivery for IMRT and VMAT techniques.
They found excellent agreement between dose calculation and dose delivery for these beams, with an average passing rate of 99.3% (±1.1%) for IMRT and 98.8% (±1.1%) for VMAT using tolerance limits of 3% and 3 mm.For 52 of the cases, dose verification at a single position was performed with an IC, either a Pinpoint chamber or a Farmer chamber, with a mean dose deviation of only 0.34%.They found that the passing rate was independent of the maximum dose rate used during the irradiation of the arc.However, the increasing ratio of monitor units (MU) to the dose per fraction indicating that highly modulated plans had slightly worse QA results.
In the end-to-end test with Rando Phantom with TLD embedded inside the PTV area, for both 6 MV FF and 6 MV FFF beams, the mean deviation between TLD measured dose and the TPS calculated dose using both algorithms AAA and AXB doses were within -0.08%.Carson et al. [6] investigated the performance of radiotherapy centers using the anthropomorphic phantom paper found that 83% of the irradiations met the credentialing criteria, which were based on dose deviations and gamma analysis.However, only 31% of the irradiations met a stricter criterion that also required adequate target coverage.The several factors associated with poor performance, such as complex delivery techniques, lack of image guidance, and errors in contouring and planning, Jacqmin et al. [21] observed that tests 3 and 4 were found to be useful in the refinement of the MLC model parameters, such as the MLC transmission and dosimetric leaf gap.
We have additionally compared the PTV coverage metrics of HI and CI of TG-244 study set plans calculated using AAA and AXB algorithms.The overall deviation in the HI and CI in the TG 244 study sets of Head Neck and Abdomen between AAA and AXB shows insignificant for both 6 MV FF and 6 MV FFF beams.This gives the additional information about how good the PTV coverage throughout the plans for the complex clinical cases.
In conclusion, the results demonstrate the accuracy of the treatment planning software Eclipse and algorithms its AAA and AXB.From our experience the validation tests can be effectively performed for dose calculation models.The recommended tests included in the VMAT/ IMRT validation section were helpful for verification of the PDD, Output factors, feasibility of complex TG-244 clinical cases apart from the benchmark TG-119 study sets.Also, the end-to-end tests helps in validating of complete workflow right from the CT simulation to treatment delivery.

Figure 1 .
Figure 1.Comparison of Measured Output Factors with TPS AAA and ACUROS (a) 6 MV FF (b) 6MV FFF for various field sizes

Figure 3 .
Figure 3. Gamma Analysis of Gafchromic Film for the TG 119 C-Shape Target -Film Placed at the Isocenter in the coronal plane-Gamma Pass Criteria -3%/3mm

PDD Depth in cm 6FFF PDD_1cm x 1cm AAA Vs ACUROS
Plan measure and compare the planning and QA results to TG 119 report for both Head and Neck and C shaped cases IBA Slab phantom, A1SL Chamber.Gafchromic Film and PTW Verisoft software 4 Clinical tests At least two clinical cases plan, measure and perform in-depth analysis of results TG 244 clinical cases of Head and Neck and Abdomen.A1SL chamber, Portal Dosimetry 5 External review Simulate, plan and treat an anthropomorphic phantom with embedded dosimeters Rando Phantom -Head and Neck site with Thermofisher TLD dosimeters

Table 1 .
Summary of the Equipments Used for VMAT Dose Validation Tests from AAPM MPPG 5b

Table 2 (
a). Point Dose Measurements for the TG 244 Study Sets

Table 2 (
b). HI and CI Analysis for TG 244 Study Sets

Table 3 .
Constraints for Critical Organs for the TG 244 Benchmarking Test Suite

Table 4 .
Point Dose Measurements for the TG 119 Benchmarking Tests