谷歌浏览器插件
订阅小程序
在清言上使用

AI Did Not Pass Finnish Plastic Surgery Written Board Examination

Journal of Plastic, Reconstructive & Aesthetic Surgery(2023)

引用 0|浏览0
暂无评分
摘要
Large language models (LLM) employ neural networks, brain mimicking algorithms, and are trained to understand and generate human language and produce human-like responses.1Sallam M. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns.. 2023; 11: 887Google Scholar The first, publicly available LLM is ChatGPT (OpenAI, San Francisco, CA, USA), launched Nov 30, 2022. ChatGPT nearly passed the threshold of 60% accuracy in the United States Medical Licensing Exam (USMLE) three exams.2Kung T.H. Cheatham M. Medenilla A. et al.Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models.PLoS Digit Health. 2023; 2e0000198Crossref PubMed Google Scholar ChatGPT-3.5 achieved an overall accuracy of 46.8%, while GPT-4 demonstrated an overall accuracy of 76.4% in Korean general surgery board exams.3Oh N. Choi G.S. Lee W.Y. ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models.Ann Surg Treat Res. 2023; 104: 269-273Crossref PubMed Scopus (14) Google Scholar However, it must be noted, that the examinations were multiple choice questions. We were curious to test whether two publicly available and free LLMs, specifically ChatGPT and Microsoft Bing would pass the national Finnish plastic surgery written examination. Our written examination is based on three essay questions, supplemented with short-answer questions. Both authors are national board examinators. Previous national board examination questions for all specialties are available freely at (https://www.laaketieteelliset.fi/ammatillinen-jatkokoulutus/erikoislaakari-jaerikoishammaslaakarikuulustelu/ilmoittautuminen/vanhat-kuulustelukysymykset) in Finnish. To successfully pass the examination, a participant must accumulate a minimum of 15 points. Additionally, no question should be assigned a score of 0 points. The responses were scored according to pre-agreed criteria. The scoring was made independently, adhering to the protocol of the real exam. Neither ChatGPT nor Microsoft Bing pass the exam, Table 1. ChatGPT performance was slightly better, Table 2 illustrates the responses. ChatGPT responses were wordy and read well. However, the text seems to be of general knowledge, not deep knowledge. The responses for essay questions ended in a disclaimer.Table 1Questions and performance of ChatGPT and Microsoft Bing in Finnish plastic surgery board examination.QuestionAssessmentMaximum pointsChatGPTBING1a) Quilting Stitches (QS) / Progressive Tension Stitches (PTS): Briefly explain what it means and why to use-0.5 p: skin (scarpa) is sewn to the base (e.g. muscle/ fascia) to reduce dead space.-0.5 p: These can be used e.g. omit drainage.-0.5 p: may reduce seroma formation.-- 0.5 p: used e.g. abdominoplasties and mastectomies.2121b) Venous flow through flap (VFTF)-0.5 p: These flaps consist of skin, subcutaneous tissue, and veenas.-0.5 p: inflow/ outflow from veena. VFTF does not have a native arterial inflow circuit. All flow proceeds to and from the flap through the venous plexus.-0.5 p: are thin and pliable, and their veins are similar in size to the veins of the hand.-0.5 p: raising them results in minimal donor site morbidity2111c) Fillet flap: briefly explain what kind of tab it is and what it can be used for, for example-0.5 p: an amputated part or part that cannot be spared, but can be used to treat a complex trauma or defect caused by a tumor.-0.5 p: axial tabs which can be used as a stem or microvascular graft-0.5 p: multi-tissue grafts-0.5 p: eg. With a crushing injury to the foot, the heel part can be put on the end of the shin stub.20.502) A 75-year-old man who has had a wounding skin lesion at the end of his nose for a long time will be brought to your appointment. A biopsy taken at a health center has revealed squamous cell carcinoma. How do you examine and treat?-1 p: Refinement of the anamnesis: diseases of the patient, condition, ko-operation, self-help-1 p: Is the tumor mobile, what size, is there an indication of a larger growth than the tumor?-1 p: Do you feel something on your neck?-1 p: Are imaging studies needed? MRI/ CT/ neck US?-0.5 p: No routine sentinel lymph node examination-0.5 p: MDT mention, monitoring-0.5 p: What are the treatment options? Skin graft, tabs, cartilage graft-0.5 p: Neck dissection, in case of neck disease62.513) A 34-year-old woman with a referral cutis laxa will come to your outpatient clinic. He has been taking semaglutide (Oxempic) for a year now with the indication for inadequate DM2 control. He has lost 40 kg during this time. She is 166 cm tall and weighs 87 kg when weighed in your practice, so her BMI is 31.6. The main trouble for him is from hanging belly coverings, folds brew and begin to smell. He hasn't had any skin infections that required ab-treatment. You notice a splash of skin below her navel, and a moderate splash of skin above her navel, plus a pubic mound hangs. She is interested in abdominoplasty and has already researched it herself online. Her biggest concern is having children and pregnancy after abdominoplasty. What do you do and what additional An emergency responder calls for a consultation about an elderly 90-year-old woman who has tripped at home and bumped her leg. It has developed a large hematoma. The patient undergoes treatment with Marevan. How do you advise an on-call operator? What issues should be taken into account and how should the lady be treated?-1 p national criteria not met – BMI above 30-1 p weight has not been stable for yet 1 year-1 p Question asked by: do you smoke-1 p No imaging studies-1 p not yet offering abdominoplasty to the patient – national criteria-0.5 p ASA class ≤ 3-0.5 p Pregnancy and childbirth are safe after abdominoplasty60.50.54) An emergency responder calls for a consultation about an elderly 90-year-old woman who has tripped at home and bumped her leg. It has developed a large hematoma. The patient undergoes treatment with Marevan. How do you advise an on-call operator? What issues should be taken into account and how should the lady be treated?-1 p History (when come), clinical picture (size, skin condition/ color)-1 p evacuation in the operating room (puncture is not recommended), possibly left open.-1 p treatment of an open wound, later possibly skin graft.-1 p where to treat (hospital).-0.5 p: pretibial hematoma.-0.5 p: exclusion/consideration of other injuries (e.g. fractures).-0.5 p: INR value.-0.5 p: definition of dermatoporosis: thinning of the skin.61.50.5Total247.55FailFail Open table in a new tab Table 2Answers of ChatGPT and Microsoft Bing in Finnish plastic surgery board examination.QuestionAssesmentChatGPTBING1a)-0.5 p: skin (scarpa) is sewn to the base (e.g. muscle/ fascia) to reduce dead space.-0.5 p: These can be used e.g. omit drainage.-0.5 p: may reduce seroma formation.-0.5 p: used e.g. abdominoplasties and mastectomies.Quilting sutures (QS) or progressive tension sutures (PTS) are surgical techniques used to close a wound after surgery.In the QS technique, special sutures are used, which are placed at the edges of the wound densely and deep into the tissue. These stitches are firmly fixed under the skin, thereby creating an even tension on the wound surface. This will help reduce tension between the edges of the wound and prevent it from rupturing or dissipating.The PTS technique is similar to QS, but involves the gradual or stepless placement of sutures on the edges of the wound. Initially, the sutures are placed near the center of the wound, and then proceed towards the edges. This will help to evenly distribute the tension over the entire wound area and reduce its concentration in one place.QS and PTS techniques help reduce tension between wound edges, thereby preventing complications such as wound dehissiveness (opening) or infection. These techniques are especially used for large surgical wounds, such as the abdomen, chest, or back, where closing the wound can be challenging and tension higher.It is important to note that the use of QS and PTS techniques is considered on a case-by-case basis and may be part of a broader wound care strategy. The surgeon decides on the best method, taking into account the individual needs of the patient and the characteristics of the wound.Quilting sutures (QS) or progressive tension sutures (PTS) are surgical sutures used to reduce seroma formation and other complications in abdominoplasty without the use of drains. PTS stitches are also useful for dead-space management, which minimizes tension in the suture line and reduces seroma formation. However, the use of PTS stitches is variable among plastic surgeons. Springer. (2022). Global Prevalence and Preferences of Progressive Tension Suture Usage in Abdominoplasty: A Systematic Review and Meta-analysis. https://doi.org/10.1007/s00266-022-03146-5 3: PubMed. (2021). Global Prevalence and Preferences of Progressive Tension Suture Usage in Abdominoplasty: A Systematic Review and Meta-analysis. https://pubmed.ncbi.nlm.nih.gov/36316457/Oxford Academic. (2021). Expanding the Progressive Tension Suture Concept: A Study of Its Efficacy in Reducing Seroma Formation in Body Contouring Surgery. https://doi.org/10.1093/asj/sjab1191b)-0.5 p: These flaps consist of skin, subcutaneous tissue, and veenas.-0.5 p: inflow/ outflow from veena. VFTF does not have a native arterial inflow circuit. All flow proceeds to and from the flap through the venous plexus.-0.5 p: are thin and pliable, and their veins are similar in size to the veins of the hand.-0.5 p: raising them results in minimal donor site morbidityVenous flow through flap (VFTF) is a concept that refers to the flow of venous blood through a graft or valve. It is a significant factor that affects the viability and healing of the graft or valve.The purpose of VFTF is to ensure that the tissue to be transplanted or taken as a flap receives sufficient blood flow and oxygen supply to remain viable and heal successfully. Venous blood flows through the blood vessels of the valve or graft and carries oxygen and nutrients to the tissue. At the same time, venous blood removes metabolic by-products and carbon dioxide.VFTF optimization is important in surgeries where tissue repairs or grafts are performed. Surgeons aim to establish adequate blood flow connections between the graft and the receiving area so that venous blood can flow unhindered. This can be achieved, for example, through microsurgical techniques such as anastomoses (connecting blood vessels) or microsurgical suturing.Accurate evaluation and optimization of VFTF are critical to achieving successful tissue repairs or transplants. This helps ensure that the transplanted tissue or valve receives sufficient blood flow to support healing and avoid the risk of transplant failure or complications.Venous flow-through flaps (VFTFs) are useful reconstructive options, especially for the repair of segmental vascular damage. They are relatively easy to collect and give several advantages in the donor area. However, VFTF is based on one central vein, so their survival depends on this vein.PubMed. (2012). The nomenclature of venous flow-through flaps: updated classification and review of the literature. https://pubmed.ncbi.nlm.nih.gov/22434451/1c)-0.5 p: an amputated part or part that cannot be spared, but can be used to treat a complex trauma or defect caused by a tumor.-0.5 p: axial tabs which can be used as a stem or microvascular graft-0.5 p: multi-tissue grafts-0.5 p: eg. With a crushing injury to the foot, the heel part can be put on the end of the shin stub.A fillet tab, also known as a soft tab, is a surgical technique that uses a piece of tissue or graft displaced or wrapped around a damaged or operated area. This will help protect the affected area, promote healing, and improve the esthetic outcome.The fillet tab is most commonly used in the mouth and face area, especially to correct injuries to the lips, mouth or cheeks. For example, if a person has a wound in the lip area, the fillet tab can be taken from nearby soft tissues, such as the oral mucosa or cheek tissue, and transferred to the wound area.The purpose of the tongue is to bring blood circulation and tissue support to the affected area, as well as to help prevent infection and support healing. It also improves the appearance of the wound and reduces the formation of scar tissue.The use of a fillet tab is usually part of complex surgical treatment and requires special skills. The surgeon assesses on a case-by-case basis whether the use of a fillet tab is a suitable option and, based on this, decides on the patient's individual needs and wound characteristics.The fillet tab is a tissue tab containing muscle, fascia, subcutaneous layer, and skin. A tensor fasciae latae (TFL) tab taken from the anterior lateral parts of the thigh is usually used. The length of the tab can be up to 35-40 cm, and the width - 20-25 cm. Taking a tab does not cause significant functional inconvenience.The Fillet tab can be used, for example, in breast prostheses correction and dilated ALT (=anterolateral thigh) tab indications.Duodecim. (2022). Kudoskielekkeet rekonstruktiivisen plastiikkakirurgian arkea. https://www.duodecimlehti.fi/duo96423Duodecim. (2017). Rintaproteesin rikkoutumisepäilyn tutkiminen ultraäänellä. https://www.duodecimlehti.fi/duo13517Duodecim. (2019). ALT-kielekkeen käyttö laajentunut. https://www.duodecimlehti.fi/duo954642)-1 p: Refinement of the anamnesis: diseases of the patient, condition, ko-operation, self-help-1 p: Is the tumor mobile, what size, is there an indication of a larger growth than the tumor?-1 p: Do you feel something on your neck?-1 p: Are imaging studies needed? MRI/ CT/ neck US?-0.5 p: No routine sentinel lymph node examination-0.5 p: MDT mention, monitoring-0.5 p: What are the treatment options? Skin graft, tabs, cartilage graft-0.5 p: Neck dissection, in case of neck diseaseIn the case of suspected squamous cell carcinoma, I would perform the following studies and consider an appropriate method of treatment:Clinical evaluation: I would carefully examine the patient's nasal head ulcer and assess its size, location, and possible symptoms. I would also ask the patient about the presence of other symptoms, such as nasal congestion or nosebleeds.Imaging: I could order imaging studies, such as computed tomography (CT) or magnetic resonance imaging (MRI), to determine the spread of a possible tumor in relation to the nasal cavity or adjacent structures.Histological confirmation: Although you mentioned that a biopsy had already been performed at a health center and squamous cell carcinoma had been diagnosed, it may be necessary to obtain a new histological evaluation of the tumor prior to treatment. This will help confirm the diagnosis and provide more accurate information about the characteristics of the tumor.Treatment plan: Treatment depends on the spread of the tumor and the patient's general health. Often, squamous cell carcinoma of the nasal area can be treated surgically, in which the tumor is removed by surgical resection. After surgery, plastic surgery techniques, such as tissue grafts, can be performed if necessary to close the wound and improve the cosmetic outcome.Follow-up: Squamous cell carcinoma may require continuous monitoring after treatment. The patient may be asked to come for regular control visits to assess wound healing, check for signs of recurrence, and evaluate nasal function.It is important to remember that this answer is based on general treatment practices, and the treatment plan may vary from case to case. The best way to determine treatment methods and plans is individually based on the patient's case-specific factors, based on a thorough assessment and discussion by a doctor.Squamous cell carcinoma is the most common cancer in the nasal area. Its treatment depends on the spread of the cancer. If the cancer is small and has not spread elsewhere, it can be removed surgically. If the cancer has spread to a wider area, radiation therapy or chemotherapy is often required.The examination may use nasal computed tomography (CT) or magnetic resonance imaging (MRI) to determine the spread of the cancer.Terveyskirjasto. (2022). Nenän alueen syövät. https://www.terveyskirjasto.fi/terveyskirjasto/tk.koti?p_artikkeli=dlk01005: Terveyskirjasto. (2022). Nenän alueen syövät - tutkimukset ja hoito. https://www.terveyskirjasto.fi/terveyskirjasto/tk.koti?p_artikkeli=dlk01005#s43)-1 p national criteria not met – BMI above 30-1 p weight has not been stable for yet 1 year-1 p Question asked by: do you smoke-1 p No imaging studies-1 p not yet offering abdominoplasty to the patient – national criteria-0.5 p ASA class ≤ 3-0.5 p Pregnancy and childbirth are safe after abdominoplastyIn the patient's situation, I would first note cutis laxa, i.e. loosening of the skin in the abdominal area due to significant weight loss. The patient has pronounced symptoms, such as sagging abdominal covers and odor problems, which significantly affect his quality of life.Anamnesis and clinical evaluation: I will find out more about the patient's weight loss history, as well as the onset of cutis laxa and the duration of symptoms. I carefully examine the abdominal area, assess the extent of loosening of the skin and the observed changes. I also discussed the patient's expectations and his interest in abdominoplasty.History of pregnancy: I ask the patient about previous pregnancies and births, as well as possible complications such as rupture or diastasis (separation of the abdominal muscles).Gynecological evaluation: If necessary, I recommend a gynecological examination of the patient, in which possible weakness of the pelvic floor muscles or other gynecological factors that may affect the post-pregnancy condition can be assessed.Further studies: I would consider the patient's general health and the effects of diabetes medication (semaglutide) on surgery. If necessary, I would recommend laboratory tests, such as monitoring blood glucose levels and other laboratory tests that I deem relevant. Of the imaging examinations, I would consider, for example, abdominal ultrasound or computed tomography in order to more accurately assess the condition of the abdominal coverings, muscle diastasis, and any other anatomical abnormalities.Informing the patient: I discuss with the patient the potential benefits, risks, and limitations of abdominoplasty, especially after pregnancy. I explain that pregnancy can affect the condition of the abdominal covers and possibly require further treatment or repetition of surgery.It is important to emphasize that this answer is based on general treatment practices and a fictitious case. Final decisions and treatment plans should be made based on the individual needs of the patient and a thorough assessment by the doctor.Abdominoplasty is a tummy tuck repair surgery to remove excess sagging and loose skin from the lower abdomen. The final result will be seen in 3-6 months. After abdominoplasty, pregnancy is possible, but it can affect the result of surgery. After pregnancy, resurgery may be necessary. Additional questions and examinations may include, for example: Does the patient have other diseases? Is the patient pregnant or planning a pregnancy? Does the patient have allergies or other medications? Does the patient have previous surgeries?4)-1 p History (when come), clinical picture (size, skin condition/ color)-1 p evacuation in the operating room (puncture is not recommended), possibly left open.-1 p treatment of an open wound, later possibly skin graft.--1 p where to treat (hospital).-0.5 p: pretibial hematoma.-0.5 p: exclusion/consideration of other injuries (e.g. fractures).-0.5 p: INR value.-0.5 p: definition of dermatoporosis: thinning of the skin.It sounds like you're advising the ER as follows, considering the hematoma caused by a trip in a 90-year-old lady and Marevan treatment:Assess the severity of the injury: Ask the emergency worker for more information about the extent of the injury and the patient's symptoms. It is important to determine whether the hematoma is the only injury or if there are other signs, such as a fracture, open wound or severe pain.Risk of bleeding: Marevan is a blood thinner that may increase the risk of bleeding. Ask if the patient has any previous bleeding problems or has had bleeding in the past while taking Marevan. This will help assess the risk of bleeding and the possible severity of bleeding.Recommend a doctor's assessment: Given the size of the hematoma and the patient's age, it is important to recommend that the patient receive a doctor's evaluation. The doctor can assess the severity of the injury, check for a possible fracture or internal bleeding and, if necessary, perform other necessary examinations, such as an X-ray.Blood thinners management: When taking Marevan, it is important to monitor blood clotting values (INR) and, if necessary, adjust medication. Recommend that the on-call physician discuss with the patient's physician the continuation of blood thinning therapy and, if necessary, the revision of the INR.Application and follow-up: The on-call physician may be advised to recommend that the patient treat the hematoma with rest, cold compress, and elevation. It is also important to advise the patient to monitor for swelling, pain, bleeding, or other symptoms in the area of injury. If the condition worsens or worrying symptoms appear, the patient should be immediately referred for medical attention.It is important to emphasize that this answer is based on general treatment practices and an imaginary situation. Final decisions and treatment plans should be made based on the patient's individual needs, the assessment of the on-call physician, and the evaluation by the doctor.Marevan is a blood thinner that prevents the formation of blood clots. For the treatment of hematoma, you can use cold therapy and elevation. If the hematoma is large or the patient is undergoing treatment with Marevan, it is important to monitor the INR and, if necessary, adjust the dose of Marevan.You can advise the on-call operator as follows:The patient should be referred to a doctor's office.The patient should be instructed to keep the leg elevated and use cryotherapy.The patient should be instructed to monitor INR and, if necessary, adjust the dose of Marevan.Terveyskirjasto. (2022). Hematooma (verenpurkauma). https://www.terveyskirjasto.fi/terveyskirjasto/tk.koti?p_artikkeli=dlk00234Terveyskirjasto. (2022). Verenohennuslääkkeet. https://www.terveyskirjasto.fi/terveyskirjasto/tk.koti?p_artikkeli=dlk00325Terveyskirjasto. (2022). Marevan. https://www.terveyskirjasto.fi/terveyskirjasto/tk.koti?p_artikkeli=dlk00326 Open table in a new tab Microsoft Bing´s responses were shorter, and it added references to the end of the response. Most of the references were appropriate, and the links led to respectable sites, in most cases to Finnish databases, Table 2. Yet, the question regarding abdominoplasty, the links led to a private hospital website and to a popular Finnish chatroom (data not shown). Either of the LLMs did not have national recommendations as sources. Also, it appeared, that Microsoft Bing didn't understand all the questions correctly e.g. venous flow through the flap. We translated the questions and responses in ChatGPT and validated that the translation corresponded to the original text. However, some of word in the translated text, most noticeable tab meaning flap is at least to say strange and is not the terminology we plastic surgeons use in our inter-specialty communication. Of note the translation was done only for the purpose of this article. Overall, these AI responses were not sufficient to pass our exam. The responses were incomplete, or the context was understood differently or incorrectly. A little surprise, given the previous studies’ relatively good performance of LLMs in medical board exams.2Kung T.H. Cheatham M. Medenilla A. et al.Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models.PLoS Digit Health. 2023; 2e0000198Crossref PubMed Google Scholar, 3Oh N. Choi G.S. Lee W.Y. ChatGPT goes to the operating room: evaluating GPT-4 performance and its potential in surgical education and training in the era of large language models.Ann Surg Treat Res. 2023; 104: 269-273Crossref PubMed Scopus (14) Google Scholar The questions were copied and pasted to LLM, and that might be one source of the inferior performance of LLMs detected in this study. LLMs perform according to prompts i.e. inputs or queries or work orders. Prompts direct the behavior of LLMs, and based on the prompt design, response may be accurate or not so accurate. Therefore prompts like “Venous flow through flap” seem to be difficult for LLM to understand, although for the human resident, the work order is clear; write on the entity known as “venous flow through flap” and not “venous flow through” AND “flap”. This all comes to human intelligence versus artificial intelligence. The machine unconsciousness was proved in questions 2, 3, and 4 (Table 1) that required “larping” as a plastic surgeon, as every ChatGPT response ended with disclaimer, Table 2. It is vital to keep these disclaimers in the AI responses also in the future, as more and people are seeking AI for medical advice.4Van Bulck L. Moons P. What if your patient switches from Dr. Google to Dr. ChatGPT? A vignette-based survey of the trustworthiness, value, and danger of ChatGPT-generated responses to health questions.Eur J Cardiovasc Nurs. 2023; zvad038Google Scholar The terminology that is in everyday use in plastic surgery in Finland, differs from the data from which the AI gathers its information. Microsoft Bing provided references or links where the information was retrieved. AI was unable to consider national practices and guidelines. In most cases, the references or resources were accurate, but we wonder why popular chatroom was used as a reference. This has been recognized as downside of LLMs using not validated references and databases such as PubMed. Desaire et al. reported significant differences between human and AI-produced scientific text,5Desaire H. Chua A.E. Isom M. Jarosova R. Hua D. ChatGPT or academic scientist? Distinguishing authorship with over 99% accuracy using off-the-shelf machine learning tools.arXiv Prepr arXiv. 2023; 230316352Google Scholar AI-produced scientific text was less complex compared to human scientists´. AI tended to produce incomplete responses to questions. Especially worrying if it is used for studying purposes. The situational knowledge that is in the core of medical expertise, is not transmitted with AI. AI can be a tool for supplementary sources of information. Residents may not necessarily be able to distinguish what is correct and what is incorrect in the AI's responses and how to use the knowledge in real life situations. AI may generate questions and model answers. With the model answer, examiners may change the focus of the question or make it more easily understandable. None.
更多
查看译文
关键词
AI,Medical education,Plastic surgery,Surgery,Large language model,Board examination
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要