Skip to main content

Command Palette

Search for a command to run...

Use GPT Test LeetCode SQL Problems

Published
6 min read
R

Engineer, DBA, data scientist. Author of an AI LLM driven SQL/BigData tool-TableChat, join our discord for products: https://discord.gg/6nhCmcNMhd

Many people have doubts about the capibility of LLMs when generating SQL with natural language. They worry about the efficiency, accuracy, usability of using text instead of writing SQL directly. Some people may think you need precise natural language that will be longer than SQL itself, and when it comes to complex queries, it is even hard to describe the problem. I also have the same doubt, so I use my tool TableChat(based on ChatGPT3.5, it is for developers and can easily transfer the schema to GPT) to test the free LeetCode SQL problems to check the performance of AI in SQL generation. Here is the result.

ProblemsOriginal TaskPromptsDiffcultyTried timesGenerated SQL
175Write a solution to report the first name, last name, city, and state of each person in the Person table. If the address of a personId is not present in the Address table, report null instead.Write a solution to report the first name, last name, city, and state of each person in the Person table. If the address of a personId is not present in the Address table, report null instead.Easy1
181Write a solution to find the employees who earn more than their managers.Write a solution to find the employees who earn more than their managers.Easy1Too easy to show
182Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.Write a solution to report all the duplicate emails. Note that it's guaranteed that the email field is not NULL.Easy1Too easy to show
183Write a solution to find all customers who never order anything.Write a solution to find all customers who never order anything return name as CustomersEasy1Too easy to show
197Write a solution to find all dates' Id with higher temperatures compared to yesterdayWrite a solution to find all dates' Id with higher temperatures compared to yesterdayEasy2(description is not clear at first)Too easy to show
176Write a solution to find the second highest salary from the Employee table. If there is no second highest salary, return null (return None in Pandas).Write a solution to find the second highest salary from the Employee table. If there is no second highest salary, return null. return SecondHighestSalary.Medium1Too easy to show
178The ranking should be calculated according to the following rules:The scores should be ranked from the highest to the lowest.If there is a tie between two scores, both should have the same ranking.After a tie, the next ranking number should be the next consecutive integer value. In other words, there should be no holes between ranks.rank the item with score, After a tie, the next ranking number should be the next consecutive integer value. return the score and the rank.Medium2Too easy to show
180Find all numbers that appear at least three times consecutively.Find all distinct numbers that appear at least three times consecutivelyMedium1SELECT Distinct(num) as ConsecutiveNums FROM Logs WHERE (SELECT COUNT(*) FROM Logs AS l2 WHERE l2.num = Logs.num AND l2.id >= Logs.id AND l2.id <= Logs.id + 2) = 3;
184Write a solution to find employees who have the highest salary in each of the departments.Write a solution to find employees who have the highest salary in each of the departments.Medium1SELECT d.name AS department, e.name AS employee, e.salary FROM Employee e JOIN Department d ON e.departmentId = d.id WHERE e.salary = ( SELECT MAX(salary) FROM Employee WHERE departmentId = e.departmentId ) ORDER BY d.name;
550Write a solution to report the fraction of players that logged in again on the day after the day they first logged in, rounded to 2 decimal places. In other words, you need to count the number of players that logged in for at least two consecutive days starting from their first login date, then divide that number by the total number of players.Step1. PLAY_FIRST_LOGIN which select player_id, event_date from the dataset, the event_date is the first time the player appear. Step2. Choose from PLAY_FIRST_LOGIN and Activity to get the players who login the next day of the event_date in PLAY_FIRST_LOGIN, return the number of the playersStep3. calculate the fraction, divide that the number above by the total number of deduplicate players,rounded to 2 decimal placesMedium3+SELECT ROUND(COUNT(DISTINCT a.player_id) / (SELECT COUNT(DISTINCT player_id) FROM Activity), 2) AS fraction FROM ( SELECT player_id, MIN(event_date) AS first_login FROM Activity GROUP BY player_id ) a JOIN Activity b ON a.player_id = b.player_id WHERE b.event_date = DATE_ADD(a.first_login, INTERVAL 1 DAY);
185A company's executives are interested in seeing who earns the most money in each of the company's departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department.Write a solution to find the employees who are high earners in each of the departments.who earns the most money in each of the company's departments. A high earner in a department is an employee who has a salary in the top three unique salaries for that department. return Department, Employee and SalaryHard1SELECT d.name AS Department, e.name AS Employee, e.salary AS Salary FROM Department d JOIN Employee e ON d.id = e.departmentId WHERE (SELECT COUNT(DISTINCT salary) FROM Employee WHERE departmentId = d.id AND salary >= e.salary) <= 3 ORDER BY d.name, e.salary DESC;
601Write a solution to display the records with three or more rows with consecutive id's, and the number of people is greater than or equal to 100 for each.Return the result table ordered by visit_date in ascending order.Step1. return all records as Temp1(CTE1) with consecutive key id(order by id) and the people is greater than or equal to 100, calculated difference of id and rownumber as column id_diff. Step2: return all records and count group by id_diff rom Temp1 as CTE2.Step3: rentur id, visit_date, people FROM cte2 where cound is not less than 3 and order by visit_dateHard3+WITH Temp1 AS ( SELECT id, visit_date, people, id - ROW_NUMBER() OVER (ORDER BY id) AS id_diff FROM Stadium WHERE people >= 100 ), CTE2 AS ( SELECT id_diff, COUNT(*) AS count FROM Temp1 GROUP BY id_diff ) SELECT id, visit_date, people FROM CTE2 JOIN Temp1 ON CTE2.id_diff = Temp1.id_diff WHERE count >= 3 ORDER BY visit_date;

As we can see above:

  • GPT can easily handle all easy and most medium problems(175, 181, 182, 183, 197, 176, 178) without modify too much the description of the task.

  • For some medium or hardproblems, we should modify the description to make it more clear, such as problem 178, 185.

  • It can generate some complex statement such as SubQuery(184, 185), Function(180, 550) and even CTE(601).

  • For some hard problems(550, 601), it is hard to understand the simple natural language description, you should teach the GPT step by step, the clearer you tell it, the more stable it will perform.

Conclution: Using AI to generate SQL can be powerful especially for simple problems. As with any new technology, AI is not a panacea, and you shouldn't expect to be able to just copy & paste SQL queries without understanding how your data works. However, AI makes it significantly more effective and convenient to learn and wield the powers of complex SQL.