DL-$;elect: a decision-list-based data-mining system

Karl Weinmeister

AAAI '98/IAAI '98 Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence（1998）

引用 23|浏览75

暂无评分

摘要

The application of machine-learn ing algorithms to the financial markets has been increasing in popularity in recent years. The majority of systems that have been created for the purpose of selecting stocks have utilized neural-network techniques. Our research has dealt with the feasibility of inductive logic approaches and the creation of a decision-list-based data-mining system, DL- $elect. Neural networks can model a variety of data distributions and handle inconsistent data well. But for complex problems such as financial analysis, the structure of a neural network can be difficult to interpret. Decision lists (Rivest 1987), however, are represented in an easily understood form: an extended "if-then-elseif-...else-" rule. Iterative algorithms for decision lists append rules into a list and remove examples from the data set that are covered by these rules. Effective future learning depends on early rule selection, which, if made poorly, can reduce the accuracy of the entire decision list. The algorithm described by Rivest (Rivest 1987) avoids this obstacle by assuming 100% accurate rules in the training data, but consequently leaves open the problem of noisy data. The learning algorithm used in DL-$elect, BruteDL (Segal and Etzioni 1994), addresses this and other issues by conducting a single search for homogenous rules—rules in which accuracy is independent of list position. Since homogenous rules need not be 100% accurate, BruteDL is better suited to handle the noise of financial-market data. There has been significant discourse in the financial and academic community regarding the efficiency of financial markets. The efficient-markets hypothesis asserts that stock prices already reflect any available information, rendering forecasting attempts useless. DL- $elect is based on the notion that markets do in fact exhibit short-term inefficiencies—trends from the previous week of activity carry over to the next week. The portfolio gleaned from DL-$elect is not intended for a buy-and-hold strategy; rather, it is meant for weekly changes. By using fresh data each week, Dl-$elect avoids the issue of non- stationarity, in which statistical properties of the market change with time. Two key data elements are needed by DL-$elect: a list of stock attributes such as price/earning s ratio, and a list of price changes acquired one week later, corresponding to the first list. DL- $elect assembles 11 attributes for 600 stocks into the attribute list, inserts the price-change data, and cleans any malformed data. Next, stocks from the resulting data file are labeled as excellent if they perform in the top α% (in our simulation α=20) since BruteDL is a classification algorithm that requires a category to predict (John and Miller 1996). The data file is then randomly partitioned into a 60% training, 30% testing, and 10% pruning blend and entered into BruteDL. The generated rules determine which variables make an "excellent" stock.

查看译文

关键词

decision-list-based data-mining system,efficient market hypothesis,financial market,iterative algorithm,data mining,neural network,financial analysis,machine learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要