Home > Uncategorized > Machine learning — puzzling Big Data nonsense

Machine learning — puzzling Big Data nonsense

from Lars Syll

maIf we wanted highly probable claims, scientists would stick to​​ low-level observables and not seek generalizations, much less theories with high explanatory content. In this day​ of fascination with Big data’s ability to predict​ what book I’ll buy next, a healthy Popperian reminder is due: humans also want to understand and to explain. We want bold ‘improbable’ theories. I’m a little puzzled when I hear leading machine learners praise Popper, a realist, while proclaiming themselves fervid instrumentalists. That is, they hold the view that theories, rather than aiming at truth, are just instruments for organizing and predicting observable facts. It follows from the success of machine learning, Vladimir Cherkassy avers, that​ “realism is not possible.” This is very quick philosophy!

Quick indeed!

The central problem with the present ‘machine learning’ and ‘big data’ hype is that so many — falsely — think that they can get away with analysing real-world phenomena without any (commitment to) theory. But — data never speaks for itself. Without a prior statistical set-up, there actually are no data at all to process. And — using a machine learning algorithm will only produce what you are looking for.

Machine learning algorithms always express a view of what constitutes a pattern or regularity. They are never theory-neutral.

Clever data-mining tricks are not enough to answer important scientific questions. Theory matters.

  1. February 13, 2019 at 7:49 am

    Lars is right. Theory matters and we need a theory or theories. In the machine-learning field, which was once called Artificial Intelligence, the question was called frame problem. Deep learning is not exempt of this general rule.

    • Yoshinori Shiozawa
      February 14, 2019 at 3:14 am

      My comment was posted earlier than Frank Salter’s first comment, but it appeared later than it. I must be due to a strange checking system which requires human check when a comment poster has been absent from posting for about a month. But it is not the point I want to argue in this “reply”.

      Frame problem can be argued from different angles. In our case, I believe, the most important point is that any Artificial Intelligence (AI) does not work without a frame. This is equivalent to say that any machine learning system that works must have a frame. What Lars proposed is a very old question that many have discussed and most of people have forgotten. Choice of better frame (or theory) is the question that comes next. It seems for me that arguments between Frank Salter and Dave Taylor1 are missing the most important problem.

  2. Frank Salter
    February 13, 2019 at 8:33 am

    I am in total agreement with Lars’ comments. But again NO mention is made of the truly significant difference between valid theory and the curve fitting of data. Curve fitting can be done using arbitrary equations. It provides a concrete representation of a single data set. Abstract, that is a theoretical quantitative representation must satisfy the quantity calculus and if it is NOT to be invalidated empirically will contain the real factors. A valid representation will describe every possible value which can occur in reality.

    The difference between these two is clearly seen in Figure 4 of my paper “Transient Development” on page 156, RWER-81, 2017. Every one of the Solow relationships can be seen to be incapable of describing anything other than the local data used to produce the fitted curves. Wild excursions can be seen in all but the linear case. The only curve which fits is the transient curve of equation (31), page 158. Please note that the dot has been lost from the q-dot on the left hand side of the equation. The abstract relationship shown in equation (30), page 157, conforms to the quantity calculus.

    • February 13, 2019 at 11:46 am

      Frank, I too agree with Lars and I am sympathetic to your position. My point is that one can abstract even quantity and still have a valid (topological boundary) theory. This is rather like saying you cannot define a set without defining its boundaries, so defining the boundaries (and hence where to look) must come first.

      • Frank Salter
        February 13, 2019 at 12:42 pm

        The only point which I really understand is your “one can abstract even quantity”. This is precisely what I said. The abstract application of quantities MUST follow the quantity calculus. Any thing else can NOT be a valid abstraction.

        So in terms of sets: The set of quantitative abstract relationships are those which conform to the quantity calculus and the expressions are NOT invalidated by empirical evidence. Currently economists seem to believe that any expression, whatever the empirical evidence, may be theoretically valid. That would appear to be the set of any mathematical relationship.

      • February 13, 2019 at 1:57 pm

        But how can you “apply” quantities if you have abstracted them? In topology one has order numbers but not quantities.
        Compare a London Underground map with a street map showing the Underground lines, in which distances (hence quantitative measures) have been conserved.

      • Frank Salter
        February 13, 2019 at 2:26 pm

        The abstract relationship 𝐸 = 𝘮𝘤 squared is the abstract description of the quantity of energy produced when matter is converted to energy. It describes every possible eventuality. That is how the specific meaning of abstract is understood. Another example is 𝘷 = 𝘴 / 𝘵 which relates the velocity 𝘷 to the distance travelled, 𝘴, when an object is travelling at constant velocity, 𝘷, over the period of time, 𝘵.

        I would commend the reading of de Boer’s “On the History of Quantity Calculus and the International System”, 1995 Metrologia 31 pp. 405−429

  3. Frank Salter
    February 13, 2019 at 2:26 pm

    The abstract relationship 𝐸 = 𝘮𝘤 squared is the abstract description of the quantity of energy produced when matter is converted to energy. It describes every possible eventuality. That is how the specific meaning of abstract is understood. Another example is 𝘷 = 𝘴 / 𝘵 which relates the velocity 𝘷 to the distance travelled, 𝘴, when an object is travelling at constant velocity, 𝘷, over the period of time, 𝘵.

    I would commend the reading of de Boer’s “On the History of Quantity Calculus and the International System”, 1995 Metrologia 31 pp. 405−429

    • Frank Salter
      February 13, 2019 at 3:21 pm

      Sorry about the double posting but the normally filled in email and name entries were missing and did not seem to have allowed the first entry.

  4. February 13, 2019 at 4:18 pm

    Popper says of Russell’s Paradox (“Conjectures and Refutations”, 1981 pbk, RKP, p.260):

    “Russell [in 1904] did not mean to make a proposal – that we should consider these combinations [of symbols] as contrary to some (partly conventional) rules for forming sentences, in order to avoid the paradoxes. Rather, he thought he had discovered the fact that these apparently meaningful formulae expressed nothing, and that they were, in nature or in essence, meaningless pseudo-propositions. A formula like “a is an a” or “a is not an element of a” looked like a proposition (because it contained two subjects and a two-termed predicate) , but it was not a genuine proposition (or sentence) because a formula of the form “x is an element of y” could be a proposition only if x was one type level lower than y:– a condition which obviously could not be satisfied if the same symbol, a, was to be substituted for both x and y. This showed that a disregard of the type level of words (or of the entities designated by them) could make sentence-like expressions meaningless. … the same kind of confusion which nowadays is often called a category mistake.”

    Chesterton had almost certainly anticipated the difference in “entities designated by the same word” [hope] in “G F Watts”, published the same year. Here, Mayo’s point is that theorisation has two levels (often called the pure and the applied): “a healthy Popperian reminder is due: humans also want to understand and to explain” as well as “organizing and predicting observable facts” (for which the pure theory is that of organisation and prediction).

    Likewise, Frank’s word “abstraction” can be a generalisation for a set of meanings dependent on context (see any dictionary) as well as the particular understanding he has of it. As a scientist I distinguish an abstraction from a generalisation in much the same way as he does, but more generally:- forming it by “dividing out” not just the dimension t but all Ranganathan’s PMEST dimensions to leave myself with E becoming mc squared at the time of the Big Bang.

    • Frank Salter
      February 13, 2019 at 6:40 pm

      Dave, I believe I used the word abstract NOT abstraction. I would refer you to read section 2.3.2, p 147 of the de Boer paper which deals with concrete and abstract quantities. This paper sets out the exact meaning of how quantities may be represented mathematically and the rules of the algebra which must be applied.

      Your digression into the semantics of the different forms of the word abstract serve only to confuse and are off-topic.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: